Postdoctoral Fellow

Improved probabilistic models of insertion/deletion for phylogenetic inference

PI(s): Benjamin D Redelings
Start Date: 1-Sep-2009
End Date: 31-Aug-2012
Keywords: phylogenetics, gene structure and function

NESCent Project:
Recent advances in statistical methodology allow phylogeny inference to make use of information in insertions and deletions, and to average over uncertainty in multiple sequence alignments. However, the accuracy of these methods could be improved by including some key features of the biological process that generates insertion and deletion mutations (indels). Two of these features are (I) spatial variation in the rate of insertion and deletion, and (II) higher rates for variation in the number of tandem repeats (VNTR). Ignoring spatial variation in insertion/deletion rates can decrease phylogenetic accuracy because the evidential weight of a shared indel is determined by the local indel rate. Proteins have higher indels rates in regions that are exposed to solvent, and so such indels should be down-weighted relative to indels that occur in the hydrophobic core. Additionally, when handling nearly-neutral sequences such as inter-genic spacers, ignoring VNTR mutations can undermine phylogeny inference by giving shared changes of these types too much weight.
I propose to extend the software BAli-Phy which jointly estimates alignments and phylogenies to handle indel hotspots. I have developed a simple transducer-based model for multiple alignments that allows each column to fall into a “fast” or “slow” rate category and clusters fast columns together. I am developing MCMC transition kernels to simultaneously Gibbs sample of alignments and column labels. Additionally, I plan to use importance sampling on posterior samples from BAli-Phy to correctly weight VNTR mutations. I will then estimate indel rate heterogeneity and VNTR rate increase in several data sets.

Related products

Software and DatasetsPublications