Graduate Fellow

Methods for reconstructing viral haplotypes and their phylogenies based on DNA sequencing data

PI(s): Raunaq Malhotra (Pennsylvania State University (University Park,PA))
Mary Poss (Pennsylvania State University (University Park,PA))
Start Date: 13-Jan-2014
End Date: 9-May-2014
Keywords: computational modeling, phylogenetics, software, recombination, genomics

A viral population consists of closely related genetic variants obtained from mutations and recombinations amongst the viruses. Understanding their phylogenetic relatedness is important to study viral evolution and is an active field of research. It requires that the genetic sequences of the viruses present in the population and their relative frequencies be known. Typically, viral sequences (or reads) obtained from next generation sequencing (NGS) technologies consist of short fragments of the virus; these are compared to a known reference sequence of the virus to determine the genetic variants present in the population. However, high mutational rates in the population and presence of sequencing errors in next generation sequencing (NGS) technology reads means that many fragments will not map to the reference genome. The aim of this project is to develop new methods for viral haplotype reconstruction and phylogenetic analysis based on counts of short consecutive bases of reads (or k-mers) obtained from a viral population. Such count based methods rely on counts of k-mers for estimating viral haplotypes and their phylogeny. These methods will help in understanding the phylogenetic relationships amongst viral haplotypes in a population and thus in understanding viral evolution and drug designs.