Abstract
In the wake of new sequencing and genotyping technologies, whole genome studies are now being undertaken to understand the genetic basis of phenotypes. Many of the principles underlying the measurement of genotype-phenotype relationships, as well as computing related population genetic parameters, are relatively well understood. However, the upcoming technologies dramatically change the scale and scope of these studies, which already encompass tens of thousands of individuals over a genome-wide region. The analysis of this data requires novel algorithmic and statistical techniques. This project focuses on a subset of the problems that could arise in a typical whole-genome based association study. These include:
- Phasing of genotypes into haplotypes using overlapping sequence data, and the application of this algorithm to phasing individual human sequences; the availability of high coverage long sequence data will make this approach the method of choice for phasing in the near future.
- Fast filtering for pairs of loci that interactively influence a phenotype and its application to multiple-locus testing of common disease phenotypes. The proposed work reduces the computational bottleneck in multiple locus testing.
- Detection of regions under balancing selection. Available tests are focused on detection of regions under positive selection. The proposed research looks for evidence of balancing selection in the genome, with specific attention on genes associated with bipolar disorder.
- Reconstruction of regulatory pathways using associations between genetic variation and gene-expression.
All software from this research is freely available as source-code, or as web-tools for academic, research and non-commercial purposes in accordance with University policy.
Students
Publications
- Christos Kozanitis, Chris Saunders, Semyon Kruglyak, Vineet Bafna, and George Varghese.
Compressing genomic sequence fragments using SlimGene.
In Proceedings of the Annual Intl. Conference on Computational Biology (RECOMB), 2010. (to appear).
- Dumitru Brinza, Glenn Tesler, and Vineet Bafna.
Rapid detection of gene-gene interactions in genome-wide association studies. submitted, 2010.
- Gaurav Bhatia, Vikas Bansal, Olivier Harismendy, Nicholas J. Schork, Eric Topol, Kelly Frazer, and Vineet Bafna.
A covering method for detecting genetic associations between rare variants and common phenotypes. (revision submitted).
- Banu Dost, Chunlei Wu, Andrew Su, and Vineet Bafna.
TCLUST: A fast algorithm for clustering large, sparse, gene expression data.
IEEE Transactions on Computational Biology and Bioinformatics, 2010. (to appear).
- Vikas Bansal and Vineet Bafna
HapCUT: an efficient and accurate algorithm for the haplotype assembly problem (PDF)
Bioinformatics. 2008 Aug 15;24(16):i153-9.
- Vikas Bansal, Aaron L. Halpern, Nelson Axelrod, and Vineet Bafna
An MCMC algorithm for haplotype assembly from whole-genome sequence data (PDF)
Genome Res. August 2008 18: 1336-1346.
Courses