Improved Pairwise Alignmnet of Genomic DNA

Open Access
- Author:
- Harris, Robert S.
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 24, 2007
- Committee Members:
- Webb Colby Miller, Committee Chair/Co-Chair
Padma Raghavan, Committee Member
Francesca Chiaromonte, Committee Member
Raj Acharya, Committee Member - Keywords:
- comparative genomics
homology search
BLAST
local alignment
sequence alignment - Abstract:
- Advances in DNA sequencing technology have fueled a rapid increase in the number of sequenced vertebrate genomes, and we anticipate an explosion in the number of genomes sequenced in the near future. Detecting similarities between genomes is a valuable technique in discovering functional elements, and sequence alignment is the primary tool for discovering similarities. The quality of alignments is affected by several user-specified control parameters. The parameters are so little understood that most users simply use default settings. We seek to change that, to have the program automatically infer appropriate parameter choices from statistics derived automatically from the sequences. We introduce a program, INFERZ, which addresses part of the inference problem, inferring substitution and gap scores according to a mathematically sound model. Further, we explore the usefulness of iterating inferred scores to convergence. We test this process on both simulated and actual genomic data, and show that iteration will converge in general, but found that converged scores were not a consistent improvement. INFERZ has a synergistic relationship with LASTZ, our improved drop-in replacement for the widely used alignment program BLASTZ. INFERZ makes repeated calls to LASTZ to test score sets, and LASTZ provides the user an option to have INFERZ decide what scoring parameters to use. Compared to BLASTZ, LASTZ adds a richer set of seeding strategy choices, supports alignment to probabilistic sequences and reduces memory requirements. Additionally, disciplined software techniques make it a better platform for continued experimentation.