Improved Pairwise Alignmnet of Genomic DNA

Open Access
Harris, Robert S.
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
September 24, 2007
Committee Members:
  • Webb Colby Miller, Committee Chair
  • Padma Raghavan, Committee Member
  • Francesca Chiaromonte, Committee Member
  • Raj Acharya, Committee Member
  • comparative genomics
  • homology search
  • local alignment
  • sequence alignment
Advances in DNA sequencing technology have fueled a rapid increase in the number of sequenced vertebrate genomes, and we anticipate an explosion in the number of genomes sequenced in the near future. Detecting similarities between genomes is a valuable technique in discovering functional elements, and sequence alignment is the primary tool for discovering similarities. The quality of alignments is affected by several user-specified control parameters. The parameters are so little understood that most users simply use default settings. We seek to change that, to have the program automatically infer appropriate parameter choices from statistics derived automatically from the sequences. We introduce a program, INFERZ, which addresses part of the inference problem, inferring substitution and gap scores according to a mathematically sound model. Further, we explore the usefulness of iterating inferred scores to convergence. We test this process on both simulated and actual genomic data, and show that iteration will converge in general, but found that converged scores were not a consistent improvement. INFERZ has a synergistic relationship with LASTZ, our improved drop-in replacement for the widely used alignment program BLASTZ. INFERZ makes repeated calls to LASTZ to test score sets, and LASTZ provides the user an option to have INFERZ decide what scoring parameters to use. Compared to BLASTZ, LASTZ adds a richer set of seeding strategy choices, supports alignment to probabilistic sequences and reduces memory requirements. Additionally, disciplined software techniques make it a better platform for continued experimentation.