PHYLOGENETIC CONSERVATION OF CIS-REGULATORY REGIONS USING SEQUENCE ALIGNABILITY AND CLADISTIC MOTIFS

Open Access
- Author:
- King, David
- Graduate Program:
- Integrative Biosciences
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 09, 2009
- Committee Members:
- Ross Cameron Hardison, Dissertation Advisor/Co-Advisor
Ross Cameron Hardison, Committee Member
Webb Colby Miller, Committee Member
Benjamin Franklin Pugh, Committee Chair/Co-Chair
Kateryna Dmytrivna Makova, Committee Member
Francesca Chiaromonte, Committee Member - Keywords:
- lineage-specific regulatory elements
cis-regulatory modules - Abstract:
- A growing body of research shows that conservation of regulatory regions across wide phylogenetic spans (e.g. pan-vertebrate) is the exception rather than the rule. Here we study the conservation of regulatory regions without predisposition toward perfect alignment or deep conservation, allowing all possible observations to be interpreted a posteriori as conserved, lineage-specific, misalignment, or as ambiguous. In order to do this, we use the multi-species alignment as the measurement, and define the conservation of any given region as its “alignability” to various species. We also define a conservation-agnostic datatype, called a cladistic motif, which is produced by scanning each row of an alignment as a single sequence and then organizing the matches in terms of their placement within the alignment. Because no a priori assumptions are made about the strength of the alignment, cladistic motifs can describe all simple as well as irregular forms of conservation. We explore cladistic motifs as components of common regulatory regions, i.e. those that fall outside the conventional classification of a multi-species constrained sequence, and are the product of a variety of evolutionary source material for binding sites, such as those arisen by adaptation in human accelerated regions, motif turnover. These cases emphasize the spontaneity of cis-regulatory evolution, and may help explain why functional regulatory regions are a lesser-conserved fraction of the genome.