Distinctive Genomic Features of Erythroid Cis-Regulatory Modules
Open Access
- Author:
- Zhang, Ying
- Graduate Program:
- Genetics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 16, 2009
- Committee Members:
- Ross Cameron Hardison, Dissertation Advisor/Co-Advisor
Robert Paulson, Committee Chair/Co-Chair
Ross Cameron Hardison, Committee Member
Douglas Cavener, Committee Member
Francesca Chiaromonte, Committee Member
Kateryna Dmytrivna Makova, Committee Member - Keywords:
- Cis-Regulatory Modules
GATA1
ESPERR
Word Enumeration
ChIP - Abstract:
- Regulation of gene expression is a major challenge in biology. My dissertation aims to improve our ability to reliably identify cis-regulatory modules (CRMs) in vertebrates. With the growing number of completed and high-quality draft sequences of several vertebrate genomes, comparative genomics and other bioinformatics methods have become first-line methods to predict and analyze CRMs. Recently, our lab has reported two large-scale investigations of Erythroid cis-regulatory modules, one of which used a systematic way to predict and test erythroid CRMs (RP-based computational predictions followed by report-gene assays), the other one used microarray coupled chromatin immunoprecipitation to identify in vivo occupied sites by GATA1. The results were satisfactory; we successfully identified 42 functional CRMs and 63 in vivo occupied sites by GATA1. To improve the predictive power of the computational models and to investigate the power of motifs in predicting the occupancy, both conservation-based (ESPERR algorithm) and motif-based (direct enumeration of words) bioinformatic methods have been applied to current datasets for an attempt of decoding the genomic and bioinformatic signals that are associated with active DNA fragments. ESPERR can distinguish known Erythroid CRMs from neutral DNAs, but it met its limitation when attempted to discriminate GATA1-occupied sites from unoccupied ones. Direct enumeration of words can identify motifs that are predictive of occupancy given the presence of WGATAR, but we need additional signals to correct identify the one real binding sites from dozens of candidates. Repeated cycles of computational predictions and biological tests, with new knowledge being incorporated into each current model, should refine our ability to correctly identify cis-regulatory modules.