Genome-Wide Modeling of Transcription Preinitiation Complex Disassembly Mechanisms Using Chromatin Immunoprecipitation Data

Open Access
- Author:
- Samorodnitsky, Eric
- Graduate Program:
- Integrative Biosciences
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- December 17, 2010
- Committee Members:
- Benjamin Franklin Pugh, Dissertation Advisor/Co-Advisor
Benjamin Franklin Pugh, Committee Chair/Co-Chair
Istvan Albert, Committee Member
Stephen Wade Schaeffer, Committee Member
Claude Walker Depamphilis, Committee Member - Keywords:
- workflow
modeling
chemical kinetics
Chromatin Immunoprecipitation
preinitation complex assembly and disassembly
binding sites - Abstract:
- Eukaryotic genes are regulated by hundreds of proteins that assemble into a preinitation complex (PIC), which functions to initiate transcription. PIC mechanisms of assembly and disassembly in vivo are largely unknown. To address this issue, in Chapter 2, I wrote the computational tool, PathCom (short for PATHway COMpatibility). PathCom takes, as input, an assumed PIC assembly pathway and genome-wide occupancy data. Assuming that occupancy data can be treated as binding duration and explicitly defined assembly/disassembly steps, PathCom outputs plausible PIC disassembly pathways. I exemplify this process by modeling ChIP-chip data of the general transcription factors (TBP, TFIIB, TFIIE, TFIIF, TFIIH, and RNA polymerase II), sequence-specific regulators, and chromatin remodelers of budding yeast Saccharomyces cerevisiae. My modeling found that TBP, sequence-specific regulator, and chromatin remodeler occupancy to be transient compared with the other general transcription factors, given the assumptions inherent in the system. Furthermore, I used PathCom to model the disassembly of GTF’s under heat shock conditions and found TBP occupancy to still be transient under heat shock conditions. PathCom can be used to model any assembly/disassembly process, given all species form a complex together. The reliability of the output modeling of PathCom relies on the accuracy of the assumptions inherent in the modeling and on the quality of the input occupancy data. To improve the quality of ChIP data, ChIP-chip is being replaced by ChIP-seq. This new technology offers higher resolution and lower background for transcription factor binding site identification. Therefore, in Chapter 3, I wrote a four-step algorithm to infer true transcription factor binding sites from ChIP-seq data. This algorithm accounts for peak representation on both DNA strands, reproducibility of peaks, and the presence of a motif. This algorithm was written for the genomic toolkit Galaxy. Steps in the algorithm, when possible, were written using preexisting tools on Galaxy. When not possible, certain steps were written in Python. This algorithm is freely available at http://dancluster.g2.bx.psu.edu.