CONTRIBUTION OF TRANSPOSABLE ELEMENTS TO GENOMIC NOVELTY: A COMPUTATIONAL APPROACH

Open Access
- Author:
- Gotea, Valer
- Graduate Program:
- Biology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 11, 2007
- Committee Members:
- Wojciech Makalowski, Committee Chair/Co-Chair
Stephen Wade Schaeffer, Committee Member
Kateryna Dmytrivna Makova, Committee Member
Piotr Berman, Committee Member - Keywords:
- transposable elements
exaptation
proteome
Alu elements
transcription factor binding sites
ScrapYard database - Abstract:
- Transposable elements (TEs) are DNA entities that have the ability to move and multiply within genomes, and thus have the ability to influence their function and evolution. Their impact on the genomes of different species varies greatly, yet they made an important contribution to eukaryotic genomes, including on those of vertebrate and mammalian species. Almost half of the human genome itself originated from various TEs, few of them still being active. Often times, TEs can disrupt the function of certain genes and generate disease phenotypes, but over long evolutionary times they can also offer evolutionary advantages to their host genome. For example, they can serve as recombination hotspots, they can influence gene regulation, or they can even contribute to the sequence of protein coding genes. Here I made use of multiple computational tools to investigate in more detail a few of these aspects. Starting with a set of well characterized proteins to complement inferences made at the level of transcripts, I investigated the contribution of TEs to protein coding sequences. I show that old TEs can indeed be found in functional proteins, albeit to a lesser extent than previously thought. I also investigated the protein coding potential of Alu elements, which are found in the alternatively spliced forms of many genes. No strong evidence could be found to support this hypothesis, but novel ways in which Alu elements can contribute to the human proteome are proposed. Complementing their protein coding potential, TEs were also shown to have a big potential to influence cis regulation of genes due to the composition of their sequence. This appears to be one important factor that determines their fate after insertion at new loci. A database, called the ScrapYard database (SYDB), was ultimately created to provide an efficient tool in addressing various questions related to the presence of TE sequences in vertebrate transcripts. At the present time, when genomic and genetic data are generated at increased rates, important advancements in biological knowledge can only be made with the help of computational tools. This work provides an example of how computational biology can advance biological knowledge, by exploring current hypotheses and testing them with available data, and at the same time, proposing new ones that can be further tested experimentally.