ASSEMBLY AND ANALYSIS OF GREAT APE Y CHROMOSOMES
Open Access
- Author:
- Rangavittal, Samarth
- Graduate Program:
- Bioinformatics and Genomics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- August 01, 2018
- Committee Members:
- Kateryna Dmytrivna Makova, Dissertation Advisor/Co-Advisor
Kateryna Dmytrivna Makova, Committee Chair/Co-Chair
Raquel Assis, Committee Member
Wansheng Liu, Committee Member
Paul Medvedev, Outside Member - Keywords:
- genome assembly
great ape
Y chromosome
k-mers - Abstract:
- Significance: Elucidating the structure and content of the Y chromosome in primates is important for understanding human sex determination, male fertility, and ancestry. Among the great apes, only the human and chimpanzee Y chromosomes have been sequenced and assembled - both using laborious and expensive techniques. Further, across the available mammalian reference genomes, less than one in five references include a sequenced Y chromosome, at varying percentages of completion. This paucity of sequenced Y chromosomes can be overcome by leveraging fast and cost-effective Next-Generation-Sequencing-based methods. Draft Y chromosomes generated by such a strategy may then be analyzed to further understand variation between species. In this dissertation, I develop and apply novel computational methods for assembly of great ape Y chromosomes (Chapters 2 and 3) and discuss the use of these methods to generate a de novo assembly of the Y chromosome of gorilla (Chapter 4). Specifically, this dissertation addresses the following questions : 1. How can we assemble a Y chromosome from enriched sequencing datasets? I developed a method called RecoverY that leverages k-mer abundance to classify sequencing reads as Y-linked. K-mers originating from Y-specific reads occur at a higher abundance than k-mers originating from non-Y reads as a consequence of the enrichment process. Reads composed of mostly high-abundance k-mers are thus selected as Y-specific and are subsequently assembled using existing short-read assemblers. 2. How do we isolate Y-linked contigs from non-enriched sequencing datasets? I developed a method called DiscoverY to isolate Y-contigs from male whole-genome assemblies of non-enriched DNA. This method uses depth of coverage from male reads and proportion of k-mers shared with a female reference to inform a machine learning model which classifies contigs as Y-linked. Those contigs which have a low depth of coverage from male reads and low proportion sharing with female are shortlisted as Y-contigs, while the remaining contigs are filtered out as likely autosomal or X-chromosomal contigs. This method has the advantage of being sequencing-platform agnostic, and does not require laborious, specialized enrichment techniques. 3. What can we say about the Y chromosome of gorilla compared to other great apes? Applying the RecoverY method for enriched datasets, I assembled a draft gorilla Y chromosome using Illumina short reads. By further improving this assembly using a hybrid approach of adding PacBio long reads, my colleagues and I were able to demonstrate that the gorilla Y shares multiple features to a greater degree with human Y than with chimpanzee Y, such as alignable sequence length, gene presence, gene copy number, interspersed repetitive element content and number of palindromes. Broadly, results from this dissertation are relevant to understanding the structure and content of great ape Y chromosomes – thus enabling a comparative genomic analysis across these Y chromosomes. Further, the freely available methods developed herein can aid the assembly of a heteromorphic sex chromosome in any species.