Unveiling Structural Variation: Detection, Characterization, Phasing, and Its Role In 3D Genome Reorganization in Cancer

Open Access
- Author:
- Song, Fan
- Graduate Program:
- Bioinformatics and Genomics (PhD)
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- May 25, 2021
- Committee Members:
- Zhonghua Gao, Major Field Member
Sinisa Dovat, Outside Unit & Field Member
George Perry, Program Head/Chair
Shaun Mahony, Chair & Dissertation Advisor
Ross Hardison, Major Field Member
Feng Yue, Special Member - Keywords:
- Structural Variation
Chromatin Structure
Acute Myeloid Leukemia
Epigenomics
Hi-C
DNA methylation
Histone modification - Abstract:
- Structural variations (SVs) is one of the hallmarks of the cancer genome, with an estimated occurrence of ~20,000 per genome. SVs cause a variety of gene mis-regulations such as change of gene dosage, expression of proteins with altered functions, ectopic gene expression, and gene deactivation, potentially leading to pathogenic phenotypes or even cancer. Despite the important implications, detection of SVs remains challenging. Traditional techniques, such as cytogenetics, microarray, and next-generation sequencing, has either limited resolution in the SV detection or trouble in detecting specific types of SVs. To address above limitations and comprehensively detect SVs in cancer genome, we proposed an integrative framework that combines Hi-C, Optical mapping and whole genome sequencing for SV detection. It is the first time that these three techniques are systematically compared and integrated to increase detection power for SVs. In particular, we developed a novel algorithm for using Hi-C to detect all types of SVs genome-wide. By comparing the SVs detected by three techniques, we found that each technique has its own strength in SV detection. Hi-C and optical mapping excel at detecting large SVs and complex SVs, while WGS is more suitable for small deletions/insertions and translocations at base-pair resolution. Combining all three techniques achieves the highest sensitivity for SV detection. Further, we observed widespread SV-induced chromatin reorganization such as formation of “neo-TAD” and “neo-loops” that underlie upregulation of known oncogenes including MYC, ETV4 and TERT. In Acute Myeloid Leukemia (AML), somatic mutations frequently affect genes involved in DNA methylation, histone modification and Cohesin complex. Mis-regulation of these genes potentially lead to altered 3D genome structures. However, chromatin reorganization and its functional consequences associated with different subtypes of AML is poorly understood. To address this question, we performed Hi-C, whole genome sequencing and RNA-seq in 29 AML primary samples which covers nine major subtypes. We observed a strikingly subtype-specific alteration of chromatin compartmentalization and loops with concordant changes of gene expression. Moreover, we identified a novel type of chromatin loops, dubbed promoter-repressor or “PR” loops, and experimentally validated their repressive effects on gene expression. Using a method recently developed by our lab, we identified structural variation-induced neo-loops potentially leading to enhancer-hijacking or repressor-hijacking events. Further, we performed whole-genome bisulfite sequencing in patient samples and found that altered methylation correlates with A/B compartment switching. The methylated CTCF motifs disrupt CTCF binding and its insulator function, leading to an extensive gain of loops across the normal TAD boundaries in AML. Finally, by treating the AML cells with the DNA hypomethylating agent 5-azacytidine, the altered chromatin structure and gene expression can be restored, with switched compartment reverted and aberrantly gained loops dissociated, alongside compromised AML cell proliferation. Understanding the cis-acting effects of SVs is a critical step towards functional interpretation of non-coding SVs. Allelic expression imbalance is a powerful technique for measuring the cis-acting effects of genetic variants. However, associating SVs with allelicly-expressed alleles often requires the haplotype information between SVs and heterozygous SNV alleles. To address this, we developed a novel computational method for detecting cis-acting SVs through phasing of SVs and SNVs and analysis of allelic imbalanced expression. Our method leverages WGS and Hi-C reads for local SV-SNV phasing, extends the SV-SNV haplotype with Hi-C based SNV phasing, and identifies biased expressed alleles phased with SVs. Applying this method to a collection of cancer cell lines and AML primary samples of which WGS and Hi-C data are available, we showed widespread cis-acting SVs that are associated with allelic imbalance expression of hundreds of genes including known oncogenes. As an example, in the LNCaP cell line (prostate cancer derived), ETV1 is expressed exclusively on the allele that is translocated to a different chromosome. Interestingly, ETV1 activates androgen receptor transcriptional program and promotes autonomous testosterone production, leading to poor clinical outcome. Phasing of SVs with gene alleles provides more direct evidence for SVs regulating gene expression in cis. These cis-acting SVs are more likely to be functional compared with those not affecting allelic expression. Therefore, our method is useful to prioritize functional SVs and their target genes.