Predicting Transcription Factor Binding Using Neural Structured Learning
![open_access](/assets/open_access_icon-bc813276d7282c52345af89ac81c71bae160e2ab623e35c5c41385a25c92c3b1.png)
Open Access
- Author:
- Zesati, Natalie
- Graduate Program:
- Bioinformatics and Genomics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- November 09, 2020
- Committee Members:
- Shaun Mahony, Thesis Advisor/Co-Advisor
George H Perry, Program Head/Chair
Reka Z Albert, Committee Member - Keywords:
- Neural Structured Learning
Convolution Neural Network
NSL
CNN
Transcription Factors - Abstract:
- Transcription Factors (TF) bind to sequence specific deoxyribonucleic acid (DNA) sites. Although a specific TF sequence is likely to exist in a multitude of locations throughout the genome, TF appear to bind to specific sites. Currently it is not well understood what factors contribute to this selection process. In recent years, the accessibility and reduced cost of assay technologies, such as ATAC-seq, ChIP-seq, and Hi-C sequencing, have provided the scientific community with an abundance of valuable data, that is growing at an exponential rate year over year. In parallel, advances in computing power and machine learning techniques are being leveraged and applied to Bioinformatics and Genomics resulting in new methods to process, store, and analyze data. This combination has resulted in significant discoveries and insights pertaining to TF binding sites, with many models being able to predict, with a high level of accuracy, binding sites for some TFs. Albeit powerful, these state-of-the-art machine learning methods are constrained by linear relationships within the genome, i.e. sequential genome segments in the genome. Here we developed a new method using TensorFlow Neural Structured Learning (NSL), which will allow us to leverage genome-wide interaction patterns during training. The inclusion of sequential as well as more distant interactions will allow us to detect TF binding (individual and clusters) and also cis-regulatory modules (CRM) that include promoters, enhancers, and repressors that may act independently or cooperatively in the regulatory process (Alberts 2015). We show that in associating these more distant interactions, via graph regularization, we can increase our accuracy for predicting Ascl1 TF binding sites.