Cross-Species Prediction of Transcription Factor Binding
Restricted (Penn State Only)
- Author:
- Agarwala, Vandana
- Graduate Program:
- Statistics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 20, 2022
- Committee Members:
- Shaun Mahony, Thesis Advisor/Co-Advisor
Qunhua Li, Thesis Advisor/Co-Advisor
Ephraim Hanks, Professor in Charge/Director of Graduate Studies
Xiang Zhu, Committee Member - Keywords:
- Bioinformatics
Deep Learning
Gene Regulation
Domain Adaptation
Transfer Learning
Computational Biology
Machine Learning - Abstract:
- Transfer learning, the application of knowledge gained in one machine learning task to a new and related task, represents an attractive approach to studying gene regulation across different species. Here, we apply transfer learning to study the transcription factor (TF) binding motif patterns of four specific transcription factors in up to seven different species. We expect that TF binding preferences should generalize across different species and thus a model trained on one species' genome should roughly be able to predict binding to another species' genome. However, there are some species-specific genomic features, such as repeat elements, which prevent trained models from generalizing perfectly across different species. To account for this, we propose a domain adaptive model architecture which discourages learning of species-specific genomic sequence features. Our results demonstrate that prediction is feasible on species-agnostic genomic features when such an architecture is used to account for domain shifts, i.e. differences in underlying genomic background. Our results also suggest that analysis may be more informative if evolutionary distance is taken into account in prediction.