Advanced algorithms for the design and analysis of synthetic genetic parts
Restricted (Penn State Only)
- Author:
- Hossain, Ayaan
- Graduate Program:
- Bioinformatics and Genomics (PhD)
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- November 02, 2021
- Committee Members:
- Paul Babitzke, Outside Unit Member
Howard Salis, Chair & Dissertation Advisor
Costas Maranas, Major Field Member
Reka Albert, Outside Field Member
George Perry, Program Head/Chair - Keywords:
- automated
design
synthetic
genetic
parts
combinatorial
optimization
machine
learning
algorithm
promoter
mRNA
transcription
translation
decay
repression
CRISPR
repeats
interference
graph
theory
path
finding
nonrepetitive
repetitive
gene
synthesis
assembly
oligopool
discovery
barcodes
primers
counting
NGS
reads
classifier
classification
regressor
regression
random
forest
LASSO
ridge
homologous
recombination
instability
tuning
efficiency
plasmids
genome
engineering
degenerate
IUPAC
vertex
cover
independent
set
initiation - Abstract:
- Chip-based oligopool synthesis and high-throughput sequencing have paved the way for synthetic biologists to systematically design and characterize hundreds of thousands of uniquely defined sequence variants encoding genetic part functions of interest using an iterative design-build-test-learn approach. In this dissertation, I will discuss novel algorithms for efficiently designing and discovering highly non-repetitive genetic parts based on user-defined design constraints that encode for arbitrary part function. I will then demonstrate the application of these algorithms in engineering thousands of de novo non-repetitive promoters for E. coli and S. cerevisiae, for large-scale genetic systems engineering. I will then introduce an end-to-end pipeline that automates the design and analysis of massively parallel reporter assays with millions of defined variants, enabling the scalable design of orthogonal and unique oligopool elements, the splitting of long constructs into multiple oligos, and the rapid mapping, packing, and counting of reads to build and characterize thousands of promoters, ribozymes, and mRNA stability elements. Next, I will show synergistic applications of non-repetitive part design and machine learning-based techniques for designing completely new CRISPR parts for massively multiplexed CRISPRi applications using extra-long single guide RNA arrays (ELSAs). I will also discuss how de novo design of genetic parts can be used to create machine-learned models of part function that can be used to design unnatural variants with desired levels of activity, including biophysical models of transcription initiation rates for bacterial 𝞂⁷⁰ promoters, and mRNA decay rates in E. coli. Finally, I will cover a machine learning model that can predict whether a large genetic system encoded as a gene fragment can be synthesized by commercial synthesis providers with a short turnaround time.