Integrated Modeling of Phototrophic Metabolism Leveraging Multi-Omics Datasets

Open Access
- Author:
- Sarkar, Debolina
- Graduate Program:
- Chemical Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- December 08, 2021
- Committee Members:
- Andrew Zydney, Major Field Member
Phillip Savage, Major Field Member
Costas Maranas, Chair & Dissertation Advisor
Seong H. Kim, Professor in Charge/Director of Graduate Studies
Sally Mackenzie, Outside Unit & Field Member - Keywords:
- flux balance analysis
metabolic engineering
SNPs
poplar
arabidopsis
cyanobacteria
constraints based analysis
optimization - Abstract:
- Rapid progress in high-throughput experimental technologies has enabled generation of large-scale systems biology datasets. These span all biological hierarchies from genomics describing the genetic make-up, transcriptomics and proteomics at the gene and enzyme expression level, metabolomics that helps quantify the amount and nature of resultant biomolecules, to finally phenomics that describes the overall traits of an individual. This veritable data deluge necessitates algorithmic and computational advances that can leverage multi-omics integration, in order to facilitate the analysis of complex systems and extract meaningful insights. Flux balance analysis (FBA) using genome-scale metabolic (GSM) models provide an advantageous platform for doing so as these models are (relatively) parameter-free, can be constructed using the annotated genome alone and simulated in linear time offering scale-up benefits. GSMs model a network view of metabolism, wherein metabolites are cast as nodes in a graph linked via edges representing all possible biochemical conversions occurring within an organism. In Chapter 1, we present an overview of constraints-based analysis of metabolic networks, including the reconstruction of GSM models, their use within an optimization-based scheme such as FBA, and the various applications of such models. Next, we describe the extension of metabolic modeling frameworks, originally designed for microbial systems, to the study of plants. This is accompanied by its own set of challenges, such as accurately capturing the division of roles between the various tissue and organ systems and dealing with systematic biases that are typically associated with poorly annotated non-model systems. Finally, we explore how the incorporation of new data types, modeling schemes, and computational tools have impacted FBA by helping increase its predictive power and scope. FBA has proven to be quite adept at describing aggregated metabolite flows, i.e., providing a snapshot of metabolism as averaged over the entire growth cycle. However, it is also time invariant, and thus does not accommodate temporally varying cell processes such as sequestering different biomass components at various time points in a growth cycle However, we know from experiments that many organisms including cyanobacteria have a lifestyle that is heavily tailored around light availability and thus show metabolic oscillations. In Chapter 2, we present a framework called CycleSyn that augments FBA by accounting for such temporal trends. CycleSyn discretizes a growth cycle into individual time periods (called Time Point Models or TPMs), each described by its own GSM model. The flow of metabolites across TPMs is allowed while inventorying metabolite levels and only allowing for the utilization of currently or previously produced compounds. Additional time-dependent constraints can also be imposed to capture the cyclic nature of cellular processes. CycleSyn was used to develop a diurnal FBA model of Synechocystis sp. PCC 6803 metabolism. Predicted flux and metabolite pools were in line with published studies, paving the way for constructing time-resolved GSM models. Additionally, the metabolic reorganization that would be required to enable Synechocystis PCC 6803 to fix nitrogen by temporally separating it from photosynthesis was also explored. Similar to modeling multiple metabolic models at once in CycleSyn, in Chapter 3 we extend this to modeling multiple organisms together as in a community, so as to discern the underlying interactions. This community comprised a genetically streamlined unicellular cyanobacterium called Candidatus Atelocyanobacterium thalassa (or UCYN-A) living in a symbiosis with a phototrophic microalga. We used metabolic modeling to glean insights into UCYN-A’s unique physiology and metabolic processes governing the symbiotic association. To this end, we developed an optimization-based framework that infers all possible trophic scenarios consistent with the observed data. Possible mechanisms employed by UCYN-A to accommodate diazotrophy with daytime carbon fixation by the host (i.e., two mutually incompatible processes) were also elucidated. We found that the metabolic functions of the two constituents, and UCYN-A’s streamlined genome is optimized to support maximal nitrogen fixation flux, alluding that this symbiosis is as close to being a functional ‘nitroplast’ as any observed till date. We envision that the developed framework using UCYN-A and its algal host will be used as a roadmap and motivate the study of similarly unique microbial systems in the future. Understanding how genomic mutations impact the overall phenotype of an organism has been a focus of efforts aimed at improving growth yield, determining genetic markers governing a trait, and understanding adaptive processes. This has been performed conventionally using genome-wide association studies, which seek to identify the genetic background behind a trait by examining associations between phenotypes and single-nucleotide polymorphisms (SNPs). Although such studies are common, biological interpretation of the results remains a challenge; especially due to the confounding nature of population structure and the systematic biases thus introduced. In Chapter 4, we propose a complementary tool called SNPeffect that offers putative genotype-to-phenotype mechanistic interpretations by integrating biochemical knowledge encoded in metabolic models. SNPeffect was used to explain differential growth rate and metabolite accumulation in Arabidopsis and poplar as the outcome of SNPs in enzyme-coding genes. To this end, we also constructed a genome-scale metabolic model for Populus trichocarpa, the first for a perennial woody tree. As expected, our results indicated that growth is a complex polygenic trait governed by carbon and energy partitioning. The predicted set of functional SNPs in both species are associated with experimentally-characterized growth-determining genes and also suggest putative ones. Functional SNPs were found in pathways such as amino-acid metabolism, nucleotide biosynthesis, and cellulose and lignin biosynthesis, in line with breeding strategies that target pathways governing carbon and energy partition. Thus far, we have developed computational frameworks that examine how the metabolism of an organism dictates its total phenotype and interactions with other organisms in a community. In Chapter 5, we take the next step by examining ways in which an organism can impact its host, specifically how the infant gut microbiome is shaped. Fecal samples from newborn infants showed that gut bacteria is detectable by 16 h after birth. However, analysis of the microbiome, proteome, and metabolome data did not suggest a single genomic signature for neonatal gut colonization. Using flux balance modeling, we found E. coli to be the most common early colonizer. The appearance of bacteria was associated with decreased levels of free amino acids and an increase in products of bacterial fermentation, primarily acetate and succinate. Among all the microbial species found, these observations were only consistent with E. coli growing under anaerobic conditions using amino acid fermentation to support maximal ATP yield. These results provide a deep characterization of the first microbes in the human gut and show how the biochemical environment is altered by their appearance. Finally, in Chapter 6, we conclude with our efforts to develop computational frameworks enabling the integration of heterogeneous datasets within constraints-based optimization. We discuss current challenges associated with such modeling frameworks and their uses, and finally present future perspectives for augmenting these models with the incorporation of diverse data types, multi-scale modeling, cross-cutting applications.