Bioinformatics discovery of chemical diversity in enzyme superfamilies

Open Access
- Author:
- Hu, Kai
- Graduate Program:
- Molecular, Cellular and Integrative Biosciences
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 24, 2019
- Committee Members:
- Benjamin D Allen, Dissertation Advisor/Co-Advisor
Joseph M Bollinger, Jr., Committee Chair/Co-Chair
Amie Kathleen Boal, Outside Member
Ross Cameron Hardison, Committee Member
Melissa Rolls, Program Head/Chair - Keywords:
- Ribonucleotide Reductase (RNR)
Bioinformatics
Radical chemistry
Class Ie RNR
Deep Mutational Scanning (DMS) strategy - Abstract:
- Enzymes are biological macromolecular catalysts that play essential roles in many fundamental biological processes including photosynthesis (the major source of bio-mass formation) and cellular respiration (the major source of ATP production). Compared to simple small molecule catalysts, the catalytic processes by enzymes are far more complicated. Among the many factors that may affect the activity of an enzyme, cofactors, which refer to non-protein chemical compounds, are usually required for an enzyme’s activity. Previous studies have revealed the great diversities regarding cofactor usage in enzymes. It has been shown that the same set of cofactors can be used toward catalyzing different reactions by certain sets of highly similar enzymes that share a common ancestor (protein superfamily). It is also very common that members of certain protein families can catalyze the exact same reaction but with distinct cofactors. The study of these diversities can help us better understand the catalytic mechanisms and evolution histories of enzyme families and also provide clues for guided engineering of proteins. Two distinct enzyme systems are discussed in this dissertation, say, the ribonucleotide reductase (RNR) and Fe(II)- and 2-oxoglutarate (2OG)-dependent oxygenase (FE2OG enzyme). The former catalyzes the only de novo biosynthesis of deoxyribonucleotides (dNTPs, DNA building blocks) from ribonucleotides (NTPs or NDPs) for nearly all cellular organisms and many viruses. Though all RNRs share a very similar active site core and catalyze exactly the same dNTP forming reaction, the metallocofactors they rely on differ and thus can be divided into multiple sub-categories. The members of FE2OG enzyme superfamilies, however, can catalyze very distinctive reactions but they all require the same set of cofactors, namely Fe(II) and 2OG. Traditional ways of analyzing enzyme diversities mainly rely on biochemistry and molecular biology methods, such as single-site mutagenesis and crystallography, which are both labor-intensive and time-consuming. Recent developments in computational biology have enabled efficient and reliable analysis of chemical diversity. This dissertation describes computational pipelines that incorporate various bioinformatics and statistical packages in order to facilitate the investigation of diversities in these two important enzyme superfamilies. In Chapter 1, the current understanding of ribonucleotide reductase will be discussed. A brief introduction of the reaction mechanism, current categorization of RNR and a short history of the discovery of class I RNR subclasses will be covered. A computational pipeline that aims at promoting the discovery of potential novel class I RNR subclasses will be described in Chapter 2. This pipeline successfully identifies several candidates that may utilize novel catalytic mechanisms compared to canonical RNRs. Among all discovered sequence candidates, a subset of them (termed subclass Ie RNR) stands out because of their distinctive structural features. A series of complementation assays are conducted that confirms the functionality of this Ie RNR. Solved structures of Ie RNR by our collaborators suggest a unique radical initiating mechanism in its active center. Besides Ie RNR, some other candidates will be discussed in Chapter 2 as well. In order to provide more evidence to support the investigation of chemical mechanisms behind each class I RNR subclasses, another pipeline is proposed in Chapter 3 that is called Deep Mutational Scanning (DMS) assay. This pipeline is initially designed for the investigation of functionally important sites in Ie RNR but can be applied to other systems as well given the appropriate inputs. A Graphical User Interface (GUI) is added in version 1.0 in addition to command line mode to make it more user-friendly. Results from this pipeline confirm the proposed critical sites in Ie RNR based on solved structures and can be used to guide hypothesis building towards its chemical mechanisms. In addition to the RNR system, this dissertation also describes a computational pipeline to investigate the diversity of the Fe(II)/2OG enzyme system. A brief summary of Fe(II)/2OG enzymes including the various reactions they catalyze, as well as the proposed underlying mechanisms, will be discussed in Chapter 4. In order to aid rational design and protein engineering, geometrical features including sequence conservation patterns, target substrate-atom distributions, and various angle metrics are measured and visualized. These features can be fed into machine learning algorithms towards a working model for alternative reaction prediction. A summary of the thesis works as well as potential future research directions will be covered concisely in a final Chapter 5. In short, this dissertation provides multiple computational pipelines that incorporate both bioinformatics and statistical packages, that provide key insights into the chemical diversity exhibited by class I RNRs and Fe(II)/2OG enzymes.