Computational Design and Experimental Characterization of Protein Domains

Open Access
Pecore, Christina M. Kraemer
Graduate Program:
Doctor of Philosophy
Document Type:
Date of Defense:
July 24, 2002
Committee Members:
  • Juliette T J Lecomte, Committee Chair
  • B Tracy Nixon, Committee Member
  • Peter C Jurs, Committee Member
  • John R Desjarlais, Committee Member
  • Christain J Falzone, Committee Member
  • biophysical techniques
  • genetic algorithm
  • protein design
This thesis describes the first successful computational redesign of a ?-sheet protein. The high-resolution backbone structure of the WW domain from human peptidyl-prolyl cis-trans isomerase (hPin1) was used as input into an algorithm developed previously for the generation of amino acid sequences compatible with a given backbone geometry (Sequence Prediction Algorithm, or SPA). Various weights of protein folding parameters such as hydrophobic burial, hydrophilic exposure, electrostatic interactions, Lennard-Jones potential and side chain torsional energy were applied for the selection of a sequence of amino acids which, when placed on the target experimental backbone in the predicted conformations, resulted in a native-like protein model in terms of packing, electrostatic interactions, hydrophobic and hydrophilic patterning, and other necessary criteria. One protein sequence was chosen for experimental characterization to assess the usefulness of SPA in selecting a primary structure compatible with the wildtype WW fold. A synthetic gene for this protein was cloned into a previously developed plasmid using standard molecular biology techniques. Although expression was achieved to a high level as a fusion system with the N-terminus of calmodulin, difficulties in purifying the protein product using a variety of modern techniques prevented the structural analysis of the material, and a different approach to the design was adopted. To avoid the summary rejection of sequences on the basis of a few minor steric clashes during the process of computational design, backbone flexibility was implemented to relieve potential strains. Flexibility was mimicked through the application of a Monte Carlo algorithm, which allowed for random movements in the (?,?) space of the wild-type backbone. These manipulations produced an ensemble of backbones, all of which had a root mean squared deviation <0.3 Å to the wild-type backbone. Each of these backbones in the ensemble was used as input into SPA; the result was a free energy matrix of each allowed amino acid and its rotamers, which was used to determine the probability of occurrence of an amino acid and rotamer combination in the lowest-energy sequence. This new method was referred to as SPANS for Sequence Prediction Algorithm on Numerous States. Three promising sequences of 36 amino acids were chosen to test the power of the improved algorithm. The corresponding proteins were prepared with standard molecular biology methods, again as fusion systems with the N-terminus of calmodulin. Circular dichroism (CD) data indicated the presence of a WW-like target fold in one of the designed proteins (referred to as SPANS-WW2) via positive ellipticity centered on ~230 nm. Others have identified this feature as a characteristic of the CD spectrum of wild-type WW domains, although in the instance of the WW domain from hPin1, the wild-type protein utilized in this study, the signal is stronger. The positive ellipticity of SPANS-WW2 increased in intensity two-fold after the sample was heated to 95 °C for five minutes at pH 7 and subsequently cooled ? a process referred to as annealing. Additionally, a single mutation in the wildtype WW domain (W29A) yielded a protein with a positive ellipticity comparable to that of the SPANS-WW2 designed protein. These observations suggested that the reduction in CD signal was due to interactions between the solvent-exposed tryptophan and the aromatic amino acids located near this residue, rather than backbone conformation. The specificity of the SPANS-WW2 fold was confirmed by initial 1D and 2D proton NMR spectroscopy, which indicated the presence of the target WW-like fold. However, the thermal stability of the designed protein was decreased compared to that of the wild-type WW, as shown by CD thermal denaturation and 1D variable temperature NMR spectroscopy. Efforts towards the characterization of the other two proteins designed by the SPANS method yielded a limited amount of success according to CD data. SPANS highlighted several potentially useful point mutations in these two designed proteins; only one of the mutations increased the presence of a WW-type fold as evidenced by slight changes in the CD signature. The ability to identify interesting or useful point mutations increased the efficacy of the SPANS algorithm in the design of a protein that has high specificity for a target fold. This work provides strong experimental evidence of the success of the algorithm in selecting for a sequence that adopted and maintained a ?-sheet fold with marginal stability. As most previous attempts at protein design have focused on proteins that were entirely or primarily ?-helical, this success is especially noteworthy.