Restricted (Penn State Only)
Abelman, Rebecca Lynn
Graduate Program:
Food Science
Master of Science
Document Type:
Master Thesis
Date of Defense:
June 13, 2018
Committee Members:
  • Edward G Dudley, Thesis Advisor
  • Catherine Nettles Cutter, Committee Member
  • Jasna Kovac, Committee Member
  • shigella
  • whole genome sequencing
  • food microbiology
Shigella is the third most common cause of bacterial foodborne disease, resulting in approximately 500 thousand cases of gastrointestinal illness annually in the United States and 80-165 million cases worldwide. Shigella sonnei is the dominant species in developed countries, and it is responsible for over 80% of the shigellosis cases seen in the United States. Recently, advancements in technology and decreased costs have made whole genome sequencing (WGS) technology readily available, and as such, government and state laboratories are moving to implement WGS in outbreak analysis, surveillance, and antimicrobial resistance monitoring of major foodborne pathogens. Accordingly, our study, in collaboration with the Pennsylvania Department of Health, examined a unique set of 22 S. sonnei isolates from Pennsylvania patients who either acquired the infections between 2009 to 2014 within the United States or abroad, when travelling to international locations. A previous study using isolates from this collection demonstrated phenotypic differences in antimicrobial resistance between international and domestic isolates, so in response, this study investigated both the genetic determinants of these observations and additionally apply WGS to characterize these strains in more detail. We analyzed the phylogeny of these strains, determining relatedness via SNP calling through the program SNVPhyl, and we observed major phylogenetic segmentation based on the previously described Global Lineages of S. sonnei. This analysis included United States isolates of a wider scope than previous research, and it showed strains from the United States, specifically from the state of Pennsylvania, fall into both Global Lineage II and III. Following these results, 17 genes with lineage specific SNPs were identified and developed into a lineage prediction test to determine the Global Lineage of uncharacterized S. sonnei before performing a full phylogenetic analysis. This prediction test was performed on 39 additional S. sonnei genomes retrieved from the NCBI Short Read Archive, and it showed high accuracy (greater than 97%) in Global Lineage prediction. The lineage prediction genes were also specific for S. sonnei when compared to both Escherichia coli and other Shigella species. Lastly, to determine if there were any differences between either the international or domestic isolates or between the Global Lineages, the antimicrobial resistance determinants were identified from the 22 genomes by using two antimicrobial resistance databases: the Bacterial Antimicrobial Resistance Reference Gene Database and ResFinder 3.0. A variety of acquired resistance genes and chromosomal mutations were found within the genomes, and while the international and domestic S. sonnei carried similar resistance determinants, Lineage III S. sonnei appeared to carry macrolide, quinolone, and chloramphenicol resistance determinants exclusively. This research adds to our knowledge about S. sonnei within the United States and how it compares to those found internationally in the context of comparative genomics. Additionally, the lineage prediction test is a useful tool for not only characterizing S. sonnei sequences effectively and efficiently, it also doubles as a method of differentiation between S. sonnei and other Shigella species and E. coli. In addition to using WGS to observe S. sonnei, additional observation into the sequencing quality control (QC) parameters were used to determine acceptability of genome sequences. Three QC parameters, coverage, N50, and sequence length, were collected from each S. sonnei sequence used in Chapter 3. QC parameters for these S. sonnei were each subjected to statistical testing using R programming. First, each QC parameter was examined for statistical distribution. Each parameter had a wide distribution, and there were similarities seen within the parameters between the isolates. In addition, a variance calculation was performed to determine the variance of the S. sonnei isolates’ sequence length when compared to the closed S. sonnei genome Ss046. The variance and the average distance from the known sequence length were larger than expected. Next, using a centroid-based clustering program, the S. sonnei QC parameters were sorted in an attempt to differentiate acceptable from unacceptable strains into two distinct clusters. Instead, two acceptable clusters were formed, and the unacceptable S. sonnei genomes were likely outliers. Lastly, based on observations from Chapter 3, a comparative analysis between S. sonnei QC parameter values and E. coli genome QC parameters was performed using a two-sample t-test. Based on this testing, all QC parameters were significantly different (p<0.05) between these highly related bacteria. Overall, this work generated useful data for the interpretation and application of whole genome sequencing S. sonnei, in hopes of assisting in our ongoing struggle with this bacterial pathogen.