Identification of small RNA producing genes in the moss Physcomitrella patens

Open Access
Coruh, Ceyda
Graduate Program:
Plant Biology
Doctor of Philosophy
Document Type:
Date of Defense:
June 16, 2014
Committee Members:
  • Michael Axtell, Dissertation Advisor
  • Michael Axtell, Committee Chair
  • Claude Walker Depamphilis, Committee Member
  • Sarah Mary Assmann, Committee Member
  • Anton Nekrutenko, Special Member
  • Heterochromatic siRNA
  • microRNA
  • small RNA
  • small RNA-seq
  • dicer
  • Physcomitrella patens
In plants, a significant fraction of the genome is responsible for making regulatory small RNAs. These ubiquitous, endogenous small RNAs are currently categorized into two groups: microRNAs (miRNAs) and small interfering RNAs (siRNAs). They are produced by Dicer-Like (DCL) proteins and utilized by Argonaute (AGO) proteins to guide repressive regulation of target mRNAs and/or chromatin selected on the basis of small RNA-target complementarity at the transcriptional or post-transcriptional levels. 21 nt miRNAs and 24 nt heterochromatic siRNAs are the two major types of small RNAs found in angiosperms (flowering plants). The small RNA populations in angiosperms are dominated by 24 nt heterochromatic siRNAs which derive from intergenic, repetitive regions and mediate DNA methylation and repressive histone modifications to targeted loci in angiosperms. However, the existence and extent of heterochromatic siRNAs in other land plant lineages has been less clear. The failure to identify 24 nt heterochromatic siRNA accumulation by initial small RNA-seq attempts from several other species including gymnosperms (Dolgosheina et al. 2008), and the lycophyte Selaginella (Banks et al. 2011) has raised the question whether the heterochromatic siRNA pathway is angiosperm-specific. Previous work in Physcomitrella provides evidence that supports the hypothesis that the heterochromatic siRNA pathway is an ancestral trait that was present in the last common ancestor of bryophytes and all other subsequently diverged lineages of plants (Cho et al. 2008). However, comprehensive annotation of small RNA genes in the basal lineage Physcomitrella is still lacking and an investigation of small RNA populations in this model organism would shed more light on the evolution of regulatory small RNA pathways in land plants. With the advent of next-generation sequencing, small RNA-seq has become a good resource for producing enormous volumes of data on plant miRNA and siRNA expression. Therefore, we produced extensive small RNA-seq data (more than 108 mapped reads) to annotate small RNA genes in ten-day-old protonemata from wild-type Physcomitrella. ShortStack is a recently developed tool to analyze small RNA-seq data with respect to a reference genome and to provide a comprehensive annotation of de novo discovered small RNA genes. Utilizing ShortStack, we identified 16,024 distinct DCL-dependent small RNA producing loci and classified them into five different groupings based on the RNA secondary structure evaluation and the predominant small RNA size. These Physcomitrella small RNA producing loci is now available in our developing web server ( In order to investigate the features of heterochromatic siRNAs, we revisited the Physcomitrella genome to find functional orthologs of the heterochromatic siRNA genes. We identified candidate proteins that could potentially be involved in the accumulation of heterochromatic siRNAs and created mutants to perform genetic analysis. With the power of consistent biological replicates, differential expression analysis on small RNA-seq data revealed that the accumulation of siRNAs from 23-24 nt siRNA loci depends upon Physcomitrella homologs of DICER-LIKE 3 (DCL3), RNA-DEPENDENT RNA POLYMERASE 2 (RDR2), and the largest sub-unit of DNA-DEPENDENT RNA POLYMERASE IV (Pol IV), with the largest sub-unit of a Pol V homolog contributing to expression at a smaller subset of the loci. These data lead us to conclude that Physcomitrella utilizes a heterochromatic siRNA pathway fundamentally similar to that of flowering plants. In contrast to angiosperms, we identified a Physcomitrella-specific MINIMAL DICER-LIKE (mDCL) gene, which lacks the N-terminal helicase domain typical of DCL proteins, but contains the ‘catalytic core’ (the PAZ domain and the twin RNAseIII domains) of the DCL proteins. We showed that Physcomitrella heterochromatic siRNAs are not solely composed of 24 nt siRNAs as seen in angiosperms, but rather contain equal mixtures of 23 and 24 nt siRNAs. Interestingly, Physcomitrella-specific mDCL is found to be specifically required for 23 nt siRNA accumulation from these loci. Overall, our data lead us to conclude that heterochromatic siRNAs, and their biogenesis pathways, are largely but not completely identical between angiosperms and basal land plants, as represented by the bryophyte, Physcomitrella patens. Significant effort has been made in small RNA gene annotation, but this progress has been unevenly distributed, with MIRNA loci in particular receiving a disproportionate share of the attention. We believe that further efforts at comprehensive and consistent reference annotations of all types of small RNA producing genes, and improvements in the dissemination of such annotations, will greatly enhance the future of plant genomics. Our developing web server (, which currently hosts small RNA gene annotations of just two species, Amborella trichopoda and Physcomitrella patens, is intended to serve this purpose. In particular, we look forward to the day when researchers seeking to study small RNAs will be liberated from the need to "re-invent the wheel" by generating their own de novo annotations of small RNA-producing genes with each analysis.