Analysis of Arabidopsis thaliana RNA Dependent RNA Polymerase Mutants Reveals Novel Small RNAs and Improves Existing Annotations

Open Access
Polydore, Seth
Graduate Program:
Doctor of Philosophy
Document Type:
Date of Defense:
November 14, 2018
Committee Members:
  • Michael Axtell, Dissertation Advisor
  • Claude Walker Depamphilis, Committee Chair
  • Michael Axtell, Committee Member
  • Charles T Anderson, Committee Member
  • Naomi S Altman, Outside Member
  • small RNAs
  • micro RNA
  • phased small RNA
  • heterochromatic small RNA
  • rdr1/2/6
Small RNAs are molecules that regulate key physiological functions in land plants through transcriptional and post-transcriptional silencing. Small RNAs can be divided into two major categories: microRNAs (miRNAs) which are precisely processed from single-stranded, stem-and-loop RNA precursors, and short interfering RNAs (siRNAs), which are derived from double-stranded RNA precursors. The processing of small RNA precursors is performed by RNASEIII-like endonucleases known as DICER-LIKE (DCL) proteins. DCLs hydrolyze small RNA precursors into 20-24 nt, double-stranded RNA molecules. One strand of these double-stranded RNA molecules is loaded into ARGONAUTE (AGO) proteins. AGO uses the loaded single-stranded RNA to target other RNA molecules for repression. Depending on the type of small RNA in question, a third type of protein family is necessary for small RNA biogenesis – RNA DEPENDENT RNA POLYMERASE (RDR). RDRs convert single-stranded RNAs into double-stranded RNAs, which can be processed by DCL proteins. Small RNAs have been the subject of intense research since their discovery and much has been discovered about their modes of biogenesis and functions. However, some questions remain unanswered and some issues have arisen: Firstly, there are many multi-mapping reads, which comprise the majority of land plant small RNAs in a typical land plant small RNA transcriptome. Due to the difficulties in dealing with multi-mapped reads, in a typical small RNA study most small RNA data are ultimately ignored or incorrectly aligned. Secondly, many known types of small RNAs are incompletely or incorrectly annotated. Many MIRNA annotations in particular are known to be erroneous. In my research, I utilize mutants of the RDR genes known to be involved in small RNA biogenesis in order to re-examine known annotations of different types of small RNAs in Arabidopsis thaliana. I also utilize these mutants in order to determine if there are other, previously unknown types of small RNAs encoded in the A. thaliana genome. Ultimately, I found 58 erroneous MIRNA annotations and 38 small RNA loci that do not follow any known methods of small RNA biogenesis. siRNAs can be further subdivided into several different groups, including phased siRNAs (phasiRNAs) and heterochromatic siRNAs (hc-siRNAs). Predominantly 24 nt in length, hc-siRNAs are produced from the biochemical actions of DCL3, DNA DEPENDENT RNA POLYMERASE (Pol) IV, and RDR2. phasiRNAs are produced from the biochemical actions of Pol II, RDR6, and DCL4. In every land plant sequenced to date, 21-22 nt dominated phasiRNAs are well represented, but 24 nt-dominated phasiRNAs have only been identified in certain monocots thus far, especially Asparagus officinalis, Hemerocallis lilioasphodelus, Lilium maculatum, and the Poaceae family. I was interested in elucidating whether or not the A. thaliana genome encoded these 24 nt-dominated phasiRNAs. I therefore carefully examined A. thaliana 24 nt-dominated loci that consistently passed PHAS-locus detecting algorithms using multiple methods and found that they are likely just heterochromatic siRNAs (hc-siRNAs). Since 24 nt-dominated siRNA loci are very numerous in angiosperms, they serve as potential source of false-positives during searches for phasiRNA-generating loci. Overall, this analysis shows that using existing phasing score algorithms to detect novel PHAS loci can lead to false positives.