Three novel procedures to control the false discovery rate
Open Access
- Author:
- Philtron, Daisy Lahaina
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 26, 2014
- Committee Members:
- Debashis Ghosh, Dissertation Advisor/Co-Advisor
Qunhua Li, Committee Member
Murali Haran, Committee Member
Ross Cameron Hardison, Committee Member - Keywords:
- Multiple testing
Reproducibility
Voronoi Diagram
Disjunction hypothesis - Abstract:
- The field of multiple testing has seen a resurgence in the last twenty years after the seminal work of Benjamini and Hochberg (1995) that introduced the false discovery rate. With the proliferation of high-throughput data generation and very large-scale simultaneous testing problems in the arena of genetic analysis, the development of procedures to control the false discovery rate has taken on increased importance. In this dissertation we introduce three novel procedures with this specific goal. Each procedure is specifically tailored for a different situation in multiple testing. The first procedure controls the false discovery rate when hypotheses are tested using next-generation sequencing data without experimental replication. In this situation the p-values used are discrete and as a result classical error control procedures are conservative. Existing approaches that are specifically for use with discrete p-values require the complete specification of each p-value's distribution function. When a small number of p-values have complicated distribution functions these approaches can be very slow. Our proposed procedure offers good error control properties, comparable power properties, and a computational advantage over existing procedures. We further propose a procedure developed specifically to test the disjunction hypothesis, wich is appropriate when each gene or location studied is associated with multiple p-values of individual interest. This can occur when more than one aspect of an underlying process is measured. For example, cancer researchers may hope to detect genes that are both differentially expressed on a transcriptomic level and show evidence of copy number aberration. Currently used methods of p-value combination for this setting are overly conservative, resulting in very low power for detection. We introduce a method to test the disjunction hypothesis by using cumulative areas from the Voronoi diagram of two-dimensional vectors of p-values. Our method offers much improved power over existing methods, even in challenging situations, while maintaining appropriate error control. Finally we introduce a non-parametric procedure to control the false discovery rate while identifying reproducibility from the results of replicated high-throughput experiments. Experiments of this type are important because their results can identify sets of genes or binding sites for focused follow-up studies, however the variability from one experiment to another presents a well-known difficulty to researchers. We present a novel procedure to identify genes with consistent signals across replicated experiments. This procedure makes no model assumptions about reproducible genes and is free of tuning parameters. We show that it has good error control and power properties in a variety of different settings, as well as show some theoretical results.