False Discovery Rates when the Statistics are Discrete

Open Access
- Author:
- Dialsingh, Isaac
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- December 01, 2011
- Committee Members:
- Naomi S Altman, Dissertation Advisor/Co-Advisor
Naomi S Altman, Committee Chair/Co-Chair
Debashis Ghosh, Committee Member
James Landis Rosenberger, Committee Member
Bruce G Lindsay, Committee Member
Claude Walker Depamphilis, Committee Member
Runzi Li, Special Member - Keywords:
- false discovery rate
discrete test statistics - Abstract:
- While a lot of work has gone into the study of Family Wise Error Rate (FWER) and False Discovery Rate (FDR) when the test statistics are continuous, not much work has gone into the study of FDR control arising from discrete test statistics. This thesis addresses three problems with regards to high dimensional discrete tests. The first problem that we will address is the estimation of the proportion of null hypotheses when the test statistics are discrete. When the test statistics are continuous, the distribution of the p-values under the null distribution is Uniform(0,1). This has made it relatively easy to estimate the proportion of true nulls ($pi_{0}$). When the test statistics are continuous and independent, the Benjamini and Hochberg's (1995) FDR cite{BH1995} controlling procedure which utilizes a p-value algorithm controls FDR exactly at the rate of $pi_{0}q_{FDR}$ where $q_{FDR}$ is the FDR error rate. Estimation of $pi_0$ is critical since it is an input in other forms of FDR such as the local FDR ( extit{l}FDR). When the test statistics are discrete, the distribution of the p-values from these tests is quite extit{erratic} and results in difficulty in estimating $pi_{0}$. Secondly, the FDR controlling procedures developed in the past have focused on continuous test statistics. Categorical data from high throughput experiments are becoming more common. Gilbert in 2005 cite{Gilbert2005} developed a method for FDR control for discrete test statistics. In this thesis we introduce an adaptive version of Gilbert's method and discuss its advantages. \ Finally, we address the problem of finding the local FDR ( extit{l}FDR) when the test statistics are discrete. The local FDR of the $i^{th}$ hypothesis is defined as: egin{equation} lFDR_{i} = frac{pi_{0}f_{0i}}{pi_{0}f_{0i} + pi_{1}f_{1i}} end{equation} where $f_{0i}$ and $f_{1i}$ are the densities under the null and alternative respectively for the test statistic of the $i^{th}$ hypothesis. In addition, $pi_{1} = 1 - pi_{0}$ is the proportion of truly non-null hypotheses. Efron in 2005 cite{Efron2005} pioneered work in the case where the test statistics are continuous. However, one of the drawbacks of his method is that it relies on a density estimate for $f =pi_{0}f_{0} + pi_{1}f_{1}$. This method works well only in the case when $pi_{0}$ is large. We propose rewriting equation for the extit{l}FDR as egin{equation} mathit{l}FDR_{i} = frac{1}{ 1 + frac{pi_{1}}{pi_{0}}frac{f_{1i}}{f_{0i}}} end{equation} and using the likelihood ratio estimator $widehat{frac{f_{1i}}{f_{0i}}}$ to estimate $frac{f_{1i}}{f_{0i}}$ . We first show that this estimator gives reasonable results for the continuous case. Thereafter, we show how this method can be extended to discrete data.