Multiple Hypothesis Testing of Discrete Data

Open Access
Austin, Stefanie Rose
Graduate Program:
Master of Science
Document Type:
Master Thesis
Date of Defense:
Committee Members:
  • Naomi S Altman, Thesis Advisor
  • statistics
  • hypothesis testing
  • multiple testing
  • false discovery rate
  • FDR
  • discrete
  • benjamini and hochberg
  • null hypotheses
Multiple hypothesis testing remains an area of great interest as data sets containing a high number of variables continue to surface in many fields, including genomics and image analysis. While a lot of work has been done in the case of continuous tests, little has been done concerning discrete data. This thesis addresses the problems of high-dimensional, independent, discrete tests, namely estimating the number of true null hypotheses ('pi0') and the control of the false discovery rate (FDR). The 'pi0' estimators in consideration include those developed by Nettleton, Pounds and Cheng, Storey, and Bancroft and Nettleton, as well as a more-recently developed Regression procedure. We also utilize Tarone's idea of removing tests that have no power prior to estimating 'pi0'. We compared eight methods in total. We generated two types of data, each with two configurations, and simulated data for eleven different values of 'pi0' and two different values of m, the number of total simultaneous tests. These data sets were then used to compare the eight methods for estimating 'pi0'. We found that several estimators proved to be useful for our data, including those developed for continuous test statistics. With discrete data it is possible for a test to have zero power. Filtering out tests with zero power generally improves the estimate of 'pi0'. We then used the same data to compare eleven procedures for controlling FDR. The procedures provide an algorithm or rule for when to reject a hypothesis, while maintaining the FDR at some specified level, usually q=.05. The most common methods are based on the BH algorithm developed in 1995 for continuous tests. Adaptive methods show that these methods can be improved by using an estimate of 'pi0' instead of the total number of hypotheses in the algorithms. Filtering out tests with zero power generally improves the power of the other tests after the FDR adjustment. We conclude by presenting areas for future research, including accounting for the possible dependency of tests and furthering developing methods for discrete data.