Two Topics: A Jackknife Maximum Likelihood Approach to Statistical Model Selection and a Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications

Open Access
- Author:
- Lee, Hyunsook
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 16, 2006
- Committee Members:
- G Jogesh Babu, Committee Chair/Co-Chair
Thomas P Hettmansperger, Committee Member
Bing Li, Committee Member
William Kenneth Jenkins, Committee Member
James Landis Rosenberger, Committee Member - Keywords:
- Kullback-Leibler distance
information criterion
statistical model selection
jackknife
jackknife information criterion
bias reduction
maximum likelihood
unbiased estimation
non-nested model
computational geometry
convex hull
convex hull peeling
statistical data depth
nonparametric statistics
generalized quantile process
descriptive statistics
skewness
kurtosis
volume functional
convex hull level set
massive data
outlier detection
balloon plot
multivariate analysis - Abstract:
- This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the Kullback-Leibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum Kullback-Liebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum Kullback-Leibler distance through bias reduction. This bias, which is inevitable in model selection problems, arise from estimating the distance between an unknown true model and an estimated model. We show that (a) the jackknife maximum likelihood estimator is consistent to the parameter of interest, (b) the jackknife estimate of the log likelihood is asymptotically unbiased, and (c) the stochastic order of the jackknife log likelihood estimate is $O(log log n).$ Because of these properties, the jackknife information criterion is applicable to problems of choosing a model from non nested candidates especially when the true model is unknown. Compared to popular information criteria which are only applicable to nested models, the jackknife information criterion is more robust in terms of filtering various types of candidate models to choose the best approximating model. However, this robust method has a demerit that the jackknife criterion is unable to discriminate nested models. Next, we explore the convex hull peeling process to develop empirical tools for statistical inferences on multivariate massive data. Convex hull and its peeling process has intuitive appeals for robust location estimation. We define the convex hull peeling depth, which enables to order multivariate data. This ordering process based on data depth provides ways to obtain multivariate quantiles including median. Based on the generalized quantile process, we define a convex hull peeling central region, a convex hull level set, and a volume functional, which lead us to invent one dimensional mappings, describing shapes of multivariate distributions along data depth. We define empirical skewness and kurtosis measures based on the convex hull peeling process. In addition to these empirical descriptive statistics, we find a few methodologies to find multivariate outliers in massive data sets. Those outlier detection algorithms are (1) estimating multivariate quantiles up to the level $alpha$, (2) detecting changes in a measure sequence of convex hull level sets, and (3) constructing a balloon to exclude outliers. The convex hull peeling depth is a robust estimator so that the existence of outliers do not affect properties of inner convex hull level sets. Overall, we show all these good characteristics of the convex hull peeling process through bivariate synthetic data sets to illustrate the procedures. We prove these empirical procedures are applicable to real massive data set by employing Quasars and galaxies from Sloan Digital Sky Survey. Interesting scientific results from the convex hull peeling multivariate data analysis are also provided.