LIKELIHOOD-TUNED DENSITY ESTIMATOR AND ITS APPLICATION TO CLUSTERING
Open Access
- Author:
- Chung, Yeojin
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 25, 2010
- Committee Members:
- Bruce G Lindsay / Jia Li, Dissertation Advisor/Co-Advisor
Bruce G Lindsay, Committee Chair/Co-Chair
Jia Li, Committee Chair/Co-Chair
Bing Li, Committee Member
David Russell Hunter, Committee Member
Jesse Louis Barlow, Committee Member - Keywords:
- Nonparametric Density Estimation
Nonparametric Maximum Likelihood
Clustering
Bandwidth Selection
Nonparametric Mixture - Abstract:
- Nonparametric density estimation is widely used for investigating underlying features of data. We introduce a likelihood enhanced nonparametric density estimator which arises from treating the kernel density estimator as an element of the model that consists of all mixtures of the kernel, continuous or discrete. One can obtain the kernel density estimator with “likelihood-tuning” by using the uniform density as the starting value in an EM algorithm. We prove algorithmic convergence of this EM algorithm to the nonparametric mixture maximum likelihood estimator. The second tuning step leads to a fitted density with higher likelihood than the kernel density estimator. This twice tuned density estimator reduces the bias of the kernel density estimator while the order of variance stays the same. Our simulation study shows that the second-tuned estimator performed robustly against the type of densities, but this feature tended to weaken relative to a competing estimator as the data dimension grew. Along with the type of density estimators, the bandwidth selection problem is very crucial in the nonparametric density estimation, particularly in higher dimensions. We introduce a new bandwidth selection method using the spectral degrees of freedom in- troduced in Lindsay et al. (2008). Investigating the theoretical sDOF and simulation results, we found that the bandwidth need to increase proportionally to the square root of dimension if we are to achieve adequate smoothing in higher dimensions. We also develop a penalized version of the likelihood-tuning procedure that allows the mixture model to adapt to local shape and scale features. This model gives the kernel density estimator with the t-kernel with the first tuning. The second penalized- tuning leads to a density estimator with local shape adaptation in the t-kernel function. We compare the performance of the new density estimators with unpenalized likelihood tuned density estimators.