MIXTURE INFERENCE AT THE EDGE OF IDENTIFIABILITY

Open Access
Author:
KIM, DAEYOUNG
Graduate Program:
Statistics
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
June 16, 2008
Committee Members:
  • Bruce G Lindsay, Committee Chair
  • Thomas P Hettmansperger, Committee Member
  • Bing Li, Committee Member
  • Jia Li, Committee Member
  • James Z Wang, Committee Member
Keywords:
  • Local identifiability
  • Asymptotic identifiability
  • Nonidentifiability
  • Finite mixture models
  • Estimation of parameters
  • Labelling confidence
Abstract:
Parameter identifiability is very useful if one wishes to make inferences in a statistical model. There are two important nonidentifiabilities in finite mixture models : boundary nonidentifiability and label nonidentifiability. Although parameters are not identifiable in the strict sense, in this thesis we show that there is a form of asymptotic identifiability which can provide reasonable answers when components densities are well separated, relative to the sample size. Asymptotic identifiability is related to local identifiability. There are very few research studies that address the role of asymptotic identifiability and the two key nonidentifiabilities on inference for the mixture model, especially when the sample size is not large. In this thesis, we examine the concept of local identifiability and investigate estimation, labelling of parameter estimators and testing for the number of components when the identifiability of the finite mixture model is weak, relative to the sample size. We then propose new methods which can solve several drawbacks of existing methods. For estimation of parameters we propose using the quadratic inference function method (Qu(1998), Park(2000), and Lindsay and Qu(2003)) as an alternative to maximum likelihood estimation in finite mixture models. For labelling of parameter estimators, we develop two methods, a likelihood-based labelling confidence assessment using a new simulation based visualization and a labelling method for parametric bootstrap analyses that is based on clustering the bootstrap estimates using a permutation mixture model. Note that the simulation based visualization method is a generally useful tool for picturing any inference function and its confidence sets in a frequentist framework. For testing for the number of components we propose two new tests, an eigenvalue ratio test based on the information matrix and a quadratic inference function test, and investigate their potential.