an extension to the parsimonious topic modeling

Open Access
Author:
Chen, Yezhou
Graduate Program:
Electrical Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
October 16, 2015
Committee Members:
  • David Jonathan Miller, Thesis Advisor
Keywords:
  • Bayesian Information Criterion(BIC)
  • Model selection
  • Parsimonious topic models
  • Generalized EM algorithm
Abstract:
In this thesis we develop a new model for estimating topics based on parsimonious topic model and Latent Dirichlet Allocation. In parsimonious models, each word has a topic shared occurring probability or a topic specific occurring probability for each topic and this is controlled by a switch. In our model, we use one more switch set to identify the mentioned switch subset(all switches for one word in all topics) by one of three cases: the word has a topic shared occurring probability for all topics, the word has a topic specific occurring probability for all topics, the word has a topic shared occurring probability for some topics and a topic specific occurring probability for some topics. We use a generalized Expectation-Maximization algorithm as a learning algorithm to optimize the parameters and minimize the objective function. Numerical results are presented to examine the performance of such a model.