Statistical Modeling of Image Semantics from Imperfectly Labeled Data Sets

Open Access
Sawant, Neela Kamlakar
Graduate Program:
Information Sciences and Technology
Doctor of Philosophy
Document Type:
Date of Defense:
April 24, 2013
Committee Members:
  • James Z Wang, Dissertation Advisor
  • Jia Li, Dissertation Advisor
  • John Millar Carroll, Committee Member
  • C Lee Giles, Committee Member
  • Reka Z Albert, Committee Member
  • statistical modeling
  • image annotation
  • instance-weighting
  • mixture models
  • transfer learning
  • domain adaptation
Computer vision is an integral aspect of cognitive computing with diverse applications in medical diagnostics and health-care, communication, transportation, entertainment, and data management. It enables actionable semantic-sensitive inferences by autonomously processing visual inputs in the form of images and videos in related problems of object recognition, object detection, scene analysis, image annotation, and automatic image tagging. Supervised learning is a common paradigm of semantic inference, i.e., learning the association between words and visual features, through a pre-selected labeled training data set. Collecting the labeled training data is a laborious process that is arguably the biggest bottleneck in scalable machine learning. To generalize the scope, it is imperative to devise self-learning machines to substitute human input in seeking quality training data, inferring the linguistic and semantic relationships between different concepts, and extending the scope to real-world applications. This dissertation offers new algorithms to extend the scope of image annotation with minimal manual supervision. The first major contribution is a mixture modeling technique to cluster noisy data. Using this technique, I developed the ARTEMIS system of training data selection from weakly-labeled images on social media sharing Websites. An ARTEMIS-trained annotation system has a similar performance to that trained using manually collected data. Secondly, I consider the issue of generating personalized annotations, essentially learning user-specific mappings from visual features to their semantic labels. A light-weight personalization model is developed using a transfer learning framework and the analysis of local social networks. Finally, I present a domain adaptation technique where training data from a labeled domain is leveraged to annotate images from an unlabeled target domain. This technique is used to recognize emotions from paintings using the recorded emotional response of human subjects on natural photographs. The research presented in this dissertation advances the technology to make sense of image semantics in challenging real-world scenarios by leveraging user-contributed data on photo sharing Websites.