Exploiting user-generated data for knowledge discovery and recommendation

Liu, Haibin

Exploiting user-generated data for knowledge discovery and recommendation

Open Access

Author:: Liu, Haibin
Graduate Program:: Information Sciences and Technology
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: June 05, 2014
Committee Members:: Dongwon Lee, Dissertation Advisor/Co-Advisor
John Yen, Committee Member
Guoray Cai, Committee Member
Fuqing Zhang, Committee Member
Keywords:: data mining
machine learning
recommender system
social media
user-generated data
Abstract:: The Internet and Web 2.0 have achieved a rapid growth and became ubiquitous in recent years. The advance in information technologies has also enabled users to generate data, implicitly or explicitly, on an unprecedented scale. Consequently,the need to discover and exploit new and useful knowledge from such data has also increased considerably. In this thesis, in this regard, we investigate user-generated data to discover interesting knowledge and enable better recommendation services. First, we tackle the problem of the location type classification using individual Twitter messages. We extend probabilistic text classification models to incorporate temporal features and user history information as probabilistic priors, and show that the proposed models can boost the classification accuracy effectively. Second, we study the problem of quantifying the notion of political legitimacy using collective Twitter messages for specific populaces. We design a framework that aggregates a large number of tweets into the final legitimacy score of a populace by leveraging probabilistic topic modeling and sentiment analysis technique. Our empirical evaluation on eight sample countries using related public tweets demonstrates that our proposed framework shows a strong correlation to results reported in political science literature. We also apply this framework to a traditional news media data set, and compare the results with Twitter data. Several interesting differences are discovered between these two medias for this quantification task of political legitimacy. Third, we study the problem of mining implicit user feedback in recommendation systems. In particular, we tackle the cold-start problem of video recommendation using users' co-view information. We propose a classification framework to incorporate co-view information based on previously seen video pairs, and learn the weights of video attributes for ranking candidate videos to recommend, yielding encouraging recommendation results. Finally, as a way to exploit social network for recommendation, we study the problem of recommending the best team for a given set of roles or skillset considering both individual and team characteristics. To quantitatively capture the team level features, we take various social networks among people into consideration from project history and many other online activities. Moreover, we learn the feature weights from the training dataset based on the correlation between features and project outcomes, and apply a combinatorial optimization algorithm to search the approximate best team. We validate our approach experimentally in a real business scenario and also compare our approach with other state-of-the-art methods using public DBLP dataset. The results demonstrate the effectiveness of our approach.

Tools