Machine Learning for Text Mining: Classification, Retrieval and Recommendation

Song, Yang

Machine Learning for Text Mining: Classification, Retrieval and Recommendation

Open Access

Author:: Song, Yang
Graduate Program:: Computer Science and Engineering
Degree:: Doctor of Philosophy
Document Type:: Dissertation
Date of Defense:: October 17, 2008
Committee Members:: C Lee Giles, Dissertation Advisor/Co-Advisor
C Lee Giles, Committee Chair/Co-Chair
Wang Chien Lee, Committee Member
Jia Li, Committee Member
Jesse Louis Barlow, Committee Member
Bing Li, Committee Member
Keywords:: classification
machine learning
text classification
information retrieval
clustering
recommendation
Abstract:: We all witnessed the information explosion of the World Wide Web which has brought us with continuously rapid growth of information and data. However, as the amount of data grows day and night, the need for efficient and effective management of information has also increased dramatically. As a result, using intelligent computerized algorithms to discover new and useful information from existing data has become a hot-pursuit in recent research of computer and information science. This thesis addresses the issues of discovering useful information from textual content of the data, as well as efficient management and organization of the data. These research issues are usually referred to as the task of text mining, which is a branch of the broad area of information retrieval research that contains many interesting and challenging problems and applications. In this thesis, we mainly focus on four issues of text mining: text classification (Chapter 2 & 3), text retrieval (Chapter 4), text recommendation (Chapter 5) and topic discovery (Chapter 6). Specifically, Chapter 2 proposes dimension reduction and collaborative filtering techniques to improve the scalability of text classification; Chapter 3 further addresses the performance issue of text classification by introducing a new nearest neighbor classification method; Chapter 4 deals with retrieving correct name entities from the web and textual documents where the names are ambiguous; Chapter 5 deals with text recommendation for scientific documents and webpages; Chapter 6 aims at discovering dynamic topic trends and correlations in scientific documents; Chapter 7 concludes this thesis. We will also try to answer some difficult research questions based on our study.

Tools