IMPROVING RELEVANCE RANKING FOR UNDERSPECIFIED QUERIES

Zhuang, Ziming

IMPROVING RELEVANCE RANKING FOR UNDERSPECIFIED QUERIES

Open Access

Author:: Zhuang, Ziming
Graduate Program:: Information Sciences and Technology
Degree:: Master of Science
Document Type:: Master Thesis
Date of Defense:: March 04, 2008
Committee Members:: C Lee Giles, Thesis Advisor/Co-Advisor
Lee Giles And Prasenjit Mitra, Thesis Advisor/Co-Advisor
Prasenjit Mitra, Thesis Advisor/Co-Advisor
Keywords:: search query
relevance ranking
information retrieval
search engine
data mining
Abstract:: Search engines have become an indispensable gateways to the sheer amount of information on the Web. Due to the large number of webpages available on any given topic, search results displayed to the search engine users are usually ranked in descending order of their relevance to the query. Because users typically browse only the first few pages of search results, the quality of relevance ranking is critical to the search experience. In this thesis, we address a challenging issue for relevance ranking in Web search: underspecified queries. To improve the quality of relevance ranking for underspecified queries, we exploit user feedbacks from two different perspectives. In the first part of this thesis, we address two common problems of underspecified queries. The first problem is that the top-ranked results for underspecified queries may not contain information that is truly relevant to the user's search intents, due to the large number of pages that could match the query. The second problem is that new webpages (even though relevant) may not be ranked high for an underspecified query due to their freshness. We propose to investigate what we called the query context, i.e. the distributional information of past queries from the search engine query logs, to refine the relevance ranking of the search results. Empirical evaluation shows that our proposal has improved over the current ranking system of a large-scale commercial Web search engine for 82% of the queries. In the second part of the thesis, we study the modeling of collective expertise. We present a novel collaborative ranking model inspired by the network flow theory, which constructs a network based on search engine logs to describe the relationship among the entities in collaborative search: collaborators, queries, and documents. This formal model permits the theoretical investigation of the nature of collaborative ranking in more concrete terms, and the learning of the dependence relations among these heterogenous entities. We then propose FlowRank, a collaborative ranking algorithm derived from this model through an analysis of empirical usage patterns. We also discuss the implementation and evaluation of FlowRank, and report improvements over two baseline ranking algorithms.

Tools