Route Extraction, Road Name Disambiguation and Efficient Spatial Query Processing under Location Constraints

Open Access
Author:
Zhang, Xiao
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
July 09, 2012
Committee Members:
  • Prasenjit Mitra, Dissertation Advisor
  • Prasenjit Mitra, Committee Chair
  • Daniel Kifer, Committee Member
  • Robert Collins, Committee Member
  • Alan Maceachren, Committee Member
Keywords:
  • human generated route descriptions
  • route information extraction
  • road name disambiguation
  • spatial query processing
Abstract:
Geospatial information has drawn more and more attention from both academia and industry nowadays. As web and mobile technology thrives, publishing, sharing and retrieving of geospatial information have become much easier and more common than before. People generate large amount of text contents containing geospatial information every day, such as travel blogs, reviews of interesting places to visit in a new city, human-generated route directions, etc. Such documents are rich in mentions of geospatial objects and accounts of movements. Meanwhile, the demand for efficient geo-spatial object retrieval under location-based constraints is growing fast. Digital maps, GPS on mobile devices are products under such in- creasing demand. However, even if geospatial information sources are plentiful and available, fulfilling the information need remains difficult because: (1) it remains a hard problem to identify and extract different types of geospatial information from text, and (2) building an accurate mapping between the textual mentions of landmarks, places to their unique latitudes and longitudes and recovering them on maps are very challenging due to the ambiguity in text. The absence of good solutions to the above-mentioned problems makes such geospatial information in text hard to use. In addition, there lacks efficient algorithms for geospatial infor- mation retrieval given various location-based constraints. Such retrieval tasks are even more challenging on mobile devices since they suffer from limited storage, battery, computational power and update issues. A system which automatically extracts, disambiguates and geo-tags geospatial information in text and provides efficient solutions to location-based services will greatly satisfy people’s demand of sharing and retrieving of geospatial information. In this dissertation, we intro- duce our efforts towards this goal. We developed: (1) a solution for automatic route components and destination name extraction from text documents contain- ing human-generated route directions. We explored and designed a variety of machine learning features and utilized sequence labeling machine learning models to identify meaningful route information from text. (2) a noval road name disam- biguation algorithm specifically designed to work well in noisy environment, i.e., in the presence of inaccurate data and incomplete gazetteer. Although toponymn (place name) disambiguation has been studied extensively, existing methods fail to solve our problem. We creatively model the problem to be an exact-all-hop shortest path problem on a special type of graph, namely semi-complete directed k-partite graph, and developed a computationally-efficient algorithm to solve it. and (3) an efficient algorithm for a particular type of geospatial object retrieval task on mobile devices. The mobile device has the ability to access multiple broadcast channels simultaneously. Under this setting, we solved the transitive nearest neigh- bor query efficiently, which means to find two objects of different types so that the total traveling distance from the query point to these two objects is minimum. An optimization algorithm was also developed to optimize the performance. We show these approaches in details and demonstrate their effectiveness in experiments.