Artificial Intelligence and Human Intelligence to Derive Meaningful Information on Geographic Movement Described in Text
Open Access
- Author:
- Pezanowski, Scott
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- January 15, 2021
- Committee Members:
- Prasenjit Mitra, Dissertation Advisor/Co-Advisor
Prasenjit Mitra, Committee Chair/Co-Chair
C Lee Giles, Committee Member
Nick Giacobe, Committee Member
Alan M Mac Eachren, Outside Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- geographic movement
geovisual analytics
machine learning
natural language processing
big data analytics
geographic movement
geovisual analytics
machine learning
natural language processing
big data analytics
artificial intelligence
deep learning
visualization
GIScience
Information Retrieval - Abstract:
- There has been much successful research to analyze the movement of people, wildlife, goods, and more, where the data is precise movement trajectories. This research has mostly ignored geographic movement that comes in the form of text descriptions. Descriptions of things moving have an advantage over trajectory data. It often includes rich contextual information that describes what is moving, when it moves, why it moves, and how it moves. Challenges exist to utilizing this source, like ambiguities in the text's meaning and the author and reader's knowledge and background. Still, computational advances like Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning are well suited to uncover essential features in the movement descriptions. This research makes several contributions to improve the understanding of geographic movement described in text documents. It also guides further research to improved methods for utilizing this underused resource. First, I created a corpus of sentences labeled as describing geographic movement or not. Creating this corpus proved difficult without any comparable corpora to start with, high human labeling costs, and text ambiguities. To overcome these challenges, I developed an iterative process employing hand labeling, crowd voting for agreement, and machine learning to predict more labels. By merging advances in word embeddings with traditional machine learning models, and model ensembling, prediction accuracy is acceptable to produce a larger silver-standard corpus where a small amount of error in predictions is accepted. In addition to the detection of movement, my corpus will likely benefit computational processing of geography in text and spatial cognition. The corpus contains many geographic place mentions along with contextual information that computational processing can learn important linguistics features that are associated with the place mentions. Spatial cognition research can analyze differences in geographic understanding of the text both from the author's side and any reader's side. My process also provides a baseline method to detect statements that describe geographic movement. Second, I show how interpreting geographic movement described in text documents is challenging because of general spatial terms, linguistics that make the thing(s) movement unclear, and many temporal references and groupings. To overcome these challenges, I identified multiple essential characteristics of the movement described that humans use to differentiate descriptions. I also explore current computational text processing techniques to analyze the movement characteristics described in the text and show how these characteristics help people understand patterns in larger bodies of text describing movement. My findings contribute to an improved understanding of the critical characteristics of geographic movement in text descriptions. The third contribution in my research is an initial effort to derive meaningful information from geographic movement descriptions at a large and general scale. Geographic Information Retrieval (GIR) is a sub-domain of both IR and GIScience that has emphasized retrieval of documents that mention or are about places along with some focus on geographic feature extraction. GIR advances have created an as yet primarily unrealized potential to leverage text documents as sources for geographic-scale movement information. The geographic movement described in text documents can complement detailed movement data, provide an alternative when precise data does not exist, and provide the added benefit of rich context about the movement. As an initial large-scale effort to derive meaningful information from textual data describing geographic movement, we applied multiple computational techniques to hundreds of millions of statements. First, we identify and geolocate the geographic places mentioned. Next, we predict those that describe geographic movement. Finally, because the COVID-19 pandemic highlighted the importance of global movement disruptions, we predict if the statement describes not moving or restricted movement. Since the data is messy and complicated and the prediction techniques are not perfect, we designed and implemented a geovisual analytics system through which a visual interface enables humans to explore initial statement classifications, the places mentioned in them, and co-occurring place mentions to assess the validity of computational methods and provide direct feedback toward improving results. We include two user scenarios that show how a human can derive meaningful information about geographic movement through the geovisual analytics system. The user scenarios constitute systematic case studies to demonstrate the utility of the approach. Existing geographic movement research has improved analysis methods and shown how these methods enhance understanding of human movement, wildlife movement, and much more. This research primarily uses precise movement data acquired through sensors like GPS but ignores such data in text documents. The geographic movement described in text documents can complement detailed movement data, provide an alternative when precise data does not exist, and provide the added benefit of rich context about the movement. My research has shown why this information is challenging to use, how people differentiate the movement described, and how computational methods can utilize the movement's differences to improve sense-making with this underused resource.