GRAPH AND TRAJECTORY MINING: FRAMEWORKS,ALGORITHMS AND APPLICATIONS
![open_access](/assets/open_access_icon-bc813276d7282c52345af89ac81c71bae160e2ab623e35c5c41385a25c92c3b1.png)
Open Access
- Author:
- Fu, Tao Yang
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 24, 2020
- Committee Members:
- Wang-Chien Lee, Dissertation Advisor/Co-Advisor
Wang-Chien Lee, Committee Chair/Co-Chair
Sencun Zhu, Committee Member
Kamesh Madduri, Committee Member
Jia Li, Outside Member
Zhen Lei, Dissertation Advisor/Co-Advisor
Paul Medvedev, Committee Member
Chitaranjan Das, Program Head/Chair
Zhen Lei, Committee Chair/Co-Chair - Keywords:
- machine learning
representation learning
neural network
network data
trajectory data
spatio-temporal data
node classification
link prediction
travel time estmiation
route planning - Abstract:
- Network and trajectory data analysis are two important fields of data mining and knowledge discovery. While network data is ubiquitous in the real world which involves various applications, e.g., node classification, node clustering and link prediction, trajectory data can be also used for various kinds of prediction tasks, e.g., travel time estimation, destination prediction and trajectory outlier detection. Recent development on deep learning and representation learning techniques have shed a light on extracting complex features from raw data via deep neural network models for applications and alleviating the dependence of feature engineering on human knowledge and labors. In this thesis, we explore various applications and representation learning frameworks in network and trajectory data. First, we study the problem of patent citation recommendation for patent examiners, which is modeled as a link recommendation in a citation network. Our proposal considers three important pieces of information from patents, including content, bibliographic information and applicant citations, which are modeled as a heterogeneous citation-bibliographic network. Then, we propose a two-phase ranking framework, where the first phase selects a candidate subset from the whole U.S. patent data; and the second phase uses supervised learning models to rank prior patents in the candidate subset based on the meta-paths based relationships between a query patent application and a candidate prior patent. Second, we study the problems of feature engineering and representation learning in networks, which aims to extract features of each node in a network. On one hand, We highlight the time dimension and different time lags associated with knowledge diffusion propagation in citation networks. We model and exploit time lags on citation edges in paper and patent citation networks, and propose to model two types of time lags in a citation network: deterministic lags and probabilistic lags. On the other hand, we propose a novel neural network model for representation learning of nodes in heterogeneous information networks (HINs), which encodes the network structural information in an HIN by exploiting various types of relationships among nodes. To achieve this goal, we design a new learning framework which, given an HIN and a set of targeted relationships (i.e., meta-paths), learns latent vectors of both nodes in the HIN by predicting relationships between nodes. Third, we study two fundamental applications in trajectory data, travel time estimation for a path and route planning, by exploring different deep learning techniques. Specifically, for travel time estimation, we propose to represent the movement path of mobile users as generalized images in order to harness the proved power of convolutional neural network model (CNN) to capture the complex moving patterns along paths, including spatial and temporal patterns, for travel time estimation. For route planning, we explore generative adversarial networks (GANs) for route planning. More specifically, we propose to generate the path progressively, starting by generating a ``low-resolution'' path, consisting of a sequence of coarse-partitioned grid cells, which gradually grows into ``higher-resolution'' paths consisting of sequences of finer-partitioned cells, and eventually produce a realistic path on the road network, i.e., consisting of a sequence of intersections. Finally, we propose a framework for trajectory representation learning. Based on the map matching results (i.e., transforming a trajectory as a sequence of road segments), the proposed framework first learns a latent vector for each road segment and then encodes each transformed trajectory as a trajectory embedding, that captures inherent spatial and temporal properties of trajectories and the underlying road network, for use in various trajectory mining applications. We conduct extensive empirical studies to evaluate the performance of our proposed approaches. The experiment results demonstrate the effectiveness of our approaches and their superiority over state-of-the-art approaches in the corresponding problem domains.