Mining Human Interaction Data

Open Access
Yip, Kelly
Graduate Program:
Industrial Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
January 08, 2015
Committee Members:
  • David Arthur Nembhard, Dissertation Advisor
  • David Arthur Nembhard, Committee Chair
  • Ling Rothrock, Committee Member
  • Timothy William Simpson, Committee Member
  • Michael J Rovine, Committee Member
  • data mining
  • multivariate time series
  • categorical data
  • association rule
  • sequential pattern
  • interactive data
  • approximated pattern
Generally, multivariate time series data are collected to answer many important questions in developmental psychology research. This project was inspired by a set mother-infant interaction dataset. One hundred fifty pairs of mothers' behaviors and infants' reactivity over time were recorded and coded into a categorical multivariate time series database. Common analytical methodologies used in this area include statistical analyses such as ANOVA, MANOVA, regression model, state space grids, Yule's Q, and time series analyses. However, these methods either fail to address the temporal characteristics in the data or the dynamics of multiple subjects as a group. Our project aims to mine these databases in three approaches in attempt to address these limitations: 1. Association Rule Mining, 2. MWASP (Multiple-Width Approximate Sequential Pattern), and 3. TRRMTSD (Temporal Relation Rules for Multivariate Times Series Databases). Association rule mining was motivated by market basket analysis and has shown to be useful methods in extracting information and uncovering hidden correlations in data. However, the approach is new to the field of developmental psychology. Moreover, conventional association rule mining does not consider the temporal nature that exists in some data. Each data point taken across time is usually treated as an individual record in the data set, where each record in the data set is assumed independent. A goal of association rule mining is to find association between different attributes in the data set among these records. Hence, the correlation of the data with respect to time is collapsed. In this project, we augment the dataset with additional time-related variables in order to address the temporal characteristics in the data with conventional association rule mining methods. Sequential pattern mining involves finding the relationships among occurrences of sequential events, to discover whether there exist any specific ordered items, and/or sequential patterns across different items. The sequential pattern models can be found include frequent patterns, periodic patterns, statistically significant patterns, and approximated patterns. However, most commonly collected data are afflicted with noise. Conventional sequential pattern mining methods that use exact matching may meet difficulties in mining databases with long sequences and noise. Two general approaches used in previous studies to mine sequential patterns in data with noises are distance-based clustering and Hidden Markov model. While these approaches are useful in mining frequent sequential patterns in noisy data, we further propose a framework (MWASP: Multiple-Width Approximate Sequential Pattern) that uncovers frequent approximate sequential patterns with multiple widths. A mined pattern in this framework is a representative of a group of sequences (with various widths) that follow the pattern’s event flow order. This gives insight into the occurrence of the pattern longitudinally as well as across the population. The pattern can be recognized as a common pattern across the multiple time series, time, or both. Multivariate processes arise when several related time series processes are observed simultaneously over time instead of observing just a single series. The two approaches above, association rule mining and sequential rule mining, collapse the temporal nature and the interactions between different processes within a series. In the study of multivariate process, a framework is needed for describing not only the properties of the individual series, but also the possible cross-relationship among the series. For most existing methods, the purposes for analyzing and modeling the series jointly are to understand the dynamic relationships over time among the series and to improve the accuracy of forecasts for individual series by utilizing the additional information available from the related series in forecasts for each series. Motivated by the mother-infant interaction dataset, we are particularly interested in discovering frequent behavior patterns, as well as the interaction and temporal features. A multivariate time series can be comprised of numeric values, symbolic (or categorical) values, or mixed data type. For numeric multivariate time series, such tasks are often studied using methods such as Hidden Markov Model (HMM), ARMA, and regression. For symbolic (or categorical) multivariate time series, methodologies in sequential pattern mining can be extended to address the tasks. Such extension is highly driven by the data model and users’ specifications. In this project, we extend the MWASP framework and develop a novice approach, TRR-MTSD (Temporal Relation Rules for Multivariate Times Series Databases) to tackle this kind of databases. The purpose of TRR-MTSD is to locate common cross-relationships between identified sequential patterns in a multivariate categorical time series database.