Contrastive Visual Learning from Spatiotemporal Information
Restricted (Penn State Only)
- Author:
- Zhu, Lizhen
- Graduate Program:
- Informatics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- March 14, 2022
- Committee Members:
- James Z Wang, Thesis Advisor/Co-Advisor
Sharon Xiaolei Huang, Committee Member
Mary Beth Rosson, Program Head/Chair
Bradley Paul Wyble, Thesis Advisor/Co-Advisor - Keywords:
- Contrastive learning
computer vision
perception - Abstract:
- Infants have the ability to efficiently build and learn visual representations without supervision. They receive other perceptions to aid visual processing and perform complex tasks such as navigation. Inspired by this, we propose to introduce temporal and spatial information to the contrastive learning framework to improve the efficiency and performance of the model. First, a data generation tool that is based on a photorealistic 3D simulation platform is provided. A virtual house is filled with furniture and an avatar can be controlled to move along a predesigned trajectory and collect images as well as contextual information. Secondly, in the pretraining stage, instead of using instance discrimination to distinguish between two augmentations of the same image and two different images as traditional contrastive learning does, we introduce spatial and temporal information into the training. Contextual information is used to evaluate the similarities of images. According to the type of contextual information used and the number of positive examples, we propose and compare One-hot Time MoCo, One-hot Space MoCo, and Multi-label Space MoCo based on the standard MoCo. The dataset generated by the tool is used to train the proposed models. The learned high-level representations are tested by the classification task. Results show that introducing contextual information in the pretraining stage can improve the performance of the model on downstream classification tasks. In addition, avoiding misleading images generated at the same place at different times makes the Space MoCo outperforms the Time MoCo in the downstream tasks. This work explores the factors that influence the learning process and validates the effectiveness of introducing temporal and spatial information in training. It may further promote the interdisciplinary study at the intersection of computer and information science, biology, psychology, and neuroscience.