Towards Designing Deep Learning Architectures for Improving Semantic Segmentation Performance

Open Access
- Author:
- Nagendra, Savinay
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 07, 2024
- Committee Members:
- Chitaranjan Das, Program Head/Chair
Vijaykrishnan Narayanan, Major Field Member
Chaopeng Shen, Outside Field Member
C Lee Giles, Outside Unit & Field Member
Daniel Kifer, Chair & Dissertation Advisor - Keywords:
- computer vision
deep learning
semantic segmentation
machine learning
artificial intelligence
vision foundation models
segment anything model
pytorch - Abstract:
- Semantic segmentation is a key component in various visual understanding systems, enabling the precise partitioning of images (or video frames) into meaningful segments by assigning a label or category to each pixel. This process plays a pivotal role in applications such as autonomous driving, medical image analysis, remote sensing, video surveillance, robotic perception, image compression, and augmented reality, among others. Recent advancements in deep learning have transformed semantic segmentation, leading to models that significantly outperform classical techniques, delivering superior segmentation results across both standard computer vision and domain-specific benchmarks. The goal of this research is to propose generalized deep learning-based techniques to address key challenges faced by the current semantic segmentation methods: (i) ensuring models can continually adapt to dynamically changing real-world environments (domain shifts) while preserving performance on previously encountered data, (ii) addressing spatial bias, the tendency for models to rely on the spatial position or pixel patterns rather than accurately discerning object features, and (iii) improving the transferability of zero-shot segmentation capabilities of large-scale promptable foundation vision models, for effectively handling a broad range of downstream segmentation tasks. We propose (i) Task-Specific Model Updates, an incremental learning mechanism for semantic segmentation models to handle domain-shifts in data during inference when deployed, (ii) PatchRefineNet, an auxiliary light-weight network that is cascaded with a base segmentation model to correct the model’s spatial bias as a post-processing refinement, and (iii) SAMIC: In-Context Segmentation using Meta’s Segment Anything Model, a few-shot spatial prompt engineering technique to leverage zero-shot segmentation capabilities of SAM for downstream tasks such as semantic, instance, panoptic and few-shot segmentation, saliency and co-saliency detection, video segmentation and text-image generation using stable diffusion. By overcoming the limitations of existing segmentation techniques, we show that the proposed methods enhance the performance and robustness of semantic segmentation across a wide range of domain-specific tasks and computer vision benchmarks.