Applications of Attention Networks in Healthcare and Manufacturing

Open Access
- Author:
- Zhou, Chen
- Graduate Program:
- Industrial Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- April 01, 2023
- Committee Members:
- Soundar Kumara, Thesis Advisor/Co-Advisor
Saurabh Basu, Committee Member
Ling Rothrock, Professor in Charge/Director of Graduate Studies - Keywords:
- Deep learning
Transformer
Attention mechanism
Healthcare
Manufacturing - Abstract:
- Recent years have seen attention networks become the dominant architecture for most deep learning tasks, including computer vision and natural language processing. This study aims to enhance the performance of computer vision algorithms in healthcare and manufacturing domains by incorporating attention mechanisms. In the healthcare area, by using lung X-ray images, we investigate the performance (separability) of an attention-based network (Vision Transformer (ViT)) in comparison with conventional convolutional neural networks (CNNs). We find that ViT is a robust tool for diagnosing diseases based solely on lung X-rays. Additionally, the separability and interpretability of ViT can be further improved by introducing self-supervised training as a pre-training strategy for lung masks as extra attention. In the manufacturing domain, we investigate the capability of deep learning on detecting welding quality directly from the images of welded workpieces. We propose a new framework based on attention mechanism and Multiple Instance Learning (MIL) that reveals the distribution of each small piece of the weld images and serves as an effective guide for weld quality discrimination. Results show that our proposed framework is a more effective tool in classifying weld images and even in identifying weld features in each small section of the welding area in work piece images, when compared to state-of-the-art CNNs. Both results from healthcare and manufacturing applications demonstrate that attention-based neural networks are salable vision learners for downstream tasks.