Training-Robust Design of Deep Neural Networks for Imaging and Vision

Restricted (Penn State Only)
- Author:
- Yazdani, Amirsaeed
- Graduate Program:
- Electrical Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 01, 2023
- Committee Members:
- Madhavan Swaminathan, Program Head/Chair
Robert Collins, Major Field Member
William Higgins, Major Field Member
Houtan Jebelli, Outside Unit & Field Member
Vishal Monga, Chair & Dissertation Advisor - Keywords:
- Computer Vision
Image Processing
Artificial Intelligence
Deep Learning
Neural Networks
Photoacoustic Target Localization
Image Relighting
Active Learning
Semantic Segmentation - Abstract:
- Recent developments in deep learning models, along with advancements in computational resources like graphics processing units (GPUs), which make these models possible to implement and apply to a large variety of problems, have triggered a new era in research in different fields from engineering to medical science. While neural network (NN) models are applicable to diverse problems such as image processing, natural language processing, robotic control, etc., they suffer from performance degradation when there is a lack of sufficient training data due to their black box design. To help enhance the generalization ability of NN's, it is common to train large models with many layers and modules on huge datasets (e.g., ImageNet). This way, the model would have a broader view of the data distribution and the variance would increase while the bias is kept low. Although training a very deep NN model on huge amounts of data might increase the generalization ability of the models, it's not plausible all the time to acquire large data as the data might be scarce or too costly to annotate. This is specifically the case for real-world problem domains such as medical imaging, remote-sensed data, biological/genetic data, etc. Furthermore, the complexity of the objective problem often causes the model to fail at showing a meaningful response to certain samples despite a sufficiently large training set. There are two approaches to address the aforementioned problems: 1) the NN model should be designed and optimized such that it can capture as much critical information as possible from the data by borrowing knowledge from the physics behind the scenes. Hence, showing the best performance with the optimal amount of data even on complicated tasks, 2) On the other hand, not all the samples in the training dataset have the same impact on the performance of the NN model. Hence, the costs of data annotation can be optimized by carefully selecting the most impactful samples for labeling and being added to the training set (active learning). The first part of this proposal tries to address data challenges via the former approach by focusing on localizing point-shaped targets in photoacoustic (PA) images in the presence of dense noise. This is done by designing a NN model which is equipped with smart, well-designed components to make the most of the input data. The proposed model benefits from an autoencoder structure in which an encoder is shared between two decoders. While the first decoder is in charge of the localization task, the second decoder acts as an auxiliary component to generate denoised PA images out of noisy inputs. The joint optimization of the components would allow for sharing the features between them, hence leading to superior localization accuracy. Since these features are shared between the two decoders, they help with a noise-robust localization. This is shown by experiments conducted on challenging simulated samples, as well as experimentally captured samples with different noise levels and a number of targets. The second part of this proposal aims to develop physically inspired dense fusion neural networks for image relighting. Image relighting is defined as changing a scene's light setting (including the light direction and color temperature) to a different target light setting. Although the state-of-the-art NN relighting models perform well on typical samples, they fail on samples in which removal/addition of dense shadows are needed. We advance the state of the art in image relighting by developing new hybrid approaches that enrich purely data-driven deep networks with physical insight from the problem domain. More specifically, we deploy a fusion strategy, which is based on incorporating two different approaches to the relighting problem: 1) A physics-based approach through which an image is rendered to its albedo and shading components. 2) A black box approach solely based on the representation power of the NN model. The estimate of each approach is then fused using a spatially varying weight map to generate the final relit output. Experiments on challenging benchmark datasets show how the proposed physically inspired fusion strategy outperforms state of the art. In the last part, we follow the second approach (i.e. active learning) to address the data annotation challenges in semantic segmentation. Our proposed method integrates two key prerequisites for active learning: 1) maturity-awareness: carefully devised network associated with novel uncertainty formulation enables us to develop an awareness of the model's maturity on different samples and select ones for which the model shows the lowest maturity. 2) Pyramidal distribution breakdown: To maintain a distribution within our training set as close as possible to the empirical data distribution, we propose to break down the data distribution in different fields of view and hierarchically decrease the field of view to the lowest level, while assessing the model uncertainty within each level. By developing an efficient understanding of the data diversity and model's uncertainty on the unseen samples, we achieve an acceptable performance relative to fully-supervised models while shrinking the training size beyond state of the art.