Explainable, Informed Deep Learning For Signal And Image Estimation

Open Access
- Author:
- Metwaly, Kareem
- Graduate Program:
- Electrical Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 10, 2022
- Committee Members:
- Abhronil Sengupta, Major Field Member
William Higgins, Major Field Member
Vikash Gayah, Outside Unit & Field Member
Vishal Monga, Chair & Dissertation Advisor
Thomas La Porta, Program Head/Chair - Keywords:
- Deep Learning
Neural Network
Image Processing
Signal Processing
Algorithm Unrolling
Dehazing
Beampattern Design
Classification
Computer Vision
Attribute Prediction
Prior Guided
Informed Learning
Estimation - Abstract:
- Deep Learning (DL) has gained significant attention in this recent decade due to its expressive power in solving many problems in various fields of life. For instance, DL has been extensively leveraged for many analysis and synthesis problems in image and signal processing. An example of image analysis is self-driving vehicles, where images are analyzed to recognize different objects in the scene. However, these images usually suffer from artifacts – such as haziness – that inhibit a good understanding of the scene. Thus, it is crucial to enhance images by synthesizing clearer images as a preprocessing step. Current methods to solve estimation problems can be categorized into either model-based or learning-based methods. The former uses physically-inspired formulations by modeling the problem as an optimization problem which is commonly challenging to solve analytically, and a numerical solution is obtained using iterative algorithms. However, they do not utilize widely available datasets, and mostly some assumptions are made to find a solution. On the other hand, learning-based techniques learn the structure and extract patterns out of the provided dataset, i.e. they learn to find cues by utilizing enormous amounts of data. They are not interpretable due to their black-box nature. Instead, they try to find the best mapping between input-output pairs. In this dissertation, we combine model- and learning-based approaches to combat each of their drawbacks by adopting learning-based algorithms guided by an informed knowledge of the physical model. By doing so, we achieve improved results and obtain interpretable models with an essential understanding of their shortcomings and edge cases. First, we present two DL-based approaches for image analysis that attempt to understand image content. Second, we present another two DL networks for image and signal synthesis, where the objective here is to synthesize signals with some specific criteria. The First part of this dissertation is concerned with analyzing the content of images for either attributes prediction or marine vessel defect detection. Identifying objects and their attributes is vital in autonomous vehicles, where we must understand the type of objects (vehicle, pedestrian, etc.) and their attributes (moving, parked, etc.). We propose ‘GlideNet’, a DL approach that employs global, local, and intrinsic information to be capable of predicting attributes of different types of objects (categories). It uses a novel self-attention scheme leveraging the category of the object as well as its geometric shape to learn where to focus. In addition, GlideNet can work with different datasets with different taxonomies as it only requires changing its last stage. We test GlideNet on two datasets for attribute prediction, VAW, and CAR, and prove its effectiveness. On the other hand, marine vessels defect detection is crucial for the safety and maintenance of vessels. We propose ‘DFE-ET’, another DL framework for defect detection in marine vessels that takes challenging low-resolution images and detects different types of defects in paint (Corrosion, Delamination, and Fouling). It uses a Spatial-Transformer-Network to recognize small defective regions. In addition, it uses two feature extractors. The first extracts general features while the second extract delamination-related features to boost its results. We also use a customized loss function to assure the diversity of the outputs of the two feature extractors. We validate the performance on a dataset provided by PPG industries and we obtain better results than human experts. The Second part of this dissertation presents methods for signal and image synthesis. We focus on dehazing and Radar beampattern design. Many recent dehazing methods design algorithms that either directly estimate the haze-free image or indirectly by estimating the physical parameters of the haze model. Both approaches fail in dealing with non-homogeneous haze where some regions are densely-hazed and others are lightly hazed. We propose ‘AtJwD’, a DL architecture that simultaneously benefits from the aforementioned two approaches and estimates a spatially varying weight map to combine a direct estimation and a physical model-based estimation. In addition, a channel attention structure facilitates the generation of distinct feature maps and a novel dilation inception module utilizes non-local features to compensate for missing information in densely hazed regions. Experiments performed on challenging benchmark datasets to demonstrate that AtJwD can outperform many state-of-the-art alternatives. On the other hand, we present ‘FLED’, a DL approach utilizing algorithm unrolling to design better Radar waveforms. Algorithm unrolling, where each iteration of an iterative algorithm is manifested as a layer in a neural network, has increased the interpretability of DL methods. We start from an iterative algorithm (PDR), reformulate the optimization problem, modify the algorithm, and unroll it to a neural network. Key algorithmic parameters are learned with the help of extensive training data using backpropagation. Considering that the presented problem is non-convex – with multiple minima, we show that FLED can learn better search directions than gradient descent directions used by PDR, which speeds up convergence to the global minimum. In summary, FLED achieves practical performance gains while enjoying interpretability simultaneously.