Information Retrieval From Images and Videos in Mobile Networks

Open Access
- Author:
- Felemban, Noor
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- March 17, 2022
- Committee Members:
- Thomas La Porta, Chair & Dissertation Advisor
Soundar Kumara, Outside Unit & Field Member
Ting He, Major Field Member
C Lee Giles, Major Field Member
Vishal Monga, Major Field Member
Chitaranjan Das, Program Head/Chair - Keywords:
- mobile networks
deep learning
neural networks
image retrieval
video retrieval
cloud computing
distributed systems - Abstract:
- The widespread usage of mobile devices has led to the generation and storage of a large amount of videos and images. These images and videos are a rich source of information, and searching them for specific occurrences of actions and objects aids in various applications. Currently mobile devices have limited computational power to process some simple machine learning (ML) algorithms. Cellular and WiFi networking has enabled resource-constrained mobile devices to access powerful servers aided with graphical processing units (GPUs), and permitted offloading videos and images to be processed on the cloud. Various queries can be issued in mobile networks, and responding to such queries requires the consideration of the limited network bandwidth, and the computational and energy constraints on the devices. This thesis explores and investigates different techniques and mechanisms to address various queries in mobile networks. First, we address the problem of collecting images, stored on mobile devices, containing a queried object in a mobile network. The goal is to minimize the query response time given an energy constraint. In order to achieve this goal we divide the process into a pipeline with different stages. Each stage makes offloading decisions based on the network conditions, the cloud backlog and the hit-rate. A filtering stage is designed to run on the mobile device that uses a smaller less accurate but faster CNN to filter out images not containing the object of interest. Following that images that are positively classified in the filtering stage move to the selection stage where the more accurate CNN is used to classify the image and search for the queried object. At any stage, if the mobile device makes the decision to offload, the image is transmitted to the video cloud to be processed there. At the end, all the images that are positively classified should be uploaded to the server, where the requester can have access to them. Next, we investigate a different problem where we have a mobile network with limited bandwidth and a requester broadcasts a query to a set of mobile devices where the requester is searching for images containing novel objects that the CNNs loaded in the network are not trained to classify. By utilizing the previously loaded and pre-trained models and using the extracted features from the CNNs along with a distance measure, we are able to respond to these queries in a manner that saves on both network bandwidth and mobile device energy. Finally, we explore a more complex media and study video retrieval in mobile networks. The goal is to identify video clips containing a queried action in a fast manner. In order to achieve this goal, various audio (sound and speech) and visual (frames and video clips) data are used. The idea of dividing the processing task into stages is also used in this work. We design a pipeline that utilizes various machine learning techniques (audio classification, speech-to-text, object classification and action recognition) in each stage in order to quickly discard false positives from earlier stages. By utilizing various algorithms and machine learning models and properly distributing the processing among the mobile device and the server GPUs we are able to minimize the query response time.