Signal Processing Augmentations to Spectrum-Based Modeling for Speaker Recognition

Open Access
Author:
Metzger, Richard Anthony
Graduate Program:
Electrical Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
October 10, 2018
Committee Members:
  • John F Doherty, Dissertation Advisor
  • John F Doherty, Committee Chair
  • David Marion Jenkins Jr., Committee Member
  • Ram Mohan Narayanan, Committee Member
  • Michelle Celine Vigeant, Outside Member
Keywords:
  • Speaker Recognition
  • ApEn
  • EMD
  • Speaker Modeling
Abstract:
When processing real-world recordings of speech, it is highly probable noise will be present at some instance in the signal. Compounding this problem is the situation when the noise occurs in short, impulsive bursts at random intervals. Traditional voice activity detectors (VADs) rely on an energy threshold in the spectrum of the incoming signal to make a decision, and therefore can erroneously flag noise segments as speech. This noise is then propagated through the speaker recognition system, resulting in an increase in the system error rate. Therefore, an approach is needed to remove the noise before the modeling of features takes place while still preserving spectral features that were uncorrupted by the noise. Motivated by principles in both information theory and signal processing, a novel processing algorithm will be explored which mitigates both high and low entropy noise. In this dissertation, the following topics will be investigated: (1) a speech and noise detection algorithm will be constructed from the approximate entropy (ApEn) statistic, (2) the ApEn algorithm will be tested on various noise cases and its resulting model will be compared to models produced by an energy-based voice activity detector (VAD), and (3) improvements will be made to ApEn by adding empirical mode decomposition (EMD) to the processing chain. The results put forth in this dissertation pose a promising technique at noise mitigation, and represent a novel approach to spectrum-based modeling in noisy environments.