The Applications of Similarity Metrics in the Source Separation of Percussive Sounds

Open Access
- Author:
- Grabow, Christopher
- Graduate Program:
- Acoustics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- September 05, 2024
- Committee Members:
- Tyler Patrick Dare, Thesis Advisor/Co-Advisor
Karl Martin Reichard, Committee Member
Julianna Simon, Program Head/Chair
Daniel C. Brown, Committee Member
Mark Andrew Fanton, Committee Member - Keywords:
- Music Source Separation
Dynamic Time Warping
Normalized Cross Correlation
Percussive Sounds
Machine Learning - Abstract:
- In recent years, there has been a growing interest in the development of music source separation (MSS) models for problems related to music information retrieval. The techniques used for source separation can often be applied to other acoustical signals as well. While existing MSS models have been able to achieve high quality separation, they are limited in specificity. These models are designed to separate a piece of music into four categories: vocals, bass, drums, and other instrumentation. The goal of this thesis is to develop a new MSS model that focuses on the separation of transient, percussive sounds. The main technique used in the proposed source separation algorithm is similarity metrics. These are metrics that compare two time series and assign a numerical value based on their level of similarity. The algorithm utilizes two similarity metrics: dynamic time warping (DTW) and normalized cross-correlation (NCC). Each metric is also capable of time-aligning the two time series, which is a necessary component within the model. The two metrics perform very similar tasks on a surface level, but their unique computations provide various advantages and drawbacks for source separation tasks. Once the architecture of the new proposed MSS model is developed, various testing is undergone to train and fine-tune it. Both a “digital” and “live” drum dataset was curated for this testing, forming a database of sounds for the algorithm to draw upon and non-database sounds to test its limits. First, a series of polyphonic drum signals along with the non-database sounds are used to define the correlation limits and noise thresholds for each similarity metric. Then, a collection of 30 drum tracks is tested in the algorithm and compared to ground truth sources. Overall, the new model shows promise for the source separation of drum signals in limited dataset. The NCC metric outperforms DTW because of its ability to extract individual drums in a polyphonic signal. Many improvements can be made to the algorithm, including the addition of onset detection functions, neural networks for speed and accuracy, and larger drum databases.