Evaluating the Impact of AI-Generated Data on Training Keyword Spotting Models

Gabriele Cirotto

Evaluating the Impact of AI-Generated Data on Training Keyword Spotting Models.

Rel. Andrea Calimera, Valentino Peluso. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Abstract

Keyword Spotting (KWS) systems have become common in everyday applications, used in virtual assistants (i.e. Amazon’s Alexa) and voice-controlled devices. These systems are typically part of complex architectures designed to simplify daily tasks and they work by continuously monitoring audio input for specific "wake words", such as "Hey Siri" or "Alexa", triggering an action or response when those words are detected through recognition models. A key challenge in designing KWS systems is the data collection process, which is often resource-consuming, especially when employing deep learning models since they require many high-quality recordings for effective training. The thesis examined the impact of blending well-known KWS datasets with synthetic samples generated by modern Text-To-Speech (TTS) systems.

The objective was to determine whether integrating synthetic data could reduce the resources required for dataset construction while maintaining model performance