polito.it
Politecnico di Torino (logo)

Efficient State Space Models for Edge-Based Spoken Language Understanding

Seyed Emadodin Mousavi

Efficient State Space Models for Edge-Based Spoken Language Understanding.

Rel. Claudio Passerone. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2025

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

This thesis details the technical contributions of a thesis focused on advancing Spoken Language Understanding (SLU) through the application of modern State Space Models (SSMs). The core objective was to investigate, implement, and validate SSM-based architectures (specifically S4 and Mamba) as an efficient and powerful alternative to mainstream models for intent classification on edge devices. This work spans foundational experimentation, deep integration into major open-source toolkits, and the development of a novel end-to-end SLU system. Contributions were made across three key codebases: S4, for initial prototyping and validation on the Fluent Speech Commands (FSC) dataset; ESPnet, for community-facing integration and bug resolution; and SpeechBrain, which houses the final, optimized end-to-end Mamba-based SLU models. The resulting architectures combine State Space Model encoders with multiple acoustic front-ends—including Mel-scale filterbank preprocessing, strided convolution feature extraction, and direct waveform embedding—paired with attention-based sequence decoding and beam-search inference, establishing a robust baseline for future SSM research in speech processing.

Relatori: Claudio Passerone
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 55
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/38750
Modifica (riservato agli operatori) Modifica (riservato agli operatori)