Federico Piovesan
EXPLORING TRANSFORMER MODEL FOR ACOUSTIC SCENE CLASSIFICATION.
Rel. Marcello Chiaberge, Luis Conde Bento, Mónica Jorge Carvalho De Figueiredo. Politecnico di Torino, Master of science program in Mechatronic Engineering, 2024
|
Preview |
PDF (Tesi_di_laurea)
- Thesis
Licence: Creative Commons Attribution Non-commercial No Derivatives. Download (14MB) | Preview |
Abstract
Sounds carry a large amount of information regarding the environment and events that take place in it. Deep learning architectures can be used to automatically extract and interpret these acoustic signals, an ability which is pivotal in numerous applications, such as multimedia retrieval, context-aware devices, robotics, and intelligent monitoring systems. Since its introduction, the Vision Transformer Architecture (ViT) has shown remarkable results in a diverse array of AI tasks, including those related to acoustics. The DCASE competition, with its acoustic challenges, has pushed research in the field, Moreover, over the last two years, DCASE Task1 has served as an effective benchmark for showcasing the performance of ViT models in Acoustic Scene Classification (ASC) problems.
This thesis aims to explore the capabilities of the ViT model in this context, using the TAU Urban Acoustic Scenes Dataset
Publication type
URI
![]() |
Modify record (reserved for operators) |
