Efficient deploy of Streamwise‐StyleMelGAN on edge CPUs

Paolo Volpe

Efficient deploy of Streamwise‐StyleMelGAN on edge CPUs.

Rel. Enrico Magli. Politecnico di Torino, Corso di laurea magistrale in Communications And Computer Networks Engineering (Ingegneria Telematica E Delle Comunicazioni), 2022

Abstract

Deep learning has opened new opportunities in speech processing and speech coding. Neural vocoders outperform conventional approaches in terms of perceptual quality of reconstructed speech. However, approaches based on deep learning still suffer from complexity issues, which make the deployment on edge devices challenging. In this thesis we present the analysis and the optimizations performed on Streamwise-StyleMelGAN (SSMGAN), a neural vocoder able to synthesize high-quality wideband speech at 1.6 kbps. First, we present the ML compiler Apache TVM and assess the performance of standard (non-optimized) SSMGAN on ARM CPUs, showing how the baseline model is significantly slower than real-time on edge devices.

Then, quantization techniques are discussed, which are able to significantly reduce the memory footprint of the model