polito.it
Politecnico di Torino (logo)

Machine learning for big sequence data: Wavelet-compressed Hidden Markov Models

Luca Bello

Machine learning for big sequence data: Wavelet-compressed Hidden Markov Models.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial Share Alike.

Download (2MB) | Preview
Abstract:

Hidden Markov models are among the most important machine learning methods for the statistical analysis of sequential data, but they struggle when applied to big data. Their relative inefficiency has been addressed several times by the use of some compression techniques, either for the computation or for the data. This thesis explores the latter, with the application of a data compression technique based on wavelets and the subsequent adaptation of the main HMMs algorithms from the literature: the forward, Viterbi and Baum-Welch algorithms used to solve the evaluation, decoding and training problem respectively. The testing phase shows that this new technique generally yields equal or better results, obtaining some extremely high speedups in the training problem, making it even thousands of times faster; this enables the training of a HMM with big data on a commodity laptop.

Relatori: Paolo Garza
Anno accademico: 2019/20
Tipo di pubblicazione: Elettronica
Numero di pagine: 60
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: CTH - Chalmers Tekniska Hogskola AB (SVEZIA)
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/15240
Modifica (riservato agli operatori) Modifica (riservato agli operatori)