polito.it
Politecnico di Torino (logo)

Energy-Efficient Quality Adaptation for Recurrent Neural Networks

Francesco Panini

Energy-Efficient Quality Adaptation for Recurrent Neural Networks.

Rel. Massimo Poncino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Document access: Anyone
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview
Abstract:

Recurrent Neural Networks - RNNs are state-of-the-art models able to deliver very high accuracy in sequence modeling and machine translation tasks. In particu- lar the Encoder-Decoder architecture excels in sequence-to-sequence tasks in which input and output sequences may not have the same length. These networks work in two stages, at first the input sequence is encoded in a fixed length representa- tion, which is then decoded in order to produce a new target sequence. Due to the abundance of the network parameters, performing inference using these models requires a high computing power and results in large energy consumption, typically unsustainable for an embedded device. While executing the inference on edge nodes is beneficial in terms of latency and responsiveness of the system, generally such nodes do not have the hardware resources needed to sustain the heavy computa- tions involved. To this end, this work proposes an algorithm to improve the energy efficiency of Encoder-Decoder RNNs. In particular a novel dynamic Beam Search algorithm is introduced, in which the Beam Width - BW is varied according to the evolution of a translation. This method is able to dynamically adapt the Beam Width parameter, i.e. one of the parameter that mostly dominates the inference complexity, according to the currently processed input and the corresponding network’s confidence. Results on two different machine translation models underline that the proposed methodology is able to reduce the average BW by up to 33%, thus significantly reducing the inference execution time and energy consumption, while maintaining the same translation performance.

Relators: Massimo Poncino
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 88
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/8990
Modify record (reserved for operators) Modify record (reserved for operators)