polito.it
Politecnico di Torino (logo)

Advanced Hardware Solutions for Neural Networks Inference in Autonomous Vehicles

Pierre Mecca

Advanced Hardware Solutions for Neural Networks Inference in Autonomous Vehicles.

Rel. Guido Masera, Maurizio Martina. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2018

Abstract:

Perception of the environment is a fundamental and challenging step in autonomous driving systems. Nowadays, the increased accuracy of deep learning algorithms is taking the lead on traditional computer vision methods for processing sensors data, such as camera images. However, deep neural networks models are characterized by an enormous amount of computations and a large memory occupancy, and their execution is still mainly performed on GP-GPUs. This thesis, developed in partnership with Magneti Marelli, discusses possible hardware solutions and optimization techniques to achieve fast and energy efficient neural networks inference in automotive embedded systems. First of all, a general overview on neural networks is given, with particular focus on Convolutional Neural Networks (CNNs), their layer organization, the Caffe framework, the most popular datasets and state-of-art topologies for autonomous driving tasks, such as image classification, object detection and semantic segmentation. Different hardware architectures for neural networks inference, like CPUs, GPUs, FPGAs, ASICs, many-core and neuromorphic, are analyzed and compared in term of performances and power efficiency. In addition, techniques to compress neural networks models are explored, with particular attention to pruning, knowledge distillation and quantization. Channel pruning method is used to reduce the model's size by removing less relevant channels from certain layers, at cost of reduced accuracy. Knowledge distillation allows to train small but efficient networks mimicking the behavior of large but more accurate models, reaching an accuracy level higher than training the small model from scratch. A quantization technique was also explored, named Incremental Network Quantization (INQ), to turn floating point network models into a low-precision power-of-2s format, taking advantage of reduced precision to decrease memory occupancy and fasten inference time, with a very limited accuracy loss on image classification models. INQ Caffe implementation was then improved for object detection application, and tests performed on SSD network resulted in a 5-bit quantized model with only 0.18% reduction in mAP. Inspired by power-of-2s quantization, the last chapter of the thesis evaluates the impact of substituting multiplications with shift-add operations in a convolutional layer of a FPGA-implemented neural network.

Relators: Guido Masera, Maurizio Martina
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 90
Additional Information: Tesi secretata. Fulltext non presente
Subjects:
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: New organization > Master science > LM-29 - ELECTRONIC ENGINEERING
Aziende collaboratrici: Magneti Marelli spa
URI: http://webthesis.biblio.polito.it/id/eprint/9823
Modify record (reserved for operators) Modify record (reserved for operators)