Giacomo Zema
Assessing Feasibility and Performance of Real-Time Semantic Segmentation in an Industrial IoT use case.
Rel. Andrea Calimera. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (19MB) | Preview |
Abstract: |
Semantic Segmentation is a computer vision task that consists in assigning a label to every pixel in an input image. There are many applications for Semantic Segmentation such as scene understanding in autonomous driving or robot vision, land cover classification of satellite images, segmentation of medical images, etc. Semantic Segmentation models are often very complex and require powerful hardware. This clashes with their usage in a resource-constrained environment such as edge devices in an IoT network. For this reason, the standard approach when deploying these models in an IoT application is to offload both training and inference to a remote server. While training on a server equipped with a GPU is a smart choice, offloading the inference phase can be rather inefficient. The issue regards the transfer of data from the edge to the server and back, which is very expensive in terms of energy and poses critical issues in terms of scalability, inference time and data security. Many applications of semantic segmentation on edge devices rely on collecting image data from cameras and processing them right after, therefore the inference phase has real-time requirements that cannot be fulfilled by offloading it to a cloud server. So, for such applications, a clear solution is to perform the inference on the edge devices. There are many cases in the literature of real-time semantic segmentation models, but virtually none of them achieve real-time latency (<33 ms) on less powerful hardware. The reason behind this is that these models are evaluated on really complex datasets, that favor bigger and more elaborate networks. In this thesis, we explore several optimizations that can be applied to a network to improve its inference latency while keeping an acceptable IoU score. The goal is to define an optimization pipeline that can be used to adapt the performance of a model to the requirements (IoU score and latency) of a specific task. We structured this pipeline according to the effort required by each step where with effort we intend both the time required by the operation and the difficulty of its implementation. The proposed optimizations are grouped into two main branches: Input Data and Model Topology. The first one consists in changing the size of the input images in order to speed up the inference, it doesn't need any knowledge of the model and therefore requires the least effort. The second branch demands a deeper understanding of the underlying network to identify important tuning knobs related to its different layers and functional blocks. In the experimental part, we selected a specific industrial IoT use case and a baseline network amongst the several models reviewed in the background chapter. We ran the experiments on a Raspberry Pi 4 and we studied in detail every optimization approach and their combinations, gaining meaningful insights that allowed us to propose an optimization framework that accounts for different levels of effort. |
---|---|
Relators: | Andrea Calimera |
Academic year: | 2021/22 |
Publication type: | Electronic |
Number of Pages: | 102 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Data Science And Engineering |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/23584 |
Modify record (reserved for operators) |