Politecnico di Torino (logo)

Leap: a Model-Based Reinforcement Learning Framework for Fast Object Detection

Edoardo Roba

Leap: a Model-Based Reinforcement Learning Framework for Fast Object Detection.

Rel. Andrea Giuseppe Bottino. Politecnico di Torino, Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview

The goal of the project was to create a new algorithm for Object Detection. The starting point of the project was exposed in "Active Object Localization with Deep Reinforcement Learning", where they described an OD algorithm based on Deep Reinforcement Learning and Markov Decision Process. Since every action taken by the Agent, the algorithm compares the current state with the environment, it takes a long time for the computations, because every state is encoded through a CNN, which is a Neural Network with over 58 million parameters (VGG16-like). The purpose of this project was to design a Model-Based algorithm to bypass the CNN. During the Q-training, many transitions (state-next_state-action) are recorded and are used as training dataset for the predictive NN. As a matter of fact, this predictive model is a fully connected layer network, which has 2 hidden layers and 9 million parameters, so it is much lighter than the CNN. The network is trained in such a way that the output is fed to the input, so the network is able to be trained on a sequence of data, which represent the sequence of states. On the other hand, an epsilon greedy policy is adapted, in order to avoid the algorithm gets stuck in a state during the detections. The results are the following: the more predictions the algorithm performs, the faster the algorithm goes (with no predictions, 1.88 seconds per image is the average speed, while we have 0.98 seconds per image if we performs 4 or 5 leaps). However, it is remarkable to say that the more leaps are performed, the less accuracy the algorithm obtains (an average of 4% of accuracy loss every leap the algorithm performs).

Relators: Andrea Giuseppe Bottino
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 56
Corso di laurea: Corso di laurea magistrale in Mechatronic Engineering (Ingegneria Meccatronica)
Classe di laurea: New organization > Master science > LM-25 - AUTOMATION ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/15384
Modify record (reserved for operators) Modify record (reserved for operators)