Riccardo Cardona
Horus: A real-time object detection application with Boston Dynamics' Spot robot.
Rel. Fabrizio Lamberti. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024
Abstract: |
A configurator of autonomous missions for Boston Dynamics' Spot Nowadays, robotic technologies are gaining popularity and becoming indispensable in many industries. They are employed for a multitude of purposes, as they provide significant advantages to both companies and workers. For instance, in industrial settings, the automation of inspection tasks has become increasingly prevalent. This thesis investigates the potential of robotics and computer vision, with a particular focus on their application in industrial contexts. In particular, the thesis presents Horus, a real-time object detection system designed for Boston Dynamics’ Spot robot. Spot is a quadrupedal robot that has been outfitted with a suite of cameras and sensors positioned around its body. Furthermore, the robot can be furnished with additional payloads from the manufacturer or third-party providers, thus optimising its functional capabilities. The robot exhibits excellent stability and is capable of automatically adapting to surfaces of varying levels of inclination. These features give Spot, via Horus, the capacity to identify and recognise particular objects within any area of interest. Horus extends Spot’s capabilities by enabling it to perform computer vision tasks, including the detection of fire extinguishers and exit signs using its onboard cameras. The Horus system consists of two principal macro sections: the Streaming section and the Processing section. The Streaming section allows the real-time execution of computer vision tasks based on the video stream from Spot’s cameras or external sources. The user is able to select the desired video source and choose from a range of object detection models, including those designed for fire extinguisher detection and exit signs detection (with on/off status recognition). The Processing section, in turn, enables computer vision tasks to be performed on user-uploaded images and videos. Furthermore, this section integrates multimodal LLM models to generate detailed reports on different types of incidents, analysing both visual and textual content. Users may select between OpenAI (GPT-4o and GPT-4 Turbo) or, if they prefer to use an LLM model locally, Llava (Large Language and Vision Assistant). The thesis provides a comprehensive overview of the backend structure of Horus, including a detailed account of the functions related to the management of video streams from potential sources, the use of the Ultralytics library for the implementation of the object detection models, the steps taken for the creation of the datasets and, finally, the integration of LLM models for the generation of reports. Additionally, the thesis presents Horus Light, a simplified version of Horus developed for a customer who requested a Spot-based security system. This version of Horus has been developed with the objective of providing the end customer with a condensed version that includes just the features that they require. This thesis was developed through a collaborative effort between Politecnico di Torino and Sprint Reply. The research activities were conducted at Reply’s laboratory facility, designated ”Area42”, located in the Lingotto district of Turin. |
---|---|
Relatori: | Fabrizio Lamberti |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 87 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | SPRINT REPLY S.R.L. CON UNICO SOCIO |
URI: | http://webthesis.biblio.polito.it/id/eprint/33883 |
Modifica (riservato agli operatori) |