Politecnico di Torino (logo)

Real-time object recognition in industrial automation processes

Michele Montatore

Real-time object recognition in industrial automation processes.

Rel. Maurizio Morisio. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (13MB) | Preview

Computer Vision (CV) is a branch of Artificial Intelligence that aims to enable machines to simulate the human visual system by extracting features of the physical world from images and videos usually taken by a camera. It includes several tasks with different goals: image classification, image segmentation, object detection, etc. To perform them, CV today’s applications exploit the most diverse algorithms and tools involving geometric transformations, filtering operations, and also modern Deep Learning technologies such as Convolutional Neural Networks. This work explores the possibility of using them as a proof of concept for an automatic a posteriori check on the assembly of automotive seat frame parts, specifically colored motors, in an industrial environment where a bench lets the frames translate back and forth while a camera captures the scene from above. For this purpose, a PyTorch model called YOLOv5 has been adopted for the real-time recognition of motors, combined with a color detection algorithm for the association of one color, among those of a predefined set, with them. The two tools work simultaneously: given a video stream, at each frame, YOLO locates the motors through bounding boxes, and the color detection algorithm is subsequently run to find the ones inside such boxes. Due to the difficulty in collecting a vast number of real variegated photos for the network training, a dataset of synthetic images with labels has been created starting from a 3D model of the frame. In particular, the joint use of the Unity engine and C# scripts has enabled the generation of simulated videos from which screenshots at different conditions have been automatically taken and annotated. Thus, the knowledge acquired on such virtual data has been the basis for a second training phase involving the few available real images according to the widespread approach of Transfer Learning. Various training trials have been done in Google Colab to get a final tuned model capable of providing decent results in terms of precision, recall, and mAP regarding the object detection part; in addition, a statistical analysis has been performed at inference on some real video footages also to evaluate the recognition goodness in terms of confidence and lowering of false positives, and color detection accuracy, not measurable at training time. The entire training architecture complies with the ETL paradigm: data are extracted from Dropbox for Colab import, annotation files are filtered, and significant per epoch values (losses, metrics, etc.) are uploaded to the non-relational cloud database MongoDB Atlas, making them accessible for further reporting activities. In closing, a user-friendly Python-based dashboard has been implemented for the real-time visualization of detections and other relevant information at inference. Despite the encouraging results in object and color recognition, some limitations have emerged in this project: firstly, the sensitivity of the color detection algorithm to the environmental conditions, but perhaps more importantly, the challenges in guaranteeing a high speed of execution in real-time video processing, mostly when YOLO is integrated with the dashboard.

Relators: Maurizio Morisio
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 101
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Orbyta Tech srl.
URI: http://webthesis.biblio.polito.it/id/eprint/25507
Modify record (reserved for operators) Modify record (reserved for operators)