polito.it
Politecnico di Torino (logo)

AI-driven Picking Solutions for Industrial Feeding Machines and Applications

Khashayar Mostmand

AI-driven Picking Solutions for Industrial Feeding Machines and Applications.

Rel. Valentino Peluso, Andrea Calimera, Enrico Macii, Alberto Dalmasso. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering), 2025

Abstract:

The proposed thesis presents a joint vision-based algorithmic scheme that makes industrial feeding systems, a major part of the modern automated manufacturing, more accurate and reliable. To overcome the problem of real-world industrial settings, including haphazard illumination, occlusions, and varying object properties, the research paper is best suited to integrate classic computer vision with the latest deep learning methods to capitalize on their benefits in object detection, orientation estimation, and picking points. The methodology is a two-step procedure: the synthesis of realistic scenes in the form of a rendering program gathers data, and authentic industrial images are used to verify the functionality of the system. Effective solutions to the problems, such as false detections and perspective distortion, can be offered through algorithmic innovations, including custom preprocessing pipelines. Strict experimental analysis proves that classical algorithms, including the Watershed Algorithm and Connected Components Analysis, provide state-of-the-art results with IoU and Dice scores of 0.802 and 0.890, respectively, and a mean Hausdorff distance of 6.08 pixels and an accuracy of 99.8%. High performance is also shown by deep learning models, such as Mask R-CNN, LRASPP MobileNetV3, with Dice scores of 0.802 and 0.881, respectively, which proves that these methods cannot produce accurate segmentations that is the critical base of other tasks. The critical deployment trade-offs can be seen through a thorough examination of computational efficiency at many image resolutions. Classical algorithms take 0.09-0.14/s to process objects on low-resolution (640×640) images (up to approximately 11 FPS), and even the latest deep models, such as Mask R-CNN, take up to 1.25 FPS. Connected components and watershed algorithms achieve up to 5 FPS on HD resolution (1280×720), and Mask R-CNN achieves over 0.3 FPS. Execution times are increased at high resolutions typical of industrial imaging (2448×2048): a classical model can offer 1.22048s/object (0.50.8 FPS) and deep networks such as Mask R-CNN can take longer (615s/object, 0.0716 FPS), which highlights the failure of these methods to achieve real-time performance. The classical methods offer a resource-effective way of deployment that can be done in real-time or near-real-time, especially in embedded and edge devices. Deep learning approaches, in their turn, are more efficient with big and fluctuating data. It is in the report that the application of embedded hardware, as well as semi-supervised and self-supervised methods of learning, is suggested to decrease the cost of annotations and promote the generalization of models. Altogether, this paper provides a strong and scalable solution to industrial automation and important insights into the high-performance design of a vision system that ensures the balance of accuracy, speed, and data efficiency in intelligent manufacturing systems.

Relatori: Valentino Peluso, Andrea Calimera, Enrico Macii, Alberto Dalmasso
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 76
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Elettronica (Electronic Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Aziende collaboratrici: E.P.F. Elettrotecnica Srl
URI: http://webthesis.biblio.polito.it/id/eprint/38746
Modifica (riservato agli operatori) Modifica (riservato agli operatori)