polito.it
Politecnico di Torino (logo)

Reinforcement Learning for Dynamic Scheduling

Lucca Gamballi

Reinforcement Learning for Dynamic Scheduling.

Rel. Edoardo Fadda, Leonardo Kanashiro Felizardo. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2025

[img] PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)
Abstract:

This thesis investigates Reinforcement Learning (RL) for the Dynamic Job Shop Scheduling Problem (DJSSP), where agents make sequencing decisions under random job arrivals and tardiness is realized only upon job completion. This work argues that asynchronous per-machine decisions mitigate the credit-assignment challenge and assist training stability, motivating designs that explicitly align rewards with the causality of shop-floor events. The scheduler adopts a Centralized-Training and Decentralized-Execution (CTDE) scheme with parameter sharing and an event-driven policy that acts only at irregular decision epochs. This preserves local detail while remaining size-agnostic as queues fluctuate. State is constructed leveraging a “Minimal Repetition” encoder that packs the top job candidates of each machine into fixed slots with job-specific features, enabling direct job selection without fixing problem size. The delayed reward is handled via a chronological joint-action pipeline: Transitions are buffered without reward and completed only when a job finishes, allocating a joint signal to the responsible agents in proportion to the queueing they induced. Finally, this thesis proposes a hierarchical learning extension to the multi agent scheduler. This introduces a High-Level Agent that selects operating modes for Low-Level (per-machine) agents, enabling the system to adapt to the shop-flor current state. Simulation results indicate that the hierarchical RL framework proposed in this thesis is able to reduce the general shop-flor tardiness when compared to standard or learning based sequencing rules.

Relatori: Edoardo Fadda, Leonardo Kanashiro Felizardo
Anno accademico: 2025/26
Tipo di pubblicazione: Elettronica
Numero di pagine: 93
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/37762
Modifica (riservato agli operatori) Modifica (riservato agli operatori)