Reinforcement Learning for Dynamic Scheduling

Lucca Gamballi

Reinforcement Learning for Dynamic Scheduling.

Rel. Edoardo Fadda, Leonardo Kanashiro Felizardo. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB) | Preview

Abstract

This thesis investigates Reinforcement Learning (RL) for the Dynamic Job Shop Scheduling Problem (DJSSP), where agents make sequencing decisions under random job arrivals and tardiness is realized only upon job completion. This work argues that asynchronous per-machine decisions mitigate the credit-assignment challenge and assist training stability, motivating designs that explicitly align rewards with the causality of shop-floor events. The scheduler adopts a Centralized-Training and Decentralized-Execution (CTDE) scheme with parameter sharing and an event-driven policy that acts only at irregular decision epochs. This preserves local detail while remaining size-agnostic as queues fluctuate. State is constructed leveraging a “Minimal Repetition” encoder that packs the top job candidates of each machine into fixed slots with job-specific features, enabling direct job selection without fixing problem size.

The delayed reward is handled via a chronological joint-action pipeline: Transitions are buffered without reward and completed only when a job finishes, allocating a joint signal to the responsible agents in proportion to the queueing they induced

Relatori

Edoardo Fadda, Leonardo Kanashiro Felizardo

Anno Accademico

2025/26

Tipo di pubblicazione

Elettronica

Numero di pagine

Corso di laurea

Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)

Classe di laurea

Nuovo ordinamento > Laurea magistrale > LM-27 - INGEGNERIA DELLE TELECOMUNICAZIONI

URI

https://webthesis.biblio.polito.it/id/eprint/37762

Modifica (riservato agli operatori)