Reinforcement Learning for Dynamic Stochastic Scheduling

Alessia De Crescenzo

Reinforcement Learning for Dynamic Stochastic Scheduling.

Rel. Paolo Brandimarte, Edoardo Fadda. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Matematica, 2024

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB) | Preview

Abstract:	With the sharp increase of uncertainty and complexity in production processes, dynamic scheduling nowadays plays a strong role in making enterprises more competitive: it is needed to handle real time events, such as machine breakdowns, job arrivals and stochastic processing times. The static job scheduling problem (JSP) is one of the most practically relevant but rather complex scheduling problems, having been proved to be NP hard, and it has been the subject of a significant amount of literature in the operations research field: however, this approach is unrealistic in real-world contexts, where dynamic events such as insertions, cancellations or modifications of orders, machine breakdowns, variation in due dates and processing times are inevitable and drive the realized execution of a static schedule far from its expected outcome and deteriorate the production efficiency seriously. This work focuses on dynamic scheduling problem in job shops with new job arrivals at stochastic times, aiming at minimizing the penalties for earliness, tardiness and flowtime, according to the just-in-time (JIT) policy,which is based on the idea that early as well as late delivery must be discouraged: a Reinforcement Learning agent-based method for developing a predictive-reactive scheduling strategy is investigated. The approach involves generating an initial schedule and subsequently revising it in response to the arrival of new jobs. Specifically, the proposed method entails implementing an event-driven rescheduling policy, wherein the arrival of a new job prompts a rescheduling of the entire timeline from the arrival time onwards. An agent is designed to simulate time according to the current schedule and schedule the operations of the new job. Two agents were trained, using Sarsa and Q-learning respectively, both with eligibility traces and function approximation. Their performance was then tested on a various range of instances and compared with the performance of a FIFO agent.
Relatori:	Paolo Brandimarte, Edoardo Fadda
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	57
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Matematica
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/32508

Modifica (riservato agli operatori)