Benchmarking Large Language Models for Decision-Making in Supply Chain

Alberto Bersano

Benchmarking Large Language Models for Decision-Making in Supply Chain.

Rel. Giovanni Zenezini, Filippo Maria Ottaviani. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Gestionale, 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (4MB) | Preview

Archive (ZIP) (Documenti_allegati) - Altro
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB)

Abstract:	The growing diffusion of Large Language Models (LLMs) has stimulated increasing interest in their application to supply chain management, a field where managerial decisions require precision, efficiency, and adaptability. Despite the widespread use of general-purpose benchmarks such as MMLU or HELM, the literature highlights the absence of systematic evaluation frameworks specifically designed for supply chain contexts. This thesis addresses that gap by developing a set of benchmarks to assess the reliability, efficiency, and managerial usefulness of LLMs. The research is guided by two central questions: (i) which combinations of datasets, evaluation metrics, and prompting strategies enable the construction of meaningful benchmarks for supply chain tasks; (ii) which language model currently offers the best balance among accuracy, speed, and cost. The overall objective is to verify whether LLMs can serve as valid tools to support managerial decision-making. To answer these questions, a multi-layered methodology was designed around a “pyramid of difficulty” dataset, progressing from single-choice questions to numerical problems with exact answers, up to complex tasks requiring explicit reasoning. The benchmarks integrate different prompting strategies (Zero-Shot, Role Prompting, Chain-of-Thought) and evaluate multiple dimensions such as accuracy, cost, latency, token usage, and reasoning quality. The Analytic Hierarchy Process (AHP) was employed to synthesize these metrics into a single comparative index, while acknowledging the subjectivity of the survey-based weights. The experimental analysis of eight state-of-the-art models revealed systematic differences. GPT-5 achieved the highest and most stable accuracy but at significantly higher computational costs and latency. Gemini 2.5 Flash reached similar accuracy while proving more efficient, whereas GPT-5 mini offered a balanced trade-off. By contrast, DeepSeek-V3.1, the Claude series, and Gemini Flash-Lite delivered less consistent outcomes, though competitive in speed and lower costs. A key insight concerns prompting. The implicit use of Chain-of-Thought—adding “Let’s think step by step” without requiring explicit reasoning—did not improve accuracy and sometimes reduced it, especially in complex tasks. Conversely, explicit reasoning (Benchmark 5) produced clear improvements, confirming that transparency in reasoning enhances reliability. The comparison of question formats further showed that LLMs perform better with single-choice tasks, where predefined options act as anchors, while struggling with numerical problems that require generating the correct value independently. Overall, the thesis demonstrates that LLMs can support managerial decision-making in supply chain contexts, provided that their adoption is guided by structured benchmarking capable of balancing accuracy with efficiency. The work contributes theoretically by proposing a replicable, domain-specific evaluation framework and by introducing a qualitative method for analyzing reasoning errors, distinguishing between interpretation and planning failures. On the practical side, it offers guidelines: avoid Chain-of-Thought for simple tasks, apply it for complex problems, and select models by weighing accuracy, cost, and latency. Future research should extend the framework to multi-turn interactions, integrate self-consistency techniques, and test robustness in uncertain, dynamic supply chain environments.
Relatori:	Giovanni Zenezini, Filippo Maria Ottaviani
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	149
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Gestionale
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-31 - INGEGNERIA GESTIONALE
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/37210

Modifica (riservato agli operatori)