TIGER: Testing and Improving Generated Code with LLMs

Lorenzo Gallone

TIGER: Testing and Improving Generated Code with LLMs.

Rel. Stefano Scanzio, Gianluca Cena. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (826kB) | Preview

Abstract

This thesis presents TIGER, a test-driven and documentation-guided framework designed to improve the functional reliability of Python code generated by Large Language Models (LLMs) in realistic software development settings. Unlike conventional approaches that often expose models to ground-truth implementations or rely on synthetic benchmarks, TIGER operates in a constraint-rich environment that better simulates the experience of human developers: the model is given access only to high-level natural language artifacts—such as README files, reStructuredText documentation, and function docstrings—and is guided solely by the results of automated test execution. The LLM is never allowed to see any reference implementation of the target functions.

At the core of TIGER lies an iterative refinement process