TIGER: Testing and Improving Generated Code with LLMs

Lorenzo Gallone

TIGER: Testing and Improving Generated Code with LLMs.

Rel. Stefano Scanzio, Gianluca Cena. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Preview

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.
Download (826kB) | Preview

Abstract:	This thesis presents TIGER, a test-driven and documentation-guided framework designed to improve the functional reliability of Python code generated by Large Language Models (LLMs) in realistic software development settings. Unlike conventional approaches that often expose models to ground-truth implementations or rely on synthetic benchmarks, TIGER operates in a constraint-rich environment that better simulates the experience of human developers: the model is given access only to high-level natural language artifacts—such as README files, reStructuredText documentation, and function docstrings—and is guided solely by the results of automated test execution. The LLM is never allowed to see any reference implementation of the target functions. At the core of TIGER lies an iterative refinement process. Given a natural language prompt built from documentation and structural metadata extracted from a repository, the LLM generates an initial implementation of a target function. This function is then dynamically inserted into the codebase and evaluated against its associated test suite. If test failures occur, the system automatically extracts a summary of the errors—using either a rule-based or model-based method—and incorporates this feedback into a revised prompt. The LLM is invoked again to produce an improved version of the function. This cycle continues until the function passes all tests or a maximum number of iterations is reached. A single-pass version of the system is also implemented to serve as a baseline for evaluating the added value of the iterative strategy. The system has been evaluated across ten real-world Python repositories of varying size and complexity. These repositories were selected for their inclusion of rich documentation and comprehensive test suites. The experiments compare the effectiveness of two different LLMs— Gemini 2.0, a general-purpose model, and QwenCoder 2.5 Instruct, optimized for code generation—using both single-pass and iterative configurations. Results demonstrate that iterative refinement consistently improves the success rate of code generation, particularly in repositories involving structural complexity, indirect test coverage, or domain-specific logic. On average, functions required fewer than two iterations to converge, and performance gains were achieved without modifying the underlying LLM weights or architectures. Furthermore, TIGER incorporates a modular architecture that enables scalable analysis of large repositories, mapping functions to test cases through static code analysis and call graph propagation. Prompt generation is structured to preserve semantic consistency and contextual relevance, with special handling for class-based functions and internal dependencies. The system also includes mechanisms for error summarization and logging, enabling transparent and reproducible evaluation. Overall, this work demonstrates that LLMs can evolve from static, one-shot generators into adaptive, test-driven agents capable of producing functionally correct and maintainable code. By combining structured prompting, automated testing, and iterative feedback—without relying on fine-tuning or external supervision—TIGER offers a lightweight yet powerful framework for integrating LLMs into practical software engineering workflows. It also lays the groundwork for future research on hybrid human-AI development pipelines, robust error handling strategies, and context-aware prompt engineering for code synthesis at scale.
Relatori:	Stefano Scanzio, Gianluca Cena
Anno accademico:	2024/25
Tipo di pubblicazione:	Elettronica
Numero di pagine:	99
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela:	UNIVERSITY OF ILLINOIS AT CHICAGO (STATI UNITI D'AMERICA)
Aziende collaboratrici:	University of Illinois at Chicago
URI:	http://webthesis.biblio.polito.it/id/eprint/36426

Modifica (riservato agli operatori)