Automatic Generation of Tool-Use Traces for Evaluating LLM Agents

Francesco Giannuzzo

Automatic Generation of Tool-Use Traces for Evaluating LLM Agents.

Rel. Paolo Garza, Paolo Papotti. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2026

Abstract

Autonomous agents powered by large language models (LLMs) are increasingly expected to operate over structured data environments via external tools. Yet evaluating their capabilities, such as planning, tool selection, ambiguity handling, and multi-step execution, remains challenging. Existing benchmarks often depend on manually crafted scenarios, which are costly to create and hard to scale. This thesis introduces SyntheticAgentTraceQA, a pipeline that automatically generates, executes, and validates synthetic task–tool interaction traces grounded in real data. The pipeline follows a top-down strategy: instead of starting from a natural language query, it first constructs abstract operational templates using a taxonomy of tool roles (e.g., entity resolution, data retrieval, analysis, aggregation).

These templates are instantiated and executed, and the resulting traces are validated for correctness