CIM Wizard and CIM Assist: An Object-Oriented Platform for Extensible City Information Modeling and Autonomous LLM Agent for Spatial Querying

Ali Taherdoustmohammadi

CIM Wizard and CIM Assist: An Object-Oriented Platform for Extensible City Information Modeling and Autonomous LLM Agent for Spatial Querying.

Rel. Lorenzo Bottaccioli, Edoardo Patti, Pietro Rando Mazzarino. Politecnico di Torino, Corso di laurea magistrale in Digital Skills For Sustainable Societal Transitions, 2025

Abstract:	Urban energy modeling currently faces two critical obstacles: (1) the lack of an extensible, standardized platform that allows researchers to easily define and integrate custom building feature calculation methods beyond a fixed set of functions, which is necessary to overcome the challenge of sparse and heterogeneous data resources; and (2) the inaccessibility of complex PostGIS spatial SQL to non-technical users, which severely limits efficient data querying and analysis. This thesis presents a comprehensive framework to address these issues through two integrated, open-source systems: CIM Wizard Integrated, an object-oriented platform for city information modeling, and CIM Assist, an LLM-powered natural language interface for spatial data access. CIM Wizard directly tackles the extensibility and standardization gap by providing a robust, object-oriented FastAPI framework. The platform includes 17 predefined calculators/methods as sample prototypes but is fundamentally designed to allow energy researchers to define and integrate their own customizable methods and logic, ensuring maximum flexibility for diverse urban energy studies. It seamlessly integrates multi-schema PostGIS databases (vector, census, network, raster) with automatic dependency resolution and fallback strategies, filling a critical need for flexible urban energy analytics. CIM Assist addresses the database interaction complexity through an LLM-powered natural language interface using a sophisticated three-stage approach: (1) an innovative dataset generation pipeline combining rule-based templates, CTGAN synthetic SQL generation, and LLM augmentation, producing over 400K training samples with a 99.57% Stage 2 NoErr rate at minimal cost; (2) two-stage fine-tuning using QLoRA on 14B parameter models, involving: a task and domain-specific model (question -> SQL), and a sequential instruction-following model (question -> instruction, followed by question+instruction -> SQL). This two-pronged strategy serves as a critical step toward the creation of a next-generation task-specific but domain-agnostic agent architecture; and (3) LangGraph-based agent integration with database introspection tools enabling self-correction through execution feedback. The systematic evaluation framework employs four complementary metrics (NoErr, EM, EX, EA) demonstrating 80-92% first-shot accuracy (EX) and 90-96% eventual accuracy (EA) with agent mode, representing a 5-10% improvement over baseline models. The complete pipeline achieves production-ready performance for real-world spatial SQL applications while democratizing access to complex urban energy databases and providing an extensible, open-source foundation for future urban energy research.
Relatori:	Lorenzo Bottaccioli, Edoardo Patti, Pietro Rando Mazzarino
Anno accademico:	2025/26
Tipo di pubblicazione:	Elettronica
Numero di pagine:	170
Informazioni aggiuntive:	Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea:	Corso di laurea magistrale in Digital Skills For Sustainable Societal Transitions
Classe di laurea:	Nuovo ordinamento > Laurea magistrale > LM-91 - TECNICHE E METODI PER LA SOCIETÀ DELL'INFORMAZIONE
Aziende collaboratrici:	NON SPECIFICATO
URI:	http://webthesis.biblio.polito.it/id/eprint/38861

Modifica (riservato agli operatori)