Luca Bergamini
GenAI-NewsScraper: Automated News Scraping, Summarization, Enrichment, and Multimodal Content Generation.
Rel. Riccardo Coppola. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (7MB) | Preview |
Abstract
The rapid growth of digital media has created a need for automated systems that can efficiently retrieve, process, and deliver news content. This thesis presents the design and implementation of a generative AI (GenAI) system for automated news scraping, content enrichment and summarization, and multimodal output generation, aimed at supporting scalable media workflows and interactive user experiences. The main objective is to develop an agent-based architecture that autonomously collects news from different sources, enriches it with related material, summarizes key information, and delivers results via text and audio formats. The system relies on a Model Context Protocol (MCP) server for orchestration, with modular tools for vector-based data storage, LLM-driven web search, and Text-to-Speech (TTS) synthesis.
Structured web scraping, multi-document summarization, and vector embeddings (using PostgreSQL with pgvector) enable efficient data processing, while TTS supports automated podcast generation and interactive newsletters
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
