Building a Distributed Query Engine at Scale

Filippo Rossi

Building a Distributed Query Engine at Scale.

Rel. Gianpiero Cabodi. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025

Abstract

Modern distributed systems generate massive volumes of telemetry data requiring fast query processing for real-time monitoring and analytics. As data volumes reach petabyte scale, traditional query architectures struggle to balance execution flexibility, resource utilization, modularity, cost efficiency, and performance. This thesis addresses these challenges through the design and implementation of Bolt, a query engine for large-scale event processing systems. The research demonstrates how to extend existing production infrastructure without disrupting operational stability, using a composable approach and standardized interfaces while maintaining compatibility with established distributed architectures. Chapter 1 establishes the problem statement: extending query capabilities for high-demand analytical workloads while addressing challenges in execution flexibility, resource utilization, modularity, and cost efficiency.

Chapter 2 describes Datadog's Event Platform (EVP) architecture and its role in processing petabytes of telemetry data monthly, examining the multi-layered distributed query architecture that forms the integration environment of the new query engine

Tipo di pubblicazione

Elettronica

URI

https://webthesis.biblio.polito.it/id/eprint/37748

Modifica (riservato agli operatori)