
Razieh Meidanshahi
Proactive Database Size Forecasting for SQL Server instances: A Machine Learning Approach.
Rel. Andrea Bottino. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
Abstract: |
Efficient database size management is crucial for organizations relying on on-premise SQL Server databases. Unexpected database growth can lead to server space saturation, causing performance degradation, system downtime, and increased operational costs. This thesis presents a forecasting approach to predict database growth, enabling proactive resource allocation and preventing storage-related failures. The study begins by extracting databases hosted on the given SQL Server instance and retrieving historical row byte size data from the InfluxDB time-series database for the past year. The extracted data undergoes cleaning and transformation to prepare it for time-series forecasting using the SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous factors) model SARIMAX is chosen over other time-series forecasting models due to its ability to handle seasonality, trends, and external influencing factors, making it well-suited for database growth predictions. Unlike simpler models such as ARIMA, which assumes stationarity, SARIMAX accounts for seasonal fluctuations and the impact of external variables, ensuring more accurate long-term predictions. Compared to machine learning-based models like LSTMs (Long Short-Term Memory networks), SARIMAX requires less training data, is computationally efficient, and provides interpretable results, making it ideal for IT administrators who require transparency in decision-making. The ability to incorporate exogenous variables allows SARIMAX to adapt to external factors, such as workload variations and policy changes, which can influence database growth. The forecasting model generates monthly growth estimates and predicts the size of each database for the next several months based on parametric inputs. The forecast results are stored in a MariaDB database and visualized through Grafana dashboards. These dashboards provide insights into database growth trends and issue warnings when projections indicate that a server’s disk usage will reach a given threshold of available capacity within the next three months. The total disk size is dynamically loaded from Zabbix data to ensure accurate monitoring. For database administrators, this solution offers several key benefits. First, it provides visibility into database growth trends, enabling proactive decision-making regarding disk allocation. By anticipating storage needs, administrators can optimize resource utilization, avoiding unnecessary storage expansion while ensuring sufficient capacity for growing workloads. Second, automated alerts reduce the risk of unexpected database-related failures, minimizing downtime and improving system reliability. Finally, the integration with existing monitoring tools such as Zabbix and Grafana ensures seamless adoption without disrupting existing workflows. This project delivers a practical, data-driven approach to database growth forecasting, helping organizations enhance operational efficiency, reduce costs, and maintain high system availability. |
---|---|
Relatori: | Andrea Bottino |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 43 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | Evo Development srl |
URI: | http://webthesis.biblio.polito.it/id/eprint/35308 |
![]() |
Modifica (riservato agli operatori) |