Politecnico di Torino (logo)

Predicting the pending time of jobs submitted on a High-Performance Computing cluster through a hierarchical classification strategy

Fabio Carfi'

Predicting the pending time of jobs submitted on a High-Performance Computing cluster through a hierarchical classification strategy.

Rel. Tania Cerquitelli. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview

In the context of heavy simulations, which need, therefore, to be performed on HPC architectures to have more computing power than the one provided by a normal PC, since the resources these systems can offer are also limited, it is useful to try to understand how much time passes between the moment in which the job is subdued and the one in which it starts the execution. This is because with this knowledge it would be possible, in the future, to design processes that can define the execution time of simulations, in order to use this information to create procedures that can optimize the resources requests, drastically reducing the waiting times of the results. The objective of this thesis project, carried out at the Iveco headquarter (Italy) under the supervision of the Politecnico di Torino, is therefore to succeed to implement a prediction system able to estimating the time that a job will have to wait before the necessary hardware resources are supplied for its execution. To do this, an analysis of the available data was initially carried out to increase its descriptive capacity and to provide guidance on how to proceed. Later, feature engineering techniques were applied to extract other information to better describe the analysis situation. In fact, in this case, variables were created to try to characterize the context of the HPC cluster at the time of submission of the job, data that were not contained within the dataset available. Once the information and variables deemed necessary for the execution of this activity have been obtained, we moved to an experimental phase in which supervised machine learning algorithms were used to perform various predictions, modifying the classes, their number and also the quantity of models used to obtain a single result. In fact, at the beginning, multi-class classification experiments were carried out with intervals of the same frequency and others with custom ranges. At a time when it was found these tests yielded low results, binary classifications were analysed, i.e. only using two prediction intervals, to study the behaviour of the algorithms and to find, therefore, those thresholds that allow to obtain appreciable results. From the knowledge acquired through these two types of experiments, it was finally decided to try a hierarchical approach in which several models were used in cascade based on the prediction results obtained in previous levels. Doing so, a solution was found which presented better results compared to those of the various tests executed. However, the values of the indices obtained in the prediction of the classes show that the system is still improvable. Therefore, the work done during this period will continue with the aim of further improving this system, modifying it and possibly integrating it with other techniques to be analysed in the future.

Relators: Tania Cerquitelli
Academic year: 2020/21
Publication type: Electronic
Number of Pages: 122
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: IVECO SPA
URI: http://webthesis.biblio.polito.it/id/eprint/15969
Modify record (reserved for operators) Modify record (reserved for operators)