Politecnico di Torino (logo)

Tracing methodologies and tools for Artificial Intelligence and Data Mining Java applications

Roberto Stagi

Tracing methodologies and tools for Artificial Intelligence and Data Mining Java applications.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (6MB) | Preview

Supercomputing and Artificial Intelligence are among the most important outcomes of the last decades. Both of them have been behind the scenes of many recent discoveries, and together with most of the applications in general, have been switching from a sequential paradigm to parallel and distributed approaches, that best fit the new hardware. The High Performance Computing (HPC) discipline is at the heart of these developments. In this context, the Java programming language plays a marginal role. However, Java is still in high demand, it is employed in AI and runs effectively on supercomputers. Even if a smaller set of programmers use it for HPC applications, its influence in the AI world is not negligible and it deserves a larger attention to the tools that support its development in such environment. Parallel program performance analysis is concerned with achieving efficient utilisation of system resources. One common technique is to collect trace data and then analyse it for possible causes of poor performance. A department of the BSC, the Performance Tools department, is in charge of developing this kind of tools. The thesis has been developed as an intern in this department, and for this reason the base of the work is going to be on the two main tools developed there: Extrae and Paraver. The former is the program needed to extract information, while the second one to show them. The main focus of this thesis is on Extrae. The state of the art of Extrae's instrumentation for Java is poorly implemented. Out of some basic features to trace basic thread events, using the instrumentation of pthreads (on which all Java threads are mapped), it does not give much valuable information. A study on the state of the art is covered in chapter 2. Since Extrae is implemented in C, generating probes and wrappers would not be an issue for other C-implemented programs. In chapter 3 there is an overview of the approaches that can be used to generate the traces for a Java program. The approach that is then developed is going to be based on an event-driven platform offered by the JVM (the JVM TI), united to the extension for the Java language that implement aspect-oriented programming paradigm (AspectJ). The development of this platform follows in chapter 4 and chapter 5, and will be applied on a real Java framework: Hadoop. This study is carried out in chapter 6, where also discussions on the whole work of the thesis can be found.

Relators: Paolo Garza
Academic year: 2019/20
Publication type: Electronic
Number of Pages: 105
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
URI: http://webthesis.biblio.polito.it/id/eprint/15276
Modify record (reserved for operators) Modify record (reserved for operators)