polito.it
Politecnico di Torino (logo)

AI-based Code Understanding using Large Language Models and a Conversational Chatbot

Mohammadreza Mohammadi

AI-based Code Understanding using Large Language Models and a Conversational Chatbot.

Rel. Giuseppe Rizzo. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

Abstract:

In the ever-evolving landscape of software development, companies have vast code repositories that store the collective knowledge of their projects. These repositories hold the answers to numerous questions posed by programmers which are often framed in plain English text. The main purpose of this thesis project is to address the issue that reviewing and understanding peers’ code can be challenging and complicated. This challenge becomes especially evident when new team members join or when an employee leaves, leaving behind code that is hard to understand. Comprehending their work becomes a significant challenge in such situations. One of the key motivations for the effort is the clear fact that not all codebases have thorough documentation and comprehensive testing coverage. In fast-paced, dynamic settings, particularly in smaller companies, delivery is an important concern. Consequently, codebases frequently evolve rapidly, lacking the extensive documentation that would enhance their comprehensibility. This project involves designing a chatbot capable of accessing various code repositories within a company and providing clear, human-readable responses to questions related to the codebase. The objective is to empower programmers with a conversational interface that can facilitate their understanding of complex code structures, functions, and algorithms. In line with this goal, the chatbot utilizes the power of retrieval augmented generation, a novel technique that combines the capabilities of information retrieval and natural language generation. By integrating these two elements, the chatbot can retrieve relevant code snippets and explanations from the repositories and generate comprehensible responses in plain English. The importance of this work lies in its potential to enhance the productivity and efficiency of programmers within organizations, reducing the time and effort required to navigate and comprehend complex codebases. As we explore this thesis, we examine the technical challenges and steps of building a chatbot capable of addressing the diverse queries posed by programmers. This project also involves the development of a graphical interface to access and query company code repositories seamlessly.

Relatori: Giuseppe Rizzo
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 44
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: KWANTIS SRL
URI: http://webthesis.biblio.polito.it/id/eprint/29589
Modifica (riservato agli operatori) Modifica (riservato agli operatori)