Roya Esmaeilikorani
Comparative Analysis of Vector Databases for Real-Time Similarity Retrieval: Enhancing Large Language Model Performance.
Rel. Giuseppe Rizzo. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024
Abstract
With the emergence of AI and big data, the dimensionality of data has increased significantly, necessitating more efficient methods for data storage and retrieval. Traditional databases are not well-suited for handling the high-dimensional and unstructured data typical of modern applications. Vector databases (VecDBs) have become essential for managing these vector representations, providing an effective solution for tasks that require rapid and accurate similarity searches. This thesis presents an analysis of vector databases (VecDBs) and their role in improving the performance of Large Language Models (LLMs) through real-time similarity retrieval. We developed a Retrieval-Augmented Generation (RAG) pipeline, which integrates LLMs with VecDBs to respond to queries about external documents, specifically PDFs.
Our RAG pipeline augments LLMs by utilizing external knowledge stored in VecDBs, addressing issues such as outdated knowledge and hallucinations often found in LLMs
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
