polito.it
Politecnico di Torino (logo)

Data Security and Privacy Concerns for Generative AI Platforms

Aurora Tomassi

Data Security and Privacy Concerns for Generative AI Platforms.

Rel. Fulvio Valenza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (5MB) | Preview
Abstract:

The thesis takes this all-important question further down, into the realms of Generative AI (GenAI) platforms — a landscape that is swiftly changing yet already old enough to bring along tons and tons of interesting privacy security challenges alongside. While the onset of these advanced GenAI technologies allows for increasingly sophisticated productions across texts, images, music, and practically all types of data there is a growing concern about how to protect personal or confidential information. These platforms have been rapidly adopted more broadly, and as a result, it is increasingly important to consider the implications of how data are used, stored, and protected — especially given the size of datasets needed for training these models. This research aims to provide a deep analysis of the intersection between data protection and Generative AI technology deployment. The thesis is organized into several chapters, each focusing on different aspects of this relationship. The first chapters take a more “macro” view of the evolution of Generative AI, as it has developed over time to be what we see today. Here are some of the key players, and what's contributing to GenAI's rapid growth. After this contextual introduction, the thesis focuses on the different security and privacy threats of Generative AI. These include, but are not limited to, everything from AI model biases to the spread of deepfakes, through potential cyber-attacks and complex tangles of intellectual property rights. Each of these is analyzed in turn, with an emphasis on how such threats impact individuals and organizations dependent on these GenAI platforms. It also looks at the concomitant discussion of the wider societal implications of these various risks, not forgetting the ethical dilemmas they avow. It is against this background of challenges that the thesis sets out various ways in which such risks as posed by Generative AI could be minimized. Among these, it takes an in-depth look at such techniques as anonymization of data, tokenization, and encryption, which might ensure the security and privacy of the data in question for the AI systems. These are then critically assessed against effectiveness and practicability to offer a balanced view of the possible solutions at the discretion of stakeholders. A significant part of the thesis describes the case study, which should illustrate how these strategies are applied. It also includes a description of an experimental project on data anonymization as a practical way of enhancing the privacy of Generative AI applications. This is a case study that crystallizes theoretical notions presented in the previous parts of the thesis and provides empirical evidence of advantages and issues related to the adoption of these privacy-enhancing techniques in real-world applications. In this work, the open-source tool adopted is Microsoft Presidio, applied to open-source LLM for testing its capabilities and performance in the anonymization of some input texts. All the presented studies have been performed by implementing LangChain pipelines. Finally, the findings of this thesis are followed by a discussion summarising those results and indicating that more research should be made available in development work on how private data can get protected so Generative AI will help them to develop such kindles privacy algorithms is not clear. It highlights the need for continued diligence and creativity when it comes to combating increasingly dangerous uses of this powerful technology.

Relatori: Fulvio Valenza
Anno accademico: 2024/25
Tipo di pubblicazione: Elettronica
Numero di pagine: 140
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Accenture
URI: http://webthesis.biblio.polito.it/id/eprint/33202
Modifica (riservato agli operatori) Modifica (riservato agli operatori)