polito.it
Politecnico di Torino (logo)

Cluster Analysis of Financial Transaction Data

Jacopo De Cristofaro

Cluster Analysis of Financial Transaction Data.

Rel. Elena Maria Baralis, Flavio Giobergia. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview
Abstract:

Banks, being the pillars of every international financial system, have the duty to detect suspicious money movements related to criminal activities such as fraud, terrorism financing or money laundering, in order to protect their clients and countries from significant losses. The techniques used in the past, based on rule-based definitions, are no longer effective: the increase in digitization has not only made it easier for criminals to evade these systems but has also significantly increased the speed and the volume of transactions to be analyzed, rendering them obsolete. More reliable outlier detection systems must be built, necessarily based on the usage of big data and artificial intelligence techniques, to label suspicious transactions and provide useful insights to the final human operator, who will be responsible for conducting the necessary investigations to determine the real nature of the marked money exchange. The goal of this thesis is to design and implement a clustering-based pipeline, which is part of a larger architecture that aims to detect anomalies in a dataset of pass-through transactions. Specifically, while the other already implemented ones have the task of directly detecting anomalies at different levels of granularity (transaction-level or user-level) using unsupervised algorithms, the pipeline to be presented will primarily focus on enriching the information of the final user reports. For this purpose, the transactions will first be aggregated at various levels of granularity (users, banks, or countries) through a feature engineering process guided by the indications of domain experts. Subsequently, by employing clustering algorithms, actors with similar behavior in the chosen feature space are going to be detected. Multidimensional space does not allow for an easy interpretation of the clustering result and so, a continuous feature quantization step followed by a frequent itemsets extraction one will generate the descriptors, that better highlight the common structures of the entities within each cluster. Finally, an aggregate result analysis step is going to produce a small number of clusters that can be studied and labeled by a domain expert to facilitate further investigations of the reported anomalous users. In addition, to enhance the second pipeline, a new Outlier Detection Model based on the clustering output will be presented. The methodologies used for the entire pipeline implementation will be described in detail, along with the experimental results obtained.

Relatori: Elena Maria Baralis, Flavio Giobergia
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 85
Soggetti:
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: NON SPECIFICATO
URI: http://webthesis.biblio.polito.it/id/eprint/29504
Modifica (riservato agli operatori) Modifica (riservato agli operatori)