Alessandro Rigoli
Legal Entity Disambiguation For Financial Crime Forensics.
Rel. Danilo Giordano, Elena Maria Baralis, Jacopo Fior. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2022
Abstract: |
One of the most time-consuming tasks in financial crime forensics is transaction monitoring, which involves reviewing millions of transactions each month for signs of criminal activities. A transaction in this context is a movement of money between an originator, i.e., an individual or a company (that in both cases we generally call a legal entity) that sends the money, and a beneficiary, i.e., a legal entity that receives the money, both of which are associated with an account at the respective financial institution. However, this information collected by financial institutions is often noisy due to the loose syntax of the protocols and the free text fields (e.g., payment description, name of the originator and beneficiary, etc.). In addition, it is important to emphasize that the account identifier, which is often an International Bank Account Number (IBAN), cannot be uniquely linked to an entity and act as its identifier, since many entities may use the same account and the same entity may have different accounts. These data quality issues prevent easy linkage between transactions and the real-world entities who performed them, which is essential for transaction monitoring algorithms to detect suspicious activities. Therefore, it is essential to include a task in the process that performs disambiguation over different representations of the same entity. In this thesis, after presenting a review of the main solutions present in the literature, I propose a data-driven solution, declined first as an unsupervised algorithm and then implemented through a semi-supervised algorithm. This solution aims to identify all entities associated with the same account, a process called intra-account entity disambiguation, and then identify entities associated with different accounts, a process called inter-account entity disambiguation. Since solutions of this type depend heavily on the type of data being handled, I formalize a new domain-specific metric for measuring the similarity between different names of the same entity and test it against the main state-of-the-art metrics. Considering performance and computation time, I determine the feasibility of using these methods in real-world scenarios. I also analyze the behavior of the proposed algorithms in depth using various tests on real-world datasets. As a result, I obtain two highly parallelizable algorithms tuned for real-world data that can be used as preprocessing tasks for Transaction Monitoring, with encouraging accuracy performance that can scale to very large datasets. |
---|---|
Relatori: | Danilo Giordano, Elena Maria Baralis, Jacopo Fior |
Anno accademico: | 2022/23 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 86 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | NON SPECIFICATO |
URI: | http://webthesis.biblio.polito.it/id/eprint/25596 |
Modifica (riservato agli operatori) |