Alessandro Nori
Company entities matching framework powered by machine learning.
Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2020
Abstract
Data matching is an essential process of all enterprises which constantly acquire new data from different systems, both structured and non. This process is usually used to remove duplicates from a database or to avoid the creation of already existing accounts when no common key between the two databases exist. Since data is coming from different sources, a massive step of data cleaning and standardization is needed in order to achieve better similarity measures between records, more representative of the reality. It is also important to apply input reduction techniques, such as blocking predicates, to reduce the number of records compared, otherwise extremely large.
The complete number of pairs of records given a database is proportional to the square of its size and a source of 100 thousands records will generate 10 billions of possible pairs
Relatori
Anno Accademico
Tipo di pubblicazione
Numero di pagine
Informazioni aggiuntive
Corso di laurea
Classe di laurea
Ente in cotutela
Aziende collaboratrici
URI
![]() |
Modifica (riservato agli operatori) |
