polito.it
Politecnico di Torino (logo)

Correcting Geographic Data in Amazon’s EU and NA Networks: A Scalable Ad-hoc Regression-Based Solution

Marcelo Bastos Lopes Ferreira

Correcting Geographic Data in Amazon’s EU and NA Networks: A Scalable Ad-hoc Regression-Based Solution.

Rel. Guido Perboli, Filippo Velardocchia. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2024

Abstract:

Amazon's logistics network in Europe and North America struggles with geographic data inaccuracies affecting operations. This thesis presents a new algorithm to fix these errors, improving the internal graph representation of Amazon's network, crucial for internal optimization models. Amazon started as an online bookstore and evolved into a global e-commerce, cloud services, and AI leader through innovation and reinvestment. Its efficient supply chain ensures quick deliveries. The EU Supply Chain Science team optimizes Amazon’s supply chain, using machine learning and optimization models to enhance operational efficiency. Within EU SCS, the True North project creates an end-to-end Sales and Operations Planning plan, optimizing speed and cost of delivery by using forecasted demand at the ZIP code granularity to plan workforce allocation and transportation connectivity. Within TN, Polaris solves the multicommodity network flow problem, minimizing costs and delivery times based on the demand at sink nodes and available inventory at source nodes. Within Polaris, accurate data is essential for reliable decision-making and optimizing the logistics network. Errors in geographic data lead to inefficient routing and higher costs. Amazon Location Service offers features similar to Google Maps but is too slow for large-scale operations like True North. Machine learning provides the needed scalability and accuracy. This thesis focuses on identifying and correcting geographic data errors that affect Polaris' final solution using Machine Learning. Using the calculated Great Circle Distance between nodes, it is possible to estimate routing distances and correct location coordinates by comparing them with public postal code data in the EU and NA. In the EU, a probabilistic analysis further refines coordinates fixing. The Residual Augmented Regression (RAR) algorithm developed to estimate routing distances operates in two steps: initial prediction using a basic linear regression model and residual modeling using a continuous function. RAR functions by modeling residuals, specifically the created metric SIRE (Signed Inference Relative Error), by fitting a continuous function to the moving average SIRE. The RAR algorithm consistently outperformed other models in accuracy for both EU and NA, while training as fast as Ridge regression, which has a closed-form solution. The developed regression augmentation algorithm derives continuous and interpretable equations for inference, enabling us to generate an equation to estimate distances accurately between two coordinates for both EU and NA. RAR is also potentially adaptable for broader applications, potentially augmenting other regression algorithms. When inspecting the differences in the final solution of Polaris, the outputs showed improved accuracy and cost savings. Metrics like MSE and MAE improved significantly after correcting errors. Comparing Polaris outputs with historical data on actual delivery flows revealed substantial improvements. Error metrics for FC to delivery station flows improved by 6.39% for MSE and 2.41% for MAE, and for FC to country flows, MSE improved by 13.18% and MAE by 5.82%. In the EU, in the recommendation generated by True North, the correction algorithm allowed for 14,916 more units of demand to be delivered, reduced weekly costs by around $60,763.33, and corrected a cost underestimation of $132,270.95. In NA, it reduced weekly costs by approximately $492,412.59 and corrected a cost overestimation of $7,531,460.

Relatori: Guido Perboli, Filippo Velardocchia
Anno accademico: 2023/24
Tipo di pubblicazione: Elettronica
Numero di pagine: 142
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Aziende collaboratrici: Amazon (LUXEMBOURG)
URI: http://webthesis.biblio.polito.it/id/eprint/31771
Modifica (riservato agli operatori) Modifica (riservato agli operatori)