Politecnico di Torino (logo)

Scaling Approximate Linear Programming: Langevin Based High-Dimensional Sampling for Constraint Violation Learning

Virginia Marcante

Scaling Approximate Linear Programming: Langevin Based High-Dimensional Sampling for Constraint Violation Learning.

Rel. Sandra Pieraccini. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview

Approximate Linear Programs (ALPs) are fundamental models for computing value function approximations and bounds for high-dimensional Markov decision processes (MDPs) arising in a wide range of applications. ALP has a manageable number of variables and a large number of constraints. Constraint generation and sampling are two traditional approaches to handle the numerous constraints. The former approach has limited applicability. The latter approach, while broadly applicable, does not guarantee a valid bound on the optimal policy value. Constraint violation learning is a recent approach for solving ALP that combines first-order methods with Metropolis Hastings sampling to provide a general purpose approach that retains the bounding property of ALP and has convergence guarantees. Its use of Metropolis Hastings however limits its scalability. This thesis introduces a novel adaptation of constraint violation learning based on Langevin Dynamics (LD) that allows us to scale the solution of ALPs to higher dimensional applications. The primary contribution of the thesis is a comprehensive examination of LD for constraint violation learning. This is a challenging setting for LD because sampling distributions are not log-concave in general, which is when LD has well understood theoretical properties. Nevertheless, we find that a log-concave regularization term used within constraint violation learning plays an important role in ensuring the effectiveness of LD. We argue that the core computational cost of LD is dimension independent. Through a comprehensive numerical study on a perishable inventory control application, this thesis demonstrates that LD maintains consistent running times even as MDP dimension increase, making it a scalable option for solving ALP. In contrast, Metropolis Hastings does not scale as effectively, as expected. This research extends the scalability of ALP using an LD based sampling approach in constraint violation learning. The exploration of LD for solving ALP is novel, thus bringing together tools from machine learning and operations research. We are also not aware of the use of LD to address the large scale nature of mathematical programs (e.g., linear programs). This thesis takes a first step at exploring this interface and its findings motivate the exploration of LD for large-scale optimization beyond ALP.

Relators: Sandra Pieraccini
Academic year: 2023/24
Publication type: Electronic
Number of Pages: 73
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/29602
Modify record (reserved for operators) Modify record (reserved for operators)