
Simran Singh
Model Playground: An Automated, On-Demand Platform for Interactive LLM Experimentation on AWS.
Rel. Marco Torchiano. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2025
Abstract: |
The evolution of Large Language Models (LLMs) has driven significant progress in natural language understanding and generation. However, deploying and experimenting with these models in production-like environment, such as those supporting Amazon Alexa, remains operationally complex. Machine learning (ML) scientists need isolated, production-grade “sandbox” environments that allow them to safely experiment with LLMs and adjust inference parameters (e.g., temperature, top-k) without impacting end users. These parameters are critical controls that influence output quality, creativity, and contextual relevance. Traditional approaches often rely on manual provisioning of GPU instances and static configuration of generation parameters, leading to high engineering overhead, slow iteration cycles, limited scalability, inefficient resource utilization, and restricted flexibility in real-time parameter tuning. This thesis introduces Model Playground, an automated, self-service sandboxing system designed to address these challenges and streamline LLM experimentation on Amazon Web Services (AWS) infrastructure. The system combines AWS Batch for dynamic orchestration, Neuron-optimized EC2 instances for cost-efficient inference, NVIDIA Triton Inference Server for high-performance model serving within Docker containers, and AWS Lambda to provide a lightweight, serverless interface for users to interact with the model. Together, these components create an on-demand, resource- efficient platform that supports rapid iteration, reduces engineering dependence, and improves research agility. Model Playground empowers ML scientists to independently test and iterate on large language models. It enables seamless interaction by allowing an AWS Lambda function to connect via gRPC directly to the inference service exposed on the model-hosting EC2 instance. This facilitates real-time adjustment of generation parameters without the need for service restarts or manual intervention. Evaluation demonstrates that Model Playground significantly reduces operational burden and resource waste through automated provisioning and de-provisioning of compute resources via AWS Batch. By decoupling parameter tuning from deployment workflows, the platform accelerates LLM innovation cycles and enhances experimentation flexibility. This ultimately removes engineering bottlenecks and grants scientists direct control over the deployment process. |
---|---|
Relatori: | Marco Torchiano |
Anno accademico: | 2024/25 |
Tipo di pubblicazione: | Elettronica |
Numero di pagine: | 161 |
Informazioni aggiuntive: | Tesi secretata. Fulltext non presente |
Soggetti: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA |
Aziende collaboratrici: | Politecnico di Torino |
URI: | http://webthesis.biblio.polito.it/id/eprint/36447 |
![]() |
Modifica (riservato agli operatori) |