Politecnico di Torino (logo)

Business Continuity in Kubernetes Multi-Cluster Environments

Francesco Torta

Business Continuity in Kubernetes Multi-Cluster Environments.

Rel. Fulvio Giovanni Ottavio Risso. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2023

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (4MB) | Preview

Over the past two decades, cloud computing has become a disruptive technology that has revolutionized the way businesses and individuals access and use computing resources. The growth of cloud computing has been driven by the increasing demand for low-cost, scalable, and easily accessible computing resources. Cloud-native applications are software applications that are specifically designed to run on a cloud computing infrastructure. They are built using cloud computing principles and technologies such as microservices architecture, containerization, and orchestration. "Liquid computing" is a term used to describe the ability to dynamically allocate resources as needed, allowing for quick and easy scaling up or down. This "liquid" nature of cloud computing allows organizations to become more agile and respond quickly to changes in their business needs. Liqo is an open source project started at Politecnico di Torino that supports this concept and enables the creation of multi-cluster topologies within Kubernetes. Multiple independent clusters can be interconnected to share resources and workloads, while being managed as a single entity. The goal of this thesis is to investigate how to ensure business continuity in a multi-cluster environment powered by Liqo, enabling an organization to maintain its critical business functions and processes in the event of a disruption. Given the increased complexity of a multi-cluster topology, various failure scenarios are analyzed, taking into account all key elements of the architecture of a Kubernetes cluster. The thesis presents the design and implementation of the ShadowEndpointSlice, an abstraction that allows to transparently guarantee service continuity even if some parts of the multi-cluster infrastructure are out of service. It also presents the implementation of a custom controller that ensures that the expected workload is running on the "big" cluster when some worker nodes are not functioning properly. Lastly, it is described a possible disaster recovery solution that leverages the potential of Liqo to easily use peered clusters as failover sites. Part of the work of this thesis has been integrated into the Liqo project and is available on the official GitHub repository.

Relators: Fulvio Giovanni Ottavio Risso
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 86
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Aruba Software Factory SRL
URI: http://webthesis.biblio.polito.it/id/eprint/26641
Modify record (reserved for operators) Modify record (reserved for operators)