Mirco Barone
Toward High-Speed Tunneling Technologies: A New WireGuard Parallel Architecture for Linear Throughput Scaling.
Rel. Fulvio Giovanni Ottavio Risso. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2024
|
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) | Preview |
Abstract: |
Multi-cloud solutions offer a potential avenue for reducing the dominance of current hyperscalers. In this context, multiple clusters can be established across different providers or, alternatively, in various geographical locations, and appropriately interconnected to enable interoperability. However, existing solutions aimed at achieving this goal (such as Submariner.io, Liqo.io) yield unsatisfactory results in terms of network bandwidth. For instance, when technologies like Wireguard are employed, the maximum speed between two clusters is on the order of a few gigabits per second. Wireguard, one of the most commonly used tunneling technologies in Linux, is known for its simplicity and excellent integration with the Linux kernel. Despite its widespread adoption, it struggles to provide high-speed connectivity between two sites when using a standard single-tunnel configuration. This poses a significant limitation when a secure, high-speed interconnection is required. Indeed, the ability to scale Wireguard performance with the number of available CPU cores is somewhat limited, even with a software architecture that is intrinsically parallel. The primary aim of this thesis is to investigate the main features of Wireguard and the state of the art of the current solutions developed to enhance its throughput. Secondly, it aims to identify existing limitations and propose an improved architecture that facilitates effective scaling, achieving a nearly linear throughput increase depending on the number of involved CPU cores. The thesis examines the architecture of a single tunnel setup, highlighting how, despite its ability to parallelize encryption and decryption stages, the presence of serial per-tunnel stages still imposes a limit on the use of additional resources. It then shifts focus to a multi-tunnel architecture. However, the analysis reveals that merely leveraging multiple tunnels can result in no scaling at all, due to a subtle “black hole” condition related to the NAPI poll functions when using the standard softirq-based NAPI. This limitation can be overcome by enabling the threaded NAPI on Wireguard interfaces. However, despite being able to leverage all the resources of our nodes, this approach still shows far from linear performance improvement when increasing the number of allocated cores. To further enhance performance, a modified architecture is proposed. This architecture handles all Wireguard stages inline for each flow, in a signal processing context on a single core, eliminating the costs of task and cache synchronization. This improved architecture, tailored for multi-tunnel support, demonstrates almost a 2x performance improvement over a multi-tunnel deployment based on the vanilla Wireguard implementation, and is capable of supporting 18x the throughput of a single tunnel setup on our machines. This approach is not a one-size-fits-all solution. Its main limitation currently lies in the inability to parallelize the encryption/decryption stages for a single flow, which could potentially penalize elephant flows. However, it provides an interesting starting point for further discussion and represents a first step towards a more scalable Wireguard architecture. |
---|---|
Relators: | Fulvio Giovanni Ottavio Risso |
Academic year: | 2023/24 |
Publication type: | Electronic |
Number of Pages: | 76 |
Subjects: | |
Corso di laurea: | Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering) |
Classe di laurea: | New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING |
Aziende collaboratrici: | UNSPECIFIED |
URI: | http://webthesis.biblio.polito.it/id/eprint/32364 |
Modify record (reserved for operators) |