The Shift to Rust in Enterprise Kernels: Progress and Challenges
On October 10, 2024, during the recent Maintainers Summit, it was agreed that the experiment of incorporating Rust into kernel development would progress, paving the way for more Rust code integration. Despite this strategic vision, detailed challenges accompany these efforts, as highlighted in a recent discussion on the nouveau mailing list—a key discussion point, given the move to replace older technologies with more modern solutions.
The Players
The nouveau driver, an open-source initiative, has long been the mainstay for handling NVIDIA GPUs. Originally developed through reverse engineering due to a lack of support from NVIDIA, it joined the mainline kernel in 2009 with version 2.6.33. However, the dynamics have shifted as NVIDIA has taken a more open stance toward free software, now actively supporting nouveau's development. This cooperation coincides with Ben Skeggs, a longstanding nouveau developer, joining NVIDIA.
Despite serving well, nouveau faces optimization for newer GPUs. Thus, the Nova project has emerged. Nova aims to develop a new Rust-written driver for NVIDIA GPUs. The rationale for using Rust lies in its capacity to efficiently interface with a GPU's often variable firmware, although it's acknowledged Nova isn't yet ready for mainstream implementation.
A New Subsystem and Its Implications
Meanwhile, NVIDIA's ambitions include leveraging their GPUs in cloud environments through a new 'vGPU' feature. Announced recently by Zhi Wang, this facilitates virtual machine access to NVIDIA GPUs, enhancing GPU usability amid rising demand for cloud-based solutions employing large language models. While vGPU builds on the existing Virtual Function I/O (VFIO) subsystem, which controls IOMMUs for secure device access, it still relies on nouveau, a notion attracting critical attention from Nova developers due to potential redundancy and the long-term shift to Nova.
Backporting Concerns
The debate further encompasses backporting challenges to older enterprise kernels lacking Rust support, crucial for wider vGPU adoption, especially on systems predating version 6.0. This clash between new development and legacy compatibility underscores the strategic necessity of redesigning for longevity, though short-term realities lean towards continuing with nouveau for practical reasons.
Disagreement and Future Options
A spirited debate unfolded, focusing on how to manage the balance between novo-written Rust code and necessary C-based implementations for backporting. Suggested solutions included designing a C-based core driver for old-kernel adaptability, although dissent like Dave Airlie’s underscored the importance of minimizing firmware-induced instability through Rust. Long-term, transitions and problem-solving strategies might better align efforts across communities and adapt to future technological landscapes.
Despite the complexities and heated exchanges, a commitment to cooperative solutions rings clear. Consensus about strategic goals persists, with optimism for agreements ensuring broader alignment. While backport complexities pose real challenges, the introduction of Rust into kernel development is inevitable, catalyzing continued discussions about timing and implementation.
For more insights, the original article can be found on LWN.net.