tux penguin with orange security shield

Sanitized Linux

Vision for a Memory-Safe Linux Distro

Open Proposal

Let's create a new Linux distro that builds binaries built with the -fsanitize=address flag by default, so that:

  1. C/C++ memory bugs cause programs to crash rather than enable the system to be compromised.
  2. We can identify new security bugs and patch them, thereby making Linux safer for everyone.

Background

Please read AddressSanitizer: A Fast Address Sanity Checker which is a compiler flag (in both GCC and Clang) that makes existing software written in C / C++ / FORTRAN / etc. nearly memory safe. When ASAN was first invented by Google, it was able to identify 300 previously unknown bugs in Chromium, e.g. buffer overruns and stack overflows. It's since been brought to kernelspace by a project called KASAN. It's even used by memory safe languages like Rust to add memory safety to all the code that uses the unsafe keyword. Therefore ASAN is in some respects the root of memory safety for modern software.

The tradeoff is it's got an average slowdown of 73% for CPU-bound workloads. Since our intended audience are the operators of backend systems doing I/O-bound workloads of untrusted network data, we feel that the benefits of not getting hacked are worth the negligible cost we see the ASAN runtime imposing on network latencies, and therefore feel it's worthwhile to have the freedom to choose to have the benefits of ASAN apply systemically.

Overview

We aim to build Sanitized Linux by forking Alpine Linux and reconfiguring it to build the kernel and entire userspace using the -fsanitize=address flag. We will then make installer ISOs and prebuilt binaries available via a website and an APK-based package service.

We like the idea of doing ASAN as a Linux distro, because it brings a community mandate for developers to more broadly engage from the bottom-up in finding security bugs and ensuring they get fixed. Right now developers mostly only use ASAN to spot bugs in their own projects. We need a way to keep ASAN plugged in to the broader ecosystem. Linus Torvalds himself once said, many eyeballs make all bugs shallow. Google gave us the microscope by inventing ASAN. Sanitized Linux will bring us the eyeballs.

We like Alpine because (1) it's popular for production containers and (2) the Alpine authors successfully managed to transplant glibc with an alternative more permissively licensed libc, which should be an encouraging sign that the further tuning we'd need to do at the build system level should be feasible. Alternatively, Debian or RPM could be used, since the method of packaging is orthogonal to the runtime hardening benefits that ASAN offers.

One criticism that's been encountered floating this idea is that the libsanitizer runtime (that comes included with libgcc) is 54,000 lines of code which are highly tunable via environmental variables intended for developers, which could be abused if productionized for things like setuid binaries. We intend to address any such concerns by using a trimmed-down runtime without the bells and whistles for release builds. For an example of a freestanding ASAN runtime needing fewer than a thousand lines of code, see //libc/intrin:asan.c from the Cosmopolitan Libc codebase.

Lastly, to safeguard against scenarios where binaries are crashing so often that the system isn't functional, we intend to add a feature to htop that lets the system administrator flip a bit in a process that causes it to go into log mode rather than crash mode. This bit could be inheritable to child processes and perhaps also require superuser privileges to set, depending on the user's choices. It could also be set on processes spawned from the command line in a manner similar to the nice command. This would enable systems to fallback to a less secure but more functional state should the need arise.

Timeline

We believe that we can ship a polished Linux distro for backend serving that's fully hardened by ASAN in less than six months.

Please note that the intended audience for Sanitized Linux is the people who operate production services on the backend. We haven't investigated what it would take to have a fully memory safe Linux Desktop, but we potentially could, if the interest is there.

Technical Details

The way ASAN works for x86 userspace in PIE mode is each time the compiler generates an instruction that accesses a byte of memory at address x, a few additional asm opcodes are generated too, in order to ensure that a "shadow bit" is set at the concomitant address x>>3.

+---------------------------------------+------------------------------+-------+
| pointer address range                 | description                  |  size |
+=======================================+==============================+=======+
| 0x0000000000000000-0x00000000001fffff | traditional NULL guard pages |   2mb |
| 0x0000000000200000-0x000001ffffffffff | unused in -fpie pml4t model  |   2tb |
| 0x0000020000000000-0x00000fffffffffff | shadow virtual memory bits   |  14tb |
| 0x0000100000000000-0x00007fffffffffff | program virtual memory bytes | 112tb |
+---------------------------------------+------------------------------+-------+

This incurs a cost of 1/8th additional memory required. Please note that the terabyte sizes above are for the entire x86 user virtual memory space, and that programs usually only map a small portion of that at 4096-byte granularity.

For example, where the compiler would normally generate:

	mov	(%rdi),%rax

GCC and Clang w/ -fsanitize=address will generate code that looks like this instead:

	mov	%rdi,%rsi
	shr	$3,%rsi
	cmpb	$0,(%rsi)
	jnz	abort
	mov	(%rdi),%rax

By having a Linux distro we're also able to choose which compiler flags the gcc and g++ commands use by default. This way we can can ensure that ASAN not only applies to the binaries we distribute, but also to the binaries that the conventional ./configure && make && make install workflow generates too!

About

This proposal is written by Justine Alexandra Roberts Tunney on February 19th, 2021.

She's the author of Cosmopolitan Libc which makes C a build-once run-anywhere language. Before that she spent six years working at Google on prominent open source projects like TensorFlow and Nomulus. She also spearheaded one of Google's notable open source security improvement initiatives Operation Rosehub. Before Google, she created the OccupyWallSt.org website and Twitter handle which gave a loudspeaker to an international grassroots movement.

Justine can bottom-line Sanitized Linux if tech industry leaders are open to funding its development. She resides in the Bay Area and can be contacted via email:

You can follow her at:

See Also