Aug 7th, 2022 @ justine's web page
I've modified GNU Make to support strict dependency checking. This is all thanks to the Landlock LSM system calls which were introduced in Linux Kernel 5.13 twelve months ago. What it means is that Make can now solve the cache invalidation problem similar to Bazel except with 5x better performance.
I blogged last month about our work porting OpenBSD pledge() and unveil() to Linux as part of the Cosmopolitan Libc project. The thought occurred to me that sandboxes aren't just good for security: they have applications in build systems too. So I used unveil() to patch GNU Make so it can function like a zero-configuration sandbox, and I'm making this work available to the community using the Actually Portable Executable format.
The basic idea is when Make runs a command, that command should only have access to a limited number of files:
That way, if some rogue unit test accidentally tries to rm -rf
/
, the kernel will simply reject it using an EACCES
error, because your root directory wasn't declared as a dependency in
your Makefile config.
For convenience, I've also chosen to implicitly whitelist a few other hard-coded paths. The following files are always unveiled by Make:
o/tmp
(rwcx
perm) and /tmp
(rwc
perm) for temporary files
o/third_party/gcc
(rx
permission) for static toolchain binaries
build/bootstrap
(rx
permission) for chicken-and-egg build tools
/dev/stdout
, /dev/stderr
, and other harmless well-known paths
Landlock Make is configured simply by writing a normal Makefile. For example, you can read the landlock-make/Makefile template to get the basic idea. However there's sometimes cases where you want to do something special. Special variables have been introduced for this purpose, which can be specified on a per-target basis:
TARGET: private .UNSANDBOXED = 1
to disable sandboxing on a build target
TARGET: private .UNVEIL = [rwcx:]PATH...
to unveil without using prerequisites
.UNVEIL
works basically the same way as the new
.EXTRA_PREREQS
variable that was added to GNU Make this
last year. You can specify as many paths as you want. The permission
defaults to read-only, but you can override that by putting the
appropriate letters with a colon in from of the file path. The
permissions take effect recursively too.
It's important to use the private
keyword, because GNU Make
variable inheritance makes it far too easy to accidentally remove safety
from everything. For example, if you define a variable on a rule that
generates an executable without using private, then the variable
definition will also apply to all the object files going into that
executable.
Another configuration option is .STRICT
(updated: now mandatory as of landlockmake v1.5) mode
which turns off all the implicitly unveiled stuff,
including $PATH
resolution, which means you can explicitly
define the perfect hermetic environment. Here's how Cosmopolitan uses it
alongside a global
.UNVEIL
variable:
.STRICT = 1 .UNVEIL = \ rwcx:o/tmp \ libc/integral \ libc/disclaimer.inc \ rx:build/bootstrap \ rx:o/third_party/gcc \ /proc/self/status \ rw:/dev/null \ w:o/stack.log \ /etc/hosts \ ~/.runit.psk
Landlock Make can build code five times faster than Bazel, while offering the same advantages in terms of safety. In other words, you get all the benefits of a big corporation build system, in a tiny lightweight binary that any indie developer can love.
To demonstrate this, I've configured
this
repository to compile 448 .c
files which are linked
into 40 executables. Building 448 files in 448 different sandboxes
takes:
Landlock Make is the winner here and Bazel is wrekt. The benchmark was performed on a 2 core Ubuntu 22.04 VM with 4gb of RAM running Linux 5.15. Landlock requires Linux 5.13+. If you don't have Landlock in your kernel, then GNU Make will silently continue along without sandboxing.
Here's a patched prebuilt fat binary of Landlock Make for x86-64 and Arm64 operating systems. Sandboxing is only supported on Linux and OpenBSD. For the other platforms, you've just got a nice portable drop-in replacement for the GNU make command.
landlockmake-2.0.com (2023-07-27)
1.2mb - PE+ELF+MachO+ZIP+SH (source code)
a21d326f52e8bf70783361cb750c7ba458fda0d618e866c6e8791f804167619e
mkdeps-2.0.com (2023-07-27)
526kb - PE+ELF+MachO+ZIP+SH (source code)
da79878af2fa430cd4d296a2c1f7779123b817305822fd12414df46382e72006
ar-2.0.com (2023-07-27)
528kb - PE+ELF+MachO+ZIP+SH (source code)
df7583f353447ffcd3d63fc85683b5a2dd9aa21584aa1fe3054c4fe0506291a7
Here's a template project for getting started.
https://github.com/jart/landlock-make
The example repository explains how to write a best practices Makefile configuration that utilizes Landlock Make features. It also contains a Bazel configuration so you can reproduce our benchmarks.
git clone https://github.com/jart/landlock-make cd landlock-make build/bootstrap/make.com
You can build Landlock Make from source here:
git clone https://github.com/jart/cosmopolitan cd cosmopolitan make -j8 o//third_party/make/make.com
GNU Make already has a file dependency graph. It's a rich data structure you define when you write your Makefile. It's a no-brainer to leverage that data to implement a zero-configuration sandbox. That's the only way to automatically prove a build configuration is correct. This technique is commonly known as strict dependency checking. What it means is that each target must declare all its dependencies. This must happen, since otherwise GNU Make can't solve the second hardest problem in computer science, which is cache invalidation.
Without strict dependency checking, your Makefile is going to behave in
strange and mysterious ways. You'll be constantly frustrated and running
make clean
whenever something goes wrong, which slows
things down by forcing everything to start over. In the traditional
world of Make, even if you take great care in writing your makefile,
there's simply no way to prove it's correct without sandboxing. It's the
missing link we've been wanting for decades. It's a surprise no one's
done it sooner.
Google came to a similar conclusion back in the 2000's. They solved this by ditching GNU Make and inventing a new build system called Blaze. A blog post was published back in 2011 announcing their work. Google said strict dependency checking was the key motivator for reinventing things. Blaze was then later open sourced to the public as Bazel in 2015, but it wasn't until 2021 that it was able to do strict dependency checking.
Because Bazel was written a long time ago, it implements sandboxing in a clumsy way. Bazel creates a giant hierarchy of symbolic links. Then it mounts and unmounts a ton of folders to create a fake filesystem which is how they limit access. It's all written in Java, which isn't very popular in the open source community. Bazel does however deserve credit for all the work they put into making Java as tiny as possible. Bazel is shipped as a 40mb single-file binary that extracts itself on the fly. That's pretty impressive by Java standards, but it's still a monster compared to my slim and sexy 519kb make.com binary which runs on six operating systems and doesn't require extraction. It's only got a few microseconds of startup latency too.
Mega-corporations love Bazel because its safety benefits enable them to scale their eng efforts into monolithic repositories with petabytes of code. So naturally they don't care that much if Bazel is fifty megs. I however refuse to believe that safety and professionalism go hand in hand with bloat. Not at any scale. I believe we can have our cake and eat it too. That's why I view Landlock as being such a game changer. It lets us have 85% the benefits of Blaze, in a tiny lightweight package. Due to the fact that all the complexity of sandboxing is now being abstracted by the Linux Kernel, all that I needed to do was add about 200 lines of code to the GNU Make codebase. No root, no mounts, no chroot, no cgroups, and especially no Docker required! All you have to do is issue a system call that tells the kernel which paths should be accessible.
Here are some basic troubleshooting commands you can try, should you encounter any problems:
./make.com --strace # system call logging ./make.com -pn # dump build graph ./make.com --ftrace # very verbose!
Landlock Make offers the strongest sandboxing when you:
If your build rule launches a dynamic or interpreted executable that relies on distro-installed files which are outside your project folder (e.g. /usr/bin/cc) then Make will react by unveiling a very broad list of paths:
/bin
with rx
permissions
/lib
with rx
permissions
/lib64
with rx
permissions
/usr/bin
with rx
permissions
/usr/lib
with rx
permissions
/usr/lib64
with rx
permissions
/usr/local/lib
with rx
permissions
/usr/local/lib64
with rx
permissions
/etc/ld-musl-x86_64.path
with r
permissions
/etc/ld.so.conf
with r
permissions
/etc/ld.so.cache
with r
permissions
/etc/ld.so.conf.d
with r
permissions
/etc/ld.so.preload
with r
permissions
/usr/include
with r
permissions
/usr/share/locale
with r
permissions
/usr/share/locale-langpack
with r
permissions
So basically, depending on any system-provided functionality will schlep
in nearly all system-provided functionality. This isn't a great
situation to be in, since at that point, you're a hair's width away from
needing Docker. If you're not sure if you're being impacted, then you
can use make.com --strace
to see what it does. The
landlock-make GitHub template repository takes a more conservative
approach, of vendoring a custom-built musl-cross-make gcc toolchain. It
only relies on the system for very trivial commands,
e.g. mkdir
.
Yes,
the Makefile
config in the landlock-make GitHub template repo is very verbose.
Cosmopolitan Libc has tools for solving that. The
mkdeps
program is able to crawl 1.5 million lines of code in 100ms on my PC
to generate a 175,712 line o/depend
file. It's so much
faster than using gcc -M
and it totally automates the
arduous task of explicitly declaring header file dependencies. Give it a
try. The download link is above.
The mkdeps.com program is usually invoked as follows:
./mkdeps.com -o o//depend -r o// @o//srcs.txt @o//hdrs.txt @o//incs.txt
The @ symbol is useful for alternatively passing arguments in a file,
which is useful for situations where you have so many source files that
they'd otherwise exceed ARG_MAX
. Modern Make is really good
at quickly generating arguments files. For example, you might configure
mkdeps in your Makefile as follows:
uniq = $(if $1,$(firstword $1) $(call uniq,$(filter-out $(firstword $1),$1))) o//srcs.txt: $(call uniq,$(foreach x,$(SRCS),$(dir $(x)))) $(file >$@,$(SRCS)) o//hdrs.txt: $(call uniq,$(foreach x,$(HDRS) $(INCS),$(dir $(x)))) $(file >$@,$(HDRS) $(INCS)) o//incs.txt: $(call uniq,$(foreach x,$(INCS) $(INCS),$(dir $(x)))) $(file >$@,$(INCS)) o//depend: o//srcs.txt o//hdrs.txt o//incs.txt $(SRCS) $(HDRS) $(INCS) ./mkdeps.com -o $@ -r o// @o//srcs.txt @o//hdrs.txt @o//incs.txt
Another thing to take into consideration, is it's best to refrain from using shell script syntax in your build commands. If you don't use any special characters, then GNU Make has an optimization where it'll pass your command and arguments directly to execve(). That way Landlock will know exactly which executable should be whitelisted. If you use special shell syntax, then the files in your shell script might not be whitelisted automatically, since we currently aren't parsing that.
Since Landlock is still very new, there's a few peculiar kinks about it right now that some folks might find surprising. While we've generally been able to make it consistent on Linux with the OpenBSD behaviors, there's still a few places where it differs slightly.
For example, unlike OpenBSD, Linux does nothing to conceal the existence
of paths. Even with an unveil() policy in place, it's still possible to
access the metadata of all files using functions
like stat()
and open(O_PATH)
, provided you
know the full path ahead of time. This means a sandboxed process can
always, for example, determine how many bytes of data are in
/etc/passwd, even through the contents of the file can't actually be
read. The good news is it's still not possible to use opendir() and go
fishing for paths which weren't previously known. So if you want to play
up your secrecy in addition to security, consider OpenBSD instead of
Linux.
Another truly weird behavior of Linux is that Landlock currently isn't
able to restrict file truncation. For example, did you know that opening
a file on Linux using open(O_RDONLY | O_TRUNC)
will
actually delete the contents of the file? The same is also the case with
the truncate()
system call, which is a blind spot with
Landlock. Right now Cosmopolitan Libc addresses this by blocking those
corner cases using the SECCOMP BPF security policies we've programmed
into our pledge() polyfill. However we're not currently using pledge()
in make.com, since the emphasis is on preventing accidental misuse
rather than preventing malicious misuse. Please note, this may change in
the future, should we decide to beef up the security of make.com. If
this topic interests you, then please reach out and contact us, to let
us know what use cases and dreams you have in mind!
Finally please note that we haven't incorporated the GNU Make tests into the Cosmopolitan Libc continuous integration system yet. Our C library is still a relative newcomer that has gaps in terms of things like locale support. The last time we checked the GNU Make test suite, our port was 80% conformant. That hasn't stopped us from eating our own dogfood though, since we use make.com every single day to maintain all our repositories. If you encounter any issues with it, or are willing to help us expand our C library implementation, then once again please don't hesitate to reach out.
Since my GNU Make fork is an Actually Portable Executable that runs on six operating systems, it'd be great to polyfill unveil() on other operating systems too. The next fun project on my list will probably be looking into FreeBSD jails, since I've heard so many good things about them on online forums.
I'd like to thank Mickaël Salaün for his work on bringing Landlock to the Linux Kernel, as well as being a big help on Twitter. Stephen Gregoratto contributed the Linux unveil() implementation to Cosmopolitan Libc in #490. Gautham Venkatasubramanian contributed the initial port of GNU Make to Cosmopolitan Libc in PR #305. I'd also thank Günther Noack for offering superb code reviews and feedback.
Funding for the development of this project was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like Landlocked Make possible. Thank you.