June 11th, 2022 @ justine's web page

APE Loader

Actually Portable Executable is a binary format that runs on seven operating systems. I'm pleased to announce your executables will no longer need to modify themselves on the fly in order to do that. We're now using the ape-no-modify-self.o bootloader for most programs in the Cosmopolitan repository. It works by embedding a 4kb Linux+MacOS+BSD executable command inside each executable that's named ape.

$ ape
usage: ape   PROG [ARGV1,ARGV2,...]
       ape - PROG [ARGV0,ARGV1,...]
αcτµαlly pδrταblε εxεcµταblε loader v1.o
copyright 2022 justine alexandra roberts tunney
https://justine.lol/ape.html

The ape loader works by calling mmap() to load your executable into memory the same way the operating system would. That means it's scalable, since mmap() means the executable doesn't need to use the expensive cp command. Pages aren't loaded off disk until they're actually used by your program.

Download   [Linux] [MacOS] [FreeBSD] [OpenBSD] [NetBSD]

# for linux+freebsd+netbsd+openbsd
# please install this to /usr/bin/ape
-rwxr-xr-x 1 501 jart 4.2K Jun 11 05:46 ape.elf

# for mac os x
# please install this to /usr/bin/ape
-rwxr-xr-x 1 501 jart 4.0K May 22 05:06 ape.macho

How It Works

What makes the loading process possible, is the APE loader does a quick scan of the first 4096 bytes to look for the classic APE printf shell command that would otherwise mutate the binary. For example, Actually Portable Executables normally look something like this:

MZqFpD='
^@^@^P^@<F8>^@^@^@^@...
^@^@<B2>@<EB>^@<EB>^T<...
#'"
exec 7<> "$o" || exit 121
printf '\177ELF\2\1\1\011\0\0\0\0\0...' >&7
exec 7<&-
exec "$0" "$@"

So your ape loader only needs to parse out those octal printf codes and apply them to the MAP_PRIVATE copy of your executable that's loaded into memory. No disk changes are needed. It can then read the ELF program headers and call mmap() as many additional times as needed to load the various segments of your program off disk.

Now you may be asking, what happens if the APE loader isn't installed on my system? In that case, it'll try to dd the 4kb copy of the APE loader that's embedded within the host executable, out to the safest folder that's guaranteed to work, namely ${TMPDIR:-${HOME:-.}}/.ape. If your operating system defines the POSIX-specified $TMPDIR variable, then the ape loader will become $TMPDIR/.ape. Otherwise if $HOME is defined, it's dropped in in $HOME/.ape. Then, if neither is defined, ./.ape is created in the current directory.

Performance

If we microbenchmark the time it takes to fork, spawn your APE program, and then wait for it to terminate, then this new solution takes about 40 microseconds, just like the old APE format.

Amongst programs linked with Cosmopolitan Libc, APE binaries will always have optimal performance. Because Cosmopolitan's execve() wrapper is smart enough to know when it's spawning an APE binary, in which case that binary can be handed off directly to a program like /usr/bin/ape instead of the kernel. It all happens seamlessly.

# fork()+execve()+waitpid() / system() latency for 100 byte ELF executable (control data)
ForkElf             l:   207,487c    67,017ns
VforkElf            l:   102,512c    33,111ns
SystemElf           l:   658,085c   212,558ns

# fork()+execve()+waitpid() / system() latency for ape.o bootloader
ForkApeClassic      l:   292,055c    94,332ns
VforkApeClassic     l:   140,869c    45,500ns
SystemApeClassic    l:   632,667c   204,348ns  # amortized system() /bin/sh

# fork()+execve()+waitpid() / system() latency for ape-no-modify-self.o
ForkApeNoMod        l:   263,056c    84,966ns
VforkApeNoMod       l:   134,124c    43,321ns
SystemApeNoMod      l: 1,594,151c   514,902ns  # system() is slower due to /bin/sh

# what happens if the executable is 3mb in size (scalability)
ForkNoMod3mb        l:   277,114c    89,506ns
VforkNoMod3mb       l:   143,678c    46,407ns
SystemNoMod3mb      l: 1,609,345c   519,809ns

The only disadvantage to ape-no-modify-self.o (compared to Classic APE) is what happens when your program is interpreted by /bin/sh or bash. Your APE loader can't amortize the cost of running the shell script builtins, since it doesn't automatically assimilate into the local host format (e.g. ELF, Mach-O). But that doesn't mean binaries can't assimilate. Because we've introduced a new flag to APE programs which lets you do just that.

$ file redbean.com
redbean.com: DOS/MBR boot sector
$ ./redbean.com --assimilate
$ file redbean.com
redbean.com: ELF 64-bit LSB executable

Assimilating shouldn't be necessary, but it does have a few use cases where it's needed to unblock things. For instance, one must assimilate a binary in order for it to security sensitive things such as being a setuid + setgid binary. Assimilating is also needed if you want your program to be a system-wide shebang interpreter installed to a folder.

#!/usr/bin/python.com
print("hello world")

So if you package programs for distros, then please help us, since all the blockers to doing that have been cleared.

binfmt_misc

Linux users who want even more of a performance edge with APE will be pleased to hear that the ape loader now supports the Linux kernel's binfmt_misc interface. We provide an install script in the Cosmopolitan Repository which can be used to set it up.

$ git clone https://github.com/jart/cosmopolitan
$ cd cosmo
$ ape/apeinstall.sh

Once that's done, we can see some noticable improvements in the microbenchmarks above:

ForkApeClassic      l:   251,039c    81,084ns
VforkApeClassic     l:   106,924c    34,536ns
SystemApeClassic    l:   617,189c   199,349ns

ForkApeNoMod        l:   220,373c    71,179ns
VforkApeNoMod       l:   105,982c    34,232ns
SystemApeNoMod      l:   643,594c   207,877ns  # woot

ForkNoMod3mb        l:   246,056c    79,475ns
VforkNoMod3mb       l:   116,121c    37,506ns
SystemNoMod3mb      l:   646,948c   208,961ns  # woot

As a project, APE has always had a love-hate relationship with binfmt_misc, since I personally think the Thompson Shell hack works well enough that we don't need it. binfmt_misc also has a history of breaking things more than it helps, which we'll go over presently.

WSL and WINE

It's possible to use APE programs (and even build the Cosmopolitan repo too!) on Microsoft's Linux Subsystem for Windows, which they call WSL. However that might not work out of the box.

It turns out that Microsoft configures WSL by default to be able to run Windows portable executables. They do this in some kind of non-standard way using binfmt_misc. Why they do this, we can only guess, because it creates a completely unholy environment where a WIN32 hosted executable is running on a Linux file system, whose properties can't be accessed without resorting to Microsoft's cryptic NTDLL internals.

So if you're getting puzzling errors on WSL or WINE, and the ape/apeinstall.sh shell script didn't work, then the thing we recommend is just disabling binfmt_misc entirely.

sudo sh -c 'echo -1 >/proc/sys/fs/binfmt_misc/status'

Doing that makes WSL work like a charm! Aside that one blemish, it really is an outstanding piece of software that Microsoft made, since they really made things go fast. WIN32 is notorious for going slow whenever we try to do anything in a UNIX-ey kind of way, due to all its virus scanning, content indexing, and layered service providers. But if you opt-in to textmode bash, then the yoke of tyranny is lifted and you can compile code faster than ever. So great work on Microsoft's part.

See Also

Funding

[United States of Lemuria - two dollar bill - all debts public and primate]

Funding for this blog post was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like Actually Portable Executable possible. Thank you.