June 11th, 2022 @ justine's web page
Actually Portable Executable is a binary
format that runs on seven operating systems. I'm pleased to announce
your executables will no longer need to modify themselves on the fly in
order to do that. We're now using the ape-no-modify-self.o
bootloader for most programs in the
Cosmopolitan
repository. It works by embedding a 4kb Linux+MacOS+BSD executable
command inside each executable that's named ape.
$ ape
usage: ape   PROG [ARGV1,ARGV2,...]
       ape - PROG [ARGV0,ARGV1,...]
αcτµαlly pδrταblε εxεcµταblε loader v1.o
copyright 2022 justine alexandra roberts tunney
https://justine.lol/ape.html
The ape loader works by calling mmap() to load
your executable into memory the same way the operating system would.
That means it's scalable, since mmap() means the executable
doesn't need to use the expensive cp command. Pages aren't
loaded off disk until they're actually used by your program.
![Linux [Linux]](http://worker.jart.workers.dev/redbean/linux.png) 
    ![MacOS [MacOS]](http://worker.jart.workers.dev/redbean/macos.png) 
    ![FreeBSD [FreeBSD]](http://worker.jart.workers.dev/redbean/freebsd64.png) 
    ![OpenBSD [OpenBSD]](http://worker.jart.workers.dev/redbean/openbsd.png) 
    ![NetBSD [NetBSD]](http://worker.jart.workers.dev/redbean/netbsd2.png) 
  
# for linux+freebsd+netbsd+openbsd # please install this to /usr/bin/ape -rwxr-xr-x 1 501 jart 4.2K Jun 11 05:46 ape.elf # for mac os x # please install this to /usr/bin/ape -rwxr-xr-x 1 501 jart 4.0K May 22 05:06 ape.macho
What makes the loading process possible, is the APE loader does a quick scan of the first 4096 bytes to look for the classic APE printf shell command that would otherwise mutate the binary. For example, Actually Portable Executables normally look something like this:
MZqFpD=' ^@^@^P^@<F8>^@^@^@^@... ^@^@<B2>@<EB>^@<EB>^T<... #'" exec 7<> "$o" || exit 121 printf '\177ELF\2\1\1\011\0\0\0\0\0...' >&7 exec 7<&- exec "$0" "$@"
So your ape loader only needs to parse out those octal
printf codes and apply them to the MAP_PRIVATE copy of your
executable that's loaded into memory. No disk changes are needed. It can
then read the ELF program headers and call mmap() as many
additional times as needed to load the various segments of your program
off disk.
Now you may be asking, what happens if the APE loader isn't installed on
my system? In that case, it'll try to dd the 4kb copy of
the APE loader that's embedded within the host executable, out to the
safest folder that's guaranteed to work, namely
${TMPDIR:-${HOME:-.}}/.ape. If your operating system
defines the POSIX-specified $TMPDIR variable, then the ape
loader will become $TMPDIR/.ape. Otherwise
if $HOME is defined, it's dropped in
in $HOME/.ape. Then, if neither is
defined, ./.ape is created in the current directory.
If we microbenchmark the time it takes to fork, spawn your APE program, and then wait for it to terminate, then this new solution takes about 40 microseconds, just like the old APE format.
Amongst programs linked with Cosmopolitan Libc, APE binaries will always
have optimal performance. Because Cosmopolitan's execve()
wrapper is smart enough to know when it's spawning an APE binary, in
which case that binary can be handed off directly to a program
like /usr/bin/ape instead of the kernel. It all happens
seamlessly.
# fork()+execve()+waitpid() / system() latency for 100 byte ELF executable (control data) ForkElf l: 207,487c 67,017ns VforkElf l: 102,512c 33,111ns SystemElf l: 658,085c 212,558ns # fork()+execve()+waitpid() / system() latency for ape.o bootloader ForkApeClassic l: 292,055c 94,332ns VforkApeClassic l: 140,869c 45,500ns SystemApeClassic l: 632,667c 204,348ns # amortized system() /bin/sh # fork()+execve()+waitpid() / system() latency for ape-no-modify-self.o ForkApeNoMod l: 263,056c 84,966ns VforkApeNoMod l: 134,124c 43,321ns SystemApeNoMod l: 1,594,151c 514,902ns # system() is slower due to /bin/sh # what happens if the executable is 3mb in size (scalability) ForkNoMod3mb l: 277,114c 89,506ns VforkNoMod3mb l: 143,678c 46,407ns SystemNoMod3mb l: 1,609,345c 519,809ns
The only disadvantage to ape-no-modify-self.o (compared to
Classic APE) is what happens when your program is interpreted by /bin/sh
or bash. Your APE loader can't amortize the cost of running the shell
script builtins, since it doesn't automatically assimilate into the
local host format (e.g. ELF, Mach-O). But that doesn't mean binaries
can't assimilate. Because we've introduced a new flag to APE programs
which lets you do just that.
$ file redbean.com redbean.com: DOS/MBR boot sector $ ./redbean.com --assimilate $ file redbean.com redbean.com: ELF 64-bit LSB executable
Assimilating shouldn't be necessary, but it does have a few use cases where it's needed to unblock things. For instance, one must assimilate a binary in order for it to security sensitive things such as being a setuid + setgid binary. Assimilating is also needed if you want your program to be a system-wide shebang interpreter installed to a folder.
#!/usr/bin/python.com
print("hello world")
So if you package programs for distros, then please help us, since all the blockers to doing that have been cleared.
Linux users who want even more of a performance edge with APE will be
pleased to hear that the ape loader now supports the Linux
kernel's binfmt_misc interface. We provide
an install
script in the Cosmopolitan Repository which can be used to set it
up.
$ git clone https://github.com/jart/cosmopolitan $ cd cosmo $ ape/apeinstall.sh
Once that's done, we can see some noticable improvements in the microbenchmarks above:
ForkApeClassic l: 251,039c 81,084ns VforkApeClassic l: 106,924c 34,536ns SystemApeClassic l: 617,189c 199,349ns ForkApeNoMod l: 220,373c 71,179ns VforkApeNoMod l: 105,982c 34,232ns SystemApeNoMod l: 643,594c 207,877ns # woot ForkNoMod3mb l: 246,056c 79,475ns VforkNoMod3mb l: 116,121c 37,506ns SystemNoMod3mb l: 646,948c 208,961ns # woot
As a project, APE has always had a love-hate relationship with binfmt_misc, since I personally think the Thompson Shell hack works well enough that we don't need it. binfmt_misc also has a history of breaking things more than it helps, which we'll go over presently.
It's possible to use APE programs (and even build the Cosmopolitan repo too!) on Microsoft's Linux Subsystem for Windows, which they call WSL. However that might not work out of the box.
It turns out that Microsoft configures WSL by default to be able to run Windows portable executables. They do this in some kind of non-standard way using binfmt_misc. Why they do this, we can only guess, because it creates a completely unholy environment where a WIN32 hosted executable is running on a Linux file system, whose properties can't be accessed without resorting to Microsoft's cryptic NTDLL internals.
So if you're getting puzzling errors on WSL or WINE, and
the ape/apeinstall.sh shell script didn't work, then the
thing we recommend is just disabling binfmt_misc entirely.
sudo sh -c 'echo -1 >/proc/sys/fs/binfmt_misc/status'
Doing that makes WSL work like a charm! Aside that one blemish, it really is an outstanding piece of software that Microsoft made, since they really made things go fast. WIN32 is notorious for going slow whenever we try to do anything in a UNIX-ey kind of way, due to all its virus scanning, content indexing, and layered service providers. But if you opt-in to textmode bash, then the yoke of tyranny is lifted and you can compile code faster than ever. So great work on Microsoft's part.
Funding for this blog post was crowdsourced from Justine Tunney's
GitHub sponsors
and Patreon subscribers. Your
support is what makes projects like Actually Portable Executable
possible. Thank you.