December 18th, 2021 @ justine's web page

LISP with GC in 436 bytes

SectorLISP now supports garbage collection. This is the first time that a high-level garbage collected programming language has been optimized to fit inside the 512-byte boot sector of a floppy disk. Since we only needed 436 bytes, that means LISP has now outdistanced FORTH and BASIC to be the tiniest programming language in the world.

[Binary Footprint Comparison]

SectorLISP consists of 223 lines of assembly. It provides a LISP system that's powerful enough to let you write your own LISP interpreter in just 40 lines of LISP. It's compatible with all PC models dating back to 1981 which have at least 64kb of RAM. This isn't a toy because SectorLISP can run the proof assistant that was included in LISP 1.5. We achieved the small file size thanks to 20/20 hindsight and an unbiased approach of maximum austerity. The goal of this project has been to have fun building a kit that optimizes for file size at the expense of everything else, which means SectorLISP has more in common with a game like Universal Paperclips than a talking paperclip like Clippy.

This is a follow-up to a previous announcement made in October that SectorLISP now fits in one sector. There's been many changes over the past few months that made it possible to shave away another hundred bytes from the i8086 assembly implementation. It left plenty of room to add a 40 byte garbage collector. This blog post will tell the story of how our low-level assembly listing evolved, using a plain and simple C / JavaScript / Thompson Shell polyglot.

Binaries   [Linux] [Windows] [DOS] [MacOS] [FreeBSD] [OpenBSD] [NetBSD]

sectorlisp.bin see also sectorlisp.S
436 bytes — BIOS only w/ i8086 architecture

sectorlisp-friendly.bin see also sectorlisp.S
509 bytes — BIOS only w/ i8086 architecture
2b71dffae9900f3aa280ad6eb3bb2752dffb589a8d9a5463515683d03e7021d8 see also lisp.js
20kb — Linux/Mac/Windows/FreeBSD/OpenBSD/NetBSD/BIOS
869abd2ebd9a31257b768398ac340967f888911dfa9152583fad79575ca11411 -rt sectorlisp.bin # runs emulator
319kb — Linux/Mac/Windows/FreeBSD/OpenBSD/NetBSD/BIOS

7.4kb — ELF debug symbols and DWARF data (optional)

7.6kb — ELF debug symbols and DWARF data (optional)
252kb — ELF debug symbols and DWARF data (optional)
4.4mb — ELF debug symbols and DWARF data (optional)

The .bin floppy disk boot sectors can be emulated using Blinkenlights or QEMU, or you can watch the video below.

curl --compressed >
curl >sectorlisp-friendly.bin
chmod +x
./ -rt sectorlisp-friendly.bin  # then press 'c'
qemu-system-i386 -nographic -fda sectorlisp-friendly.bin

The .com file is slightly larger but it runs on seven operating systems. It's the same as the JS simulator below.

curl >
chmod +x

Sources   [Linux] [MacOS] [FreeBSD] [OpenBSD] [NetBSD]

See the assembly listing listing section. You can build the simulator on your UNIX system if cc is installed:

curl >lisp.js
chmod +x lisp.js

Building from source means you get a better command line interface. The shell script above will curl a GNU Readline replacement we wrote called Bestline from this same domain ( to make sure you have the latest copy, because we've been slowly but steadily recreating support for Emacs' famous Paredit-style editing features.


You can use SectorLISP from the comfort of your browser. The C / JavaScript source code is intended to roughly model the behavior of SectorLISP on bare metal. Since maximum austerity leaves a bad impression for developers getting started, this simulator is from the friendly branch which uses an extended 509 byte implementation that'll be a much better friend for you, since it prints errors on undefined behavior and lets you DEFINE persistent bindings.

0 0
0 0
0 0
0 0

Original Hardware

Jim Leonard was able to confirm for us that SectorLISP does in fact run on the original hardware, or more specifically, the IBM PC model 5150. That was quite thrilling to learn, since SectorLISP was largely written a priori using a emulator on Linux that we created.

[YouTube Video of SectorLISP running on IBM PC 5150]

His video provides more background on what boot sectors are. He talks about similar projects that have been created in the past, such as Oscar Toledo's pioneering work. Then, towards the end of the video, you get to watch SectorLISP actually running on one of the finest computers ever sold, which has been preserved in perfect quality.

For example, you can hear the thunk of the power switch on the IBM PC. It's about as pleasing to hear as the thud sound an expensive German car like a Mercedes makes when you close the door. But the most pleasing of all is the vintage Model F mechanical keyboard, which to this day remains quite possibly the greatest and most respected mechanical keyboard of all time. No keyboard exists in the world that's superior in quality to those sold with the original IBM PC, except for the tenkeyless version of the Model F with APL legends on the keycaps, or better yet, the IBM 3278 Beam Spring keyboard for which the Model F was intended to be a more economical replacement.


Here's a demo of SectorLISP v2 booting from BIOS in Blinkenlights and running the metacircular evaluator, i.e. LISP written in LISP. If you compare this video to the one from the previous blog post you'll see how the new garbage collector has dramatically changed the personality of the software, in terms of how it uses physical memory. The WRITE memory panel in particular dances more, and shows how its heap allocations behave very similar to a stack. You can also use SectorLISP v1.o online in a PC browser emulator by visiting

If you're looking for something more modern, the multiplatform executable also runs on bare metal. Here's a Blinkenlights demo of it booting from BIOS in 16-bit real mode. It then bootstraps itself into 32-bit mode so it can load itself off disk into memory. The final stage of bootstrapping inverts the physical memory, in a similar manner to the Linux Kernel, which enables it to run in 64-bit long mode. Once it's reached the zenith of computing modernity, it displays a LISP REPL via the serial port.

Keep in mind that the LISP operating system above that's fast-forwarding its way through history, is actually just the JavaScript source code built with Cosmopolitan Libc. Although we tuned it in lisp.c for a slight performance nudge when compiled with GCC.

Memory Model

NULL +0 +2 +4 'N' 2 'I' 4 0 'L' -2 +2 is IL +4 is L +0 is NIL -0 is () +2 -2 -4 +4 -0 -2 is (L) or (cons 'L ()) -4 is (IL L) or (cons 'IL (cons 'L ()))

The most important trick to implementing LISP is to redefine NULL.

function Set(i, x) {
  M[Null + i] = x;

function Get(i) {
  return M[Null + i];

LISP has two types of memory: atoms and cons cells. We store them in two adjacent stacks that grow outward from NULL.

Positive addresses are used to intern strings. Interning is good for storing symbols (which LISP calls atoms) since it lets us test string equality by comparing unique addresses. First among atoms is NIL which the assembly encodes as a NUL-terminated string residing at the NULL address.

Negative addresses are used to store Pair<int> tuples, which LISP calls cons cells. These are used to chain atoms together into data structures such as lists and binary trees. First among cons cells is the empty list () which is stored at negative NULL and therefore equal to NIL. This multifacted nature of NIL serves as evidence of how LISP's design lends itself to symmetrical partitioning of memory, even if the underlying machine arithmetic doesn't support signed zero. Negative memory then grows down as CONS is called, which returns tuples indexed by CAR and CDR.

function Cons(car, cdr) {
  Set(--cx, cdr);
  Set(--cx, car);
  return cx;
function Car(x) {
  return Get(x);

function Cdr(x) {
  return Get(x + 1);

The assembly version does things the same way with the slight difference that cons cells grow up from INT_MIN rather than growing down from zero. Doing that made the code smaller, but it also lets us make the claim that it runs on the stock configuration of the first IBM PC, which shipped with 64kb of RAM. It's because rebasing NULL on the boot address 0x7c00 gives negative memory a legal range of -0x8000 to -0x7c00 since only 0x10000 bytes of linear memory exist. Working within the constraints of an old computer like i8086 that required us to confront unfamiliar concepts like negative memory is what helped the most elegant approach for C and JavaScript to become clear.

function ReadList() {
  var x = Read();
  if (x > 0 && Get(t) == Ord(')')) {
    return -0;
  } else {
    return Cons(x, ReadList(t));
function Print(x) {
  if (1./x < 0) {
  } else {

One advantage of JavaScript is that it uses a sign-magnitude encoding similar to the IBM 704 computer for which LISP was designed. That means we can tell () apart from NIL in our Read and Print code, even though Eval doesn't care about signed zeroes. The divide by zero hack can test for cons cells including () because 1/-0 = -Infinity and -Infinity < 0. C environments with integral two's complement arithmetic will simply normalize () to NIL and the floating point operations can be optimized away automatically using -ffast-math.


f = (lambda (X Y) Y) (car (cdr f)) = (X Y) (car (cdr (cdr f))) = Y

LISP evaluation works by recursively calling Apply to transform the first element of a list into a lambda. It zips the arguments with their parameters into the environment variables, and runs the code contained in the function.

Eval(code=(SECOND ARG1 ARG2),
     vars={SECOND=(LAMBDA (X Y) Y), ARG1=FOO, ARG2=BAR}) →
  Apply(code=((LAMBDA (X Y) Y) FOO BAR),
        vars={SECOND=(LAMBDA (X Y) Y), ARG1=FOO, ARG2=BAR}) →
         vars={X=FOO, Y=BAR, SECOND=(LAMBDA (X Y) Y), ARG1=FOO, ARG2=BAR}) →

John McCarthy discovered an elegant self-defining way to compute the above steps, more commonly known as the metacircular evaluator. Alan Kay once described this code as the "Maxwell's equations of software". Here are those equations as implemented by SectorLISP:

#define var int
#define function

function Evcon(c, a) {
  if (Eval(Car(Car(c)), a)) {
    return Eval(Car(Cdr(Car(c))), a);
  } else {
    return Evcon(Cdr(c), a);

function Evlis(m, a) {
  return m ? Cons(Eval(Car(m), a),
                  Evlis(Cdr(m), a)) : m;

function Assoc(x, y) {
  if (x == Car(Car(y))) return Cdr(Car(y));
  return Assoc(x, Cdr(y));

function Pairlis(x, y, a) {
  return x ? Cons(Cons(Car(x), Car(y)),
                  Pairlis(Cdr(x), Cdr(y), a)) : a;

function Eval(e, a) {
  var A = cx;
  if (!e) return e;
  if (e > 0) return Assoc(e, a);
  if (Car(e) == kQuote) return Car(Cdr(e));
  if (Car(e) == kCond) return Evcon(Cdr(e), a);
  return Gc(A, Apply(Car(e), Evlis(Cdr(e), a), a));

function Apply(f, x, a) {
  if (f < 0)      return Eval(Car(Cdr(Cdr(f))), Pairlis(Car(Cdr(f)), x, a));
  if (f == kEq)   return Car(x) == Car(Cdr(x));
  if (f == kCons) return Cons(Car(x), Car(Cdr(x)));
  if (f == kAtom) return Car(x) >= 0;
  if (f == kCar)  return Car(Car(x));
  if (f == kCdr)  return Cdr(Car(x));
  return Apply(Assoc(f, a), x, a);

The code above is from lisp.js. What makes our Rosetta Stone possible is that C was designed to use int as an implicit default type, and compilers maintained backwards compatibility ever since. Traditional C may as well be Sanskrit since it can't be a coincidence that the languages with the most gravitas seem to share this as their common subset. It's unfortunate the C standards committee intends to remove support for K&R syntax, because it implements LISP so well. Let's compare the code above to John McCarthy's original 1950's paper which used M-expression notation:

eval[e; a] = [
  atom[e] → assoc[e; a];
  atom[car[e]] → [
    eq[car[e]; QUOTE] → cadr[e];
    eq[car[e]; ATOM]  → atom[eval[cadr[e]; a]];
    eq[car[e]; EQ]    → [eval[cadr[e]; a] = eval[caddr[e]; a]];
    eq[car[e]; COND]  → evcon[cdr[e]; a];
    eq[car[e]; CAR]   → car[eval[cadr[e]; a]];
    eq[car[e]; CDR]   → cdr[eval[cadr[e]; a]];
    eq[car[e]; CONS]  → cons[eval[cadr[e]; a]; eval[caddr[e]; a]];
    T                 → eval[cons[assoc[car[e]; a]; evlis[cdr[e]; a]]; a]
  eq[caar[e]; LAMBDA] →
    eval[caddar[e]; append[pair[cadar[e]; evlis[cdr[e]; a]; a]]]

It's a good thing that C wasn't designed until the 1970's since otherwise JMC might never have discovered LISP. Or maybe the semicolons, equals sign, and brackets that behave like switch, PROGN, and COND suggest the existence of a lost language that both Ritchie and JMC were fortunate enough to use. It's unlikely, but it'd explain why he felt so unhappy working with IBM on FORTRAN since it would be like asking a Rust developer to fix Visual Basic 6. IBM likely knew he was uniquely qualified to help them, but they didn't accept any of his proposals, rejecting his asks for features like recursion as unnecessary. LISP happened as a result. It was a big opportunity loss for both the IBM PC and JMC himself, since he could have been Bill Gates if the two had found a way to work together.

Garbage Collection

SectorLISP uses what we call an ABC garbage collector and it took only 40 bytes of assembly. It works by saving the position of the cons stack before and after evaluation. Those values are called A and B. It then decreases the cx cons stack pointer further by recursively copying the Eval result. The new stack position is called C. The memory between B and C is then copied up to A. Once that happens, the new cons stack position becomes A - B + C. The purpose of this operation is to discard all the cons cells that got created which aren't part of the result, because we know for certain they can't be accessed anymore (assuming functions aren't added which mutate cells).

function Copy(x, m, k) {
  return x < m ? Cons(Copy(Car(x), m, k),
                      Copy(Cdr(x), m, k)) + k : x;

function Gc(A, x) {
  var C, B = cx;
  x = Copy(x, A, A - B), C = cx;
  while (C < B) Set(--A, Get(--B));
  return cx = A, x;
Copy:	cmp	%dx,%di
	jb	1f
	push	(%bx,%di)
	mov	(%di),%di
	call	Copy
	pop	%di
	push	%ax
	call	Copy
	pop	%di
	call	Cons
	sub	%si,%ax
	add	%dx,%ax
1:	xchg	%di,%ax

Fast immediate garbage collection with zero memory overhead and perfect heap defragmentation is as easy as ABC when your language guarantees data structures are acyclic. The trick is to not copy anything above A, since that memory space consists of cons cells owned by calling functions as well as interned atoms, which should be ignored. That way overlaps aren't possible. Thus a GC cycle can't increase memory usage.


In order to bootstrap LISP one must first bootstrap its builtin atoms. In C you might do that using a for loop which copies chars from a string to the memory array. However the assembly implementation doesn't need to do anything.

ΦA☺ûΦC ░♪Φr δΩë╧ê╨< v☻¬ûΦ_ < v±<)v♣Ç·)wΦê=û├░(λ0ï4Φ↕ ░ ^à÷x≥t♣░∙
Φ♦ ░)δ7Φ4 à÷x▐¼ä└u⌠├<(tPQë²)═E1λ^VëΘë°8=t♀≤ªt◙O1└«u²δΩ≤ñY├1└═▬┤♫
═►<♪u♦░◙δ⌠Æ├àλt▬λ1ï♣Φ¡ _PΦ≡λ_ç∙ë♪ë☺ìM♦ù├Φcλ<)t_ΦóλPΦ≥λδΣ9╫rΩλ1ï=
Φ⌡λ_PΦ≡λ_Φ╤λ)≡☺╨├VΦo ^à└y↔ùï9Wï=àλt╲¡λ1ï=ï4Φ»λùÆΦ¬λÆ_δΘ=) w╒ï<<∟
t+< t&<↨u•àλy♫1└├<$ï0¡tà1°u≥░♦├ë╓ï<ï0»u∙÷ï9<»ï♣├ï9Wï5¡Φ♂ _à└t≥λ5
_Φσλà└t+y╒û¡=♀ ï<t┌=↕ t┌RQPΦ.λûXΦsλZë╬ùΦNλë╫)±≤ñë∙Z├╬╬╬╬╬╬╬╬╬╬╬╬
╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬╬ SECTORLISP v2 U¬

x86 processors have a feature that lets us literally redefine NULL to be the address to which our program is loaded by the BIOS. It's a privilege that's normally reserved only for operating systems. The trick we use here is to treat the program image itself as a set of entries in the interned strings table.

One problem that arises is that the BIOS asks the CPU to execute the ASCII strings at the beginning of the image. By sheer luck we learned NIL and T could be safely decoded without side-effects. Had that not been the case, we'd've needed to choose a different terminator than NUL which would have made the program file larger.

"N"	dec	%si
"I"	dec	%cx
"L"	dec	%sp
"\0T\0" add	%dl,(%si)  # and we know for certain %dl is 0
	ljmp	$0x7c00>>4,$begin

One thing you may be wondering about the above sector, is what's the point of making it smaller than 512 bytes if you have to pad it to 512 bytes anyway to include the U¬ (AA55h) boot signature? The answer is that we simply learned more about LISP by doing it. But it's also because that signature wasn't part of the original PC design. It was actually added by Microsoft to later models, similar to the more recently introduced requirements that the Linux Kernel be distributed as a Windows executable. The IBM PC XT will happily load and run the 436 byte version and it's a nice design.

Overlapping Functions

If high-level programming languages like C are the Ice Hotel and assembly is the tip of the iceberg, then the hidden dimension of complexity lurking beneath would be Intel's variable length encoding. This is where boot sectors get esoteric real fast, since tools can't easily visualize it. for example, consider the following:

/	%ip is 0
	mov	$0xf4,%al
/	%ip is 1
	.byte	0xb0
wut:	hlt # and catch fire

Similar to how a Chess game may unfold very differently if a piece is moved to an unintended adjacent square, an x86 program can take on an entirely different meaning if the instruction pointer becomes off by one. We were able to use this to our advantage, since that lets us code functions in such a way that they overlap with one another.

/	SectorLISP code.
89 D6 Assoc:	mov	%dx,%si
8B 3C 1:	mov	(%si),%di
8B 30 	mov	(%bx,%si),%si
AF    	scasw
75 F9 	jne	1b
F6    	.byte	0xF6
8B 39 Cadr:	mov	(%bx,%di),%di
3C    	.byte	0x3C
AF    Cdr:	scasw
8B 05 Car:	mov	(%di),%ax
C3    	ret
89 D6          Assoc:	mov	%dx,%si
8B 3C          1:	mov	(%si),%di
8B 30          	mov	(%bx,%si),%si
AF             	scasw
75 F9          	jne	1b
F6 8B 39 3C AF 	testw	$0xaf,0x3c39(%bp,%di)
8B 05          	mov	(%di),%ax
C3             	ret

8B 39          Cadr:	mov	(%bx,%di),%di
3C AF          	cmp	$0xaf,%al
8B 05          	mov	(%di),%ax
C3             	ret

AF             Cdr:	scasw
8B 05          	mov	(%di),%ax
C3             	ret

8B 05          Car:	mov	(%di),%ax
C3             	ret
Note that you can hover over the instruction names above to see a tooltip explaining what they do. For further details, please see the Blinkenlights' i8086 ISA encoding rundown and Intel's manual.


It takes about 10 milliseconds on a 4.77mhz IBM PC to run to run a LISP program wrapped inside John McCarthy's metacircular evaluator. However an obvious bottleneck exists with the interning algorithm which takes upwards of 50 milliseconds during the Read operation. Even on a forty year old computer, fifty milliseconds of latency is quite sluggish in terms of performance; so let's fix that.

Tinier solutions are usually better ones, but that's not always the case. The assembly version of our Intern function has an obvious scalability bottleneck since it saves space by using a double-nul-terminated list. The simplest way to improve that is to redefine positive memory to be a hash table containing linked lists of characters.

function Probe(h, p) {
  return (h + p * p) & (Null / 2 - 1);

function Hash(h, x) {
  return (((h + x) * 3083 + 3191) >> 4) & (Null / 2 - 1);

function Intern(x, y, h, p) {
  if (x == Get(h) && y == Get(h + Null / 2)) return h;
  if (Get(h)) return Intern(x, y, Probe(h, p), p + 1);
  Set(h, x);
  Set(h + Null/2, y);
  return h;

function ReadAtom() {
  var x, y;
  ax = y = 0;
  do x = ReadChar();
  while (x <= Ord(' '));
  if (x > Ord(')') && dx > Ord(')')) y = ReadAtom();
  return Intern(x, y, (ax = Hash(x, ax)), 1);
/	this is slow
Intern:	push	%cx
	mov	%di,%bp
	sub	%cx,%bp
	inc	%bp
	xor	%di,%di
1:	pop	%si
	push	%si
	mov	%bp,%cx
	mov	%di,%ax
	cmp	%bh,(%di)
	je	8f
	rep cmpsb
	je	9f
	xor	%ax,%ax
	dec	%di
2:	scasb
	jne	2b
	jmp	1b
8:	rep movsb
9:	pop	%cx

The trick here is to find a hash function so that NIL interns at the NULL position and T interns at 1. That's what lets our Apply code be more elegant. In the above example, the magic numbers (3083, 3191, 4) were chosen under the assumption that Null is 040000. If it were a different two-power like 0400000, then the magnums would be (60611, 20485, 0). If you wish to dive deeper then take a look at lisp.js, hash.c, and

                (CONS (CAR X) Y)))
          ((QUOTE T) Y))))

One of the known issues with recursive functions is that they have suboptimal performance without the aid of a tail call optimizer that can do copy elision. This can have a negative impact on the performance of the ABC Garbage Collector in list building functions such as the one above. Blinkenlights is good at spotting scalability problems early.

The canonical workaround to this kind of problem is to use binary trees instead of lists for big data, since binary trees will reduce the the call stack depth from being linear to logarithmic.

    (COND ((ATOM X) X)
          ((QUOTE T)
           (CONS (REVERSE (CDR X))
                 (REVERSE (CAR X)))))))
  (((((A . B) . C) .
     ((D . E) . F)) .
    (((G . H) . I) .
     ((J . K) .
      (L . M)))) .
   ((((N . O) . P) .
     ((Q . R) . S)) .
    (((T . U) . V) .
     ((W . X) .
      (Y . Z)))))))

One thing profiling reveals is that LISP spends most of its time inside Assoc looking up variables, because recursive functions repeatedly append the same variables to the dynamically-scoped a list. One low-hanging fruit optimization that's much easier than implementing a tail-call optimizer is simply tuning Pairlis to peel away repeated variables.

function Peel(x, a) {
  return a && x == Car(Car(a)) ? Cdr(a) : a;

function Pairlis(x, y, a) {
  return x ? Cons(Cons(Car(x), Car(y)),
                  Pairlis(Cdr(x), Cdr(y),
                          Peel(Car(x), a))) : a;

The above optimization allows SectorLISP to outperform Emacs, based on a benchmark of the "Triple LISP" example available under the Simulator Load button. See eval4.lisp for the Emacs Lisp translation. Emacs takes 2,187µs to run that, whereas SectorLISP takes 1580µs. This hack also helped reduce JavaScript latency from ~30ms to ~15ms.

Logic Model

Now summon cunning soul, frauds, and deceits
With the whole Ulysses; for truth never perishes

— Lucius Annaeus Seneca
The Trojan Women, 613


The meaning of truth is underspecified by John McCarthy's original paper; however, if you read the LISP 1.5 source code, one of the first things you'll see is this charming comment, which attempts to provide a definition:

                              * PROP LISTS FOR ATOMS NIL & VERITAS-NUMQUAM-PERIT
                       77640         ORG     COMMON-18
      77640  0 00137 0 07335  NILSXX         $PNAME,,-*-1
      77641  0 00000 0 00136                 -*-1
      77642 -0 00000 0 00135         MZE     -*-1
      77643 -053143777777            OCT     453143777777      NIL
      77644  0 00000 0 00370  NILLOC         $ZERO
      77645  0 00132 0 10742   STS           $APVAL,,-*-1
      77646 -0 00130 0 00131         MZE     -*-1,,-*-2
      77647  0 00000 0 00001                 1                 IS A CONSTANT ,,1 FOR APPLY
      77650  0 00127 0 07335                 $PNAME,,-*-1
      77651  0 00000 0 00126                 -*-1
      77652 -0 00000 0 00125         MZE     -*-1
      77653  546351642554            BCI     1,*TRUE*

LISP 1.5 is a real trickster compared to modern LISP implementations because it clandestinely maps T to *TRUE* the latter of which is the true truth, since it's what builtin predicates like ATOM and EQ actually return. It figures they would quote someone who was tormenting Hector's widow since that design probably tormented many MIT students. LISP 1.5 also jealously guards its definition of truth, and will throw a constant error if you try to change it.

A Latin phrase that better describes SectorLISP is NIHIL NVMQVAM PERIT or in English what is dead may never die. What it means is that NIL is the only atom SectorLISP won't let you redefine. You can however define T to have any meaning you wish, or use it as a variable name, since it's just an atom, and anything that isn't NIL is considered true by EVCON. From a programming perspective, this means you'd need to use something like (QUOTE T) rather than T in the default clauses of your COND statements, unless you map T to T.

We chose this model simply because it's what made the file size tinier. So it actually can be a good idea to have a less permissive design which prevents T from perishing, because making that a builtin feature of Eval alone can improve the performance of the arithmetic code below by ten percent.


SectorLISP doesn't support numbers; but that's OK, since Arabic numerals are after all just a sequence of digits, and digits are symbols. Since SectorLISP does support sequences and symbols, you can use LISP's preschool math to implement elementary maths like arithmetic. All you need is a few lines of code. The simplest way we could model unsigned integers is by using lists and then switching the Arabic right-to-left ordering to be left-to-right instead (i.e. little endian). We'll then define NIL as 0, and anything that isn't NIL shall be 1. For example, a number such as ten (or 0b1010 in binary) could then be encoded as (NIL T NIL T). Now that we've defined how our LISP numbers look, we can implement recursive functions that operate on them.

(DEFINE OR   . (LAMBDA (X Y) (COND (X T) (T Y))))
(DEFINE TAIL . (LAMBDA (X) (COND (X (CDR X)) (T ()))))

    (ADD A B NIL)))

    (COND ((OR A B)
           (CONS (XOR (XOR (HEAD A) (HEAD B)) C)
                 (ADD (TAIL A) (TAIL B)
                      (OR (AND (XOR (HEAD A) (HEAD B)) C)
                          (AND (HEAD A) (HEAD B))))))
          (C (CONS C ()))
          (T ()))))
    (COND ((ATOM X) (EQ X Y))
          ((ATOM Y) (EQ X Y))
          ((EQUAL (CAR X) (CAR Y))
           (EQUAL (CDR X) (CDR Y)))
          ((QUOTE T) NIL))))

(DEFINE COMMENT 0 + 0 = 0)
(EQUAL (+ ()

(DEFINE COMMENT 1 + 1 = 2)
          (QUOTE (T)))
       (QUOTE (NIL T)))

(DEFINE COMMENT 1 + 2 = 3)
          (QUOTE (NIL T)))
       (QUOTE (T T)))

(DEFINE COMMENT 2 + 2 = 4)
          (QUOTE (NIL T)))
       (QUOTE (NIL NIL T)))

(DEFINE COMMENT 10 + 10 = 20)
          (QUOTE (NIL T NIL T)))
       (QUOTE (NIL NIL T NIL T)))

[Full Adder Circuit Schematic]

The above code iterates over two lists of bits and then applies the full adder schematic. The nice thing about this code is it supports arbitrary precision. The Googlepedia solution to this kind of problem is to use Church numerals; but since Church encoding requires an amount of memory equal to the numbers themselves, that means we wouldn't be able to have numbers larger than 8192 on the IBM PC XT. That might be fine for proving theorems, but how would you like to own a 13-bit computer? SectorLISP instead unleashes your oldskool 16-bit CPU, enabling it to perform 64-bit and even 128-bit computations.

Since microoptimizations are very highly prized in arithmetic, one way we might do that with SectorLISP by using builtins more and taking advantage of friendly branch features, such as dot cons literals which let us remove a CAR call. The friendly branch also defines CAR and CDR the modern way which means we don't need to define HEAD and TAIL plus it obfuscates the need for a wrapper. If we combine those techniques with a builtin tautology atom then this new definition should triple addition performance with a nice raw quality that self-documents the truth table.

    (COND ((COND (A T)
                 (T B))
           ((LAMBDA (S)
              (CONS (CAR S)
                    (+ (CDR A)
                       (CDR B)
                       (CDR S))))
            (COND ((CAR A) (COND ((CAR B) (COND (C (QUOTE (T   . T  )))
                                                (T (QUOTE (NIL . T  )))))
                                       (T (COND (C (QUOTE (NIL . T  )))
                                                (T (QUOTE (T   . NIL)))))))
                        (T (COND ((CAR B) (COND (C (QUOTE (NIL . T  )))
                                                (T (QUOTE (T   . NIL)))))
                                       (T (COND (C (QUOTE (T   . NIL)))
                                                (T (QUOTE (NIL . NIL))))))))))
          (C (CONS C ()))
          (T ()))))

Another interesting thing about our arithmetic model is how the loose typing of truthiness means that S-expressions in general could be seen as having numeric values, almost like Hebrew numerals. We haven't imagined a use case for it yet, but let us know if you do, since that could be very cool.


Arithmetic in SectorLISP is very elegant, but unfortunately the numbers themselves aren't that readable unless you write a radix converter and reverse the lists. So calculus and algebra are usually better use cases for LISP since their inputs and outputs are already symbolic sequences and LISP is a symbolic language. Here's an example of how you can use SectorLISP to perform symbolic differentiation with support for the most common mathematical operators.

    (COND ((EQ E WRT) 1)
          ((ATOM E) 0)
          ((QUOTE T)
           (DIFF3 (CAR E) (CAR (CDR E))
            (COND ((CDR (CDR E))
                   (CAR (CDR (CDR E))))
                  ((QUOTE T) NIL)))))))

     (COND ((EQ OP ADD)
            (L3 OP (DIFF X WRT) (DIFF Y WRT)))
           ((EQ OP SUB)
            (L3 OP (DIFF X WRT) (DIFF Y WRT)))
           ((EQ OP MUL)
            (L3 ADD (L3 MUL X (DIFF Y WRT))
                    (L3 MUL (DIFF X WRT) Y)))
           ((EQ OP DIV)
            (L3 SUB (L3 DIV (DIFF X WRT) Y)
                    (L3 DIV (L3 MUL X (DIFF Y WRT))
                            (L3 POW Y (QUOTE 2)))))
           ((EQ OP POW)
            (L3 ADD (L3 MUL (L3 POW X Y)
                            (L3 MUL (L2 LOG X)
                                    (DIFF Y WRT)))
                    (L3 MUL Y
                      (L3 MUL (L3 POW X
                                 (L3 SUB Y 1))
                          (DIFF X WRT)))))
           ((EQ OP LOG)
            (L3 DIV (DIFF X WRT) X))
           ((EQ OP (QUOTE SIN))
            (L3 MUL (DIFF X WRT)
                    (L2 (QUOTE COS) X)))
           ((EQ OP (QUOTE COS))
            (L3 SUB 0 (L3 MUL (DIFF X WRT)
                              (L2 (QUOTE SIN) X))))
           ((EQ OP (QUOTE ABS))
            (L3 DIV (L3 MUL X (DIFF X WRT))
                    (L2 (QUOTE ABS) X)))
           ((QUOTE T) :HOW))))
(DEFINE 0 . 0)     (DEFINE 1 . 1)
(DEFINE L3 . (LAMBDA (X Y Z) (CONS X (L2 Y Z))))

    (COND ((ATOM X) (EQ X Y))
          ((ATOM Y) (EQ X Y))
          ((EQUAL (CAR X) (CAR Y))
           (EQUAL (CDR X) (CDR Y)))
          ((QUOTE T) NIL))))


       (DIFF (QUOTE (ADD X X)) (QUOTE X)))

       (DIFF (QUOTE (ADD X Y)) (QUOTE X)))

       (DIFF (QUOTE (MUL X Y)) (QUOTE X)))

       (DIFF (QUOTE (SIN X)) (QUOTE X)))

        (ADD (MUL (POW X Y) (MUL (LOG X) 0))
             (MUL Y (MUL (POW X (SUB Y 1)) 1))))
       (DIFF (QUOTE (POW X Y)) (QUOTE X)))

Once again we see LISP's ability to not only compute, but elegantly explain complex topics too, such as derivatives. LISP obviously knows nothing about the meaning of the symbols you choose, such as COS or 0 even though it's able to manipulate them for you at a high level. For that reason, some of the expressions it outputs, e.g. (ADD 1 0) can obviously be simplified. LISP is good at doing that too. As a language, LISP is popular for implementing computer algebra systems as well as compilers, since simplified math is usually faster math. For example, GCC actually uses a dialect of LISP internally for exactly this purpose. So even if your experience has been focused on writing C and C++ in Vim, you may already be a LISP user enjoying the benefits, from a certain point of view.


The strangest thing that became apparent in our quest to demo a language that's based on the lambda calculus is that the lambda keyword itself is superfluous. That's not to imply LISP doesn't have lambdas, but rather that it's such a powerful concept that it needn't be named. Apply is able to check for lambdas using f<0 and then ignores Car(f). We could have saved 2 bytes by dropping the LAMBDA keyword (e.g. (((X) X) (QUOTE ARG)) instead of ((LAMBDA (X) X) (QUOTE ARG))) but we left things alone, since it's an obvious focal point for language revision schemes. It also makes SectorLISP itself more flexible in terms of syntax. For example, you could say ((FUNCTION (X) X) (QUOTE ARG)) or ((LOL (X) X) (QUOTE ARG)) and it would mean the same thing. In the friendly branch, you can also say (DEFINE IDENTITY AS (X) X) rather than (DEFINE IDENTITY . (LAMBDA (X) X)).

The same concept also applies to COND which can be removed if we treat the LAMBDA body as an implicit COND. But it'd only save us 11 bytes. That's too big a tradeoff for the glory of 425 bytes, which can be viewed in the reform branch. Reforming the LISP language has never been our goal, because it should be your goal. For example, here's how LISP could be defined using only six builtins, which we'll call LINK, HEAD, TAIL, EQ, ATOM, and Q.

  (EVAL (Q (FF X))
        (Q ((FF ((X)
             ((ATOM X) X)
             (FF (HEAD X))))
            (X ((A) B C))))))
  ((NULL ((X)
    (EQ X ())))
   (LIST ((X Y)
    (LINK X (LINK Y ()))))
   (ASSOC ((X Y)
    ((NULL Y) (LIST (Q ?) X))
    ((EQ X (HEAD (HEAD Y)))
     (HEAD (TAIL (HEAD Y))))
    (ASSOC X (TAIL Y))))
   (EVLAM ((C A)
    ((NULL (TAIL C))
     (EVAL (HEAD C) A))
    ((EVAL (HEAD (HEAD C)) A)
     (EVLAM (TAIL (HEAD C)) A))
    (EVLAM (TAIL C) A)))
   (ZIP ((X Y A)
    ((NULL X) A)
          (ZIP (TAIL X) (TAIL Y) A))))
   (EVLIS ((M A)
    ((NULL M) M)
          (EVLIS (TAIL M) A))))
   (APPLY ((FN X A)
    ((ATOM FN)
     ((EQ FN (Q HEAD)) (HEAD (HEAD X)))
     ((EQ FN (Q TAIL)) (TAIL (HEAD X)))
     ((EQ FN (Q ATOM)) (ATOM (HEAD X)))
     ((EQ FN (Q LINK))
      (LINK (HEAD X) (HEAD (TAIL X))))
     ((EQ FN (Q EQ))
      (EQ (HEAD X) (HEAD (TAIL X))))
     (APPLY (EVAL FN A) X A))
   (EVAL ((E A)
    ((NULL E) E)
    ((ATOM E) (ASSOC E A))
    ((ATOM (HEAD E))
     ((EQ (HEAD E) (Q Q)) (HEAD (TAIL E)))
    (APPLY (HEAD E) (EVLIS (TAIL E) A) A))))))
    (COND ((EQ X (CAR (CAR Y)))
           (CAR (CDR (CAR Y))))
          ((QUOTE T)
           (ASSOC X (CDR Y))))))

                   (EVLIS (CDR M) A)))
          ((QUOTE T) ()))))

    (COND ((EQ (CDR C) ())
           (EVAL (CAR C) A))
          ((EVAL (CAR (CAR C)) A)
           (EVLAM (CDR (CAR C)) A))
          ((QUOTE T)
           (EVLAM (CDR C) A)))))

    (COND (X (CONS (CONS (CAR X) (CONS (CAR Y) ()))
                   (ZIP (CDR X) (CDR Y) A)))
          ((QUOTE T) A))))

      ((ATOM FN)
       (COND ((EQ FN (QUOTE HEAD))
              (COND ((CAR X) (CAR (CAR X))) (T ())))
             ((EQ FN (QUOTE TAIL))
              (COND ((CAR X) (CDR (CAR X))) (T ())))
             ((EQ FN (QUOTE ATOM))
              (ATOM (CAR X)))
             ((EQ FN (QUOTE LINK))
              (CONS (CAR X) (CAR (CDR X))))
             ((EQ FN (QUOTE EQ))
              (EQ (CAR X) (CAR (CDR X))))
             ((QUOTE T)
              (APPLY (EVAL FN A) X A))))
      ((QUOTE T)
       (EVLAM (CDR FN) (ZIP (CAR FN) X A))))))

    (COND ((EQ E ()) ())
          ((ATOM E) (ASSOC E A))
          ((ATOM (CAR E))
           (COND ((EQ (CAR E) (QUOTE Q)) (CAR (CDR E)))
                 ((QUOTE T)
                  (APPLY (CAR E) (EVLIS (CDR E) A) A))))
          ((QUOTE T)
           (APPLY (CAR E) (EVLIS (CDR E) A) A)))))

The code above is a LISP within a LISP within a LISP: three levels. You can use this technique to implement missing features like macros. It also becomes clear that LISP is a king of kings among programming languages, since if a language can succinctly implement itself, then it can also let you easily build any other domain-specific language too (so long as they also use parenthesis notation). One such DSL we could build is for a Turing machine, which in this case has been configured to compute popcount(r) % 2 == 1.

(DEFINE RIGHT-HAND-TAPE . (1 1 0 B))  ;; we're computing parity of 0b011
(TURING (QUOTE (0 B (0 0 B R 0)   ;; if (state==0 && input==0) print(B), move(Right), state=0
                    (0 1 B R 1)   ;; if (state==0 && input==1) print(B), move(Right), state=1
                    (0 B 0 R 2)   ;; if (state==0 && input==B) print(0), move(Right), state=2
                    (1 0 B R 1)   ;; if (state==1 && input==0) print(B), move(Right), state=1
                    (1 1 B R 0)   ;; if (state==1 && input==1) print(B), move(Right), state=0
                    (1 B 1 R 2))) ;; if (state==1 && input==B) print(1), move(Right), state=2

You can view the full Turing machine example by clicking the simulator's Load button above. For further details on how the DSL is defined, please see AI Memo No. 8.


While knowledge of the technique existed, no one's published software before that polyglots C and JavaScript, so it's worth going into further detail on how that works. Language polyglots are similar in spirit to how a lady or gentleman might choose a communications style that focuses on a subset of rhetoric all groups feel comfortable hearing. The code examples in this blog post operate by that same principle, since they're designed to run in the most popular programming environments. It's the same technique that was used for Actually Portable Executable which lets us build binaries that run on seven operating systems.

There's nothing extraordinary about having a coding style that focuses on what different programmers share in common. It's the initial process of determining what that software consensus is that's difficult, since it requires one to not only understand of the requirements of modern systems, but also undergo an archaeological voyage reading tomes of legacy code like a librarian in order to understand their phylogenesis. The results end up being very easy:

// plain simple executable code that
// helps c / js communities use lisp
function Apply(f, x, a) {
  if (f < 0)      return Eval(Car(Cdr(Cdr(f))), Pairlis(Car(Cdr(f)), x, a));
  if (f == kEq)   return Car(x) == Car(Cdr(x));
  if (f == kCons) return Cons(Car(x), Car(Cdr(x)));
  if (f == kAtom) return Car(x) >= 0;
  if (f == kCar)  return Car(Car(x));
  if (f == kCdr)  return Cdr(Car(x));
  return Apply(Assoc(f, a), x, a);

But not every line of code can be made a good citizen of multiple languages. We need workarounds for code that's platform specific. Ulysses (the person the LISP 1.5 source code quoted earlier) is most famous for his Trojan Horse trick that let Greece defeat Troy. His guile inspired another recent paper called Trojan Source. The trick SectorLISP uses which is similar in spirit (but not intent) is the PARAGRAPH SEPARATOR (u2029) which ECMA-262 2021 §12.3 defines as a line terminator, whereas ANSI C and nearly everything else does not.

javascript syntax highlighting//¶`
... C only code goes here ...

It works by sneaking multiline JavaScript strings into C comments, to serve as a cross-language #ifdef statement. That lets us ask JavaScript to ignore small portions of code that are only meant for C compilers. If we wish to ask C to ignore code that's intended only for JavaScript, then the technique can be used as follows:

c syntax highlighting//¶`
#if 0
... JavaScript only code goes here ...

It should be a requirement for C projects that call themselves portable. Codebases like Zlib / InfoZIP use a notorious number of #ifdef statements to bring the benefits of support for the DEFLATE algorithm to platforms like VAX, QDOS, Amiga, IBM System/360, and possibly even those IBM Series/1 minicomputers. If we go to great lengths using ifdefs to maintain rare platforms, then why shouldn't we use polyglot ifdefs to support the most popular platform too? There's wisdom in a LISP that's fits on floppy disks if we consider that a U.S. nuclear weapons agency got hacked immediately after they stopped using them6. They should have considered LISP instead since it has all the qualities of the old technologies while continuing to be ahead of its time, even after all these years.

Please note the PARAGRAPH SEPARATOR (u2029) is normally invisible so it's been typeset above as PILCROW SIGN (u00b6). Since creative uses of technology have a tendency to reveal bugs, it should be noted that, thanks to the limitless configurability of LISP, you'll always have a safer space for programming since you can tune your Emacs editor as follows:

(or standard-display-table
    (setq standard-display-table
(aset standard-display-table #x2028 [?↵]) ;; LINE SEPARATOR
(aset standard-display-table #x2029 [?¶]) ;; PARAGRAPH SEPARATOR
(aset standard-display-table #x202A [?⟫]) ;; LEFT-TO-RIGHT EMBEDDING
(aset standard-display-table #x202B [?⟪]) ;; RIGHT-TO-LEFT EMBEDDING
(aset standard-display-table #x202D [?❯]) ;; LEFT-TO-RIGHT OVERRIDE
(aset standard-display-table #x202E [?❮]) ;; RIGHT-TO-LEFT OVERRIDE
(aset standard-display-table #x2066 [?⟩]) ;; LEFT-TO-RIGHT ISOLATE
(aset standard-display-table #x2067 [?⟨]) ;; RIGHT-TO-LEFT ISOLATE
(aset standard-display-table #x2068 [?⧽]) ;; FIRST STRONG ISOLATE
(aset standard-display-table #x202C [?⇮]) ;; POP DIRECTIONAL FORMATTING
(aset standard-display-table #x2069 [?⇯]) ;; POP DIRECTIONAL ISOLATE


SectorLISP is non-revisionist because if we'd gone down the path of changing the definition of LISP, then it'd've taken us to a place where lists become tape and then you've got an ASCII Turing machine like Brainfuck, which can be implemented with only 99 bytes. It may be capable of computing everything that's computable, but can we really call gibberish a programming language?

Hello World in Brainf*#k
>>.        >---.   +++++++..
+++.       >>.     <-.
<.         +++.    ------.
--------.  >>+.    >++.

If that were the goal, we'd be better served by simply not having an abstraction layer at all. For example, it's possible in 23 bytes to have a universal language on x86 simply by exposing the x86 machine language. If you compile the program below and load it into Blinkenlights then you can copy and paste the gibberish string for Hello World and it'll print "hello world". If you can chord then you can type the binary into your IBM PC with the Model F keyboard too.

/	twenty three byte loader
_start:	ljmp	$0x600>>4,$_begin
_begin:	push	%cs
	pop	%ds
	push	%cs
	pop	%es
0:	call	1f
1:	push	%di
2:	xor	%ax,%ax
	int	$0x16
	cmp	$206,%al
	jne	2b
/	example program with loader
/	put string in blinkenlights
/	j♪^┤♫¼═►<◙u∙├hello world♪◙╬
	push	$13
	pop	%si
	mov	$0x0e,%ah
0:	lodsb
	int	$0x10
	cmp	$10,%al
	jnz	0b
	.ascii	"hello world\r\n"

Languages like Brainfuck were known in JMC's time, but it was the human quality of his research that made LISP so attractive. Universal languages are common enough that one of the biggest challenges in securing protocols and file formats is demonstrating that they're not accidentally a Turing machine. For example, the UNICODE stack machine characters above flirt dangerously close with Turing's thesis. Even Conway's Game of Life was proven to be Turing complete. On the other hand, LISP is a tool that people want to use. It's always commanded respect and some companies have built veritable empires using it. That's why it's such a great find that, thanks to the simplicity and generality of its original formulas, we were able to implement LISP in nearly the same number of bytes as Brainfuck.

[Curve of tiniest Turing machines based on x states and y symbols]

It might be possible to create something even smaller than Brainfuck. Computer science provides some clues as to what the tiniest esoteric boot sector language might be. For example, Marvin Minsky discovered a 7-state 4-symbol universal Turing machine in 1962. The chart above is from Turlough Neary's PhD thesis which shows a tradeoff between minimizing symbols and states. If we could find some way to get the rules for one of these tiny Turing machines to neatly map onto a bitwise arithmetic hack, similar to the (B11 | r0) & r1 & ~r2 trick that Browne and Tavener discovered for Conway's Game of Life, like a NAND gate, then we might create an esoteric boot sector language so tiny in scale and austere in its definition that its parsimoniousness would never be challenged.

Why It Matters

Simplicity is the sign of truth.

— Herman Boerhaave


Tiny code contests have historically focused on the source code. What makes that problematic is how it incentivizes obfuscation. On the other hand, optimizing for binary size usually results in the source code becoming more elegant. In order to make program files tinier, one must choose data structures and design patterns that are harmonious with the way computers work. Doing that requires us to better understand the nature of the problem.

Is it better to help maintain the largest piece of software in the world when you can create the tiniest one instead? There's more combinations in a 512-byte boot sector than there are positions on a Go board. The fact that people are still publicly developing new boot sector programs for a forty year old computer should speak for itself. Demoscene is a relative newcomer compared to Go and Chess, but it could potentially be a more enjoyable competitive mind sport since Kolmogorov complexity is AI-hard.

While code golfing is fun, the goal of any technology is to help us better understand nature and to improve upon the human condition. The way tinier software conventionally does this is by letting us have more instances of it at scale. The same is true of nature where the tiniest creatures are the most predominant. They're called phages and there's 10,000,000,000,000,000,000,000,000,000,000 of these guys on earth. The tiniest phage in the world that's listed in the European Nucleotide Archive goes by the name Leuconostoc Phage L5 and it has a complete genome size of 631 bytes. So by the standards of our game, we could say that LISP has now outdistanced nature too. This of course isn't meant to imply that smaller is necessarily better. For example, ring tailed lemurs have a binary footprint of 537 megs12 similar to Racket's 500mb install size. They're both quite adorable, but certainly not simple.

> L06183: Leuconostoc Phage L5 complete genome cp437 le-bin acgt
Z6♫τ♫▼≡2É<ÅΣ< 2âÖ⌂δ╧δ♦♫ü◘&☼G~C ≥♥Fî` ÉO²æ⌡f1▬←°ⁿ<7¿√•▼╠♫&μ»↕_░»≡
ä ╗≤↔?☼@v¿ ○☻ r┤ⁿ░≥▼.λ┐¢ⁿ∩&╗↓≡╗╔λo■ⁿ÷∩o╛■Bj♂X»ô┤∙â☺☺`7s╨ƒ•3b♪l§►
→ü♫╓╠‼μ▀X√╛←╝∩♠<┐∩⌂≡≈λc±∩≡£ ‼²∙²•o♥|╧φ/C▀╟♪;╛♥⌂λ╟♀♀═tλî;/┴D4α╬╝3
⌂╚$á┬╠á▒☼ °└,╚å‼A λ├≥τ` ╩╕û├↨F♦○B$34l?└┘‼LÇy♣(Mⁿây┐╧♂↕╝C2NKÇαá╞⌂
╦┴♥?│├Çä☼├Ç£ ån<0⌡╧∙╦■║♀╙Ç|<é╜3☼Ü>♀pé²76+┘▓╧≈▬“∩∩A│3 6 NO║oƒ<n@~
╕┐M)@> 8e☼∙ò£ê≤├♥▒♫╠◙l⌂F╛≥┤λ♦░⌂λ▬╛▲☼↑ ↨♀═‘/≡♠ñ•¶╒λ≈↨╪/Ç√ô♀├ ♂≤∩∟
0N╚ñ ♂αCé╘#┘ú»⌠♫0•♀[9=¢2╬☻♦ï▼δ£ä6 ¼±¡÷n↑♦ìp(ⁿMX≤æ°∩¬á┴╧<+☼↓<ç÷O╦
ÉN<  G≡23â≈⌠├°6∩²┌?│α·α>┘°▲L±@=7░íⁿ0▀╨≤C2>√?▼█≥┴♂♥?ó├?`♀ êyC┴â▬Æ

John McCarthy may have lost out on the opportunity to be the father of the personal computing software revolution, but there's still a chance for his ideas to lead the second. The tools for natural engineering became cheap and open source in recent years. Many people are scrambling to be the next Monsanto, but what we actually want is the next Linux, Apple, and Microsoft. We need a tool that can abstract genetics and proteins in the smallest way possible, similar to how SectorLISP currently offers the service of abstracting gritty assembly in the fewest commands. For example one might edit a tiny symbiotic bacteria like Candidatus Carsonella Ruddii (40,000 bytes) to be a platform for running LISP programs that would be delivered by printed phages. In fact, someone might have even figured it out already. It's been public knowledge for fifty years people can build recombinant organisms, like the gap between iPhone and SIGSALY. Crispr can be programmed in XML. If it can be done with LISP instead then we must use it.


Here's the full i8086 assembly listing for SectorLISP v2.
It's 8 pages and available in TXT, HTML, BIN, or ELF.
You may also view the sources on GitHub as either ASM or C.

GAS LISTING sectorlisp.S 			page 1

 GNU assembler version 2.34 (x86_64-alpine-linux-musl)
	 using BFD version (GNU Binutils) 2.34.
 options passed	: -aghlms=sectorlisp.lst -g 
 input file    	: sectorlisp.S
 output file   	: sectorlisp.o
 target        	: x86_64-alpine-linux-musl
 time stamp    	: 2021-12-11T00:49:38.000-0800

GAS LISTING sectorlisp.S page 2
1 /*-*- mode:unix-assembly; indent-tabs-mode:t; tab-width:8; coding:utf-8 -*-│ 2 │vi: set et ft=asm ts=8 tw=8 fenc=utf-8 :vi│ 3 ╞══════════════════════════════════════════════════════════════════════════════╡ 4 │ Copyright 2020 Justine Alexandra Roberts Tunney │ 5 │ Copyright 2021 Alain Greppin │ 6 │ Some size optimisations by Peter Ferrie │ 6 │ │ 7 │ Permission to use, copy, modify, and/or distribute this software for │ 8 │ any purpose with or without fee is hereby granted, provided that the │ 9 │ above copyright notice and this permission notice appear in all copies. │ 10 │ │ 11 │ THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL │ 12 │ WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED │ 13 │ WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE │ 14 │ AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL │ 15 │ DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR │ 16 │ PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER │ 17 │ TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR │ 18 │ PERFORMANCE OF THIS SOFTWARE. │ 19 ╚─────────────────────────────────────────────────────────────────────────────*/ 21 22 // LISP meta-circular evaluator in a MBR 23 // Compatible with the original hardware 24 25 .code16 26 .globl _start 27 0000 4E494C00 _start: .asciz "NIL" # dec %si ; dec %cx ; dec %sp 28 0004 5400 kT: .asciz "T" # add %dl,(%si) boot A:\ DL=0 29 0006 EA0000C0 start: ljmp $0x7c00>>4,$begin # cs = 0x7c00 is boot address 29 07 30 000b 00 .asciz "" 31 000c 51554F54 kQuote: .asciz "QUOTE" 31 4500 32 0012 434F4E44 kCond: .asciz "COND" 32 00 33 0017 41544F4D kAtom: .asciz "ATOM" # ordering matters 33 00 34 001c 43415200 kCar: .asciz "CAR" # ordering matters 35 0020 43445200 kCdr: .asciz "CDR" # ordering matters 36 0024 434F4E53 kCons: .asciz "CONS" # ordering matters 36 00 37 0029 455100 kEq: .asciz "EQ" # needs to be last 38 39 002c BC0080 begin: mov $0x8000,%sp # uses higher address as stack 40 # and set independently of SS! 41 # 8088 doesn't stop interrupts 42 # after SS is set, and PC BIOS 43 # sets SP to a value that will 44 # damage our code if int fires 45 # between it setting SS and SP 46 002f 0E push %cs # that means ss = ds = es = cs 47 0030 1F pop %ds # noting ljmp set cs to 0x7c00 48 0031 0E push %cs # that's the bios load address 49 0032 07 pop %es # therefore NULL points to NUL 50 0033 0E push %cs # terminated NIL string above! 51 0034 17 pop %ss # errata exists but don't care 52 0035 BB0200 mov $2,%bx
GAS LISTING sectorlisp.S page 3
53 0038 89E1 main: mov %sp,%cx 54 003a E81100 call GetToken 55 003d E85400 call GetObject 56 0040 E84101 call Eval 57 0043 96 xchg %ax,%si 58 0044 E84300 call PrintObject 59 0047 B00D mov $'\r',%al 60 0049 E87200 call PutChar 61 004c EBEA jmp main 62 63 GetToken: # GetToken():al, dl is g_look 64 004e 89CF mov %cx,%di 65 0050 88D0 1: mov %dl,%al 66 0052 3C20 cmp $' ',%al 67 0054 7602 jbe 2f 68 0056 AA stosb 69 0057 96 xchg %ax,%si 70 0058 E85F00 2: call GetChar # exchanges dx and ax 71 005b 3C20 cmp $' ',%al 72 005d 76F1 jbe 1b 73 005f 3C29 cmp $')',%al 74 0061 7605 jbe 3f 75 0063 80FA29 cmp $')',%dl # dl = g_look 76 0066 77E8 ja 1b 77 0068 883D 3: mov %bh,(%di) # bh is zero 78 006a 96 xchg %si,%ax 79 006b C3 ret 80 81 .PrintList: 82 006c B028 mov $'(',%al 83 006e FF30 2: push (%bx,%si) 84 0070 8B34 mov (%si),%si 85 0072 E81200 call .PutObject 86 0075 B020 mov $' ',%al 87 0077 5E pop %si # restore 1 88 0078 85F6 test %si,%si 89 007a 78F2 js 2b # jump if cons 90 007c 7405 jz 4f # jump if nil 91 007e B0F9 mov $249,%al # bullet (A∙B) 92 0080 E80400 call .PutObject 93 0083 B029 4: mov $')',%al 94 0085 EB37 jmp PutChar 95 96 .PutObject: # .PutObject(c:al,x:si) 97 .PrintString: # nul-terminated in si 98 0087 E83400 call PutChar # preserves si 99 PrintObject: # PrintObject(x:si) 100 008a 85F6 test %si,%si # set sf=1 if cons 101 008c 78DE js .PrintList # jump if not cons 102 .PrintAtom: 103 008e AC lodsb 104 008f 84C0 test %al,%al # test for nul terminator 105 0091 75F4 jnz .PrintString # -> ret 106 0093 C3 ret 107 108 GetObject: # called just after GetToken 109 0094 3C28 cmp $'(',%al
GAS LISTING sectorlisp.S page 4
110 0096 7450 je GetList 111 # jmp Intern 112 113 0098 51 Intern: push %cx # Intern(cx,di): ax 114 0099 89FD mov %di,%bp 115 009b 29CD sub %cx,%bp 116 009d 45 inc %bp 117 009e 31FF xor %di,%di 118 00a0 5E 1: pop %si 119 00a1 56 push %si 120 00a2 89E9 mov %bp,%cx 121 00a4 89F8 mov %di,%ax 122 00a6 383D cmp %bh,(%di) 123 00a8 740C je 8f 124 00aa F3A6 rep cmpsb # memcmp(di,si,cx) 125 00ac 740A je 9f 126 00ae 4F dec %di 127 00af 31C0 xor %ax,%ax 128 00b1 AE 2: scasb # memchr(di,al,cx) 129 00b2 75FD jne 2b 130 00b4 EBEA jmp 1b 131 00b6 F3A4 8: rep movsb # memcpy(di,si,cx) 132 00b8 59 9: pop %cx 133 00b9 C3 ret 134 135 00ba 31C0 GetChar:xor %ax,%ax # GetChar→al:dl 136 00bc CD16 int $0x16 # get keystroke 137 00be B40E PutChar:mov $0x0e,%ah # prints CP-437 138 00c0 CD10 int $0x10 # vidya service 139 00c2 3C0D cmp $'\r',%al # don't clobber 140 00c4 7504 jne 1f # look xchg ret 141 00c6 B00A mov $'\n',%al 142 00c8 EBF4 jmp PutChar 143 00ca 92 1: xchg %dx,%ax 144 00cb C3 ret 145 146 //////////////////////////////////////////////////////////////////////////////// 147 148 00cc 85FF Evlis: test %di,%di # Evlis(m:di,a:dx):ax 149 00ce 7416 jz 1f # jump if nil 150 00d0 FF31 push (%bx,%di) # save 1 Cdr(m) 151 00d2 8B05 mov (%di),%ax 152 00d4 E8AD00 call Eval 153 00d7 5F pop %di # restore 1 154 00d8 50 push %ax # save 2 155 00d9 E8F0FF call Evlis 156 # jmp xCons 157 158 00dc 5F xCons: pop %di # restore 2 159 00dd 87F9 Cons: xchg %di,%cx # Cons(m:di,a:ax):ax 160 00df 890D mov %cx,(%di) # must preserve si 161 00e1 8901 mov %ax,(%bx,%di) 162 00e3 8D4D04 lea 4(%di),%cx 163 00e6 97 1: xchg %di,%ax 164 00e7 C3 ret 165 166 00e8 E863FF GetList:call GetToken
GAS LISTING sectorlisp.S page 5
167 00eb 3C29 cmp $')',%al 168 00ed 745F je .retF 169 00ef E8A2FF call GetObject 170 00f2 50 push %ax # popped by xCons 171 00f3 E8F2FF call GetList 172 00f6 EBE4 jmp xCons 173 174 00f8 39D7 Gc: cmp %dx,%di # Gc(x:di,A:dx,B:si):ax 175 00fa 72EA jb 1b # we assume immutable cells 176 00fc FF31 push (%bx,%di) # mark prevents negative gc 177 00fe 8B3D mov (%di),%di 178 0100 E8F5FF call Gc 179 0103 5F pop %di 180 0104 50 push %ax 181 0105 E8F0FF call Gc 182 0108 5F pop %di 183 0109 E8D1FF call Cons 184 010c 29F0 sub %si,%ax # ax -= C - B 185 010e 01D0 add %dx,%ax 186 0110 C3 ret 187 188 0111 56 .dflt1: push %si # save x 189 0112 E86F00 call Eval 190 0115 5E pop %si # restore x 191 # jmp Apply 192 193 0116 85C0 Apply: test %ax,%ax # Apply(fn:ax,x:si:a:dx):ax 194 0118 791D jns .switch # jump if atom 195 011a 97 xchg %ax,%di # di = fn 196 011b 8B39 .lambda:mov (%bx,%di),%di # di = Cdr(fn) 197 011d 57 push %di # for .EvCadr 198 011e 8B3D mov (%di),%di # di = Cadr(fn) 199 0120 85FF Pairlis:test %di,%di # Pairlis(x:di,y:si,a:dx):dx 200 0122 745C jz .EvCadr # return if x is nil 201 0124 AD lodsw # ax = Car(y) 202 0125 FF31 push (%bx,%di) # push Cdr(x) 203 0127 8B3D mov (%di),%di # di = Car(x) 204 0129 8B34 mov (%si),%si # si = Cdr(y) 205 012b E8AFFF call Cons # Cons(Car(x),Car(y)) 206 012e 97 xchg %ax,%di 207 012f 92 xchg %dx,%ax 208 0130 E8AAFF call Cons # Cons(Cons(Car(x),Car(y)),a) 209 0133 92 xchg %ax,%dx # a = new list 210 0134 5F pop %di # grab Cdr(x) 211 0135 EBE9 jmp Pairlis 212 0137 3D0000 .switch:cmp $kEq,%ax # eq is last builtin atom 213 013a 77D5 ja .dflt1 # ah is zero if not above 214 013c 8B3C mov (%si),%di # di = Car(x) 215 013e 3C00 .ifCar: cmp $kCar,%al 216 0140 742B je Car 217 0142 3C00 .ifCdr: cmp $kCdr,%al 218 0144 7426 je Cdr 219 0146 3C00 .ifAtom:cmp $kAtom,%al 220 0148 7507 jne .ifCons 221 014a 85FF test %di,%di # test if atom 222 014c 790E jns .retT 223 014e 31C0 .retF: xor %ax,%ax # ax = nil
GAS LISTING sectorlisp.S page 6
224 0150 C3 ret 225 0151 3C00 .ifCons:cmp $kCons,%al 226 0153 8B30 mov (%bx,%si),%si # si = Cdr(x) 227 0155 AD lodsw # si = Cadr(x) 228 0156 7485 je Cons 229 0158 31F8 .isEq: xor %di,%ax # we know for certain it's eq 230 015a 75F2 jne .retF 231 015c B000 .retT: mov $kT,%al 232 015e C3 ret 233 234 015f 89D6 Assoc: mov %dx,%si # Assoc(x:ax,y:dx):ax 235 0161 8B3C 1: mov (%si),%di 236 0163 8B30 mov (%bx,%si),%si 237 0165 AF scasw 238 0166 75F9 jne 1b 239 0168 F6 .byte 0xF6 # testb §i8,i16(%bp,%di) jmp Car 240 0169 8B39 Cadr: mov (%bx,%di),%di # contents of decrement register 241 016b 3C .byte 0x3C # cmp §scasw,%al (nop next byte) 242 016c AF Cdr: scasw # increments our data index by 2 243 016d 8B05 Car: mov (%di),%ax # contents of address register!! 244 016f C3 2: ret 245 246 0170 8B39 1: mov (%bx,%di),%di # di = Cdr(c) 247 0172 57 Evcon: push %di # save c 248 0173 8B35 mov (%di),%si # di = Car(c) 249 0175 AD lodsw # ax = Caar(c) 250 0176 E80B00 call Eval 251 0179 5F pop %di # restore c 252 017a 85C0 test %ax,%ax # nil test 253 017c 74F2 jz 1b 254 017e FF35 push (%di) # push Car(c) 255 0180 5F .EvCadr:pop %di 256 0181 E8E5FF call Cadr # ax = Cadar(c) 257 # jmp Eval 258 259 0184 85C0 Eval: test %ax,%ax # Eval(e:ax,a:dx):ax 260 0186 742B jz 1f 261 0188 79D5 jns Assoc # lookup val if atom 262 018a 96 xchg %ax,%si # di = e 263 018b AD lodsw # ax = Car(e) 264 018c 3D0000 cmp $kQuote,%ax # maybe CONS 265 018f 8B3C mov (%si),%di # di = Cdr(e) 266 0191 74DA je Car 267 0193 3D0000 cmp $kCond,%ax 268 0196 74DA je Evcon # ABC Garbage Collector 269 0198 52 push %dx # save a 270 0199 51 push %cx # save A 271 019a 50 push %ax 272 019b E82EFF call Evlis 273 019e 96 xchg %ax,%si 274 019f 58 pop %ax 275 01a0 E873FF call Apply 276 01a3 5A pop %dx # restore A 277 01a4 89CE mov %cx,%si # si = B 278 01a6 97 xchg %ax,%di 279 01a7 E84EFF call Gc 280 01aa 89D7 mov %dx,%di # di = A
GAS LISTING sectorlisp.S page 7
281 01ac 29F1 sub %si,%cx # cx = C - B 282 01ae F3A4 rep movsb 283 01b0 89F9 mov %di,%cx # cx = A + (C - B) 284 01b2 5A pop %dx # restore a 285 01b3 C3 1: ret 286 287 01b4 CECECECE .sig: .fill 512 - (2f - 1f) - (. - _start), 1, 0xce 287 CECECECE 287 CECECECE 287 CECECECE 287 CECECECE 288 01ef 20534543 1: .ascii " SECTORLISP v2 " 288 544F524C 288 49535020 288 763220 289 01fe 55AA .word 0xAA55 290 2: .type .sig,@object 291 .type kQuote,@object 292 .type kCond,@object 293 .type kAtom,@object 294 .type kCar,@object 295 .type kCdr,@object 296 .type kCons,@object 297 .type kEq,@object


SectorLISP started as an experiment in the Cosmopolitan repo where Justine Tunney used sed to get code built by a Linux x86_64 compiler to run in 16-bit mode. Its original size was 948 bytes, which was gradually reduced with help from Ilya Kurdyukov at BaseALT and Scott Wolchok at Facebook. Alain Greppin did a rewrite after having the brilliant insight of using Steve Russel's coding techniques from the LISP 1.5 manual which finally let us fit SectorLISP in one sector. Peter Ferrie at Amazon and Justine reduced SectorLISP further, by another 110 bytes. The ABC garbage collector was designed and implemented by Justine. The 99 byte Brainfuck implementation was written by Peter and Justine. The musical track is an instrumental recording of Das Model by Kraftwerk from 1978. Credit for the UNICODE paragraph separator trick for C / JS polyglots goes to a code golfer from Estonia named randomdude999. You're invited to come hang out with the SectorLISP team. Use the following link to join our Discord chatroom:


Marc Feeley and Samuel Yvon recently wrote a paper on a Small Scheme VM, Compiler, and REPL which cites SectorLISP. Rather than targeting the legacy sector size of 512 bytes, they targeted the modern 4096-byte page size. That enabled them to offer a great deal more value in a comparatively tiny size, such as a bytecode compiler, which helps illuminates Gosling's original intentions for Java.


[United States of Lemuria - two dollar bill - all debts public and primate]

Funding for this blog post was crowdsourced from Justine Tunney's GitHub sponsors and Patreon subscribers. Your support is what makes projects like SectorLISP possible. Thank you.

See Also

  1. Artificial Intelligence Memo No. 8 by John McCarthy
  2. A Basis for a Mathematical Theory of Computation by John McCarthy
  3. Recursive Functions of Symbolic Expressions and their Computation by Machine by John McCarthy
  4. History of Lisp by John McCarthy
  5. The Roots of Lisp by Paul Graham
  6. The announcement that the U.S. nuclear arsenal no longer relies on floppy disks was made in the New York Times on October 24th, 2019 (see and The U.S. National Nuclear Security Administration is believed to have been breached by the transitive property because they granted a vendor named SolarWinds the ability to remotely manage their systems and SolarWinds got hacked, according to Natasha Bertrand and Eric Wolff at Politico on December 17th, 2020 (see and The first evidence that artifacts distributed by SolarWinds had been tampered with dates back to October 2019 according to Tomislav Peričin at ЯeversingLabs (see and
  7. The IBM 7090 didn't have the concept of bytes so the 32kB binary footprint was tallied based on instructions in the LISP 1.5 listing file and then rounded up to include possible essential data, which was later reconciled with the 27kB code size estimate provided by the LISP 1.5 manual. See
  8. Troades by Seneca