Dec 4^th, 2023 @ justine's web page

Bash One-Liner for Summarizing URLs

I spent the last month working with Mozilla to launch a new project called llamafile which lets you distribute and run LLMs with a single file. We had a successful launch five days ago. The project earned 4.4k stars on GitHub, 1064 upvotes on Hacker News, and received press coverage from Hackaday.

Many people are excited about how the project puts control of chat bots into the hands of everyday people. However my favorite thing to focus on is how it can be genuinely useful in helping me get work done, as an old school UNIX hacker, who would rather write shell scripts than tame wild-eyed condas in containers.

I've never used things like the OpenAI API before. As much as I love browsing the web and using online services like Twitter, I simply wouldn't want to use a text generator that talks to the Internet, any more than I'd choose to use a terminal emulator that's built on Electron so it can integrate with Full Story.

I don't want my daily life working in the terminal to become a permanently recorded artifact in someone's cloud. Tools that don't talk to the network have no business being on the network. What I love about llamafile is that, since it's a locally-running dependency-free command line tool that's fast, I finally feel comfortable enough with LLMs to start learning how to use them.

That's because llamafile behaves just like classic commands, e.g. cat (which is basically what an LLM is if you think about it). What I like about the old technology is that, if you're able to run a command, you can automate it using shell scripts. That ability to automate machines is the foundation of all power in tech, and llamafile gives you that power over AI, in such a way that no one can ever take it away from you. So even if you're fearful of AI, please give this a try anyway, because if you do, then you're going to feel like all those scary new powers you've read about in the news, will now be yours to command, using the llamafile command.

Lastly, and even more importantly, you have the freedom to use these tools to enrich yourself, on your own terms (so long as you're not a patent troll), since both llamafile and the Mistral weights are licensed Apache 2.0. That means you can fork it, whitelabel it, or build on top of it; and the things you create can be as purely open or strictly confidential as you want them to be.

Summarization Tutorial

The easiest way to get a local LLM to be useful is to use it to summarize text. With llamafile, you can turn a web page with a long wall of text into a few short paragraphs. Consider the following URL:

Ed Post; USENET 1983
Real Programmers Don't Use Pascal
https://www.pbm.com/~lindahl/real.programmers.html

It's worth the ~20 minutes to read all 3,774 words, but if you need an executive summary, then your llamafile, powered by Mistral 7B, can read the text from the above URL to generate the following:

The article "Real Programmers Don't Use Pascal" is a humorous essay that argues that Real Programmers use Fortran, while Quiche Eaters use Pascal. The author claims that there are two types of programmers: Real Programmers and Quiche Eaters. Real Programmers are those who can understand and write programs in Fortran, while Quiche Eaters are those who cannot.
The article goes on to describe the characteristics of Real Programmers and Quiche Eaters. Real Programmers are said to be able to write programs quickly and efficiently, while Quiche Eaters are said to be slow and inefficient. Real Programmers are also said to be able to understand complex algorithms and data structures, while Quiche Eaters are not.
The article concludes by arguing that the future of programming is bright for Real Programmers, as Fortran remains a popular language and structured coding constructs have failed to replace it. The author also argues that Unix, despite its popularity among hackers, is not a serious operating system for Real Programmers.

Here we see Mistral provides a faithful overview of Ed Post's essay. Mistral also managed to identify the one point Ed Post made that was both offensive and wrong, which is his belief that UNIX is a toy.

How to Turn Any HTML URL Into Plain Text

Mistral is probably smart enough to read an article if you just curl the raw HTML into your prompt. What makes links useful for LLMs is it removes all the unrelated <html> tags which take up space in memory. The PASCAL essay is 3000+ words. Each word is normally broken up into multiple tokens. Mistral only has an 8,000 token context window. So if we include the HTML, we might get an error that we've run out of space.

links -codepage utf-8 -force-html -dump -width 500 \
   https://www.pbm.com/~lindahl/real.programmers.html

One of the links flags that's particularly helpful is -width 500 which turns off line wrapping. With the way LLMs tokenize text, spaces are usually free, since LLM tokens are often chopped up to include spaces, depending on the arbitrary alphabet each model defines. On the other hand, line breaks and long strings of repeating spaces will always be less efficient, and if they're being used purely to reflow paragraphs for human readability, then those added tokens provide no additional value from the LLM's perspective.

Links Installation

The links command is a pretty common. For example, most MacOS users just need to run:

brew install links

If you don't have a package manager, then here's a prebuilt APE binary of links v2.29:

links (7.7mb)
wget https://cosmo.zip/pub/cosmos/bin/links
For AMD64+ARM64 on Linux+Mac+Windows+FreeBSD+NetBSD+OpenBSD

I built it myself to test summarization with my Nvidia graphics card on Windows. Plus it gives me the ability to browse the web using PowerShell, which is a nice change of pace from Chrome and Firefox.

curl -o links.exe https://cosmo.zip/pub/cosmos/bin/links
.\links.exe https://justine.lol

Regardless of your OS, if you have issues running my links binary, then see the gotchas section below.

How to Summarize Text

./mistral-7b-instruct-v0.1-Q4_K_M-main.llamafile

How to Summarize a URL in One Line of Bash

So without further ado, here's how you'd do the above in one piece.

(echo "[INST]Summarize the following article:"; links -codepage utf-8 -force-html -width 500 -dump https://www.pbm.com/~lindahl/real.programmers.html; echo "[/INST]") | ./mistral-7b-instruct-v0.1-Q4_K_M-main.llamafile --temp 0 -c 7000 -n 1000 -f /dev/stdin --silent-prompt 2>/dev/null

Gotchas

On macOS with Apple Silicon you need to have Xcode installed for llamafile to be able to bootstrap itself.

If you use zsh and have trouble running llamafile, try saying sh -c ./llamafile. This is due to a bug that was fixed in zsh 5.9+. The same is the case for Python subprocess, old versions of Fish, etc.

On some Linux systems, you might get errors relating to run-detectors or WINE. This is due to binfmt_misc registrations. You can fix that by adding an additional registration for the APE file format llamafile uses:

sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf
sudo chmod +x /usr/bin/ape
sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"

As mentioned above, on Windows you may need to rename your llamafile by adding .exe to the filename.

Also as mentioned above, Windows also has a maximum file size limit of 4GB for executables. The LLaVA server executable above is just 30MB shy of that limit, so it'll work on Windows, but with larger models like WizardCoder 13B, you need to store the weights in a separate file. An example is provided above; see "Using llamafile with external weights."

On WSL, it's recommended that the WIN32 interop feature be disabled:

sudo sh -c "echo -1 >/proc/sys/fs/binfmt_misc/WSLInterop"

On any platform, if your llamafile process is immediately killed, check if you have CrowdStrike and then ask to be whitelisted.