If you live on AI Twitter, this probably won’t have any new information; this is primarily meant to record an interesting ongoing story for posterity so we can look back and see how fast things develop at the moment.

Large language models (LLMs) are a form of artificial intelligence (AI) that let computers process and generate human language. They first appeared in 2018 and they’ve gotten really powerful in the years since.


understands humor


understands code


across languages


can do organic chemistry?

Most popular LLMs (at the time of writing, March 2023) are only partially open– some require you to have a special invitation if you want to use them, others require the general public to access it through a controlled interface without providing any insight into how things work on the inside.

Anyways, on February 24, 2023, Meta (previously known as Facebook) released a LLM called LLaMa. The AI team released the code necessary to use it, but not the weights. (For context, the code is like a car’s metal skeleton and the weights are like the motor or engine, so not releasing the weights means you won’t get any useful results). You might think that this would be the end of the story… were it not for one small detail. They actually did release the weights, but only to those who would do research-like exploration with it. You would fill out a form, request the weights files, and hope they would approve your request. Around March 1, the first LLaMa weights requests got approved.

One day later, an anonymous user leaked all the weights onto the internet via torrent and 4chan.

And suddenly, the internet had its hands on a full-fledged modern LLM that they could run entirely using their own equipment, with no intermediary whatsoever.

Within a day of the initial leak, in an act of now-famous performance art, Github user ChristopherKing42 publicly proposed to add the torrent links to the code release to help “save bandwidth” on downloads. About a week after that, Georgi Gerganov announced that he’d rewritten the entire LLaMa codebase in C++, making it usable without the end user needing to buy graphics card computer hardware costing several thousand dollars. Three days after that (i.e. four days ago), Anish Thite got it to work in the highly battery-constrained and computing power-constrained environment of a standard Pixel 6 smartphone. Twelve hours ago, Justine Tunney published a way to load the models with no user latency. None of these people are famous in the way that Elvis or Michael Jordan are famous, but they just might be the reason why alien minds can live on the little rectangles of light that sit in our pockets.

So, if you’re keeping score at home, this all happened in about 3 weeks. This is one story among many, of one model among many, created by one company among many. And, well, we’re only a quarter of the way through the year 2023. I’m dumbfounded, amazed, worried, but hopeful with how fast things have unfolded. If you are too, then my writing has succeeded.

March 22 (T + 27 days) update: Kevin Kwok announces that Alpaca, a fine-tuned version of LLaMa, now runs on iPhone 14.

Several technology companies appear in this writing. I don’t currently do work for any of them and these are not meant to be endorsements of any company or product. I’ve tried to describe things fully, but sometimes I’ll miss details or leave things out because the AI field is moving so fast it’s hard to keep up :)