A castle on a swamp: part 2, walking on water
This is the 2nd post in “A castle on a swamp.” I recommend reading part 1 if you haven’t already.
Last time we explored how a group of dedicated geeks figured out how to create a competitive game out of the uncompetitive mess that is Pokemon. It turns out that making something stable on unstable foundations happens in lots of places. Many of the things we take for granted are really castles built upon swamps. And yet the castles remain standing anyways.
If you’ve ever used the internet, you’ve interacted with TCP. It’s designed to reliably send data from one computer to another. But here’s where it gets weird: it’s built using IP, which is inherently unreliable. It’s like trying to reliably send a letter but you can only use envelopes that spontaneously combust half the time.
I’ll quote Joel Spolsky here, because the wording here is incredible and everyone should read it at least once in their life:
Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes.
Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn’t have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country… If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn’t even tell the movie directors in California what happened.
To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.
TCP is the Hollywood Express, and IP is cars. We never find out about the unreliable parts of IP because TCP does a really good job of masking it.
The next time you use Wi-fi, realize that you’re Hollywood-Expressing your cat pictures through the air with invisible radio waves that get beamed into space and somehow everything usually ends up on the other side of the world with no problem at all.
Rails are made of miles and miles of thin strips of metal. We expect them to work reliably, even when it’s raining, even when it’s hot or cold (think of thermal expansion and contraction), even when weeds grow near them, even when there’s ground vibration, metal fatigue, freak accidents, rust buildup, dirt, etc.
Oh, and it’s not like the load is evenly distributed. More like nothing happens for hours, then a train weighing a gajillion pounds thunders across it for a few seconds, and then back to hours of nothing.
The best solution we’ve collectively come up with involves the metal rails being loosely connected to the horizontal pieces of wood (called ties), which in turn sit on a pile of little crushed rocks that are nailed down to nothing, which in turn sit on a pile of rock which is what actually sits on the ground.
You’ll notice that most things are only loosely connected to each other. Nothing is nailed down, nothing is rigid. It turns out that this is a feature, not a bug, because if you try to lock down the tracks too tightly, the metal buckles.
Have you ever noticed that big websites like Google pretty much never seem to go down?
In a lot of cases, the reason why they’re so reliable isn’t because they have higher-quality hardware. In fact, a great deal of these computer systems are built using commodity hardware: off-the-shelf, cheap, easy to get, interchangeable.
The point is to distribute the load across the system rather than letting a single point of failure break everything. A machine, no matter how well-made, breaks down eventually. But there are ways to design systems that survive even when individual computers routinely crash, go down for maintenance, burn in a fire, explode, be taken over by hackers, burn out, trip over their own shoelaces, lose power, get flooded, be overwhelmed by spam, get overwhelmed by fanmail, deal with existential dread, get food poisoning, encounter bugs in the code, or fail in all sorts of other creative ways.
Fun fact: people who design these sorts of systems often just start by assuming failures of all sorts are happening all the time. When you have enough computers in one place, hardware failures all over the place isn’t a catastrophe, it’s just Tuesday. What systems folks have figured out is how to get useful work out of this reality anyways. As long as the failure doesn’t hit too many of your computers at the same time, no one will ever notice.
In math, it might seem like we start from rock-solid axioms and then logic our way towards all the interesting complexity. That’s how school usually teaches math: step-by-step, start with your assumptions, deduce your way to the ending, and neatly write down your “two-column proof” for full credit on the homework.
Except in math, the assumptions are just that. ssumptions. Worse yet, we can’t even prove that our axioms don’t contradict each other.
There’s nowhere solid to stand on! There’s no solid ground! (You know what solid ground is? That’s the Earth’s crust. Which, by the way, floats on the mantle, which– you guessed it– is liquid rock. I hope you’re beginning to see a pattern here.)
One last point: careful not to confuse what we’re seeing here with “emergent complexity”. Emergent phenomena have to do with complex behavior coming out of simple mechanisms. Think about how the interactions between molecules give us cells, how small groups of people form nations, how the simple neurons form the complex brain.
This is the opposite. The foundations are unstable, complicated, chaotic, shifting all the time, and pinning anything down is like trying to walk on water. The real revelation is that we can still make something useful and meaningful out of it anyways.
There’s no standing still on the surface of the sea, but paddle fast enough and you can surf the waves.