Photo by Алесь Усцінаў
No man is an island entirely of itself; every man is a piece of the continent, a part of the main.
– For Whom the Bell Tolls
I stood in a house that never should have been built. It was beautiful—a marvel of modern design. The walls were a pristine white, smooth to the touch, and the oak floors gleamed in the midday light. The fireplace stood at the center like a warm, welcoming anchor, and the lights glowed softly from above, casting a perfect shadow on the room’s symmetry.
But what made the house a horror, was what it took to keep it standing.
Every beam, every nail, every piece of drywall was connected to the Internet. If the company supplying the nails shut down, the whole place would crumble. If the Web service running the lights crashed, the living room would turn dark and cold. If the supplier for the fireplace lost its network, the chimney might collapse. It wasn’t that each component couldn’t fail, but that any failure would crash everything.
I will never set foot in this place again.
And yet, this fragile house is exactly the system that holds up the world.
The banks, the transportation systems, utilities, voting systems —all built like that house.
We’ve designed our world to work brilliantly, until it fails.
Every engineer and manager worth their salt will tell you that a well-built system should have no single points of failure. Yet the reality is that what’s been built—what I’ve helped build—is a mess of tightly coupled services, web integrations, and dependencies that span continents and time zones. An outage from one company like CloudStrike ripples across industries, causing unknown chaos to unfold in corners we didn’t even know were connected.
There’s no safety in these integrations, only a dangerous trust that the chaos won’t raise its ugly head one day and crash our systems.
Years ago, Ronald Coase wrote that a company, is a collection of contracts, each one carrying its own cost.
We outsource parts of our business when the external cost is cheaper than doing it ourselves.
But outsourcing costs, like a landscape at dusk, are not always what they appear.
Those contracts tie you into a spider’s web of suppliers and systems whose names you don’t even know.
What was once a cost-saving maneuver becomes a liability—a monster that bites off your head when the house of cards comes down.
And it will come down. CrowdStrike will happen again. Sooner than you think.
The last decade, with its growth of Web integrations, software-as-a-service, and the rise of a global, networked society, has turned this fragile arrangement into an all-too-common reality. Because it’s so easy to stitch services together—to say, “Let’s have this third party do it, since they’re cheaper”—we end up weaving an ever-tightening noose around our systems. The more contracts, the more unseen dependencies, the greater the fragility.
Every time you hit “accept” on a new service integration, you’re placing a beam in that fragile house, adding a connection that looks solid until the slightest breeze—the wind of a supplier going bankrupt, a software update gone awry, or a malicious actor—turns it to dust.
We build systems that are strong only as long as they’re strong.
Our digital world, like a city built on stilts, is full of brittle bridges.
It’s why a minor software glitch in a company you’ve never heard of can cause the entire civil aviation industry to crash. When one part fails, the ripple spreads far beyond what any one of us imagined.
Nature is different. It’s full of failures. In nature, there’s no single point of failure, no critical contract that can bring everything down. Trees fall in the forest, animals die in the wild, rivers dry up in droughts, but nature goes on. What looks like chaos is, in reality, a complex web of redundancies, each component connected to another in ways that allow failure to happen without catastrophe.
We must learn to mimic nature in how things fail.
It’s a paradox, that the strength of a system is not in its ability to avoid failure, but in its capacity to absorb it.
Resilience is more important than success
Take the example of Netflix’s Chaos Monkey—a tool designed to break things. It wasn’t built to help Netflix succeed; it was created to ensure that when things fail, they fail gracefully. Chaos Monkey intentionally sabotages Netflix’s systems, knocking out servers, disrupting services, and generally causing mayhem. Why? Because forcing systems to fail under controlled conditions reveals where the real weaknesses lie.
It’s a brutal practice, costly in the short term and painful for engineers who have to fix the mess. But what’s the alternative? Building something that works perfectly, until one day it doesn’t?
The hard truth is that tech companies are rewarded for building features that succeed, not for systems that fail well.
We see it in the way contracts are written, in the budgets allocated to engineering teams, and in the products rolled out with fanfare. If you add a new feature, you get a pat on the back. If you quietly reinforce the system’s resilience, no one notices.
The lesson we’ve learned from nature is that resilience is more important than success. A forest can thrive with the death of a few trees. An ecosystem can absorb the disappearance of a species. The resilience comes not from everything working as planned, but from the ability to adapt, to take a blow and keep moving forward.
This is what we need in our technological infrastructure: Deep complexity, not just surface complexity. We need systems that have redundancy, that can bend and not break, that can lose a piece without losing the whole. That means designing systems that expect failure and handle it.
We need chaos as a first principle in design.
It starts with leadership that understands this first principle.
With companies that are willing to invest not just in the UX that pops, but in the back-end systems that quietly hold up the world. It means being willing to build a house that, like nature, can lose a wall, can watch the lights go out, can let the fireplace flicker and die, and still stand tall.
That’s the challenge of our era: to build with failure in mind.
To construct not just for beauty and function, but for the long, slow collapse that’s inevitable—and to ensure that when the bell tolls, it’s not for all of us.
Inspired by Bruce Schneier and dedicated to CrowdStrike