Uptime golf

April 27, 2026

I’ve been noticing a lot of service outages lately. Some with few enough nines that you’d think they were going for a low score. My guess: this is probably going to get worse before it gets better. But maybe not for the reasons you’d think.

First off, let’s look at GitHub. It launched in 2008 and was acquired by Microsoft in late 2018. For most of its life, it was boring infrastructure in the best possible way. More recently it’s been up and down like a yo-yo. The official status page currently lists most services at two nines of uptime over the last 90 days… but counting slightly differently makes it look even worse.

Some people have blamed AI for this. Not because everyone at GitHub is vibe coding now (though, that might not be helping things). Instead because Microsoft is prioritizing AI projects over everything else, including at GitHub. GitHub also has had a substantial influx of users as more people are building software with AI and using AI agents. On top of all that, Microsoft recently started re-platforming work to move GitHub services to Azure. This is taking resources away from efforts that could help increase stability. ¹ ²

The companies providing AI services aren’t doing much better. Anthropic’s Claude has been between one and two nines of uptime over the past three months. OpenAI’s ChatGPT has done a bit better at around two nines. There might be more of a reason to blame these outages on vibe coding, but I think the simpler answer is that it’s tough to scale. These services are being used by more people every day, and the systems in play are novel.

The other service I’ve seen having issues a lot recently is Bluesky. Over the past month or so, the service has been pretty rough to use. A lot of folks in my feed are blaming these outages on vibe coding. But again, there have been better explanations. One outage was based on a bit of an oopsie, but I’ve definitely seen worse before LLMs existed. The other was caused by an extended attack on the platform.

I’m writing about this because I don’t think blaming everything on “vibe coding” is helpful. Here’s the thing: I think those who uncritically use LLMs to generate code are racking up a lot of technical debt. But I also think that using “vibe coding” as a knee-jerk bogeyman is missing the point. There are much simpler explanations for all of these things. Some of them can even already reasonably be blamed on AI!

More than that, I think there are likely more outages coming. Like it or not, LLMs are very good at generating code. They’ve also become great at finding issues with systems. cURL author, Daniel Stenberg, ended the project’s bug bounty program earlier this year after getting massive amounts of awful AI-generated vulnerability reports. He then started getting a flood of legit issues, followed by more, and is anticipating even more. Bad actors have the same access to these tools as well-meaning security folks.

Some people said Claude’s Mythos was simply hype, and I get it. Obviously major issues can be found with current models. But I don’t think this bodes well for anyone. Even if LLM pricing starts reflecting its actual costs, I worry that there might be more downtime to come.

As I’m writing this, GitHub is currently experiencing another incident. ↩
Also, NPM. ↩