anderegg.ca

Thoughts on the recent migration from X to Bluesky

October 21, 2024

A flood of users left X and moved to Bluesky over the weekend. I’ve written a bunch of times in the past about Bluesky, and have gone from grumpy frustration to general acceptance of the service. I’m happy that people are moving away from X, and I think Bluesky is a reasonable replacement. That said, I think new users should understand a bit how the Bluesky system works. Some of reasons that people are leaving X also exist on Bluesky.

There are two big reasons I saw for the most recent X exodus. The first is that there’s a change to the way blocking works. Now users you have blocked on X will be able to see your posts, just not interact with them. The second is that X updated its terms of service to make it clearer that they would train their AI models with user data, and also allow third parties to use data for AI training.

Paul Frazee, a developer at Bluesky, posted a thread yesterday which gives a high-level overview of the AT Protocol (atproto). The whole thread is great. The gist is: lots of servers (PDSs) having their content slurped up by other servers (Relays), and aggregated out to readers (AppViews). That last sentence is a massive simplification. There is a lot of complexity under the hood, but users don’t have to interact with it. Here’s an official docs page with another high-level overview and a nice diagram. I point to Paul’s thread because his RSS analogy is apt. It’s also one of the main reasons I don’t prefer the tradeoffs made by atproto.

A Personal Data Server (PDS) is like a Git repository where one or more users might house their Bluesky content. It’s something that people can self-host today. Though most users have their content on a PDS hosted by Bluesky, the idea is that you can move your PDS to another host if you’d like. But PDSs are just stores for user data. They don’t communicate directly with other parts of the Bluesky system. Instead, the Relay is responsible for crawling all PDSs and making their data available to App Views. Because of this, all data 1 on PDSs needs to be public. This means that anyone you’ve blocked on Bluesky can still read all of your posts. This is part of the atproto design.

Similarly, because the protocol was designed for Relays to be able to read all Bluesky content, any content on the service is available to anyone. Including those who would use it for AI training. This is the “RSS” piece of the system. Relays need to know about all the PDSs, so it’s much easier to scrape Bluesky than it is to scrape the public web. All content from all users needs to exist as a feed, and the Relay needs to know where each of them is. This design makes it easy to grab all the content on the service, if you have the time, space, and bandwidth. More than that, it’s trivial to download an export of any user’s Bluesky content, even if they’ve blocked you. I don’t have any proof that people are using Bluesky data for AI training, but I’d bet very heavily that it’s happening.

I’m not writing this to dissuade people from using Bluesky. It’s definitely not a cesspool like X, which is reason enough to switch. It’s also an exciting take on decentralized social media. Heck, my new city councillor campaigned there, as Halifax has turned into a bit of a Bluesky bubble. I’m trying not to be a grump about this platform, but they’ve made several trade-offs that I’m not so sure about. I’m very happy people are migrating from X, but I think that people should come to Bluesky with their eyes open. If your main concerns about X were being able to bypass blocks or having data available to companies training AI, you should know that Bluesky doesn’t solve those specific issues.


  1. There are some things that PDSs keep private — which accounts you have muted, for instance. It’s still not clear to me how direct messages work on the server side, but these seem to be stored outside of the PDS. The main point is that all of your public-facing content on Bluesky needs to be accessible by anyone for the system to work.