Take a minute. I'd like you to design the notification fan-out for a service at Twitter's scale. Roughly a billion users, a mix of celebrities and ordinary accounts. Start wherever you like.
Okay. So before I go anywhere, I want to clarify the read pattern. Are we optimizing for fanout-on-write — push notifications into a per-user inbox at the moment a tweet is posted — or fanout-on-read, where the inbox is materialized when the user opens the app?
Good question. Assume both will be present in the real system. Tell me where you'd draw the line.
Right. So my instinct is to set up Kafka in front of a fanout worker pool. Tweets land in a partitioned topic keyed on the author's user ID, then the worker reads off each partition and writes into a per-follower inbox — probably a Redis-backed inbox cache with a hot tier and a cold tier in Cassandra. The read path is then just an inbox-by-user-id query at the edge.
Before you draw the boxes — what's the read-to-write ratio you're designing against?
Fair. Let me back up. At one billion daily-active users with a power-law follower distribution, the read side dominates — call it twenty reads per write at the median account and a hundred reads per write at the celebrity tier. So the system is read-dominated, and the cost question is whether we materialize the inbox at write time and pay storage, or compute it at read time and pay latency