At bar on and — strong problem decomposition once you got there. Approaching on Framing; you went architecture-first before quantifying load, and the interviewer had to pull the read/write ratio out of you. Pacing held; Evidence is the next thing to work on.
Strong problem decomposition; missed bottleneck framing — went straight to architecture before quantifying load. The interviewer had to ask for the read/write ratio at 1:26, which is the candidate's job to volunteer, not the room's job to extract.
You opened with a clarification — separating fanout-on-write from fanout-on-read — which is a strong move, because it forces the design conversation to commit to a regime before any boxes get drawn. Then you drew the boxes. You sketched Kafka, a fan-out worker pool, a Redis inbox cache, and a Cassandra cold tier before the page had a single number on it.
Strong candidates land a read/write ratio inside the first ninety seconds. The ratio is the anchor — it tells the interviewer (and you) which side of the system is going to dominate cost, and therefore which design choices are load-bearing and which are decoration. Without the anchor, the rest of the answer drifts in the direction of whatever architecture the candidate already had in their head.
"Right. So my instinct is to set up Kafka in front of a fan-out worker pool. Tweets land in a partitioned topic keyed on the author's user ID, then the worker reads off each partition and writes into a per-follower inbox — probably a Redis-backed inbox cache with a hot tier and a cold tier in Cassandra."
State the ratio upfront. "At one billion users with a power-law follower distribution, this is read-dominated — call it twenty reads per write at the median user, a hundred at the celebrity tier." Then clarify fanout-on-write vs fanout-on-read. The number is the anchor that lets the interviewer trust the rest of the answer.
Once the load was quantified, the decomposition was crisp. You named the celebrity-tier problem before the interviewer asked and proposed a threshold-tunable hybrid — push under ten thousand followers, pull above — which is the textbook staff-level answer to this drill.
Your discussion of the partitioning key was the highest-signal moment in the session. You explicitly contrasted keying on author ID (which simplifies the write path but creates hotspots on celebrity accounts) against keying on follower ID (which spreads write load but multiplies the number of writes). That trade-off is the heart of this question, and you found it without prompting.
"The line I'd draw is at follower count. For accounts under, say, ten thousand followers, fanout-on-write into per-user inboxes is fine — that's the ninety-five percent case and the inbox write is bounded. For accounts above that — the celebrity tier — we flip to fanout-on-read and merge their tweets in at read time. The threshold is tunable."
Numbers showed up late, and only one of them was load-bearing. At staff level, the room expects two or three quantitative anchors — peak QPS, storage cost per user-day, p99 latency budget — that the candidate can defend.
You quoted "twenty reads per write" at 01:38 and then never produced another number until 18:10, when you ballparked the inbox cache size at "a few hundred gigabytes per ten million users". That was the only storage figure you offered, and you offered it without the back-of-envelope arithmetic that would let the interviewer audit it. The number is correct, but a staff candidate shows their work.
You also did not name a p99 read latency target. The interviewer noticed and steered the next question toward latency at 22:04; you adapted, but you let them set the bar instead of setting it yourself.
After the read/write ratio, name three numbers and defend them on screen: peak write QPS (Twitter sustained roughly five thousand tweets per second at the relevant scale), median inbox depth for a long-tail user (~2,400 unread items), and a target p99 read latency under 150 ms. Numbers are how the room trusts you.
When the interviewer pushed back on the thundering-herd problem at 11:18, you paused, restated the failure mode in your own words, and proposed a per-partition concurrency cap before the room had to suggest one. That is what recovery looks like.
The earlier slip — going architecture-first before quantifying load — would have hurt a senior candidate. You corrected it inside one turn and explicitly backed up: "Fair. Let me back up." The room reads that as a senior-level habit, not a stumble.
"Right, so if a celebrity tweets and we try to fan out to ten million followers synchronously, we're going to thunder the inbox cache. The fix is to bound concurrency per partition — say, four hundred concurrent fan-out workers per partition, and let the partition itself queue. We trade fan-out latency for stability."
Twenty-eight minutes was inside the room's budget. You held silence on three follow-ups, which is harder than it looks under pressure — most candidates fill the silence and lose signal in the noise.
The only pacing flag: forty-one seconds spent on clarification before the first number. Strong candidates compress that opening to about twenty-five seconds and bank the remainder for the failure-mode discussion at the end. You arrived at the failure-mode discussion with four minutes left, which was enough but not comfortable. Aim to be there with eight.