Publish-Subscribe Pattern Explained: Decoupling Producers from Consumers at Scale

How pub/sub works — topics, subscriptions, message ordering, at-least-once delivery, and real-world patterns with Kafka, SNS, and Google Pub/Sub.

pub-submessagingevent-drivenkafkadistributed-systems

Publish-Subscribe Pattern

The publish-subscribe (pub/sub) pattern is a messaging paradigm where message producers (publishers) send messages to a topic without knowledge of which consumers (subscribers) will receive them, enabling loose coupling between components.

What It Really Means

In a direct communication model, Service A calls Service B's API. Service A must know Service B's address, API contract, and availability. If Service C also needs the same data, Service A must call both. If Service B is down, Service A must handle the failure. Each new consumer requires changes to the producer.

Pub/sub eliminates this coupling. Service A publishes an event — "OrderPlaced" — to a topic. It does not know or care who consumes it. Service B subscribes to that topic and processes new orders. Service C subscribes to the same topic and updates analytics. Service D subscribes and sends confirmation emails. The publisher's code never changes when consumers are added or removed.

This is fundamentally different from a point-to-point queue. In a queue, each message is consumed by exactly one consumer. In pub/sub, each message is delivered to every subscriber. This fan-out behavior is what makes pub/sub powerful for event-driven architectures where multiple systems need to react to the same event.

How It Works in Practice

Message Flow

Delivery Semantics

At-most-once: Message delivered zero or one times. Fast but lossy — acceptable for metrics or logging where occasional data loss is tolerable.

At-least-once: Message delivered one or more times. The most common guarantee. Subscribers must be idempotent because they may process the same message twice.

Exactly-once: Message delivered exactly one time. Extremely difficult in distributed systems. Kafka achieves this with transactional producers and consumer offset commits, but at a performance cost.

Ordering Guarantees

Real System: E-commerce Event Bus

Implementation

Publishing events (Python with Kafka):

python

Consuming events with idempotency:

python

AWS SNS + SQS fan-out pattern:

SNS handles fan-out (one message to many subscribers). SQS provides durability and retry (each subscriber processes at its own pace).

Trade-offs

Benefits:

  • Publishers and subscribers are fully decoupled — add/remove consumers without changing producers
  • Fan-out: one event triggers multiple independent processing paths
  • Load leveling: subscribers process messages at their own rate
  • Fault isolation: a slow subscriber does not block other subscribers

Costs:

  • Message ordering is hard — most systems provide only partition-level ordering
  • Debugging is harder — tracing a message through multiple subscribers requires distributed tracing
  • Eventual consistency — subscribers process events asynchronously
  • Duplicate messages are common — subscribers must be idempotent

When to use pub/sub:

  • Multiple services need to react to the same event
  • You want to add new consumers without modifying existing services
  • Event-driven architectures with asynchronous processing
  • Fan-out: notifications, analytics, audit logging

When to use point-to-point queues instead:

  • Each message should be processed by exactly one consumer
  • Work distribution (task queue pattern)
  • Load balancing across worker instances

Common Misconceptions

  • "Pub/sub guarantees message delivery" — Most pub/sub systems provide at-least-once delivery, not exactly-once. Messages can be lost (at-most-once) or duplicated (at-least-once). Build consumers to be idempotent.
  • "Pub/sub and message queues are the same thing" — Queues deliver each message to one consumer. Pub/sub delivers each message to every subscriber. They solve different problems.
  • "Kafka is pub/sub" — Kafka supports both patterns. With consumer groups, it acts as a queue (messages distributed across group members). Without consumer groups, it acts as pub/sub (every consumer gets every message).
  • "Pub/sub means real-time" — There is always some latency: publishing, broker processing, delivery, subscriber processing. For Kafka, end-to-end latency is typically 5-50ms. For SNS/SQS, it can be 100ms-1s.

How This Appears in Interviews

  1. "Design a notification system" — Classic pub/sub: events published to topics, subscribers for push notifications, email, SMS. Discuss fan-out, delivery guarantees, and retry.
  2. "How do microservices communicate asynchronously?" — Pub/sub for event-driven communication. Compare with request/response (synchronous) and point-to-point queues.
  3. "How do you ensure a message is processed exactly once?" — Explain why exactly-once is hard, at-least-once with idempotent consumers is the practical solution, and how Kafka's transactional API approaches exactly-once.
  4. "Design a real-time analytics pipeline" — Kafka topics for ingestion, consumer groups for parallel processing, fan-out to multiple analytics consumers.

Related Concepts

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.