Kafka is an open-source distributed event streaming platform adopted by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. At its core it is a publish/subscribe system: producers write messages to topics, and consumers read from the topics they care about. It extends far beyond traditional message queues like ActiveMQ or RabbitMQ.

⚡ Quick Takeaways
  • Log, not a queue — messages persist after being read; each consumer tracks its own offset.
  • Partitions give horizontal scale and per-key ordering.
  • Consumer groups give parallel reads — one partition per member.
  • acks + ISR + idempotence tune durability all the way up to exactly-once.
  • Fast because of sequential writes, page cache, and zero-copy.
tldr

Kafka is a durable, partitioned, replicated commit log. Producers append records to partitions, consumers pull at their own pace, and replication + acks give you tunable durability — from fire-and-forget to exactly-once.

Major Use Cases

Pub/Sub Model

Different categories of messages flow into distinct topics from various producers. Unlike traditional queues, messages persist after consumption — Kafka manages the lifecycle independently of any single consumer, so multiple consumers can read the same topic at different offsets.

Producer Architecture

The producer calls .send() with a ProducerRecord (a message wrapper). The flow:

Consumer Model

Kafka uses a pull model — consumers request data on their own schedule, giving more decoupling than push. Many independent consumers can read the same topic, but consumers within a single consumer group cannot share a partition: the group behaves as one logical subscriber, and each partition is owned by exactly one member.

Partitions & Replicas

Parallelism comes from splitting a topic into partitions spread across servers. The DefaultPartitioner uses hash(key) % num_partitions; custom implementations override partition().

Each partition is an append-only log segmented at 1 GB, with a sparse index record every 4 KB for fast lookups. Retention (log.retention.hours/minutes/ms, default 7 days) deletes or compacts old data based on log.cleanup.policy. Replication guards against single-point failures via a leader/follower scheme.

System Reliability

acks

This producer-side setting controls how many brokers must receive a record before the write is considered successful — the core durability/throughput tradeoff:

SettingBehavior
acks=0Success the moment the record is sent — no broker response awaited. Fastest, least durable.
acks=1Success once the leader receives the record.
acks=all (-1)Success only after all in-sync replicas (ISR) acknowledge. If ISRs drop offline, streaming blocks. min.insync.replicas=n guarantees a minimum number of copies.
reliable transmission

For durable delivery: acks=-1, num_partitions > 1, and min.insync.replicas > 1.

Retries & Delivery Semantics

RETRIES_CONFIG defaults to MAX_INT for Kafka ≥ 2.1. Retries risk duplication unless managed:

Replicas

Multiple followers sync from the leader. Producers only write to the leader; followers fetch from it. The ISR lists in-sync followers; a follower lagging beyond replica.lag.time.max.ms moves to the OSR. ISR + OSR = Assigned Replicas (AR). On leader failure, a new leader is elected from the ISR. Each replica tracks its LEO (Log End Offset) to measure sync progress.

High-Speed Read/Write

ZooKeeper

A coordination service for distributed systems. With Kafka it tracks cluster node status and topic/message metadata. Five primary functions:

CLI Examples

Create a topic with 2 partitions and replication factor 3:

bash
# test.config → bootstrap.servers=localhost:9092
kafka-topics \
  --bootstrap-server localhost:9092 \
  --command-config ./test.config \
  --topic test1 \
  --create \
  --replication-factor 3 \
  --partitions 2

Produce keyed messages, then consume from the beginning:

bash
# producer
kafka-console-producer \
  --topic test1 \
  --broker-list localhost:9092 \
  --property parse.key=true \
  --property key.separator=:
# > key:{"val":0}

# consumer
kafka-console-consumer \
  --topic test1 \
  --bootstrap-server localhost:9092 \
  --property print.key=true \
  --from-beginning

Consumer Groups and Rebalancing

A consumer group is the unit of parallel consumption in Kafka. Every consumer instance in a group is assigned a disjoint subset of partitions, and the group together consumes the full topic. Add a sixth consumer to a five-partition topic and that consumer will be idle — there are no spare partitions to assign. The constraint is deliberately simple: it means each partition has exactly one authoritative offset per group, and progress is never ambiguous.

The Rebalance Protocol

A rebalance is triggered whenever group membership changes: a new consumer joins, an existing one crashes or exceeds session.timeout.ms, a new partition is added to the topic, or an application calls unsubscribe(). During a rebalance, the Group Coordinator (a broker elected for each consumer group) stops all consumption, revokes partition assignments, then redistributes them. The cost is a pause — sometimes several seconds — during which no messages are processed. Minimizing rebalance frequency and duration is therefore a major operational concern.

properties
# Tune these to reduce unnecessary rebalances
session.timeout.ms=45000        # how long before a silent consumer is declared dead
heartbeat.interval.ms=15000     # must be < session.timeout.ms / 3
max.poll.interval.ms=300000    # max time between poll() calls before consumer is dropped
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor

Eager vs. Cooperative Rebalancing

The classic eager rebalance (used by RangeAssignor and RoundRobinAssignor) revokes all partition assignments before any redistribution begins — a global stop-the-world pause. Kafka 2.4+ introduced the cooperative sticky rebalance protocol (CooperativeStickyAssignor): consumers only revoke partitions that need to move, while keeping the ones they retain. This typically eliminates the pause entirely for consumers not involved in the reassignment, and keeps partition assignments "sticky" to minimize unnecessary state migration.

pitfall

max.poll.interval.ms is the silent killer. If your poll() loop takes longer than this (e.g., a slow database write downstream of message processing), the broker assumes the consumer is dead and triggers a rebalance. The fix: either increase the timeout or, better, move slow work to a separate thread and keep the poll loop tight.

Static Group Membership

Kafka 2.3+ added group.instance.id, which gives a consumer a persistent identity across restarts. With static membership, a consumer that crashes and rejoins within session.timeout.ms reclaims its old partitions without a rebalance — an important optimization for container workloads where pods restart frequently during deployments.

Log Compaction

Retention in Kafka comes in two flavors. The default delete policy simply removes segments older than log.retention.hours (or exceeding a size threshold). Log compaction is fundamentally different: instead of discarding messages by time, it retains only the most recent message for each key, creating a compacted log that behaves like an eventually-consistent key-value store.

bash
# Create a compacted topic for a user-profile changelog
kafka-topics \
  --bootstrap-server localhost:9092 \
  --topic user-profiles \
  --create \
  --partitions 12 \
  --replication-factor 3 \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.1 \
  --config segment.ms=86400000    # roll a new segment daily

How the Log Cleaner Works

The broker's background log cleaner threads scan the "dirty" (recently written) portion of each partition. They build an offset map in memory (key → latest offset), then copy only the records from the dirty portion that are the latest for their key into a new "clean" segment, discarding older duplicates. The result: the compacted log grows only with distinct keys, not with time.

A special tombstone message — a record with a non-null key and a null value — signals deletion. After compaction, even the tombstone is eventually removed (after delete.retention.ms), so consumers that read the compacted log see the key fully erased.

use case

Log compaction powers Kafka Streams' state stores and changelog topics. When a KTable is rebuilt after a crash, the application replays only the compacted log — not the full history — to reconstruct current state. Database CDC topics (Debezium) also use compaction so downstream consumers can bootstrap from the latest row image per primary key.

KRaft — Kafka Without ZooKeeper

ZooKeeper served Kafka for years as its cluster coordination layer: controller election, broker membership, topic metadata. But it introduced a separate operational dependency, a separate scaling concern, and a bottleneck during cluster startup. KRaft (Kafka Raft Metadata mode), introduced in Kafka 2.8 and production-ready from Kafka 3.3, replaces ZooKeeper entirely with Kafka's own built-in Raft consensus implementation.

Architecture Shift

Under KRaft, a subset of brokers are designated controllers. They run a dedicated metadata partition (topic __cluster_metadata) replicated via the Raft protocol. The active controller is the Raft leader. All cluster metadata — broker registrations, topic configurations, partition leadership — lives in this single log rather than spread across ZooKeeper znodes.

properties
# KRaft server.properties (combined broker+controller node)
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
log.dirs=/var/lib/kafka/data

KRaft Benefits

migration path

Kafka 3.5+ supports a migration mode that lets you convert an existing ZooKeeper-based cluster to KRaft without downtime: run both coordinators simultaneously during the transition, then decommission ZooKeeper once all metadata is in KRaft. Never skip this dual-write phase in production.

Performance Tuning Parameters

Kafka's defaults are conservative. Real-world tuning targets three dimensions: producer throughput, consumer lag, and durability guarantees. The following parameters are the levers that matter most.

Producer Tuning

properties
# Throughput-optimized producer
acks=all                   # durable writes to all ISR replicas
enable.idempotence=true    # deduplicate retries, exactly-once within producer session
linger.ms=5               # wait up to 5ms to batch records; 0 = send immediately
batch.size=65536          # 64 KB batch; larger = fewer round trips, more latency
compression.type=lz4     # lz4 for best CPU/ratio tradeoff; snappy also common
buffer.memory=67108864   # 64 MB in-memory buffer; increase for high-volume producers
max.in.flight.requests.per.connection=5  # >1 is fine with idempotence enabled

Consumer Tuning

properties
fetch.min.bytes=1024         # wait until 1 KB available; reduces fetch round trips
fetch.max.wait.ms=500        # max wait for fetch.min.bytes to be met
max.poll.records=500         # records per poll(); reduce if processing is slow
auto.offset.reset=earliest  # on first run: start from oldest; latest = tail only
enable.auto.commit=false    # manual commit for exactly-once processing semantics

Broker and Topic Tuning

properties
# broker-level
num.io.threads=8             # number of I/O threads; set to disk count
num.network.threads=3       # network request threads
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
log.flush.interval.messages=10000  # fsync every 10K msgs; leave to OS for best perf

# topic-level overrides via kafka-configs
kafka-configs --bootstrap-server localhost:9092 \
  --alter --entity-type topics --entity-name orders \
  --add-config retention.ms=604800000,min.insync.replicas=2

Kafka vs. RabbitMQ vs. Apache Pulsar

Choosing the right messaging system is one of the highest-leverage architecture decisions you can make. Kafka, RabbitMQ, and Pulsar each occupy a different point in the design space.

DimensionKafkaRabbitMQPulsar
Message modelDurable log — persists after consumption; consumers replay at any offsetQueue — message is deleted after acknowledgementLog + queue unified: topics backed by a durable log, queues as a cursor abstraction
ThroughputMillions of msg/sec per cluster via sequential disk I/O~50K–100K msg/sec per queue (memory-bound)Comparable to Kafka via Apache BookKeeper storage
LatencyLow single-digit ms with tuned batching; higher with lingerSub-millisecond p50 (no batching overhead)5–15 ms typical; geo-replication adds more
RoutingTopic + partition key; no content-based routingRich routing via exchanges (direct, fanout, topic, headers)Namespaces + topic hierarchy; no exchange routing
ReplayNative — seek to any offset within retention windowNot supported — consumed messages are goneNative — backed by BookKeeper ledgers
Multi-tenancyNamespaces (limited)Virtual hostsFirst-class: tenants → namespaces → topics with quotas
Geo-replicationMirrorMaker 2 (async)Shovel / Federation pluginsBuilt-in, synchronous cross-datacenter replication
Ops complexityModerate (KRaft removes ZK dependency)Low — excellent management UI, easy setupHigh — Kafka + ZooKeeper + BookKeeper triad historically
Best forEvent streaming, CDC, audit logs, high-throughput pipelinesTask queues, RPC patterns, complex routing, low latency jobsMulti-tenant SaaS, native geo-replication, tiered storage at petabyte scale
decision guide

Pick Kafka when you need replay, high throughput, CDC, or event sourcing. Pick RabbitMQ for complex routing logic, task queues, or sub-millisecond latency requirements. Pick Pulsar when you need built-in geo-replication across datacenters or true multi-tenancy with per-namespace quota isolation — at the cost of significantly higher operational complexity.

Kafka Streams and Kafka Connect

Kafka is not just a message broker — it is the foundation for an entire streaming processing ecosystem. Two components extend it furthest: Kafka Streams for stream processing inside your JVM application, and Kafka Connect for plug-and-play data pipeline connectors.

Kafka Streams

Kafka Streams is a lightweight client library (not a cluster) for building stream processing applications. Unlike Apache Flink or Spark Streaming, there is no separate processing cluster to manage — the library runs inside your application process, reads from input topics, and writes results to output topics. Fault tolerance is achieved by storing state (windowed aggregations, join tables) in RocksDB on disk, backed by a Kafka changelog topic for recovery after crashes.

java
StreamsBuilder builder = new StreamsBuilder();

// read from "orders" topic, group by userId, count over 5-minute windows
KStream<String, Order> orders = builder.stream("orders");
orders
  .groupByKey()
  .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
  .count()
  .toStream()
  .to("orders-per-user-5min");  // write results back to Kafka

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

Key Kafka Streams concepts:

Kafka Connect

Kafka Connect is a scalable, fault-tolerant framework for moving data between Kafka and external systems — databases, object stores, search engines, SaaS APIs — without writing custom code. Connectors come in two flavors: source connectors read from an external system and write to Kafka (e.g., Debezium PostgreSQL CDC, S3 source), and sink connectors read from Kafka and write to an external system (e.g., Elasticsearch sink, Snowflake sink, JDBC sink).

json
{
  "name": "postgres-orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.prod.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.dbname": "orders",
    "table.include.list": "public.orders,public.inventory",
    "topic.prefix": "cdc",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_orders",
    "publication.name": "dbz_publication"
  }
}

Connect runs as a cluster of workers that distribute connector tasks across themselves. If a worker fails, its tasks are reassigned to surviving workers — exactly the same consumer group rebalance mechanism. This means Connect scales horizontally by adding workers and survives node failures without operator intervention.

Schema Registry and Avro

Raw bytes in Kafka topics become a maintenance nightmare as schemas evolve. The Confluent Schema Registry (or AWS Glue Schema Registry) stores Avro, Protobuf, or JSON Schema definitions and assigns each version a numeric ID. Producers serialize messages by embedding a 4-byte schema ID in the payload prefix; consumers look up the schema by ID to deserialize. This enables schema evolution with compatibility checks: the registry rejects a new schema version that is backward- or forward-incompatible with the registered versions, preventing silent data corruption across independently deployed services.

avro
{
  "type": "record",
  "name": "OrderEvent",
  "namespace": "com.example.orders",
  "fields": [
    { "name": "orderId",  "type": "string" },
    { "name": "userId",   "type": "string" },
    { "name": "totalCents", "type": "long" },
    { "name": "currency", "type": "string", "default": "USD" }
  ]
}

Common Pitfalls and Best Practices

Partition Count — Getting It Right

Partitions are Kafka's unit of parallelism and they cannot be decreased once created (only increased). Choosing too few caps your consumer throughput ceiling; choosing too many wastes memory and increases rebalance time. A rough starting point: aim for 1–3 MB/sec throughput per partition based on your broker disk bandwidth, and target roughly 1 partition per consumer instance you expect to run at peak. A common mistake is creating topics with a single partition to "keep things simple" — this makes every consumer in the group idle except one.

Offset Management

With enable.auto.commit=true, offsets are committed periodically regardless of whether the consumer actually processed the message. A crash between the commit and the downstream write means the message is permanently skipped. Use enable.auto.commit=false and commit offsets only after your processing is complete and durable. For exactly-once, wrap the downstream write and the offset commit in a Kafka transaction.

Consumer Lag Monitoring

Consumer lag — the gap between the latest produced offset and the consumer's committed offset — is the primary operational health metric for a Kafka consumer. A growing lag that doesn't recover signals either a slow consumer or a sudden production spike. Monitor it with Kafka's built-in consumer group describe, Confluent's Kafka Consumer Lag Exporter, or LinkedIn's Burrow.

bash
# Check consumer group lag across all partitions
kafka-consumer-groups \
  --bootstrap-server localhost:9092 \
  --group order-processing-group \
  --describe

# Sample output
# GROUP                   TOPIC  PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
# order-processing-group  orders  0          1023440         1023441         1
# order-processing-group  orders  1          998772          999201          429  ← lag growing!

Avoiding Common Anti-Patterns

Real-World Usage Scenarios

Understanding why organizations reach for Kafka — rather than a simpler queue — solidifies when to recommend it in an interview.

takeaway

Think of Kafka as a replicated log, not a queue. Partitions give you horizontal scale and ordering-per-key; consumer groups give you parallel reads; and acks + ISR + idempotence let you dial durability from fire-and-forget all the way to exactly-once. KRaft removes ZooKeeper from the equation; log compaction turns topics into a durable KV changelog; Kafka Streams and Connect complete a full streaming ecosystem without external dependencies.

🎯 interview hot-takes

How does Kafka guarantee exactly-once? Idempotent producer (PID + sequence number prevents duplicates on retry) plus transactions that wrap beginTransaction()/commitTransaction() around produce + offset-commit atomically.
Why is Kafka so fast? Sequential append-only writes + OS page cache (avoids double buffering) + zero-copy sendfile syscall + batch compression.
What triggers a consumer group rebalance? Member joins/leaves, session timeout exceeded (session.timeout.ms), max poll interval breached (max.poll.interval.ms), or partition count changes. Use CooperativeStickyAssignor to make rebalances incremental instead of stop-the-world.
Log compaction vs. retention? Retention deletes old segments by time/size; compaction retains only the latest record per key — effectively a durable KV changelog. Null-value tombstones signal deletes.
Kafka vs. RabbitMQ? Kafka for high-throughput event streaming, replay, and CDC; RabbitMQ for complex routing patterns, task queues, and sub-millisecond latency requirements.
What is KRaft and why does it matter? Kafka's built-in Raft metadata layer that replaces ZooKeeper, removing the separate dependency, enabling millions of partitions, and cutting cluster startup from minutes to seconds.

← back to
Tech Stacks