A message queue is a durable buffer that decouples the producer of a message from its consumer. The producer sends a message to the queue and moves on; one or more consumers process it asynchronously, at their own pace. This asynchronous handoff is what makes message queues a fundamental building block for resilient, loosely-coupled distributed systems. But message queuing is not a single, monolithic concept — it spans a spectrum from simple task queues (RabbitMQ, SQS) to append-only distributed commit logs (Kafka, Pulsar), each with fundamentally different semantics around message retention, consumer models, ordering, and delivery guarantees.
Choosing the wrong messaging system — or misunderstanding its delivery guarantees — is a common source of data loss, duplicate processing, and hard-to-reproduce production bugs. This guide covers the complete picture: why queues exist, the three delivery guarantees and when each applies, the queue vs commit-log architectural divide, ordering and idempotency, consumer group models, backpressure and dead-letter queues, and a detailed three-way comparison between RabbitMQ, SQS, and Kafka to help you pick the right tool for your system.
- At-least-once is the practical default — design consumers to be idempotent (check a dedup ID or use natural idempotency like upserts).
- Queue semantics (RabbitMQ, SQS): messages are deleted after consumption — one consumer per message. Commit-log semantics (Kafka): messages are retained and multiple independent consumer groups each replay the full stream.
- Pull model (Kafka, SQS) naturally rate-limits to consumer capacity; mitigate polling latency with long polling.
- Dead-letter queue — wire it up from day one; poison messages will happen and you want them surfaced, not silently blocking your pipeline.
- Ordering is guaranteed only within a Kafka partition — partition by the entity key (user_id, order_id) to ensure related events are ordered.
- Exactly-once requires end-to-end coordination between the broker and the consumer's storage system — it is achievable but adds significant complexity and latency.
- Backpressure naturally exists in the pull model; in the push model it requires explicit flow control to avoid overwhelming slow consumers.
Message queues decouple producers from consumers in time and speed. They absorb traffic bursts, enable async work, and improve system resilience. The fundamental fork: use a traditional queue (RabbitMQ, SQS) for task distribution where each message is consumed once and discarded; use a commit log (Kafka) for event streaming where multiple consumers independently replay the same events. Design consumers to be idempotent for at-least-once delivery, the realistic default. Wire up dead-letter queues from day one.
Core Problems Message Queues Solve
Before reaching for a message queue, it helps to be explicit about which problem you are solving. Message queues address four distinct system design problems — and which one you are solving should influence which queue technology and which configuration you choose.
- Traffic spike absorption (the shock absorber) — a queue absorbs burst traffic so downstream services are not overwhelmed. The producer writes at its peak rate; the consumer processes at the rate it can sustain. Without a queue, a 10× traffic spike either requires the downstream to also scale 10× (expensive and slow) or it falls over. With a queue, the downstream simply drains the backlog after the spike subsides. The tradeoff: work is delayed, not dropped — which is correct if the work is not time-critical.
- Service decoupling — the producer does not need to know the consumer's address, its implementation language, or even whether it is currently running. This eliminates temporal coupling (the producer cannot succeed only when the consumer is alive) and spatial coupling (the producer does not need to know where the consumer is deployed). Services can be deployed, restarted, and scaled independently.
- Async offloading of non-critical work — any work that does not need to block the user response (sending a confirmation email, resizing a profile photo, generating a report, updating a search index) can be enqueued and handled asynchronously. The user gets a fast response; the background work proceeds at its own pace. This is one of the most common and highest-ROI uses of message queues.
- Resilience and durability — if a downstream service crashes, messages accumulate in the durable queue. When the service recovers, it processes the backlog without any message loss. This makes the system resilient to partial failures by default — as long as the queue itself is durable.
Queue Semantics vs. Commit-Log Semantics
The most important architectural distinction in messaging is between a traditional message queue and a distributed commit log (or event stream). These are fundamentally different data structures with different retention semantics, consumer models, and appropriate use cases.
Traditional message queue (RabbitMQ, SQS, ActiveMQ)
A traditional queue stores messages durably until a consumer acknowledges successful processing, at which point the message is deleted. The queue is a work list: every message is a unit of work that must be done exactly once, distributed among competing consumers. Once consumed, the message is gone from the queue — there is no replay, no multiple consumers reading the same message (unless you use publish/subscribe topics, which are logically separate queues per subscriber).
Key characteristics:
- Messages are consumed and deleted — the queue represents pending work, not history.
- Competing consumers — multiple instances of the same consumer each receive different messages; the queue load-balances across them.
- Message ordering is FIFO within the queue (with caveats — SQS standard queues do not guarantee strict ordering; SQS FIFO queues do at the cost of lower throughput).
- Best for task queues: email sending, image processing, order fulfillment, report generation — any work where each item must be done once and completion is the terminal state.
Distributed commit log (Kafka, Pulsar, Kinesis)
A commit log is an append-only, immutable log of events, partitioned across a cluster. Consumers do not delete messages — they maintain a cursor (offset) that tracks how far they have read. Multiple independent consumer groups can each maintain their own offset, reading the same stream of events independently and at their own pace. Messages are retained for a configurable period (hours to forever) regardless of whether anyone has consumed them.
Key characteristics:
- Messages are retained after consumption — the log is the system of record for events that happened.
- Consumer groups — each group reads the full stream independently; adding a new consumer group does not affect existing ones.
- Strong ordering within a partition — a topic is split into N partitions; all messages with the same partition key go to the same partition and are strictly ordered there.
- Best for event streaming: audit logs, Change Data Capture feeds, real-time analytics, data pipeline ingestion, event sourcing — any scenario where multiple systems need to react to the same events, or where replay and reprocessing are requirements.
# Traditional Queue: competing consumers, delete-on-ack
Producer → [msg1, msg2, msg3] → Queue
Consumer A ──reads msg1──▶ ack ──▶ msg1 deleted
Consumer B ──reads msg2──▶ ack ──▶ msg2 deleted
Consumer A ──reads msg3──▶ ack ──▶ msg3 deleted
# Commit Log (Kafka): consumer groups, independent offsets
Producer → [msg1, msg2, msg3, msg4] → Topic (retained)
Group A: offset=4 (processed all msgs, reads new ones)
Group B: offset=2 (behind; still processing older msgs)
Group C: offset=0 (new service; reprocessing from start)
Delivery Semantics: The Three Guarantees
Every messaging system makes a promise about how many times it will deliver a message to a consumer. Understanding these guarantees — and their implementation costs — is fundamental to designing a reliable messaging layer.
| Semantic | Guarantee | Risk | Implementation Cost | When to Use |
|---|---|---|---|---|
| At-Most-Once | Delivered zero or one time | Message loss on failure | Lowest — ack before processing | Metrics, heartbeats — loss is tolerable |
| At-Least-Once | Delivered one or more times | Duplicate processing | Low — ack after processing; consumer must be idempotent | Most use cases — the practical default |
| Exactly-Once | Delivered exactly one time | None (if correct) | Highest — requires end-to-end transactional coordination | Financial transactions, inventory deduction |
At-most-once in practice
The broker acknowledges the message immediately upon receipt and never retries delivery. If the consumer crashes before processing completes, the message is lost. This is the right choice only when the data is so low-value that loss is genuinely acceptable — telemetry counters, approximate metrics, health check heartbeats. Do not use it for anything a user or business cares about.
At-least-once in practice
The broker retains the message until the consumer sends an explicit acknowledgement. If the consumer crashes before acking, the broker redelivers the message — potentially to a different consumer instance. This guarantees no message loss at the cost of possible duplicate deliveries. The design implication: every consumer must be idempotent. The practical default for almost all production messaging systems.
def handle_order_placed(msg):
msg_id = msg['id']
order_id = msg['order_id']
# idempotency check before any side effect
if db.exists("SELECT 1 FROM processed_messages WHERE id=?", msg_id):
log.info(f"msg={msg_id} already processed, skipping")
queue.ack(msg) # ack to prevent redelivery
return
with db.transaction():
send_confirmation_email(order_id) # side effect
db.execute( # record as processed
"INSERT INTO processed_messages(id, processed_at) VALUES(?,NOW())",
msg_id
)
queue.ack(msg) # ack only after successful processing
Exactly-once in practice: why it's hard
Exactly-once is the most desired and most misunderstood guarantee. Achieving it end-to-end requires coordinating two independent systems: the broker must not redeliver, and the consumer must not re-execute side effects. This requires either:
- Transactional messaging — the broker and the consumer's database participate in a distributed transaction. Kafka Transactions (introduced in 0.11) enable this within the Kafka ecosystem: a producer can write to multiple topics atomically, and consumers can read-process-write within a single transaction that spans Kafka offset commits and database writes. This is real exactly-once within the Kafka ecosystem but requires all participants to be Kafka-aware.
- Idempotent consumer + deduplication — at-least-once delivery combined with a deduplication mechanism at the consumer that makes re-processing identical to never processing. This is semantically exactly-once from the business outcome perspective, even though the broker may deliver the message twice. This is the more common production approach because it works with any broker.
True end-to-end exactly-once across arbitrary systems (e.g., Kafka → PostgreSQL → third-party API) is generally not achievable because the third-party API call cannot participate in the Kafka transaction. The pragmatic answer is: use at-least-once delivery, make every consumer idempotent, and treat "exactly-once semantics" as a consumer-level property implemented via deduplication — not a broker-level guarantee.
Ordering and Partitioning
Ordering is one of the most misunderstood properties in distributed messaging. The truth is nuanced: ordering is not binary — it exists at multiple granularities, and the granularity you get depends on the messaging system and how you configure it.
Global ordering
All messages across all producers arrive at consumers in the exact order they were sent, globally. This requires a single partition and a single consumer. Global ordering eliminates any parallelism and caps throughput at what one machine can handle. Rarely worth the cost except for very low-volume streams where global ordering has genuine business semantics (a strictly ordered audit log).
Partition-level ordering (Kafka's model)
Kafka partitions a topic into N independent, ordered logs. Within each partition, messages are strictly ordered. Messages in different partitions have no ordering relationship. The producer assigns each message to a partition based on a partition key — all messages with the same key go to the same partition and are processed in order. Messages with different keys go to different partitions and may be processed in any order relative to each other.
This is the sweet spot: you get ordering for related messages (all events for the same user, all events for the same order) while achieving parallelism across unrelated message groups.
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='kafka:9092')
# Partition by order_id — all events for the same order are ordered
producer.send(
topic='order-events',
key=str(order_id).encode(), # partition key
value=event_payload.encode(),
)
# order_id=42: placed → payment_received → shipped → delivered
# guaranteed in order within the same partition
# order_id=43: processes in parallel on a different partition
Ordering in traditional queues
SQS Standard queues provide best-effort ordering — messages are generally delivered in order, but out-of-order delivery is possible and you must handle it. SQS FIFO queues provide strict ordering within a message group (analogous to Kafka partitions) but cap throughput at 300 messages/sec per queue. RabbitMQ provides strict FIFO ordering within a single queue, but multiple consumers on the same queue can receive messages out of order if one consumer is slower than another.
Push vs. Pull Consumption Models
Push model
The broker actively delivers messages to consumers as they arrive, without the consumer requesting them. The consumer registers a handler function; the broker calls it when a message is available.
- Advantages: Near-zero latency from message arrival to consumer processing. Simple consumer code — no polling logic needed.
- Disadvantages: The broker must track consumer capacity and implement flow control; if the consumer is slow and the broker pushes too fast, the consumer's queue fills up, causing memory pressure or message drops. RabbitMQ's
basic.qos/prefetch_countsetting addresses this — it limits the number of unacknowledged messages the broker delivers to one consumer at a time. - Used by: RabbitMQ (AMQP push delivery), event-driven FaaS platforms (AWS Lambda triggered by SNS/SQS).
Pull model
Consumers actively poll the broker for new messages at their own pace. The consumer controls the consumption rate — it only requests more messages when it is ready to process them. This is the model used by Kafka and SQS.
- Advantages: Natural backpressure — the consumer never receives more messages than it can handle because it controls the rate. Simpler broker design (no need to track consumer capacity). Consumer can batch-pull multiple messages per request, amortizing network overhead.
- Disadvantages: Polling latency — if the consumer polls and finds no messages, it must wait before polling again. Mitigated by long polling (SQS:
WaitTimeSeconds=20; Kafka: consumer blocks until messages arrive or timeout fires). Wasted CPU/network on empty polls without long polling. - Used by: Kafka (consumer manages its offset), SQS (consumer calls
ReceiveMessage).
Consumer Groups and the Competing Consumer Pattern
Competing consumers (traditional queues)
Multiple instances of the same consumer service all listen on the same queue. The broker distributes messages among them — each message goes to exactly one consumer. This is the standard pattern for horizontal scaling of worker processes: to process twice as many messages per second, run twice as many consumers. Adding consumers to a queue is zero-configuration scaling.
The catch: with multiple consumers on a single queue, strict global ordering is broken. Consumer A might receive message 2 and Consumer B might receive message 3 — and if Consumer A is slower, message 3 is processed before message 2. Design your consumers to handle out-of-order delivery, or use a single consumer if ordering is critical (accepting the throughput cap).
Kafka consumer groups: parallelism with ordering
A Kafka consumer group is a set of consumers that collectively consume a topic. Each partition is assigned to exactly one consumer in the group at a time. Adding consumers to a group increases parallelism up to the partition count — a topic with 16 partitions can be consumed by up to 16 consumers in a single group in parallel. Beyond 16 consumers, extra consumers are idle (they have no partition assigned). This is why Kafka topics must be over-partitioned when initially created: partitions cannot be added without disrupting existing consumers, and they cannot be removed.
# Topic: order-events — 4 partitions
P0: [order:1, order:5, order:9 ...]
P1: [order:2, order:6, order:10 ...]
P2: [order:3, order:7, order:11 ...]
P3: [order:4, order:8, order:12 ...]
# Consumer Group A (email-service): 4 consumers
C0 ─reads─▶ P0 C1 ─reads─▶ P1 C2 ─reads─▶ P2 C3 ─reads─▶ P3
# Consumer Group B (analytics-service): 2 consumers
C0 ─reads─▶ P0, P1 C1 ─reads─▶ P2, P3
# Both groups read all messages independently — adding Group B
# has zero impact on Group A's consumption
Multiple independent consumer groups
The defining advantage of Kafka over traditional queues is that multiple independent consumer groups can each read the full stream of events without interfering with each other. An order event can be consumed by the email service (for confirmation emails), the analytics pipeline (for business metrics), the inventory service (for stock updates), and a fraud detection system — all reading from the same Kafka topic, each at their own pace, each maintaining their own offset. Adding the fraud detection system after the fact requires no changes to any existing consumer or producer; it simply creates a new consumer group and starts reading from the beginning or from a specific timestamp.
This is impossible to replicate cleanly with a traditional queue without either publishing to N separate queues (write amplification at the producer) or using a fan-out mechanism (exchange in RabbitMQ, SNS in AWS) that routes a copy to each downstream queue.
Backpressure and Flow Control
Backpressure is the mechanism by which a slow consumer signals to the system that it cannot process messages fast enough, causing the system to slow down message delivery rather than overwhelm the consumer. Mishandling backpressure causes memory pressure, increased latency, and in the worst case, message loss or consumer crashes.
Backpressure in the pull model (Kafka, SQS)
Pull-based consumers implement backpressure naturally: the consumer only fetches the next batch of messages when it has finished processing the previous batch. If processing slows down, the consumer simply fetches less frequently, and messages accumulate in the queue/partition. The queue's consumer lag (how many messages are unprocessed) is the key monitoring metric — a growing lag indicates a consumer that cannot keep up and needs to be scaled out.
Backpressure in the push model (RabbitMQ)
Push-based consumers require explicit configuration to avoid overwhelm. RabbitMQ's basic.qos with prefetch_count limits the number of messages the broker will push to a consumer before waiting for acknowledgements. Setting prefetch_count=10 means the consumer has at most 10 unacknowledged messages at any time — if it is slow to ack, the broker stops delivering new messages. Without this configuration, a slow consumer will have its memory filled with unprocessed messages until it crashes.
import pika
conn = pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
ch = conn.channel()
# limit in-flight messages to prevent consumer overwhelm
ch.basic_qos(prefetch_count=10)
def on_message(ch, method, props, body):
process_message(body) # potentially slow
ch.basic_ack(method.delivery_tag) # ack after success; releases prefetch slot
ch.basic_consume('my-queue', on_message)
ch.start_consuming()
Dead-Letter Queues (DLQ) and Poison Messages
A poison message is a message that causes a consumer to fail every time it tries to process it — due to a malformed payload, an incompatible schema change, a dependency on a resource that no longer exists, or a bug in the consumer's processing logic. Without a DLQ, a poison message blocks its position in the queue (or partition) indefinitely, causing either infinite redelivery loops or blocking all downstream processing.
A dead-letter queue is a secondary queue to which messages are automatically routed after exceeding a maximum delivery attempt count. The main queue keeps flowing; the failed message is preserved for inspection and replay rather than dropped or blocking production traffic.
// AWS SQS: attach a DLQ to a source queue
{
"RedrivePolicy": {
"deadLetterTargetArn": "arn:aws:sqs:us-east-1:123456:my-dlq",
"maxReceiveCount": 5
}
}
// After 5 failed delivery attempts, message is routed to my-dlq
// Main queue continues processing; DLQ preserves the poison msg
// Ops team alerts on DLQ depth > 0 and investigates
DLQ operational playbook
- Alert on DLQ depth — a DLQ message count above zero is always a signal worth investigating. Set an alert for DLQ depth > 0 and treat it as a pager-level event for business-critical queues.
- Preserve the original message — include the original message body and metadata in the DLQ message, along with the failure reason if the consumer can capture it. This context is essential for debugging.
- Replay after the fix — once the root cause (schema bug, missing dependency, consumer logic error) is fixed, messages can be moved from the DLQ back to the main queue for reprocessing. AWS provides
StartMessageMoveTaskfor this; Kafka requires a separate consumer that reads from the error topic and re-publishes to the main topic. - Do not DLQ indefinitely — set a retention period on the DLQ itself (7–14 days is common) so old unresolved failures do not accumulate forever.
Exponential backoff with jitter before DLQ routing
Before routing a message to the DLQ, retry it with exponential backoff to handle transient failures (a downstream service briefly unavailable, a network timeout). Immediate retry of a failing message is rarely useful; waiting 1s, 2s, 4s, 8s ... gives the transient condition time to resolve. Add random jitter to the backoff to prevent all retrying consumers from hammering the same downstream simultaneously.
import time, random
def process_with_backoff(msg, max_retries=5):
for attempt in range(max_retries):
try:
handle_message(msg)
return # success
except TransientError as e:
if attempt == max_retries - 1:
route_to_dlq(msg, error=e) # exhausted retries → DLQ
return
backoff = (2 ** attempt) + random.uniform(0, 1)
log.info(f"msg={msg.id} attempt={attempt} backoff={backoff:.1f}s")
time.sleep(backoff) # 1s, 2s, 4s, 8s, 16s + jitter
RabbitMQ vs. SQS vs. Kafka: A Full Comparison
| Dimension | RabbitMQ | Amazon SQS | Apache Kafka |
|---|---|---|---|
| Architecture | Broker-centric AMQP; exchanges + queues | Managed service; AWS-native | Distributed partitioned log; ZooKeeper/KRaft |
| Message retention | Deleted after ack (default); TTL configurable | Deleted after ack; max 14-day visibility | Retained for configurable period (hours to forever) |
| Delivery model | Push (broker delivers to consumer) | Pull (consumer polls) | Pull (consumer manages offset) |
| Ordering | FIFO per queue with single consumer | Best-effort (Standard); FIFO with 300/s limit (FIFO queues) | Strict ordering within a partition |
| Throughput | ~50K–100K msg/sec per node | ~3,000 msg/sec (FIFO); effectively unlimited (Standard) | ~1M+ msg/sec per cluster |
| Multiple consumers | Competing (or fan-out via exchanges) | Competing (or fan-out via SNS) | Independent consumer groups; full replay per group |
| Replay | No (messages deleted after ack) | No | Yes — reset offset to any point in history |
| Operations | Self-hosted; management UI; moderate ops complexity | Fully managed; zero ops | Complex ops (ZooKeeper/KRaft, broker tuning); managed offerings available (Confluent, MSK) |
| Best for | Task queues, RPC patterns, complex routing via exchanges | Simple task queues; serverless/Lambda integration; AWS-native | Event streaming, CDC feeds, audit logs, data pipelines, event sourcing |
Decision framework: which one to pick
- Use RabbitMQ when: you need complex routing logic (topic exchanges, header-based routing), you need the push model with fine-grained flow control, you want a mature self-hosted solution with a management UI, or your team already operates it.
- Use SQS when: you are on AWS and want zero operational burden, your use case is a simple task queue or job distribution, you want seamless integration with Lambda/SNS/ECS, or you need a dead-letter queue with minimal configuration. SQS Standard handles bursts to millions of messages per second transparently.
- Use Kafka when: you need multiple independent consumer groups reading the same event stream, you require message replay and reprocessing capability, throughput is in the hundreds of thousands of messages per second, you are building a data pipeline or event sourcing system, or you need strong ordering within a partition key.
Reliability Patterns
Transactional outbox pattern
A classic distributed systems problem: you want to update your database and publish a message atomically. If you write to the database and then publish to the queue, the service might crash between the two operations — the database write succeeds but the message is never published. The transactional outbox pattern solves this by writing the message to an outbox table in the same database transaction as the business data update. A separate relay process (or CDC connector) reads from the outbox table and publishes to the queue, deleting the outbox row on success.
-- Single transaction: update business data + write outbox entry
BEGIN;
INSERT INTO orders (id, user_id, status, total_cents)
VALUES (42, 101, 'PENDING', 9999);
INSERT INTO outbox (id, topic, payload, created_at)
VALUES (gen_random_uuid(), 'order-events',
'{"event":"OrderPlaced","order_id":42}', NOW());
COMMIT;
-- Relay process reads outbox and publishes to queue
-- If publish succeeds: DELETE FROM outbox WHERE id=...
-- If publish fails: message stays in outbox; relay retries
Saga pattern with compensating transactions
For multi-step business processes that span multiple services (checkout: deduct inventory → charge payment → send confirmation), the Saga pattern uses message queues as the coordination mechanism. Each step publishes an event that triggers the next step. On failure, a compensating transaction reverses the completed steps. Message queues are essential here because they decouple the steps temporally and provide durability — if the payment service is down, the inventory-reserved event waits in the queue until payment recovers.
Consumer health and lag monitoring
The key operational metrics for any messaging system:
- Consumer lag — how many messages are unprocessed (SQS:
ApproximateNumberOfMessages; Kafka: consumer group lag viakafka-consumer-groups.shor monitoring tools). Growing lag = consumer cannot keep up with producer rate. - DLQ depth — messages in the DLQ indicate processing failures. Alert on any DLQ depth > 0 for critical queues.
- Message age — how old is the oldest unprocessed message? SQS tracks
ApproximateAgeOfOldestMessage. A growing age indicates a stuck or slow consumer. - Throughput — messages published vs consumed per second. A diverging gap indicates a growing backlog.
Use message queues to decouple services in time and protect downstream systems from traffic spikes. The fundamental architectural choice is queue semantics (delete-on-ack, competing consumers) vs. commit-log semantics (retain messages, independent consumer groups with replay). Use at-least-once delivery as the default and design every consumer to be idempotent. Wire up dead-letter queues from day one — poison messages will happen and you want them surfaced immediately, not silently blocking your pipeline. For the technology choice: SQS for zero-ops AWS task queues, RabbitMQ for complex routing and push delivery, Kafka for high-throughput event streams where multiple consumers need to independently replay.
Why is exactly-once so hard? It requires coordinating both the broker (no duplicate delivery) and the consumer (idempotent processing) atomically across two systems. Kafka Transactions make this achievable within the Kafka ecosystem; across arbitrary systems (Kafka → third-party API), the practical answer is at-least-once + idempotent consumer design.
What is a dead-letter queue? A secondary queue where messages are automatically routed after exceeding a maximum delivery attempt count. It keeps the main queue flowing while preserving failed messages for inspection, root-cause analysis, and replay after the fix is deployed.
Traditional MQ vs Kafka? MQ (RabbitMQ, SQS) deletes messages after consumption — one consumer per message; use it for task queues. Kafka retains messages as an immutable log; multiple independent consumer groups can each read all events at their own pace and replay from any offset — use it for event streaming, CDC, and data pipelines.
How do you guarantee ordering in Kafka? By partitioning on the entity key (user_id, order_id); all messages with the same key go to the same partition and are consumed in strict order by exactly one consumer in the group at a time.
How do you atomically write to the database and publish a message? The transactional outbox pattern — write both the business record and the message to the same database transaction in an outbox table; a relay process publishes from the outbox asynchronously, retrying on failure.