Skip to content

Saga Pattern: Distributed Transaction Compensation

The Saga pattern is Ruvon's solution to the distributed transaction problem: how do you maintain consistency across multiple independent operations when you can't use a traditional database transaction?

The Problem

Imagine a booking workflow:

1. Reserve flight seat → SUCCESS
2. Reserve hotel room → SUCCESS
3. Charge credit card → FAILURE (card declined)

Now you have a problem: the flight and hotel are reserved, but payment failed. In a traditional database, you'd rollback the transaction. But these are external services—they don't support rollback.

You need to manually undo the flight and hotel reservations.

This is what the Saga pattern does.

What is a Saga?

A Saga is a sequence of local transactions where each transaction has a compensating transaction that semantically undoes its effects.

Forward Transactions         Compensating Transactions
─────────────────────       ──────────────────────────
Reserve Flight      ←──────  Cancel Flight
Reserve Hotel       ←──────  Release Hotel
Charge Payment      ←──────  Refund Payment (if charged)

Key Insight: Compensations run in reverse order (LIFO - Last In, First Out). This is because later steps may depend on earlier steps.

Example: You can't cancel the flight until you've released the hotel, because both might share the same booking reference.

How Ruvon Implements Sagas

1. Define Compensatable Steps

Each step that modifies external state should have a compensation function:

def reserve_flight(state: BookingState, context: StepContext) -> dict:
    """Reserve a flight seat."""
    reservation_id = flight_api.reserve(
        flight_number=state.flight_number,
        passenger=state.passenger_name
    )
    return {"flight_reservation_id": reservation_id}

def cancel_flight(state: BookingState, context: StepContext) -> dict:
    """Compensation: Cancel the flight reservation."""
    if state.flight_reservation_id:
        flight_api.cancel(state.flight_reservation_id)
    return {"flight_reservation_cancelled": True}
steps:
  - name: "Reserve_Flight"
    type: "STANDARD"
    function: "booking.steps.reserve_flight"
    compensate_function: "booking.steps.cancel_flight"

  - name: "Reserve_Hotel"
    type: "STANDARD"
    function: "booking.steps.reserve_hotel"
    compensate_function: "booking.steps.release_hotel"

  - name: "Charge_Payment"
    type: "STANDARD"
    function: "booking.steps.charge_payment"
    compensate_function: "booking.steps.refund_payment"

3. Enable Saga Mode

workflow = await builder.create_workflow("BookingWorkflow", initial_data)

# Enable saga mode
workflow.enable_saga_mode()

# Now execute workflow
await workflow.next_step(user_input={})  # Reserve_Flight
await workflow.next_step(user_input={})  # Reserve_Hotel
await workflow.next_step(user_input={})  # Charge_Payment (fails)

# Saga automatically triggered:
# 1. Refund_Payment (no-op, payment never charged)
# 2. Release_Hotel (cancels hotel)
# 3. Cancel_Flight (cancels flight)

Saga Execution Flow

Normal Execution (No Failure)

Step 1: Reserve_Flight
  - Executes reserve_flight()
  - Records to saga log: ("Reserve_Flight", "cancel_flight")
  - State: ACTIVE

Step 2: Reserve_Hotel
  - Executes reserve_hotel()
  - Records to saga log: ("Reserve_Hotel", "release_hotel")
  - State: ACTIVE

Step 3: Charge_Payment
  - Executes charge_payment()
  - Records to saga log: ("Charge_Payment", "refund_payment")
  - State: COMPLETED

Saga log: [
  ("Reserve_Flight", "cancel_flight"),
  ("Reserve_Hotel", "release_hotel"),
  ("Charge_Payment", "refund_payment")
]

All steps succeeded, compensations not needed.

Failure with Compensation

Step 1: Reserve_Flight
  - Executes reserve_flight()
  - Records to saga log: ("Reserve_Flight", "cancel_flight")
  - State: ACTIVE

Step 2: Reserve_Hotel
  - Executes reserve_hotel()
  - Records to saga log: ("Reserve_Hotel", "release_hotel")
  - State: ACTIVE

Step 3: Charge_Payment
  - Executes charge_payment()
  - FAILS! (card declined)
  - State: Executing compensation...

Compensation (reverse order):
  3. refund_payment() → no-op (payment never charged)
  2. release_hotel() → cancels hotel reservation
  1. cancel_flight() → cancels flight reservation

State: FAILED_ROLLED_BACK

Compensation Guarantees

Idempotency

Compensations must be idempotent—safe to execute multiple times:

def refund_payment(state: BookingState, context: StepContext) -> dict:
    """Compensation: Refund the payment."""
    if not state.transaction_id:
        # No transaction to refund
        return {"refund_status": "not_applicable"}

    # Check if already refunded
    if payment_api.is_refunded(state.transaction_id):
        return {"refund_status": "already_refunded"}

    # Issue refund
    payment_api.refund(state.transaction_id)
    return {"refund_status": "refunded"}

Why: Worker crashes, retries, or network failures may cause compensation to run multiple times.

Best Effort

Compensations are best effort, not guaranteed to succeed:

def cancel_flight(state: BookingState, context: StepContext) -> dict:
    """Compensation: Cancel the flight reservation."""
    try:
        flight_api.cancel(state.flight_reservation_id)
        return {"flight_cancelled": True}
    except FlightAlreadyDeparted:
        # Can't cancel - flight already departed
        logger.error(f"Cannot cancel flight {state.flight_reservation_id}: already departed")
        return {"flight_cancelled": False, "reason": "already_departed"}

What happens if compensation fails? - Ruvon logs the failure to audit log - Continues with remaining compensations - Final state: FAILED_ROLLED_BACK (even if some compensations failed) - Ops team investigates and manually fixes

Semantic Rollback

Compensations provide semantic rollback, not technical rollback:

Technical rollback (database transaction): - Exact state restoration - Guaranteed consistency - All-or-nothing

Semantic rollback (Saga): - Approximate state restoration - Best-effort consistency - May leave side effects

Example: Cancelling a flight reservation doesn't undo the airline's system log entry. But semantically, the customer doesn't have a reserved seat.

Advanced Saga Patterns

Partial Compensation

Not all steps need compensation:

steps:
  - name: "Validate_Input"
    type: "STANDARD"
    function: "steps.validate_input"
    # No compensation - validation has no side effects

  - name: "Reserve_Flight"
    type: "STANDARD"
    function: "steps.reserve_flight"
    compensate_function: "steps.cancel_flight"  # Needs compensation

  - name: "Send_Confirmation_Email"
    type: "STANDARD"
    function: "steps.send_email"
    # No compensation - emails can't be "unsent"

Best Practice: Only define compensations for steps with reversible external side effects.

Conditional Compensation

Compensation logic can check state:

def compensate_inventory(state: OrderState, context: StepContext) -> dict:
    """Compensation: Release allocated inventory."""

    # Check if inventory was actually allocated
    if not state.inventory_allocated:
        return {"inventory_released": False, "reason": "not_allocated"}

    # Check if order was already shipped
    if state.order_status == "shipped":
        # Can't release inventory - order already shipped
        logger.warning(f"Cannot release inventory for shipped order {state.order_id}")
        return {"inventory_released": False, "reason": "already_shipped"}

    # Release inventory
    inventory_api.release(state.order_id, state.items)
    return {"inventory_released": True}

Nested Sagas (Sub-Workflows)

Parent workflow with saga spawns child workflow with saga:

# Parent: Order Processing (saga enabled)
steps:
  - name: "Allocate_Inventory"
    # Spawns InventoryAllocation sub-workflow (also saga-enabled)

  - name: "Charge_Payment"
    compensate_function: "refund_payment"

# If payment fails:
# 1. Parent saga triggers
# 2. Compensation for "Allocate_Inventory" step
# 3. This triggers child workflow's saga
# 4. Child releases inventory locks
# 5. Parent continues with remaining compensations

Behavior: Child saga runs first (unwinding), then parent saga continues.

Saga vs Traditional Transactions

Feature Database Transaction Saga Pattern
Scope Single database Multiple services
Consistency ACID guaranteed Eventual consistency
Isolation Locks prevent concurrent access No isolation (concurrent sagas possible)
Duration Milliseconds (held locks) Minutes to hours
Rollback Exact state restoration Semantic undo
Failure Handling Automatic rollback Manual compensation
Complexity Low (handled by DB) High (manual design)

When to use Sagas: - Distributed systems with multiple services - Long-running workflows (hours/days) - External APIs (payment gateways, booking systems) - Microservices architecture

When to use DB transactions: - Single database operations - Short-lived operations (< 1 second) - Need strong consistency guarantees

Saga Logging and Debugging

Saga Log Structure

workflow.saga_log = [
    {
        "step_name": "Reserve_Flight",
        "compensation_function": "cancel_flight",
        "executed_at": "2026-02-13T10:15:00Z",
        "result": {"flight_reservation_id": "FL-12345"}
    },
    {
        "step_name": "Reserve_Hotel",
        "compensation_function": "release_hotel",
        "executed_at": "2026-02-13T10:16:00Z",
        "result": {"hotel_reservation_id": "HTL-67890"}
    }
]

Audit Trail

PostgreSQL backend logs compensation execution:

SELECT step_name, event_type, event_data, created_at
FROM workflow_audit_log
WHERE workflow_id = '550e8400-...'
  AND event_type IN ('STEP_EXECUTED', 'COMPENSATION_EXECUTED')
ORDER BY created_at ASC;

Example output:

step_name          | event_type            | created_at
-------------------|-----------------------|-------------------------
Reserve_Flight     | STEP_EXECUTED         | 2026-02-13 10:15:00
Reserve_Hotel      | STEP_EXECUTED         | 2026-02-13 10:16:00
Charge_Payment     | STEP_FAILED           | 2026-02-13 10:17:00
Refund_Payment     | COMPENSATION_EXECUTED | 2026-02-13 10:17:01
Release_Hotel      | COMPENSATION_EXECUTED | 2026-02-13 10:17:02
Cancel_Flight      | COMPENSATION_EXECUTED | 2026-02-13 10:17:03

Compensation Failures

If a compensation fails:

SELECT step_name, event_data->'error' AS error_message
FROM workflow_audit_log
WHERE workflow_id = '550e8400-...'
  AND event_type = 'COMPENSATION_FAILED';

Manual Intervention: Ops team reviews failed compensations and fixes manually (e.g., call hotel to cancel reservation).

Saga Design Principles

1. Pivot Transaction

The pivot transaction is the point of no return—after this, compensation is no longer an option:

steps:
  - name: "Reserve_Flight"
    compensate_function: "cancel_flight"

  - name: "Charge_Payment"
    compensate_function: "refund_payment"

  - name: "Issue_Ticket"  # ← Pivot transaction
    # No compensation - tickets can't be unissued

  - name: "Send_Confirmation"
    # After pivot, no compensations

Design Principle: Put the pivot transaction as late as possible, after all compensatable steps.

2. Compensating Actions vs Compensating Transactions

Compensating Action: Undo one step

def cancel_flight(state, context):
    flight_api.cancel(state.flight_reservation_id)

Compensating Transaction: Undo multiple related steps

def cancel_booking(state, context):
    # Undo multiple steps atomically
    flight_api.cancel(state.flight_reservation_id)
    hotel_api.cancel(state.hotel_reservation_id)
    car_api.cancel(state.car_reservation_id)

Recommendation: Use compensating actions (one per step) for granular control.

3. Timeout Handling

Long-running compensations should have timeouts:

def cancel_flight(state: BookingState, context: StepContext) -> dict:
    """Compensation with timeout."""
    try:
        # 30-second timeout
        flight_api.cancel(state.flight_reservation_id, timeout=30)
        return {"flight_cancelled": True}
    except TimeoutError:
        logger.error(f"Compensation timeout for flight {state.flight_reservation_id}")
        # Log for manual retry
        return {"flight_cancelled": False, "reason": "timeout"}

Common Pitfalls

Pitfall 1: Non-Idempotent Compensations

# ❌ Bad: Assumes refund hasn't happened
def refund_payment(state, context):
    payment_api.refund(state.transaction_id)  # May fail if already refunded

# ✅ Good: Checks first
def refund_payment(state, context):
    if payment_api.is_refunded(state.transaction_id):
        return {"already_refunded": True}
    payment_api.refund(state.transaction_id)
    return {"refunded": True}

Pitfall 2: Assuming All Compensations Succeed

# ❌ Bad: Assumes compensation always works
def cancel_hotel(state, context):
    hotel_api.cancel(state.hotel_reservation_id)  # May fail!

# ✅ Good: Handles failure
def cancel_hotel(state, context):
    try:
        hotel_api.cancel(state.hotel_reservation_id)
        return {"hotel_cancelled": True}
    except HotelCancellationFailed as e:
        logger.error(f"Failed to cancel hotel: {e}")
        # Alert ops team for manual intervention
        send_alert("hotel_cancellation_failed", state.hotel_reservation_id)
        return {"hotel_cancelled": False, "error": str(e)}

Pitfall 3: Forgetting to Enable Saga Mode

# ❌ Bad: Define compensations but don't enable saga
workflow = await builder.create_workflow("BookingWorkflow", data)
# If step fails, compensations won't run!

# ✅ Good: Enable saga mode
workflow = await builder.create_workflow("BookingWorkflow", data)
workflow.enable_saga_mode()  # Now compensations will run on failure

What's Next

Now that you understand the Saga pattern: - Workflow Lifecycle - How FAILED_ROLLED_BACK state works - Sub-Workflows - Nested sagas in parent-child workflows - Parallel Execution - Compensating parallel tasks