Saga Pattern: Distributed Transaction Compensation¶
The Saga pattern is Ruvon's solution to the distributed transaction problem: how do you maintain consistency across multiple independent operations when you can't use a traditional database transaction?
The Problem¶
Imagine a booking workflow:
1. Reserve flight seat → SUCCESS
2. Reserve hotel room → SUCCESS
3. Charge credit card → FAILURE (card declined)
Now you have a problem: the flight and hotel are reserved, but payment failed. In a traditional database, you'd rollback the transaction. But these are external services—they don't support rollback.
You need to manually undo the flight and hotel reservations.
This is what the Saga pattern does.
What is a Saga?¶
A Saga is a sequence of local transactions where each transaction has a compensating transaction that semantically undoes its effects.
Forward Transactions Compensating Transactions
───────────────────── ──────────────────────────
Reserve Flight ←────── Cancel Flight
Reserve Hotel ←────── Release Hotel
Charge Payment ←────── Refund Payment (if charged)
Key Insight: Compensations run in reverse order (LIFO - Last In, First Out). This is because later steps may depend on earlier steps.
Example: You can't cancel the flight until you've released the hotel, because both might share the same booking reference.
How Ruvon Implements Sagas¶
1. Define Compensatable Steps¶
Each step that modifies external state should have a compensation function:
def reserve_flight(state: BookingState, context: StepContext) -> dict:
"""Reserve a flight seat."""
reservation_id = flight_api.reserve(
flight_number=state.flight_number,
passenger=state.passenger_name
)
return {"flight_reservation_id": reservation_id}
def cancel_flight(state: BookingState, context: StepContext) -> dict:
"""Compensation: Cancel the flight reservation."""
if state.flight_reservation_id:
flight_api.cancel(state.flight_reservation_id)
return {"flight_reservation_cancelled": True}
2. Link Steps in YAML¶
steps:
- name: "Reserve_Flight"
type: "STANDARD"
function: "booking.steps.reserve_flight"
compensate_function: "booking.steps.cancel_flight"
- name: "Reserve_Hotel"
type: "STANDARD"
function: "booking.steps.reserve_hotel"
compensate_function: "booking.steps.release_hotel"
- name: "Charge_Payment"
type: "STANDARD"
function: "booking.steps.charge_payment"
compensate_function: "booking.steps.refund_payment"
3. Enable Saga Mode¶
workflow = await builder.create_workflow("BookingWorkflow", initial_data)
# Enable saga mode
workflow.enable_saga_mode()
# Now execute workflow
await workflow.next_step(user_input={}) # Reserve_Flight
await workflow.next_step(user_input={}) # Reserve_Hotel
await workflow.next_step(user_input={}) # Charge_Payment (fails)
# Saga automatically triggered:
# 1. Refund_Payment (no-op, payment never charged)
# 2. Release_Hotel (cancels hotel)
# 3. Cancel_Flight (cancels flight)
Saga Execution Flow¶
Normal Execution (No Failure)¶
Step 1: Reserve_Flight
- Executes reserve_flight()
- Records to saga log: ("Reserve_Flight", "cancel_flight")
- State: ACTIVE
Step 2: Reserve_Hotel
- Executes reserve_hotel()
- Records to saga log: ("Reserve_Hotel", "release_hotel")
- State: ACTIVE
Step 3: Charge_Payment
- Executes charge_payment()
- Records to saga log: ("Charge_Payment", "refund_payment")
- State: COMPLETED
Saga log: [
("Reserve_Flight", "cancel_flight"),
("Reserve_Hotel", "release_hotel"),
("Charge_Payment", "refund_payment")
]
All steps succeeded, compensations not needed.
Failure with Compensation¶
Step 1: Reserve_Flight
- Executes reserve_flight()
- Records to saga log: ("Reserve_Flight", "cancel_flight")
- State: ACTIVE
Step 2: Reserve_Hotel
- Executes reserve_hotel()
- Records to saga log: ("Reserve_Hotel", "release_hotel")
- State: ACTIVE
Step 3: Charge_Payment
- Executes charge_payment()
- FAILS! (card declined)
- State: Executing compensation...
Compensation (reverse order):
3. refund_payment() → no-op (payment never charged)
2. release_hotel() → cancels hotel reservation
1. cancel_flight() → cancels flight reservation
State: FAILED_ROLLED_BACK
Compensation Guarantees¶
Idempotency¶
Compensations must be idempotent—safe to execute multiple times:
def refund_payment(state: BookingState, context: StepContext) -> dict:
"""Compensation: Refund the payment."""
if not state.transaction_id:
# No transaction to refund
return {"refund_status": "not_applicable"}
# Check if already refunded
if payment_api.is_refunded(state.transaction_id):
return {"refund_status": "already_refunded"}
# Issue refund
payment_api.refund(state.transaction_id)
return {"refund_status": "refunded"}
Why: Worker crashes, retries, or network failures may cause compensation to run multiple times.
Best Effort¶
Compensations are best effort, not guaranteed to succeed:
def cancel_flight(state: BookingState, context: StepContext) -> dict:
"""Compensation: Cancel the flight reservation."""
try:
flight_api.cancel(state.flight_reservation_id)
return {"flight_cancelled": True}
except FlightAlreadyDeparted:
# Can't cancel - flight already departed
logger.error(f"Cannot cancel flight {state.flight_reservation_id}: already departed")
return {"flight_cancelled": False, "reason": "already_departed"}
What happens if compensation fails?
- Ruvon logs the failure to audit log
- Continues with remaining compensations
- Final state: FAILED_ROLLED_BACK (even if some compensations failed)
- Ops team investigates and manually fixes
Semantic Rollback¶
Compensations provide semantic rollback, not technical rollback:
Technical rollback (database transaction): - Exact state restoration - Guaranteed consistency - All-or-nothing
Semantic rollback (Saga): - Approximate state restoration - Best-effort consistency - May leave side effects
Example: Cancelling a flight reservation doesn't undo the airline's system log entry. But semantically, the customer doesn't have a reserved seat.
Advanced Saga Patterns¶
Partial Compensation¶
Not all steps need compensation:
steps:
- name: "Validate_Input"
type: "STANDARD"
function: "steps.validate_input"
# No compensation - validation has no side effects
- name: "Reserve_Flight"
type: "STANDARD"
function: "steps.reserve_flight"
compensate_function: "steps.cancel_flight" # Needs compensation
- name: "Send_Confirmation_Email"
type: "STANDARD"
function: "steps.send_email"
# No compensation - emails can't be "unsent"
Best Practice: Only define compensations for steps with reversible external side effects.
Conditional Compensation¶
Compensation logic can check state:
def compensate_inventory(state: OrderState, context: StepContext) -> dict:
"""Compensation: Release allocated inventory."""
# Check if inventory was actually allocated
if not state.inventory_allocated:
return {"inventory_released": False, "reason": "not_allocated"}
# Check if order was already shipped
if state.order_status == "shipped":
# Can't release inventory - order already shipped
logger.warning(f"Cannot release inventory for shipped order {state.order_id}")
return {"inventory_released": False, "reason": "already_shipped"}
# Release inventory
inventory_api.release(state.order_id, state.items)
return {"inventory_released": True}
Nested Sagas (Sub-Workflows)¶
Parent workflow with saga spawns child workflow with saga:
# Parent: Order Processing (saga enabled)
steps:
- name: "Allocate_Inventory"
# Spawns InventoryAllocation sub-workflow (also saga-enabled)
- name: "Charge_Payment"
compensate_function: "refund_payment"
# If payment fails:
# 1. Parent saga triggers
# 2. Compensation for "Allocate_Inventory" step
# 3. This triggers child workflow's saga
# 4. Child releases inventory locks
# 5. Parent continues with remaining compensations
Behavior: Child saga runs first (unwinding), then parent saga continues.
Saga vs Traditional Transactions¶
| Feature | Database Transaction | Saga Pattern |
|---|---|---|
| Scope | Single database | Multiple services |
| Consistency | ACID guaranteed | Eventual consistency |
| Isolation | Locks prevent concurrent access | No isolation (concurrent sagas possible) |
| Duration | Milliseconds (held locks) | Minutes to hours |
| Rollback | Exact state restoration | Semantic undo |
| Failure Handling | Automatic rollback | Manual compensation |
| Complexity | Low (handled by DB) | High (manual design) |
When to use Sagas: - Distributed systems with multiple services - Long-running workflows (hours/days) - External APIs (payment gateways, booking systems) - Microservices architecture
When to use DB transactions: - Single database operations - Short-lived operations (< 1 second) - Need strong consistency guarantees
Saga Logging and Debugging¶
Saga Log Structure¶
workflow.saga_log = [
{
"step_name": "Reserve_Flight",
"compensation_function": "cancel_flight",
"executed_at": "2026-02-13T10:15:00Z",
"result": {"flight_reservation_id": "FL-12345"}
},
{
"step_name": "Reserve_Hotel",
"compensation_function": "release_hotel",
"executed_at": "2026-02-13T10:16:00Z",
"result": {"hotel_reservation_id": "HTL-67890"}
}
]
Audit Trail¶
PostgreSQL backend logs compensation execution:
SELECT step_name, event_type, event_data, created_at
FROM workflow_audit_log
WHERE workflow_id = '550e8400-...'
AND event_type IN ('STEP_EXECUTED', 'COMPENSATION_EXECUTED')
ORDER BY created_at ASC;
Example output:
step_name | event_type | created_at
-------------------|-----------------------|-------------------------
Reserve_Flight | STEP_EXECUTED | 2026-02-13 10:15:00
Reserve_Hotel | STEP_EXECUTED | 2026-02-13 10:16:00
Charge_Payment | STEP_FAILED | 2026-02-13 10:17:00
Refund_Payment | COMPENSATION_EXECUTED | 2026-02-13 10:17:01
Release_Hotel | COMPENSATION_EXECUTED | 2026-02-13 10:17:02
Cancel_Flight | COMPENSATION_EXECUTED | 2026-02-13 10:17:03
Compensation Failures¶
If a compensation fails:
SELECT step_name, event_data->'error' AS error_message
FROM workflow_audit_log
WHERE workflow_id = '550e8400-...'
AND event_type = 'COMPENSATION_FAILED';
Manual Intervention: Ops team reviews failed compensations and fixes manually (e.g., call hotel to cancel reservation).
Saga Design Principles¶
1. Pivot Transaction¶
The pivot transaction is the point of no return—after this, compensation is no longer an option:
steps:
- name: "Reserve_Flight"
compensate_function: "cancel_flight"
- name: "Charge_Payment"
compensate_function: "refund_payment"
- name: "Issue_Ticket" # ← Pivot transaction
# No compensation - tickets can't be unissued
- name: "Send_Confirmation"
# After pivot, no compensations
Design Principle: Put the pivot transaction as late as possible, after all compensatable steps.
2. Compensating Actions vs Compensating Transactions¶
Compensating Action: Undo one step
Compensating Transaction: Undo multiple related steps
def cancel_booking(state, context):
# Undo multiple steps atomically
flight_api.cancel(state.flight_reservation_id)
hotel_api.cancel(state.hotel_reservation_id)
car_api.cancel(state.car_reservation_id)
Recommendation: Use compensating actions (one per step) for granular control.
3. Timeout Handling¶
Long-running compensations should have timeouts:
def cancel_flight(state: BookingState, context: StepContext) -> dict:
"""Compensation with timeout."""
try:
# 30-second timeout
flight_api.cancel(state.flight_reservation_id, timeout=30)
return {"flight_cancelled": True}
except TimeoutError:
logger.error(f"Compensation timeout for flight {state.flight_reservation_id}")
# Log for manual retry
return {"flight_cancelled": False, "reason": "timeout"}
Common Pitfalls¶
Pitfall 1: Non-Idempotent Compensations¶
# ❌ Bad: Assumes refund hasn't happened
def refund_payment(state, context):
payment_api.refund(state.transaction_id) # May fail if already refunded
# ✅ Good: Checks first
def refund_payment(state, context):
if payment_api.is_refunded(state.transaction_id):
return {"already_refunded": True}
payment_api.refund(state.transaction_id)
return {"refunded": True}
Pitfall 2: Assuming All Compensations Succeed¶
# ❌ Bad: Assumes compensation always works
def cancel_hotel(state, context):
hotel_api.cancel(state.hotel_reservation_id) # May fail!
# ✅ Good: Handles failure
def cancel_hotel(state, context):
try:
hotel_api.cancel(state.hotel_reservation_id)
return {"hotel_cancelled": True}
except HotelCancellationFailed as e:
logger.error(f"Failed to cancel hotel: {e}")
# Alert ops team for manual intervention
send_alert("hotel_cancellation_failed", state.hotel_reservation_id)
return {"hotel_cancelled": False, "error": str(e)}
Pitfall 3: Forgetting to Enable Saga Mode¶
# ❌ Bad: Define compensations but don't enable saga
workflow = await builder.create_workflow("BookingWorkflow", data)
# If step fails, compensations won't run!
# ✅ Good: Enable saga mode
workflow = await builder.create_workflow("BookingWorkflow", data)
workflow.enable_saga_mode() # Now compensations will run on failure
What's Next¶
Now that you understand the Saga pattern: - Workflow Lifecycle - How FAILED_ROLLED_BACK state works - Sub-Workflows - Nested sagas in parent-child workflows - Parallel Execution - Compensating parallel tasks