Workflow Versioning: Definition Snapshots¶
When you deploy a new version of a workflow YAML file, what happens to the 10,000 workflows already running in production? Workflow versioning ensures that running workflows continue using their original definition, immune to breaking changes.
The Problem¶
Scenario: Breaking Change Deployment¶
# v1.0.0 (deployed Monday, 10,000 workflows running)
workflow_type: "OrderProcessing"
workflow_version: "1.0.0"
steps:
- name: "Validate_Order"
- name: "Human_Approval" # Manual approval required
- name: "Charge_Payment"
- name: "Ship_Order"
Tuesday, you deploy v2.0.0 that removes the manual approval step:
# v2.0.0 (deployed Tuesday)
workflow_type: "OrderProcessing"
workflow_version: "2.0.0"
steps:
- name: "Validate_Order"
# Human_Approval removed (automated now)
- name: "Charge_Payment"
- name: "Ship_Order"
What happens to the 10,000 workflows in WAITING_HUMAN_INPUT at step Human_Approval?
Without versioning: - Workflow resumes, tries to find "Human_Approval" step - Step not found in new YAML - Workflow crashes - 10,000 orders stuck, requiring manual recovery
With versioning: - Workflow uses its original definition snapshot (v1.0.0) - "Human_Approval" step still exists in the snapshot - Workflow completes successfully - New workflows use v2.0.0, old workflows use v1.0.0
How Workflow Versioning Works¶
1. Definition Snapshot on Creation¶
When a workflow is created, Ruvon captures a complete snapshot:
workflow = await builder.create_workflow(
workflow_type="OrderProcessing",
initial_data={"order_id": "ORD-123"}
)
# Ruvon automatically snapshots:
workflow.definition_snapshot = {
"workflow_type": "OrderProcessing",
"workflow_version": "1.0.0",
"initial_state_model": "myapp.models.OrderState",
"steps": [
{"name": "Validate_Order", "type": "STANDARD", ...},
{"name": "Human_Approval", "type": "STANDARD", ...},
{"name": "Charge_Payment", "type": "STANDARD", ...},
{"name": "Ship_Order", "type": "STANDARD", ...}
],
# ... complete workflow definition
}
2. Snapshot Persistence¶
Snapshot stored in database alongside workflow state:
CREATE TABLE workflow_executions (
id UUID PRIMARY KEY,
workflow_type VARCHAR(100) NOT NULL,
workflow_version VARCHAR(50), -- e.g., "1.0.0"
definition_snapshot JSONB, -- Complete YAML as JSON
state JSONB NOT NULL,
status VARCHAR(50) NOT NULL,
...
);
Storage overhead: - ~5-10 KB per workflow (typical) - PostgreSQL: Compressed JSONB (efficient storage) - SQLite: TEXT (JSON string)
3. Snapshot Usage on Resume¶
When workflow is loaded (after pause, async task, or server restart):
# Load workflow from database
workflow_dict = await persistence.load_workflow(workflow_id)
# Use snapshot instead of current YAML
if workflow_dict.get('definition_snapshot'):
# Use snapshot (version at creation time)
workflow_config = workflow_dict['definition_snapshot']
else:
# Fallback: Load from current YAML (backward compatibility)
workflow_config = builder.load_yaml(workflow_dict['workflow_type'])
# Reconstruct workflow with original definition
workflow = Workflow(
id=workflow_dict['id'],
config=workflow_config, # Uses snapshot!
state=state,
...
)
Key Point: Workflow always uses its snapshot, never the latest YAML.
Version Compatibility¶
Explicit Versioning¶
Add workflow_version to YAML for tracking:
workflow_type: "OrderProcessing"
workflow_version: "2.1.0" # Semantic versioning
initial_state_model: "myapp.models.OrderState"
steps:
- name: "Validate_Order"
...
Access version in code:
workflow = await builder.create_workflow("OrderProcessing", data)
print(f"Workflow version: {workflow.workflow_version}") # "2.1.0"
Query workflows by version:
SELECT id, workflow_version, status, created_at
FROM workflow_executions
WHERE workflow_type = 'OrderProcessing'
AND workflow_version = '1.0.0'
AND status = 'ACTIVE';
Compatibility Checking (Optional)¶
Implement version validation when resuming workflows:
def check_version_compatibility(snapshot_version: str, current_version: str) -> bool:
"""Check if workflow snapshot is compatible with current YAML."""
if not snapshot_version or not current_version:
return True # No version specified
# Parse semantic versions
snap_major, snap_minor, _ = snapshot_version.split('.')
curr_major, curr_minor, _ = current_version.split('.')
# Compatible if same major version
if snap_major != curr_major:
return False
# Allow minor version upgrades (backward compatible)
return int(snap_minor) <= int(curr_minor)
# In WorkflowBuilder.load_workflow:
workflow_dict = await persistence.load_workflow(workflow_id)
snapshot_version = workflow_dict.get('workflow_version')
current_version = builder.get_current_version(workflow_dict['workflow_type'])
if not check_version_compatibility(snapshot_version, current_version):
raise IncompatibleVersionError(
f"Workflow version {snapshot_version} incompatible with "
f"current version {current_version}"
)
Breaking Change Strategies¶
Strategy 1: Snapshot-Only (Recommended)¶
Just deploy the new YAML—running workflows use their snapshot:
# Monday: Deploy v1.0.0
workflow_version: "1.0.0"
steps:
- name: "Human_Approval" # Required
# Tuesday: Deploy v2.0.0 (removes Human_Approval)
workflow_version: "2.0.0"
steps:
# Human_Approval removed
Behavior: - Existing workflows (created Monday): Use v1.0.0 snapshot, have "Human_Approval" - New workflows (created Tuesday): Use v2.0.0 YAML, no "Human_Approval"
Pros: - Simple deployment - No coordination needed - Natural migration (old workflows drain, new workflows start)
Cons: - May have 2 versions running for days/weeks (long-running workflows)
Strategy 2: Side-by-Side Versions¶
Keep both YAML files, register as separate workflow types:
# config/order_processing_v1.yaml
workflow_type: "OrderProcessing_v1"
workflow_version: "1.0.0"
steps:
- name: "Human_Approval"
# config/order_processing_v2.yaml
workflow_type: "OrderProcessing_v2"
workflow_version: "2.0.0"
steps:
# Human_Approval removed
Register both:
# config/workflow_registry.yaml
workflows:
- type: "OrderProcessing_v1"
config_file: "order_processing_v1.yaml"
deprecated: true
- type: "OrderProcessing_v2"
config_file: "order_processing_v2.yaml"
Application code:
# Gradually migrate to v2
if feature_flag('use_order_processing_v2'):
workflow_type = "OrderProcessing_v2"
else:
workflow_type = "OrderProcessing_v1"
workflow = await builder.create_workflow(workflow_type, data)
Pros: - Explicit versioning - Easy to toggle between versions - Can delete v1 YAML after all workflows complete
Cons: - File duplication - Registry clutter
Strategy 3: Gradual Migration¶
Deploy new version but keep old version available:
# config/order_processing.yaml (v2.0.0)
workflow_type: "OrderProcessing"
workflow_version: "2.0.0"
steps:
# New definition
# config/order_processing_v1.yaml (kept for running workflows)
workflow_type: "OrderProcessing"
workflow_version: "1.0.0"
steps:
- name: "Human_Approval"
Monitoring:
-- Check if any v1 workflows still running
SELECT COUNT(*) AS active_v1_workflows
FROM workflow_executions
WHERE workflow_type = 'OrderProcessing'
AND workflow_version = '1.0.0'
AND status IN ('ACTIVE', 'PENDING_ASYNC', 'WAITING_HUMAN_INPUT');
-- When count reaches 0, delete order_processing_v1.yaml
Pros: - Safe migration - Can monitor progress
Cons: - Requires manual cleanup
Snapshot Analysis¶
Inspecting Snapshots¶
Check what definition a workflow is using:
workflow = await persistence.load_workflow(workflow_id)
snapshot = workflow['definition_snapshot']
print(f"Workflow type: {snapshot['workflow_type']}")
print(f"Workflow version: {snapshot.get('workflow_version')}")
print(f"Steps in snapshot: {[s['name'] for s in snapshot['steps']]}")
Comparing Snapshot to Current YAML¶
def compare_definitions(snapshot, current_yaml):
"""Compare snapshot to current YAML."""
snapshot_steps = {s['name'] for s in snapshot['steps']}
current_steps = {s['name'] for s in current_yaml['steps']}
added = current_steps - snapshot_steps
removed = snapshot_steps - current_steps
print(f"Steps added: {added}")
print(f"Steps removed: {removed}")
# Usage
snapshot = workflow['definition_snapshot']
current = builder.load_yaml("OrderProcessing")
compare_definitions(snapshot, current)
# Output:
# Steps added: {'Automated_Approval'}
# Steps removed: {'Human_Approval'}
Storage Analysis¶
-- Check snapshot storage usage
SELECT
AVG(LENGTH(definition_snapshot::text)) AS avg_snapshot_size_bytes,
MAX(LENGTH(definition_snapshot::text)) AS max_snapshot_size_bytes,
COUNT(*) AS total_workflows
FROM workflow_executions
WHERE definition_snapshot IS NOT NULL;
-- Find workflows with large snapshots
SELECT id, workflow_type, LENGTH(definition_snapshot::text) AS snapshot_size
FROM workflow_executions
WHERE LENGTH(definition_snapshot::text) > 50000 -- > 50KB
ORDER BY snapshot_size DESC;
Backward Compatibility¶
Workflows Without Snapshots¶
Existing workflows (before versioning feature) don't have snapshots:
Behavior: - Workflow loads current YAML (legacy behavior) - May break if YAML has incompatible changes - Recommend: Migrate to snapshot
Migration:
# Backfill snapshots for running workflows
async def backfill_snapshots(persistence, builder):
workflows = await persistence.list_workflows(status="ACTIVE")
for workflow in workflows:
if not workflow.get('definition_snapshot'):
# Load current YAML as snapshot
current_yaml = builder.load_yaml(workflow['workflow_type'])
# Save snapshot
await persistence.update_workflow(
workflow['id'],
{'definition_snapshot': current_yaml}
)
print(f"Backfilled snapshot for {workflow['id']}")
Common Pitfalls¶
Pitfall 1: Deleting Old YAML Files Immediately¶
# ❌ Bad: Delete old YAML right after deploying new version
rm config/order_processing_v1.yaml
# Existing workflows may need v1 definition for reference/debugging
Best Practice: Keep old YAMLs for 30-90 days, or until all workflows complete.
Pitfall 2: Assuming All Workflows Use Latest Definition¶
# ❌ Bad: Assume workflow has latest field
def my_step(state: OrderState, context: StepContext):
# This field added in v2.0.0
new_field = state.new_v2_field # AttributeError for v1 workflows!
# ✅ Good: Check version or use defaults
def my_step(state: OrderState, context: StepContext):
new_field = getattr(state, 'new_v2_field', default_value)
Pitfall 3: Not Testing Migrations¶
# ✅ Good: Test v1 → v2 migration
def test_v1_workflow_completes_after_v2_deploy():
# Create workflow with v1 YAML
builder_v1 = WorkflowBuilder(config_dir="config/v1/")
workflow = builder_v1.create_workflow("OrderProcessing", data)
# Pause workflow
workflow.next_step(user_input={}) # Validate_Order
workflow.next_step(user_input={}) # Human_Approval (pauses)
# Deploy v2 YAML (removes Human_Approval)
builder_v2 = WorkflowBuilder(config_dir="config/v2/")
# Resume workflow (should use v1 snapshot)
loaded_workflow = await persistence.load_workflow(workflow.id)
assert "Human_Approval" in [s['name'] for s in loaded_workflow['definition_snapshot']['steps']]
# Complete workflow
workflow.resume(input={"approved": True})
assert workflow.status == "COMPLETED"
Versioning Best Practices¶
1. Use Semantic Versioning¶
# Major: Breaking changes (incompatible state model)
workflow_version: "2.0.0"
# Minor: Backward-compatible features (new optional steps)
workflow_version: "1.1.0"
# Patch: Bug fixes (no definition changes)
workflow_version: "1.0.1"
2. Document Version Changes¶
# config/order_processing.yaml
workflow_type: "OrderProcessing"
workflow_version: "2.0.0"
# CHANGELOG:
# v2.0.0 (2026-02-13):
# - Removed Human_Approval step (automated)
# - Added Fraud_Check step
# - Changed state model: removed manual_approver field
#
# v1.0.0 (2026-01-01):
# - Initial version
3. Monitor Version Distribution¶
-- Track version distribution
SELECT workflow_version, status, COUNT(*) AS count
FROM workflow_executions
WHERE workflow_type = 'OrderProcessing'
GROUP BY workflow_version, status
ORDER BY workflow_version DESC, status;
-- Output:
-- workflow_version | status | count
-- -----------------|----------------------|-------
-- 2.0.0 | ACTIVE | 8,500
-- 2.0.0 | COMPLETED | 12,000
-- 1.0.0 | ACTIVE | 1,200 ← Still running!
-- 1.0.0 | WAITING_HUMAN_INPUT | 300 ← Still running!
-- 1.0.0 | COMPLETED | 50,000
4. Set Deprecation Timeline¶
workflows:
- type: "OrderProcessing_v1"
config_file: "order_processing_v1.yaml"
deprecated: true
deprecation_date: "2026-03-01"
remove_after_date: "2026-04-01" # Delete after 30 days
5. Alert on Long-Running Old Versions¶
# Alert if v1 workflows still running after 30 days
old_workflows = await persistence.query("""
SELECT id, workflow_version, created_at, status
FROM workflow_executions
WHERE workflow_type = 'OrderProcessing'
AND workflow_version = '1.0.0'
AND status != 'COMPLETED'
AND created_at < NOW() - INTERVAL '30 days'
""")
if old_workflows:
alert(f"{len(old_workflows)} old version workflows still running")
What's Next¶
Now that you understand workflow versioning: - Zombie Recovery - Another reliability feature - Workflow Lifecycle - How versioning interacts with lifecycle states - State Management - Version compatibility for state models