Skip to content

Troubleshooting guide

This guide covers common issues and solutions when working with Ruvon.

Installation issues

Import error: "No module named 'ruvon'"

Problem: Python can't find the Ruvon module.

Solution: Install in editable mode:

cd /path/to/ruvon-sdk
pip install -e .

Verify:

python -c "from ruvon.builder import WorkflowBuilder; print('✅ Ruvon installed')"

Missing dependencies

Problem: Import errors for aiosqlite, asyncpg, etc.

Solution: Install all dependencies:

pip install aiosqlite orjson asyncpg uvloop

Or install all extras:

pip install 'ruvon-sdk[postgres,performance,cli]'

Module not found: "examples.quickstart.steps"

Problem: Python can't find example modules when running examples.

Solution: Run from project root with PYTHONPATH:

cd /path/to/ruvon-sdk
PYTHONPATH=$PWD:$PYTHONPATH python examples/quickstart/run_quickstart.py

Database issues

PostgreSQL: "database schema missing"

Problem: Tables don't exist in database.

Solution: Apply Alembic migrations:

cd src/ruvon
export DATABASE_URL="postgresql://ruvon:ruvon_secret_2024@localhost:5433/ruvon_cloud"
alembic upgrade head

Verify:

alembic current
# Should show current migration version

SQLite: "no such table"

Problem: SQLite schema not initialized.

Solution: Initialize schema via CLI:

ruvon config set-persistence
# Choose: SQLite
# Database path: workflow.db

ruvon db init

Or initialize programmatically:

from ruvon.implementations.persistence.sqlite import SQLitePersistenceProvider

persistence = SQLitePersistenceProvider(db_path="workflow.db")
await persistence.initialize()

PostgreSQL connection refused

Problem: Can't connect to PostgreSQL.

Solution: Check Docker container is running:

cd docker
docker compose up postgres -d

# Verify
docker compose ps
# Expected: postgres (healthy)

# Check logs
docker compose logs postgres

Test connection:

psql "postgresql://ruvon:ruvon_secret_2024@localhost:5433/ruvon_cloud"

Database lock errors (SQLite)

Problem: "database is locked" error.

Solution: Increase timeout:

persistence = SQLitePersistenceProvider(
    db_path="workflow.db",
    timeout=30.0  # Wait up to 30 seconds
)

Or switch to PostgreSQL for high concurrency.

Workflow execution issues

Workflow stuck in RUNNING state

Problem: Workflow never completes or fails.

Solution 1: Check for missing automate_next:

steps:
  - name: "Step1"
    type: "STANDARD"
    function: "my_app.steps.step1"
    automate_next: true  # Add this

Solution 2: Check for unhandled exceptions in step function:

def my_step(state, context):
    try:
        # Your logic
        return {"result": "success"}
    except Exception as e:
        # Log error
        print(f"Step failed: {e}")
        # Re-raise to fail workflow
        raise

Solution 3: Check zombie workflow scanner for crashed workers:

ruvon scan-zombies --db $DATABASE_URL --fix

Step function not found

Problem: ImportError: No module named 'my_app.steps'

Solution: Verify function path in YAML matches actual module:

# Ensure this path is correct
function: "my_app.steps.process_order"

Verify module exists:

python -c "from my_app.steps import process_order; print('✅ Found')"

State not persisting

Problem: State changes lost between steps.

Solution 1: Modify state object directly:

def my_step(state: MyState, context: StepContext) -> dict:
    # ✅ Correct - modifies state
    state.status = "processed"

    # ✅ Also correct - return dict merges into state
    return {"status": "processed"}

Solution 2: Ensure state is JSON-serializable:

from pydantic import BaseModel
from datetime import datetime

class MyState(BaseModel):
    # ❌ Won't serialize
    created_at: datetime

    # ✅ Will serialize
    created_at: str  # Use ISO format string

Celery issues

Celery worker not starting

Problem: Worker crashes or won't start.

Solution: Check environment variables:

export DATABASE_URL="postgresql://..."
export CELERY_BROKER_URL="redis://localhost:6379/0"
export CELERY_RESULT_BACKEND="redis://localhost:6379/0"

celery -A ruvon.celery_app worker --loglevel=debug

Check Redis connection:

redis-cli -h localhost -p 6379 ping
# Expected: PONG

Tasks not executing

Problem: Tasks queued but not executing.

Solution: Check worker is running:

celery -A ruvon.celery_app inspect active

Check queue status:

celery -A ruvon.celery_app inspect stats

Verify worker sees tasks:

celery -A ruvon.celery_app worker --loglevel=debug
# Watch for task received messages

Workflow stuck in PENDING_SUB_WORKFLOW or PENDING_ASYNC with silent workers

Problem: Workflow never completes a parallel or sub-workflow step; workers appear idle; no errors logged.

Cause: data_region is set on a StartSubWorkflowDirective or step, routing tasks to a named queue (e.g. onsite-london) that no worker is consuming.

Diagnose:

# Check Redis for unexpected queue names
docker exec <redis-container> redis-cli keys "*"
# Look for queue names other than "default", "high_priority", "low_priority"

# Check queue depths
docker exec <redis-container> redis-cli llen onsite-london
# Non-zero = tasks stuck there

Solution: Either remove data_region (routes to default) or start a worker listening to that queue:

celery -A ruvon.celery_app worker -Q onsite-london

Flush the orphaned queue before retrying:

docker exec <redis-container> redis-cli del onsite-london

Parallel task function raises TypeError before queuing

Problem: Celery raises TypeError: missing required argument: 'context' when dispatching a PARALLEL step, before any task executes.

Cause: Parallel task functions passed to Celery must use a plain signature matching func.s(state=..., workflow_id=...). Celery calls check_arguments eagerly when building chord signatures — if context is a required positional parameter, it fails immediately.

Wrong:

async def my_parallel_task(state, context: StepContext, *args, **kwargs):
    ...

Correct:

def my_parallel_task(state: dict, workflow_id: str):
    ...

Verification: Worker logs show "task received" and "task succeeded"; no TypeError before queuing.

Performance issues

Slow workflow execution

Problem: Workflows taking too long to complete.

Solution 1: Enable performance optimizations:

export RUVON_USE_UVLOOP=true
export RUVON_USE_ORJSON=true

Solution 2: Increase PostgreSQL connection pool:

export POSTGRES_POOL_MIN_SIZE=20
export POSTGRES_POOL_MAX_SIZE=100

Solution 3: Use thread pool or Celery for parallel execution:

from ruvon.implementations.execution.thread_pool import ThreadPoolExecutionProvider

execution = ThreadPoolExecutionProvider(max_workers=20)

High memory usage

Problem: Application consuming too much memory.

Solution 1: Reduce connection pool size:

export POSTGRES_POOL_MAX_SIZE=50

Solution 2: Limit workflow state size:

  • Store large data externally (S3, file system)
  • Only store references in state
  • Clean up completed workflows

Solution 3: Restart workers periodically:

# Celery: Restart worker after N tasks
celery -A ruvon.celery_app worker --max-tasks-per-child=1000

Database connection pool exhausted

Problem: "pool exhausted" errors.

Solution: Increase pool size:

persistence = PostgresPersistenceProvider(
    db_url=db_url,
    pool_max_size=200  # Increase from default 50
)

Or reduce concurrent workflows.

Configuration issues

Config file not found

Problem: Workflow YAML not loading.

Solution: Check config_dir path:

builder = WorkflowBuilder(
    config_dir="config/",  # Relative to working directory
    # Or use absolute path
    config_dir="/absolute/path/to/config/",
    ...
)

Verify files exist:

ls config/
# Should show workflow YAML files and workflow_registry.yaml

Environment variables not loaded

Problem: DATABASE_URL not being used.

Solution: Export before running:

export DATABASE_URL="postgresql://..."
python app.py

Or load from .env file:

from dotenv import load_dotenv
load_dotenv()

# Now environment variables available
import os
db_url = os.getenv("DATABASE_URL")

Testing issues

Tests failing with "database locked"

Problem: SQLite concurrent access in tests.

Solution: Use in-memory database:

from ruvon.implementations.persistence.sqlite import SQLitePersistenceProvider

@pytest.fixture
async def persistence():
    provider = SQLitePersistenceProvider(db_path=":memory:")
    await provider.initialize()
    yield provider
    await provider.close()

Celery tests hanging

Problem: Integration tests with Celery never complete.

Solution: Use synchronous executor for tests:

from ruvon.implementations.execution.sync import SyncExecutionProvider

# In tests, use sync executor
execution = SyncExecutionProvider()

Or set test mode for Celery:

export TESTING=true
# Parallel tasks run synchronously in test mode

Deployment issues

Template variables not rendering in FIRE_AND_FORGET steps

Problem: Template like {{ state.recipient }} renders as empty string or raises UndefinedError.

Cause: The Jinja2 template context is the workflow state as a flat dict (state.model_dump()), not wrapped under a state key. There is no state. prefix available inside templates.

Wrong:

message_template: "Hello {{ state.recipient }}, your amount is {{ state.amount }}"

Correct:

message_template: "Hello {{ recipient }}, your amount is {{ amount }}"

Docker image runs wrong version after version bump

Problem: Docker image is tagged 0.6.x but python -c "import ruvon; print(ruvon.__version__)" inside the container prints the previous version.

Cause: Docker reused the cached pip install ruvon-sdk==<old> layer because the Dockerfile was not updated before building.

Solution: (1) Update the version pin in all three Dockerfiles as part of the version bump step. (2) Always build with --no-cache after a version bump:

docker build --no-cache -f docker/Dockerfile.ruvon-server-prod -t ruvondev/ruvon-server:0.1.2 .

Verify:

docker run --rm ruvondev/ruvon-server:0.1.2 python -c "import ruvon; print(ruvon.__version__)"
# Must print the new version

Docker container fails to start

Problem: Container exits immediately.

Solution: Check logs:

docker logs ruvon-server

Common issues:

  1. Missing database migrations:
# Add to docker-entrypoint.sh
cd /app/src/ruvon && alembic upgrade head
  1. Database connection failed:
# Check DATABASE_URL is correct
docker run --rm ruvon:latest env | grep DATABASE_URL
  1. Missing dependencies:
# Install the Ruvon packages you need
RUN pip install 'ruvon-sdk[postgres,performance]'

Kubernetes pod crash loop

Problem: Pods restarting constantly.

Solution: Check pod logs:

kubectl logs -f deployment/ruvon-server

Check resource limits:

resources:
  requests:
    memory: "512Mi"  # Increase if needed
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

Check health probes:

livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 60  # Increase if slow startup
  periodSeconds: 10

Common error messages

"UNIQUE constraint failed"

Problem: Attempting to create duplicate workflow.

Solution: Check for unique constraints:

# Workflows have unique IDs (auto-generated)
# Don't reuse workflow IDs

# For idempotent operations, check if exists first
existing = await persistence.load_workflow(workflow_id)
if not existing:
    workflow = await builder.create_workflow(...)

"Foreign key constraint failed"

Problem: Referencing non-existent parent workflow.

Solution: Ensure parent workflow exists:

# When creating sub-workflow
parent_workflow = await persistence.load_workflow(parent_id)
if parent_workflow:
    # Safe to create sub-workflow
    sub_workflow = await builder.create_workflow(...)

"Workflow not found"

Problem: Loading workflow that doesn't exist.

Solution: Check workflow ID is correct:

try:
    workflow = await persistence.load_workflow(workflow_id)
except Exception:
    print(f"Workflow {workflow_id} not found")

Getting help

Enable debug logging

import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("ruvon")
logger.setLevel(logging.DEBUG)

Check Ruvon version

import ruvon
print(ruvon.__version__)

Diagnostic information

Collect this information when reporting issues:

# Python version
python --version

# Ruvon version
python -c "import ruvon; print(ruvon.__version__)"

# Database version
psql --version  # or sqlite3 --version

# Environment variables
env | grep -E "(DATABASE|CELERY|RUVON)"

# Workflow status
ruvon show <workflow-id> --state --logs

Workflow debugging checklist

When debugging workflow issues:

  • Check workflow status (ruvon show <workflow-id>)
  • Check audit logs (ruvon logs <workflow-id>)
  • Verify step function can be imported
  • Check database connection
  • Verify YAML configuration is valid
  • Check for unhandled exceptions in step functions
  • Verify state is JSON-serializable
  • Check Celery workers are running (if using Celery)
  • Scan for zombie workflows (ruvon scan-zombies)

Next steps

See also