Timeouts

This guide covers timeout configuration at the platform level — how to set up timeout constraints in AsyncActor manifests, gateway configuration, and transport settings.

Overview#

Asya enforces timeouts at three levels:

Actor timeout — per-call limit enforced by the sidecar
SLA timeout — pipeline-level deadline enforced by the sidecar before calling the runtime
Gateway backstop timeout — hard limit for tool invocations enforced by the gateway

These timeouts interact in a layered fashion:

actor timeout (per-call) < SLA timeout = gateway backstop (per-pipeline)

The SLA timeout and gateway backstop share the same timeout value from flows.yaml. The actor timeout is a per-call limit; the SLA is a pipeline-wide deadline.

Actor Timeout (Per-Call)#

The actor timeout is the maximum duration a single runtime invocation can take. It is configured in the AsyncActor spec under resiliency.actorTimeout and enforced by the sidecar.

Configuration#

apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
  name: ml-inference
  namespace: prod
spec:
  actor: ml-inference
  image: my-org/ml-inference:latest
  handler: inference.handle

  resiliency:
    actorTimeout: 5m  # 5 minutes per call

Format: Duration string — 30s, 5m, 1h30m

Default: 5m (set by Crossplane composition if not specified)

Behavior on timeout#

When the runtime exceeds actorTimeout:

Sidecar sends the envelope to x-sump with a timeout error
Sidecar crashes the pod (exits with status 1)
Kubernetes restarts the pod to recover clean state

Rationale: Crash-on-timeout prevents zombie processing where the runtime may still be executing after the sidecar gives up.

When to adjust#

Increase for long-running tasks: model loading, heavy inference, large file processing
Decrease for fast operations: lookups, simple transforms, caching

Example — GPU model inference with 2-minute timeout:

resiliency:
  actorTimeout: 2m

SLA Timeout (Pipeline Deadline)#

The SLA timeout is the maximum end-to-end duration for an entire pipeline. It is set by the gateway when a tool is invoked and stored in the envelope's status.deadline_at field.

The sidecar checks status.deadline_at before calling the runtime. If the deadline has already passed, the envelope is routed directly to x-sink with phase=failed, reason=Timeout — the runtime is never called.

Configuration#

SLA is configured per flow in the gateway's flows.yaml ConfigMap:

flows:
- name: image-enhance
  entrypoint: download-image
  route_next: [enhance, upload]
  description: Enhance image quality
  timeout: 120  # SLA: 120 seconds (2 minutes)
  mcp:
    inputSchema:
      type: object
      properties:
        image_url:
          type: string
      required: [image_url]

Field: timeout (integer, seconds)

Default: No SLA — envelope can run indefinitely

How it works#

Gateway receives tool invocation
Gateway stamps status.deadline_at = now + timeout_seconds
Envelope travels through actors
Each sidecar checks: if now > deadline_at, route to x-sink (failed)
Otherwise, sidecar calculates effective timeout: min(actorTimeout, deadline_at - now)

Example timeline:

t=0:    Gateway creates envelope (SLA: 120s, deadline_at = t+120)
t=10:   Actor A starts (remaining SLA: 110s, actorTimeout: 5m)
        Effective timeout: min(5m, 110s) = 110s
t=30:   Actor A completes (remaining SLA: 90s)
t=40:   Actor B starts (remaining SLA: 80s, actorTimeout: 5m)
        Effective timeout: min(5m, 80s) = 80s
t=125:  Actor C receives envelope (remaining SLA: -5s)
        Sidecar routes to x-sink immediately (phase=failed, reason=Timeout)

When to adjust#

Increase for multi-actor pipelines with slow steps
Decrease for single-actor tools or fast pipelines

Monitoring: Check x-sink for phase=failed, reason=Timeout to identify SLA violations.

Gateway Backstop Timeout#

The gateway maintains an independent backstop timer per task. This timer fires when a message is stuck in a queue and no sidecar ever picks it up, ensuring the task eventually reaches a terminal state.

Configuration#

The backstop timer uses the same timeout value from flows.yaml that sets status.deadline_at on the envelope. There is no separate environment variable.

# Gateway flows.yaml
flows:
- name: image-enhance
  entrypoint: download-image
  route_next: [enhance, upload]
  timeout: 120  # Backstop timer fires after 120 seconds

Default: No backstop timer (task can wait indefinitely if timeout is omitted)

Behavior on timeout#

When the backstop timer fires:

Gateway marks the task as failed with error "envelope timed out"
SSE subscribers are notified with a failure event
Subsequent sidecar timeout reports are ignored (task is already terminal, first-write-wins)

When to use#

Enable (set timeout in flows.yaml) for any flow exposed as an MCP tool or A2A skill
Omit for internal pipelines or long-running workflows where no client is waiting

Transport-Level Timeouts#

Transport-specific timeouts control message visibility and redelivery. These are not actor processing timeouts — they determine how long a message remains invisible after being consumed.

SQS Visibility Timeout#

When a sidecar consumes a message from SQS, the message becomes invisible to other consumers. If the sidecar doesn't delete the message within the visibility timeout, SQS redelivers it.

Configuration:

The sidecar computes the default visibility timeout as actorTimeout * 2. With the default 5m actor timeout, visibility timeout is 600s (10 minutes). Override with ASYA_SQS_VISIBILITY_TIMEOUT (seconds).

# Override in AsyncActor spec (injected as ASYA_SQS_VISIBILITY_TIMEOUT)
resiliency:
  actorTimeout: 2m  # → default visibility timeout = 240s

Default: actorTimeout * 2 (e.g., 600s for 5m actor timeout)

Why 2x: The safety margin covers sidecar overhead (message parsing, routing, progress reporting) beyond the actor processing time.

Interaction with actor timeout:

Visibility timeout should always be longer than actorTimeout
If visibility timeout is shorter, SQS may redeliver while the actor is still processing
Actors must be idempotent to handle duplicate delivery

RabbitMQ Consumer Timeout#

RabbitMQ uses a consumer timeout to detect stuck consumers. If a consumer doesn't ACK or NACK within the timeout, RabbitMQ closes the channel.

Configuration (RabbitMQ server-side):

# In RabbitMQ config (not AsyncActor spec)
consumer_timeout = 3600000  # 1 hour in milliseconds

Default: Infinite (RabbitMQ 3.8+)

When to adjust: Set a consumer timeout longer than your longest actorTimeout to prevent RabbitMQ from killing long-running actors.

Timeout Hierarchy Summary#

Timeout	Scope	Configured in	Enforced by	On exceed
Actor timeout	Per-call	AsyncActor `resiliency.actorTimeout`	Sidecar	Send to x-sump, crash pod
SLA timeout	Pipeline	Gateway `flows.yaml` (`timeout` field)	Sidecar (pre-check before runtime)	Send to x-sink (phase=failed)
Gateway backstop	Tool invocation	Gateway `flows.yaml` (`timeout` field)	Gateway	Mark task as `failed`
SQS visibility	Message redelivery	`ASYA_SQS_VISIBILITY_TIMEOUT` (default: `actorTimeout * 2`)	SQS	Redeliver message
RabbitMQ consumer	Channel liveness	RabbitMQ server config	RabbitMQ	Close channel

Best Practices#

Set actor timeout longer than typical processing time — leave headroom for variance; if 95th percentile is 30s, set actorTimeout: 1m.
Set SLA timeout to sum of actor timeouts + buffer — if a 3-actor pipeline has actors with 1m, 2m, 1m timeouts, set SLA to 5m (not 4m) to account for routing overhead.
The gateway backstop and SLA use the same timeout — the timeout field in flows.yaml sets both status.deadline_at on the envelope and the gateway backstop timer. They fire at approximately the same time; whichever fires first wins (first-write-wins).
Monitor timeout metrics — track asya_actor_runtime_errors_total{error_type="timeout"} (actor timeouts) and x-sink messages with phase=failed, reason=Timeout (SLA timeouts).
Use SLA for user-facing flows — set timeout in flows.yaml for any flow exposed as an MCP tool or A2A skill to prevent runaway pipelines.
Tune visibility timeout if needed — the default actorTimeout * 2 should work for most cases. Override ASYA_SQS_VISIBILITY_TIMEOUT if you see duplicate processing from SQS redelivery.

Example: Multi-Actor Pipeline#

# AsyncActor: download-image
resiliency:
  actorTimeout: 1m  # Download typically takes 20s

---
# AsyncActor: enhance-image
resiliency:
  actorTimeout: 3m  # Model inference takes up to 2m

---
# AsyncActor: upload-image
resiliency:
  actorTimeout: 1m  # Upload typically takes 30s

---
# Gateway flows.yaml
flows:
- name: image-enhance
  entrypoint: download-image
  route_next: [enhance-image, upload-image]
  timeout: 360  # SLA: 6 minutes (buffer: 1m + 3m + 1m + 1m)
  mcp:
    inputSchema: {...}

Result:

Each actor has a per-call timeout appropriate to its workload
Pipeline SLA prevents end-to-end runaway (6 minutes max)
Gateway backstop timer uses the same 360s timeout to mark the task failed if no sidecar reports
SQS visibility timeout defaults to actorTimeout * 2 per actor, ensuring adequate processing time

Using timeouts: To set timeouts in your actor handlers, see usage/guide-timeouts.md.