Timeouts#
This guide covers timeout configuration at the platform level — how to set up timeout constraints in AsyncActor manifests, gateway configuration, and transport settings.
Overview#
Asya enforces timeouts at three levels:
- Actor timeout — per-call limit enforced by the sidecar
- SLA timeout — pipeline-level deadline enforced by the sidecar before calling the runtime
- Gateway backstop timeout — hard limit for tool invocations enforced by the gateway
These timeouts interact in a layered fashion:
actor timeout (per-call) < SLA timeout = gateway backstop (per-pipeline)
The SLA timeout and gateway backstop share the same timeout value from flows.yaml. The actor timeout is a per-call limit; the SLA is a pipeline-wide deadline.
Actor Timeout (Per-Call)#
The actor timeout is the maximum duration a single runtime invocation can take. It is configured in the AsyncActor spec under resiliency.actorTimeout and enforced by the sidecar.
Configuration#
apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
name: ml-inference
namespace: prod
spec:
actor: ml-inference
image: my-org/ml-inference:latest
handler: inference.handle
resiliency:
actorTimeout: 5m # 5 minutes per call
Format: Duration string — 30s, 5m, 1h30m
Default: 5m (set by Crossplane composition if not specified)
Behavior on timeout#
When the runtime exceeds actorTimeout:
- Sidecar sends the envelope to
x-sumpwith a timeout error - Sidecar crashes the pod (exits with status 1)
- Kubernetes restarts the pod to recover clean state
Rationale: Crash-on-timeout prevents zombie processing where the runtime may still be executing after the sidecar gives up.
When to adjust#
- Increase for long-running tasks: model loading, heavy inference, large file processing
- Decrease for fast operations: lookups, simple transforms, caching
Example — GPU model inference with 2-minute timeout:
resiliency:
actorTimeout: 2m
SLA Timeout (Pipeline Deadline)#
The SLA timeout is the maximum end-to-end duration for an entire pipeline. It is set by the gateway when a tool is invoked and stored in the envelope's status.deadline_at field.
The sidecar checks status.deadline_at before calling the runtime. If the deadline has already passed, the envelope is routed directly to x-sink with phase=failed, reason=Timeout — the runtime is never called.
Configuration#
SLA is configured per flow in the gateway's flows.yaml ConfigMap:
flows:
- name: image-enhance
entrypoint: download-image
route_next: [enhance, upload]
description: Enhance image quality
timeout: 120 # SLA: 120 seconds (2 minutes)
mcp:
inputSchema:
type: object
properties:
image_url:
type: string
required: [image_url]
Field: timeout (integer, seconds)
Default: No SLA — envelope can run indefinitely
How it works#
- Gateway receives tool invocation
- Gateway stamps
status.deadline_at = now + timeout_seconds - Envelope travels through actors
- Each sidecar checks:
if now > deadline_at, route to x-sink (failed) - Otherwise, sidecar calculates effective timeout:
min(actorTimeout, deadline_at - now)
Example timeline:
t=0: Gateway creates envelope (SLA: 120s, deadline_at = t+120)
t=10: Actor A starts (remaining SLA: 110s, actorTimeout: 5m)
Effective timeout: min(5m, 110s) = 110s
t=30: Actor A completes (remaining SLA: 90s)
t=40: Actor B starts (remaining SLA: 80s, actorTimeout: 5m)
Effective timeout: min(5m, 80s) = 80s
t=125: Actor C receives envelope (remaining SLA: -5s)
Sidecar routes to x-sink immediately (phase=failed, reason=Timeout)
When to adjust#
- Increase for multi-actor pipelines with slow steps
- Decrease for single-actor tools or fast pipelines
Monitoring: Check x-sink for phase=failed, reason=Timeout to identify SLA violations.
Gateway Backstop Timeout#
The gateway maintains an independent backstop timer per task. This timer fires when a message is stuck in a queue and no sidecar ever picks it up, ensuring the task eventually reaches a terminal state.
Configuration#
The backstop timer uses the same timeout value from flows.yaml that sets status.deadline_at on the envelope. There is no separate environment variable.
# Gateway flows.yaml
flows:
- name: image-enhance
entrypoint: download-image
route_next: [enhance, upload]
timeout: 120 # Backstop timer fires after 120 seconds
Default: No backstop timer (task can wait indefinitely if timeout is omitted)
Behavior on timeout#
When the backstop timer fires:
- Gateway marks the task as
failedwith error "envelope timed out" - SSE subscribers are notified with a failure event
- Subsequent sidecar timeout reports are ignored (task is already terminal, first-write-wins)
When to use#
- Enable (set
timeoutin flows.yaml) for any flow exposed as an MCP tool or A2A skill - Omit for internal pipelines or long-running workflows where no client is waiting
Transport-Level Timeouts#
Transport-specific timeouts control message visibility and redelivery. These are not actor processing timeouts — they determine how long a message remains invisible after being consumed.
SQS Visibility Timeout#
When a sidecar consumes a message from SQS, the message becomes invisible to other consumers. If the sidecar doesn't delete the message within the visibility timeout, SQS redelivers it.
Configuration:
The sidecar computes the default visibility timeout as actorTimeout * 2. With the default 5m actor timeout, visibility timeout is 600s (10 minutes). Override with ASYA_SQS_VISIBILITY_TIMEOUT (seconds).
# Override in AsyncActor spec (injected as ASYA_SQS_VISIBILITY_TIMEOUT)
resiliency:
actorTimeout: 2m # → default visibility timeout = 240s
Default: actorTimeout * 2 (e.g., 600s for 5m actor timeout)
Why 2x: The safety margin covers sidecar overhead (message parsing, routing, progress reporting) beyond the actor processing time.
Interaction with actor timeout:
- Visibility timeout should always be longer than
actorTimeout - If visibility timeout is shorter, SQS may redeliver while the actor is still processing
- Actors must be idempotent to handle duplicate delivery
RabbitMQ Consumer Timeout#
RabbitMQ uses a consumer timeout to detect stuck consumers. If a consumer doesn't ACK or NACK within the timeout, RabbitMQ closes the channel.
Configuration (RabbitMQ server-side):
# In RabbitMQ config (not AsyncActor spec)
consumer_timeout = 3600000 # 1 hour in milliseconds
Default: Infinite (RabbitMQ 3.8+)
When to adjust: Set a consumer timeout longer than your longest actorTimeout to prevent RabbitMQ from killing long-running actors.
Timeout Hierarchy Summary#
| Timeout | Scope | Configured in | Enforced by | On exceed |
|---|---|---|---|---|
| Actor timeout | Per-call | AsyncActor resiliency.actorTimeout |
Sidecar | Send to x-sump, crash pod |
| SLA timeout | Pipeline | Gateway flows.yaml (timeout field) |
Sidecar (pre-check before runtime) | Send to x-sink (phase=failed) |
| Gateway backstop | Tool invocation | Gateway flows.yaml (timeout field) |
Gateway | Mark task as failed |
| SQS visibility | Message redelivery | ASYA_SQS_VISIBILITY_TIMEOUT (default: actorTimeout * 2) |
SQS | Redeliver message |
| RabbitMQ consumer | Channel liveness | RabbitMQ server config | RabbitMQ | Close channel |
Best Practices#
-
Set actor timeout longer than typical processing time — leave headroom for variance; if 95th percentile is 30s, set
actorTimeout: 1m. -
Set SLA timeout to sum of actor timeouts + buffer — if a 3-actor pipeline has actors with 1m, 2m, 1m timeouts, set SLA to
5m(not 4m) to account for routing overhead. -
The gateway backstop and SLA use the same timeout — the
timeoutfield inflows.yamlsets bothstatus.deadline_aton the envelope and the gateway backstop timer. They fire at approximately the same time; whichever fires first wins (first-write-wins). -
Monitor timeout metrics — track
asya_actor_runtime_errors_total{error_type="timeout"}(actor timeouts) andx-sinkmessages withphase=failed, reason=Timeout(SLA timeouts). -
Use SLA for user-facing flows — set
timeoutinflows.yamlfor any flow exposed as an MCP tool or A2A skill to prevent runaway pipelines. -
Tune visibility timeout if needed — the default
actorTimeout * 2should work for most cases. OverrideASYA_SQS_VISIBILITY_TIMEOUTif you see duplicate processing from SQS redelivery.
Example: Multi-Actor Pipeline#
# AsyncActor: download-image
resiliency:
actorTimeout: 1m # Download typically takes 20s
---
# AsyncActor: enhance-image
resiliency:
actorTimeout: 3m # Model inference takes up to 2m
---
# AsyncActor: upload-image
resiliency:
actorTimeout: 1m # Upload typically takes 30s
---
# Gateway flows.yaml
flows:
- name: image-enhance
entrypoint: download-image
route_next: [enhance-image, upload-image]
timeout: 360 # SLA: 6 minutes (buffer: 1m + 3m + 1m + 1m)
mcp:
inputSchema: {...}
Result:
- Each actor has a per-call timeout appropriate to its workload
- Pipeline SLA prevents end-to-end runaway (6 minutes max)
- Gateway backstop timer uses the same 360s timeout to mark the task failed if no sidecar reports
- SQS visibility timeout defaults to
actorTimeout * 2per actor, ensuring adequate processing time
Using timeouts: To set timeouts in your actor handlers, see usage/guide-timeouts.md.