Guide error handling#
How to configure resiliency policies, error-matching rules, and error routing
at the infrastructure level. This guide covers the spec.resiliency block in
AsyncActor manifests.
Overview#
Asya handles errors at the sidecar level using a policy-based system:
error occurs at actor X
└─ rules: first matching rule by error type / MRO?
├─ match → apply matched policy
└─ no match → apply policies.default (or fail to x-sink)
apply policy:
└─ attempts < maxAttempts AND wall-clock < maxDuration?
├─ yes → retry (back to X's queue with backoff delay)
└─ no → thenRoute configured?
├─ yes → route to thenRoute actors
└─ no → x-sink (phase=failed)
Resiliency schema#
The full spec.resiliency block in an AsyncActor manifest:
apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
name: my-actor
spec:
resiliency:
timeout:
actor: 120s # per-execution deadline (kills runtime)
policies:
default: # fallback when no rule matches
maxAttempts: 3
backoff: exponential # exponential | constant | linear
initialDelay: 1s
maxInterval: 60s
jitter: true
maxDuration: 600s # total wall-clock budget
# thenRoute omitted → x-sink on exhaustion
retry-fast:
maxAttempts: 5
backoff: exponential
initialDelay: 500ms
no-retry:
maxAttempts: 1
thenRoute: ["alert-devops"]
rules:
- errors: ["ConnectionError", "TimeoutError"]
policy: retry-fast
- errors: ["openai.AuthenticationError"]
policy: no-retry
Policy fields#
| Field | Type | Default | Description |
|---|---|---|---|
maxAttempts |
int | 1 | Total attempts (1 = no retry) |
backoff |
enum | — | exponential, constant, linear |
initialDelay |
duration | — | First retry delay |
maxInterval |
duration | — | Cap on backoff delay |
jitter |
bool | false | Add random jitter to delays |
maxDuration |
duration | — | Total wall-clock budget; exhausted = same as maxAttempts reached |
thenRoute |
[]string | — | Where to route when exhausted; omit for x-sink |
Error type matching#
errors entries support two forms:
- Short name (no dot): matches any error where
type.__name__equals the key."ConnectionError"matchesrequests.ConnectionError,urllib3.ConnectionError, etc. - FQN (contains dot): matches exact
type.__module__ + "." + type.__name__."openai.RateLimitError"only matchesopenai.RateLimitError.
If the exact class doesn't match, ancestors are checked in MRO order.
"Exception" matches all Python exceptions.
Rules evaluation order#
Rules are evaluated top-to-bottom; first match wins. When multiple sources contribute rules, priority is:
- Compiler-generated rules (from
try/exceptin flows) — prepended - Actor inline rules — defined in the manifest
- Flavor-contributed rules — appended in
spec.flavorsorder
policies.default is the fallback when no rule matches.
Recipes#
Retry with exponential backoff#
resiliency:
policies:
default:
maxAttempts: 5
backoff: exponential
initialDelay: 2s
maxInterval: 60s
jitter: true
Non-retryable errors (fail fast)#
Route specific error types directly to x-sink on first failure:
resiliency:
policies:
default:
maxAttempts: 3
backoff: exponential
initialDelay: 1s
hard-fail:
maxAttempts: 1
rules:
- errors: ["ValueError", "TypeError"]
policy: hard-fail
Route exhausted envelopes to a recovery actor#
When a policy exhausts, forward the envelope to a recovery actor instead of x-sink:
resiliency:
policies:
default:
maxAttempts: 3
backoff: constant
initialDelay: 5s
thenRoute: ["recovery-actor"]
The recovery actor receives the envelope with status.phase=failed
and status.error containing the error details.
Cap total retry time#
Stop retrying after 10 minutes regardless of attempt count:
resiliency:
policies:
default:
maxAttempts: 100
backoff: exponential
initialDelay: 1s
maxInterval: 30s
maxDuration: 10m
Whitelist mode (retry only specific errors)#
Fail fast on everything except explicitly listed types:
resiliency:
policies:
default:
maxAttempts: 1 # unmatched errors → fail fast
retry-transient:
maxAttempts: 3
backoff: exponential
initialDelay: 1s
rules:
- errors: ["ConnectionError", "TimeoutError"]
policy: retry-transient
Reusable resiliency via flavors#
Define resiliency rules once as a flavor, apply to many actors:
# EnvironmentConfig (flavor)
apiVersion: apiextensions.crossplane.io/v1beta1
kind: EnvironmentConfig
metadata:
name: openai-resiliency
labels:
asya.sh/flavor: openai-resiliency
data:
resiliency:
policies:
retry-rate-limit:
maxAttempts: 5
backoff: exponential
initialDelay: 10s
maxDuration: 1800s
alert-auth:
thenRoute: ["alert-devops"]
rules:
- errors: ["openai.RateLimitError"]
policy: retry-rate-limit
- errors: ["openai.AuthenticationError"]
policy: alert-auth
Apply to any actor: spec.flavors: ["openai-resiliency"]
How the compiler uses resiliency (try/except in flows)#
When a flow author writes try/except, the compiler automatically injects
resiliency rules into the manifests of actors inside the try body. For
example:
try:
p = validate(p)
except ValueError:
p = handle_error(p)
The compiler stamps this into validate's manifest:
resiliency:
policies:
try_except_line_3_0:
maxAttempts: 1
thenRoute: ["router-except-line-5-except-2"]
rules:
- errors: ["ValueError"]
policy: try_except_line_3_0
These compiler-generated rules prepend before any actor-defined rules,
so flow-level try/except takes precedence for matched error types.
The actor's own policies.default still applies for unmatched errors.
Error flow through the system#
Actor pod Sidecar
┌──────────────┐ ┌─────────────────────────┐
│ Runtime │ error │ handleErrorResponse() │
│ executes │─────────────►│ │
│ handler │ │ 1. matchPolicy() │
│ │ │ rules: first match │
└──────────────┘ │ │
│ 2. applyPolicy() │
│ retry? → re-queue │
│ exhausted? │
│ thenRoute → send │
│ no route → x-sink │
└─────────────────────────┘
The sidecar sets status.error on the envelope with the error's
type, message, traceback, and mro before routing. Downstream
actors can read these via yield "GET", ".status.error".
See also#
- Configuring Timeouts — per-actor and SLA deadlines
- Actor Flavors — reusable resiliency via flavors
- AsyncActor CRD Reference — full schema
Using error handling in flows: To write try/except in the Flow DSL, see
usage/guide-error-handling.md.