Sidecar-Runtime

Communication between Asya sidecar (Go) and runtime (Python) uses HTTP/1.1 over a Unix domain socket.

Transport#

Socket path: /var/run/asya/asya-runtime.sock (default; override with ASYA_SOCKET_DIR + ASYA_SOCKET_NAME for testing)
Protocol: HTTP/1.1 — standard net/http client (Go) and http.server.HTTPServer (Python)
One connection per message — no persistent pooling; clean state between requests

Startup Readiness#

The runtime uses late binding: the HTTP server starts after _load_function() completes. This means:

Runtime loads and validates the user handler (may take seconds for model loading)
HTTP server binds the Unix socket and starts listening
Ready-file runtime-ready is written to SOCKET_DIR
Sidecar polls the ready-file (500 ms interval), then verifies the socket connection

Sidecar never sees a listening socket before the handler is fully loaded — no race condition at startup.

Endpoints#

`POST /invoke` — Process a envelope#

Request (sidecar → runtime):

POST /invoke HTTP/1.1
Content-Type: application/json
Content-Length: <n>

{
  "id": "msg-123",
  "route": {
    "prev": [],
    "curr": "step1",
    "next": ["step2"]
  },
  "payload": {"text": "Hello"},
  "headers": {"trace_id": "abc"}
}

Response codes:

HTTP Status	Meaning	Body
`200 OK`	Handler returned one or more frames	`{"frames": [...]}`
`204 No Content`	Handler returned `None` — abort pipeline	empty
`400 Bad Request`	Malformed JSON or validation error	`{"error": "msg_parsing_error", "details": {...}}`
`500 Internal Server Error`	Unhandled handler exception	`{"error": "processing_error", "details": {...}}`

Success response (200):

{
  "frames": [
    {
      "payload": {"text": "Hello", "processed": true},
      "route": {
        "prev": ["step1"],
        "curr": "step2",
        "next": []
      },
      "headers": {"trace_id": "abc"}
    }
  ]
}

Fan-out handlers (generators) produce multiple frames in the same frames array.

Error response (400 / 500):

{
  "error": "processing_error",
  "details": {
    "message": "division by zero",
    "type": "builtins.ZeroDivisionError",
    "mro": ["builtins.ArithmeticError", "builtins.Exception"],
    "traceback": "Traceback (most recent call last):\n  ..."
  }
}

`GET /healthz` — Kubernetes readiness probe#

Returns 200 OK once the HTTP server is listening (i.e., after handler loading completes).

GET /healthz HTTP/1.1

HTTP/1.1 200 OK
Content-Type: application/json

{"status": "ready"}

Any unknown path returns 404 Not Found.

Error Categories#

Runtime-returned error codes (in "error" field of 400/500 responses):

Code	Cause	Sidecar action
`msg_parsing_error`	Malformed JSON or missing required fields	Route to `x-sink` (phase: failed, no retry)
`processing_error`	Unhandled Python exception in handler	Apply resiliency policy (retry or route to `x-sink`)

Sidecar-side errors (not from runtime):

Error	Cause	Action
`context.DeadlineExceeded`	Runtime exceeded `ASYA_RESILIENCY_ACTOR_TIMEOUT`	Send to `x-sump` (phase: failed), crash pod
HTTP parse error	Unexpected non-HTTP response	Route to `x-sink` (phase: failed)

Timeout Strategy#

The sidecar enforces timeouts at two levels:

SLA Pre-Check (Pipeline-Level Deadline)#

Before calling the runtime, the sidecar checks status.deadline_at on the incoming envelope. If the current time is past the deadline, the envelope is routed directly to x-sink with phase=failed, reason=Timeout — the runtime is never called.

The gateway stamps status.deadline_at based on the tool's timeout_seconds configuration. This absolute deadline is never mutated as the envelope travels through actors.

Effective Timeout (Per-Call)#

For envelopes that pass the SLA pre-check, the sidecar computes an effective timeout:

effective_timeout = min(ASYA_RESILIENCY_ACTOR_TIMEOUT, remaining_SLA)

Where remaining_SLA = deadline_at - now (only if deadline_at is set).

Runtime Timeout Behavior#

On runtime timeout (context.DeadlineExceeded): 1. Sidecar sends the envelope to x-sink (phase: failed, reason: Timeout) 2. Sidecar crashes the pod (exits with status code 1) 3. Kubernetes restarts the pod to recover clean state

Rationale: crash-on-timeout prevents zombie processing where the runtime may still be executing after the sidecar gives up.

Debugging with curl#

Inspect the runtime directly without a sidecar:

# Invoke handler
curl --unix-socket /var/run/asya/asya-runtime.sock \
  -X POST http://localhost/invoke \
  -H "Content-Type: application/json" \
  -d '{"id":"dbg-1","route":{"prev":[],"curr":"my-actor","next":[]},"payload":{"x":1}}'
# → 200 {"frames":[{"payload":{"x":1},"route":{"prev":["my-actor"],"curr":"","next":[]}}]}

# Check handler readiness
curl --unix-socket /var/run/asya/asya-runtime.sock http://localhost/healthz
# → 200 {"status":"ready"}

Configuration Reference#

Runtime Variables#

Variable	Default	Description
`ASYA_HANDLER`	(required)	Handler path (`module.function` or `module.Class.method`)
`ASYA_SOCKET_CHMOD`	`0o666`	Socket file permissions (octal string)
`ASYA_ENABLE_VALIDATION`	`true`	Enable message validation
`ASYA_LOG_LEVEL`	`INFO`	Log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)

Sidecar Variables#

Variable	Default	Description
`ASYA_RESILIENCY_ACTOR_TIMEOUT`	`5m`	Per-call actor timeout (from XRD `resiliency.actorTimeout`)
`ASYA_ACTOR_NAME`	(required)	Actor name for queue consumption

Best Practices#

For Handler Authors#

Monitor processing time — return early if approaching the timeout limit; the sidecar will crash the pod on DeadlineExceeded, so a graceful early return is preferable.
Use context managers for resource cleanup (file handles, HTTP clients, DB connections) so teardown happens even when exceptions occur.
Return None to abort — handlers returning None produce a 204 response, which routes the envelope to x-sink without an error. Use this for intentional pipeline exits, not errors.
Avoid global mutable state that leaks across requests; class handlers share the instance, so thread-safety matters for concurrent runtimes.
Let exceptions propagate — the runtime catches all unhandled exceptions and returns processing_error with a full traceback. Wrapping everything in a bare except hides bugs.
Use structured logging — log at DEBUG during normal processing so ASYA_LOG_LEVEL=DEBUG gives full trace without changing code.

For Operators#

Tune ASYA_RESILIENCY_ACTOR_TIMEOUT to balance task duration against responsiveness; short timeouts cause false crashes on slow model inference.
Monitor x-sump queue depth — a growing sump queue signals systematic handler errors or timeout spikes.
Size container memory for peak model/data size, not average; OOM kills look like pod crashes and are hard to distinguish from timeout crashes without metrics.
Use GET /healthz as the Kubernetes readiness probe target — it becomes available only after the handler is fully loaded, so the pod never receives traffic while still initialising.
Test failure modes in staging before production: inject bad payloads, simulate timeouts, and verify envelopes land in x-sink (phase: failed) rather than disappearing silently.

Asya Docs

Sidecar-Runtime#

Transport#

Startup Readiness#

Endpoints#

`POST /invoke` — Process a envelope#

`GET /healthz` — Kubernetes readiness probe#

Error Categories#

Timeout Strategy#

SLA Pre-Check (Pipeline-Level Deadline)#

Effective Timeout (Per-Call)#

Runtime Timeout Behavior#

Debugging with curl#

Configuration Reference#

Runtime Variables#

Sidecar Variables#

Best Practices#

For Handler Authors#

For Operators#

Sidecar-Runtime#

Transport#

Startup Readiness#

Endpoints#

POST /invoke — Process a envelope#

GET /healthz — Kubernetes readiness probe#

Error Categories#

Timeout Strategy#

SLA Pre-Check (Pipeline-Level Deadline)#

Effective Timeout (Per-Call)#

Runtime Timeout Behavior#

Debugging with curl#

Configuration Reference#

Runtime Variables#

Sidecar Variables#

Best Practices#

For Handler Authors#

For Operators#

`POST /invoke` — Process a envelope#

`GET /healthz` — Kubernetes readiness probe#