State Proxy

This guide covers state proxy configuration at the infrastructure level — how to enable persistent state for actors, configure storage backends, and manage credentials.

Overview#

The state proxy gives actors persistent state access via standard file operations. Handlers read and write to paths like /state/checkpoints/model.pt, and the runtime transparently forwards those operations to a storage backend (S3, GCS, Redis).

From an infrastructure perspective, enabling state proxy involves:

Adding stateProxy entries to the AsyncActor spec
Configuring storage backend credentials (IAM roles, secrets)
Choosing a connector image (S3, Redis, etc.)
Setting consistency guarantees (LWW vs CAS)

The Crossplane composition renders connector sidecar containers into the actor pod based on the stateProxy configuration.

Enabling State Proxy#

State proxy is configured in the AsyncActor spec under spec.stateProxy. Each entry defines a mount with a unique name, a path in the runtime container, and a connector configuration.

Basic Example#

apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
  name: ml-inference
  namespace: prod
spec:
  actor: ml-inference
  image: my-org/ml-inference:latest
  handler: inference.handle

  stateProxy:
  - name: weights
    mount:
      path: /state/weights
    connector:
      image: ghcr.io/deliveryhero/asya-state-proxy-s3-buffered-lww:v1.0.0
      env:
      - name: STATE_BUCKET
        value: ml-model-weights
      - name: AWS_REGION
        value: us-east-1

What happens:

Crossplane adds a state-sockets emptyDir volume to the pod
Crossplane adds a asya-state-proxy-weights sidecar container with the specified image and env vars
Crossplane sets ASYA_STATE_PROXY_MOUNTS=weights:/state/weights:write=buffered on the runtime container
Runtime patches Python builtins to intercept file operations on /state/weights/*
Handler code can use open("/state/weights/model.pt", "rb") and the runtime forwards it to the connector over Unix socket

Storage Backends#

S3 / MinIO#

Three S3 connector variants are available:

Image suffix	Write mode	Consistency	Use case
`s3-buffered-lww`	buffered	Last-Write-Wins	Single-writer state (checkpoints, configs)
`s3-buffered-cas`	buffered	Check-And-Set (ETag)	Multi-writer state with conflict detection
`s3-passthrough`	passthrough	Last-Write-Wins	Large files (streaming writes)

s3-buffered-lww#

Consistency: Last-Write-Wins — no conflict detection. Writes always overwrite.

Configuration:

stateProxy:
- name: checkpoints
  mount:
    path: /state/checkpoints
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-s3-buffered-lww:v1.0.0
    env:
    - name: STATE_BUCKET
      value: ml-checkpoints
    - name: STATE_PREFIX
      value: inference-v2/  # Optional: key prefix within bucket
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_ENDPOINT_URL  # Optional: for MinIO or LocalStack
      value: http://minio.<namespace>.svc.cluster.local:9000
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 100m
        memory: 128Mi

When to use: State written by a single actor instance (model weights, checkpoints, configs).

s3-buffered-cas#

Consistency: Check-And-Set with ETag-based conflict detection. Write fails if the object was modified since the last read.

Configuration:

stateProxy:
- name: shared-state
  mount:
    path: /state/shared
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-s3-buffered-cas:v1.0.0
    env:
    - name: STATE_BUCKET
      value: shared-state
    - name: AWS_REGION
      value: us-east-1

When to use: State written by multiple actor replicas where conflicts must be detected (e.g., distributed locking, counters).

Error handling: Handler code receives FileExistsError on CAS conflict — application must retry or resolve the conflict.

s3-passthrough#

Consistency: Last-Write-Wins — no conflict detection.

Write mode: Streaming — each write() call sends an HTTP chunk. No buffering in memory.

Configuration:

stateProxy:
- name: large-files
  mount:
    path: /state/large
  writeMode: passthrough  # Required for passthrough connector
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-s3-passthrough:v1.0.0
    env:
    - name: STATE_BUCKET
      value: large-files
    - name: AWS_REGION
      value: us-east-1

When to use: Writing large files (>100 MB) where buffering in memory is not feasible.

Limitations: Does not support seek() or tell() on write file handles.

Redis#

Consistency: Check-And-Set with WATCH/MULTI/EXEC optimistic locking.

Configuration:

stateProxy:
- name: cache
  mount:
    path: /state/cache
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-redis-buffered-cas:v1.0.0
    env:
    - name: REDIS_URL
      value: redis://redis.<namespace>.svc.cluster.local:6379/0
    - name: STATE_PREFIX
      value: actor-cache:  # Optional: key prefix

When to use: Low-latency state access with TTL support (session data, ephemeral state).

TTL: Redis supports per-key TTL via the extended attributes (xattr) API. After writing a key, set its TTL with os.setxattr(path, "user.asya.ttl", b"3600") (value in seconds). Read the remaining TTL with os.getxattr(path, "user.asya.ttl"). See the usage guide for examples.

GCS (Google Cloud Storage)#

Two GCS connector variants are available:

Image suffix	Write mode	Consistency	Use case
`gcs-buffered-lww`	buffered	Last-Write-Wins	Single-writer state (checkpoints, configs)
`gcs-buffered-cas`	buffered	Check-And-Set (generation)	Multi-writer state with conflict detection

gcs-buffered-lww#

Consistency: Last-Write-Wins — no conflict detection. Writes always overwrite.

Configuration:

stateProxy:
- name: context
  mount:
    path: /state/context
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-gcs-buffered-lww:v1.0.0
    env:
    - name: STATE_BUCKET
      value: actor-context
    - name: STATE_PREFIX
      value: prod/
    - name: GCS_PROJECT
      value: my-gcp-project
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 100m
        memory: 128Mi

When to use: State written by a single actor instance (model weights, checkpoints, configs).

gcs-buffered-cas#

Consistency: Check-And-Set with GCS generation-based conflict detection. On read, the object generation number is cached in memory. On write, if_generation_match is used as a precondition. If the object was modified externally since the last read, GCS returns PreconditionFailed (412), which the connector maps to FileExistsError.

Configuration:

stateProxy:
- name: shared-state
  mount:
    path: /state/shared
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-gcs-buffered-cas:v1.0.0
    env:
    - name: STATE_BUCKET
      value: shared-state
    - name: GCS_PROJECT
      value: my-gcp-project

When to use: State written by multiple actor replicas where conflicts must be detected.

Error handling: Handler code receives FileExistsError on CAS conflict. The sidecar handles retries via message requeue with exponential backoff.

Credential strategy: In production on GKE, use Workload Identity (no static keys). The google-cloud-storage SDK auto-discovers credentials via Application Default Credentials (ADC). For local development and testing, set GOOGLE_APPLICATION_CREDENTIALS or STORAGE_EMULATOR_HOST.

NATS KV#

Not yet implemented. Planned connector: nats-kv-buffered-cas.

NATS JetStream KV provides distributed key-value storage with strong consistency via Raft consensus and revision-based CAS support. See the NATS KV connector reference for planned configuration.

IAM and Credentials#

State proxy connectors use the same credential mechanisms as actors: IRSA (IAM Roles for Service Accounts) or Kubernetes Secrets.

IRSA (Recommended for AWS)#

IRSA injects AWS credentials into pods via the ServiceAccount. No secrets to manage.

Setup:

Create an IAM role with S3 permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::ml-checkpoints/*",
        "arn:aws:s3:::ml-checkpoints"
      ]
    }
  ]
}

Configure the trust relationship to allow the ServiceAccount:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:prod:asya-actors"
        }
      }
    }
  ]
}

Enable IRSA in the Crossplane chart:

# deploy/helm-charts/asya-crossplane/values.yaml
irsa:
  enabled: true
  serviceAccountName: asya-actors
  roleArn: arn:aws:iam::ACCOUNT_ID:role/asya-actors-prod

The Crossplane composition automatically injects the ServiceAccount annotation:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: asya-actors
  namespace: prod
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/asya-actors-prod

Result: All connector containers in the actor pod inherit AWS credentials from the ServiceAccount.

Kubernetes Secrets#

For Redis or non-AWS backends, use Kubernetes Secrets.

Example — Redis password:

Create a secret:

kubectl create secret generic redis-creds \
  --from-literal=password=my-redis-password \
  -n prod

Reference the secret in the connector env:

stateProxy:
- name: cache
  mount:
    path: /state/cache
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-redis-buffered-cas:v1.0.0
    env:
    - name: REDIS_URL
      value: redis://:$(REDIS_PASSWORD)@redis.<namespace>.svc.cluster.local:6379/0
    - name: REDIS_PASSWORD
      valueFrom:
        secretKeyRef:
          name: redis-creds
          key: password

Alternative — use secretRefs at the actor level to inject secrets:

spec:
  secretRefs:
  - secretName: redis-creds
    keys:
    - key: password
      envVar: REDIS_PASSWORD

  stateProxy:
  - name: cache
    connector:
      env:
      - name: REDIS_URL
        value: redis://:$(REDIS_PASSWORD)@redis.<namespace>.svc.cluster.local:6379/0

Key Patterns and Namespace Isolation#

State proxy connectors use flat key-value storage. Paths like /state/checkpoints/v2/model.pt are stored as the key v2/model.pt (relative to the mount).

Key prefix#

Use STATE_PREFIX to isolate keys within a shared bucket:

stateProxy:
- name: checkpoints
  connector:
    env:
    - name: STATE_BUCKET
      value: shared-ml-state
    - name: STATE_PREFIX
      value: team-nlp/inference-v2/  # Keys scoped to this prefix

Example:

Handler writes to /state/checkpoints/model.pt
Connector stores to S3 key: team-nlp/inference-v2/model.pt

Namespace isolation#

To isolate state by Kubernetes namespace, use a namespace-specific prefix:

stateProxy:
- name: checkpoints
  connector:
    env:
    - name: STATE_BUCKET
      value: ml-state
    - name: STATE_PREFIX
      value: $(NAMESPACE)/  # Requires NAMESPACE env var

Inject the namespace via downward API:

spec:
  env:
  - name: NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace

Result: Actors in namespace prod write to prod/model.pt, actors in namespace dev write to dev/model.pt.

Actor-level isolation#

To isolate state per actor, include the actor name in the prefix:

stateProxy:
- name: checkpoints
  connector:
    env:
    - name: STATE_PREFIX
      value: $(ACTOR)/

Inject the actor name via label:

spec:
  env:
  - name: ACTOR
    valueFrom:
      fieldRef:
        fieldPath: metadata.labels['asya.sh/actor']

Consistency Guarantees#

Connector	Consistency model	Conflict behavior
`s3-buffered-lww`	Last-Write-Wins	Overwrites silently
`s3-passthrough`	Last-Write-Wins	Overwrites silently
`s3-buffered-cas`	Check-And-Set (ETag)	Raises `FileExistsError`
`gcs-buffered-lww`	Last-Write-Wins	Overwrites silently
`gcs-buffered-cas`	Check-And-Set (generation)	Raises `FileExistsError`
`redis-buffered-cas`	Check-And-Set (WATCH/EXEC)	Raises `FileExistsError`

Last-Write-Wins (LWW)#

Semantics: Writes always succeed. No conflict detection.

Use case: Single-writer state or state where the latest write is always correct (e.g., checkpoints, configs).

Example:

Actor A writes "version 1" to /state/cache/result.json
Actor B writes "version 2" to /state/cache/result.json
Result: "version 2" (last write wins)

Check-And-Set (CAS)#

Semantics: Write fails if the object was modified since the last read. Handler code must catch FileExistsError and retry.

Use case: Multi-writer state where conflicts must be detected (e.g., distributed counters, leader election).

Example:

import json

async def handler(payload):
    # Read-modify-write with CAS
    try:
        with open("/state/shared/counter.json") as f:
            counter = json.load(f)
    except FileNotFoundError:
        counter = {"value": 0}

    counter["value"] += 1

    # Write — connector uses cached revision for conditional write.
    # On conflict, raises FileExistsError; sidecar requeues the message.
    with open("/state/shared/counter.json", "w") as f:
        json.dump(counter, f)

    return payload

CAS retries are handled by the sidecar: on a CAS conflict (FileExistsError), the sidecar requeues the message with exponential backoff, and the handler runs again from scratch with a fresh read() that sees the latest value. Explicit retry loops in handler code are not needed.

CAS granularity:

S3 CAS: ETag is checked per object. Reading model.pt and writing config.json does not cause a conflict.
GCS CAS: Generation number is checked per blob. Same per-object granularity as S3.
Redis CAS: WATCH is set per key. Same granularity as S3/GCS CAS.

Multiple Mounts#

Actors can have multiple stateProxy entries. Each entry becomes a separate connector sidecar and a separate mount path in the runtime container.

Example:

stateProxy:
- name: weights
  mount:
    path: /state/weights
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-s3-buffered-lww:v1.0.0
    env:
    - name: STATE_BUCKET
      value: ml-weights

- name: cache
  mount:
    path: /state/cache
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-redis-buffered-cas:v1.0.0
    env:
    - name: REDIS_URL
      value: redis://redis.<namespace>.svc.cluster.local:6379/0

- name: checkpoints
  mount:
    path: /state/checkpoints
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-s3-buffered-lww:v1.0.0
    env:
    - name: STATE_BUCKET
      value: ml-checkpoints

Result:

Handler writes to /state/weights/model.pt → S3 bucket ml-weights
Handler writes to /state/cache/result.json → Redis key result.json
Handler writes to /state/checkpoints/epoch-10.pt → S3 bucket ml-checkpoints

Pod layout:

Pod
├── asya-runtime                 (runtime container)
│   ├── /var/run/asya/state/    ← shared volume
│   ├── /state/weights/         ← logical mount (no real FS)
│   ├── /state/cache/           ← logical mount
│   └── /state/checkpoints/     ← logical mount
├── asya-state-proxy-weights    (connector sidecar)
│   └── /var/run/asya/state/weights.sock
├── asya-state-proxy-cache      (connector sidecar)
│   └── /var/run/asya/state/cache.sock
└── asya-state-proxy-checkpoints (connector sidecar)
    └── /var/run/asya/state/checkpoints.sock

Resource Limits#

Each connector sidecar can have its own resource requests and limits.

Example:

stateProxy:
- name: checkpoints
  connector:
    image: ghcr.io/deliveryhero/asya-state-proxy-s3-buffered-lww:v1.0.0
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 256Mi

Tuning:

CPU: Connectors are I/O-bound. 50m is usually sufficient.
Memory: Buffered connectors hold data in memory before flushing. Set limits based on expected file sizes.
Small files (<1 MB): 64 Mi
Medium files (1-10 MB): 128 Mi
Large files (10-100 MB): 256 Mi or more
Passthrough connectors: Use minimal memory (no buffering). 64 Mi is sufficient.

Connector Environment Variables#

Variable	Connectors	Required	Description
`CONNECTOR_SOCKET`	all	✅	Unix socket path (set by Crossplane, do not override)
`STATE_BUCKET`	s3-, gcs-	✅	S3 or GCS bucket name
`STATE_PREFIX`	s3-, gcs-, redis	❌	Key prefix within bucket or namespace
`AWS_REGION`	s3-*	❌	AWS region (default: `us-east-1`)
`AWS_ENDPOINT_URL`	s3-*	❌	Custom endpoint for MinIO/LocalStack
`GCS_PROJECT`	gcs-*	❌	GCP project ID (auto-detected from credentials)
`STORAGE_EMULATOR_HOST`	gcs-*	❌	Override for fake-gcs-server in testing
`STATE_PRESIGN_TTL`	s3-, gcs-	❌	Presigned/signed URL expiration in seconds (default: `3600`)
`REDIS_URL`	redis-*	✅	Redis connection URL (e.g., `redis://localhost:6379/0`)

Note: CONNECTOR_SOCKET is set by the Crossplane composition to /var/run/asya/state/{name}.sock and should never be overridden.

Debugging State Proxy#

Check connector logs#

kubectl logs -n prod ml-inference-abc123 -c asya-state-proxy-weights

Verify socket exists#

kubectl exec -n prod ml-inference-abc123 -c asya-runtime -- ls -lh /var/run/asya/state/

Expected output:

srw-rw-rw- 1 root root 0 Jan 1 12:00 weights.sock
srw-rw-rw- 1 root root 0 Jan 1 12:00 cache.sock

Test connector directly#

# From the runtime container
kubectl exec -n prod ml-inference-abc123 -c asya-runtime -- \
  curl --unix-socket /var/run/asya/state/weights.sock http://localhost/healthz

Expected output:

{"status": "ready"}

Check runtime environment#

kubectl exec -n prod ml-inference-abc123 -c asya-runtime -- env | grep ASYA_STATE_PROXY_MOUNTS

Expected output:

ASYA_STATE_PROXY_MOUNTS=weights:/state/weights:write=buffered;cache:/state/cache:write=buffered

Best Practices#

Use IRSA for S3 — avoid managing AWS credentials in secrets; use IAM roles with IRSA.
Set STATE_PREFIX per-namespace or per-actor — isolate state to prevent accidental overwrites across tenants.
Use CAS for multi-writer state — if multiple replicas write to the same key, use s3-buffered-cas or redis-buffered-cas to detect conflicts.
Use passthrough for large files — if writing files >100 MB, use s3-passthrough to avoid memory pressure.
Set resource limits on connectors — prevent runaway memory usage; tune based on expected file sizes.
Monitor connector errors — check connector logs for HTTP errors (403 Forbidden, 404 Not Found, 409 Conflict).

Using state proxy: To read/write state in your actor handlers, see usage/guide-state-proxy.md.