AsyncActor Spec#
The AsyncActor is the only CRD you need to deploy an actor on Kubernetes. This guide walks through what each section does and how Asya uses it.
The Full Picture#
Here's the full AsyncActor spec. Required fields are uncommented, everything else is commented with defaults — uncomment what you need:
apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
name: image-enhancer
namespace: production
# labels:
# asya.sh/flow: image-pipeline # group actors into a flow
# team: ml-platform # custom labels propagate to all child resources
spec:
# --- Required ---
actor: image-enhancer
image: ghcr.io/myorg/image-enhancer:v2.3
handler: enhancer.SDXLEnhancer.process
# --- Flavors: inherit platform-team defaults ---
# flavors: [gpu-inference, llm-resilient]
# --- Image settings ---
# imagePullPolicy: Always # default: IfNotPresent
# pythonExecutable: /opt/conda/bin/python # default: python3
# --- Autoscaling (KEDA) ---
# scaling:
# minReplicaCount: 0 # default: 0 (scale to zero)
# maxReplicaCount: 50 # default: 10
# queueLength: 1 # default: 5 (messages per replica)
# pollingInterval: 10 # default: 30 (seconds)
# cooldownPeriod: 600 # default: 300 (seconds)
# --- Fixed replicas (instead of autoscaling) ---
# scaling:
# enabled: false
# replicas: 3
# --- Resiliency ---
# resiliency:
# actorTimeout: 300s # max processing time per message
# policies:
# default:
# maxAttempts: 3
# backoff: exponential # constant | linear | exponential
# initialInterval: 2s
# maxInterval: 60s
# rules:
# - match: "RateLimitError" # retry with default policy
# policy: default
# - match: "ContentPolicyViolation"
# action: fail # skip retries, send to x-sump
# --- Environment variables ---
# env:
# - name: MODEL_PATH
# value: /models/sdxl-turbo
# - name: API_KEY
# valueFrom:
# secretKeyRef:
# name: api-credentials
# key: key
# --- Secret injection (all keys as env vars) ---
# secretRefs:
# - name: aws-creds
# namespace: production
# --- Resources and GPU scheduling ---
# resources:
# requests:
# cpu: "2"
# memory: 8Gi
# nvidia.com/gpu: "1"
# limits:
# cpu: "4"
# memory: 16Gi
# nvidia.com/gpu: "1"
#
# nodeSelector:
# cloud.google.com/gke-accelerator: nvidia-tesla-a100
#
# tolerations:
# - key: nvidia.com/gpu
# operator: Exists
# effect: NoSchedule
When you kubectl apply this, Asya creates three resources:
- Message queue (
asya-production-image-enhancer) — the actor's inbox - Deployment with sidecar + runtime containers
- KEDA ScaledObject watching the queue depth
Delete the AsyncActor and all three are cleaned up automatically.
What Asya Fills In#
You write the minimal spec. Asya's Crossplane Composition fills in everything else:
- Sidecar container (Go binary for queue polling and routing)
- Runtime container command (loads
asya_runtime.pywhich calls your handler) - Unix socket volume for sidecar-runtime communication
- Environment variables (
ASYA_HANDLER,ASYA_SOCKET_DIR, etc.) - Readiness probes
- Transport configuration (SQS endpoint, RabbitMQ URL, etc.)
Your image only needs Python and your handler code. No Asya dependencies.
Scaling#
By default, actors scale to zero and back:
spec:
scaling:
minReplicaCount: 0 # zero pods when queue is empty
maxReplicaCount: 20 # burst capacity
queueLength: 5 # target: 5 messages per replica
pollingInterval: 15 # check queue every 15 seconds
cooldownPeriod: 300 # wait 5 min before scaling down
How it works: if the queue has 25 messages and queueLength is 5, KEDA creates
5 replicas. When the queue drains, pods scale back to 0 after the cooldown period.
To disable autoscaling (fixed replicas):
spec:
scaling:
enabled: false
replicas: 3
Resiliency#
Configure retries and timeouts at the infrastructure level — not in your code:
spec:
resiliency:
actorTimeout: 120s
policies:
default:
maxAttempts: 3
backoff: exponential
initialInterval: 1s
maxInterval: 60s
rules:
- match: "RateLimitError"
policy: default
- match: "ContentPolicyViolation"
action: fail
Policies define retry behavior (how many times, what backoff strategy).
Rules match error messages to policies. Unmatched errors use the default policy.
action: fail sends the message directly to x-sump without retrying.
Flavors#
Platform engineers pre-configure common patterns as flavors — reusable templates that bundle scaling, resiliency, and infrastructure settings:
spec:
image: my-model:latest
handler: handler.process
flavors: [gpu-inference, llm-resilient]
The actor inherits all settings from the flavors. Inline spec fields override flavor defaults. This keeps the data scientist's YAML short while the platform team controls infrastructure.
See Actor Flavors guide for how to define and use flavors.
Environment Variables and Secrets#
Inject env vars directly or from Kubernetes secrets:
spec:
env:
- name: MODEL_PATH
value: /models/v2
- name: API_KEY
valueFrom:
secretKeyRef:
name: api-credentials
key: key
secretRefs:
- name: aws-creds
namespace: default
secretRefs mounts all keys from the referenced secret as environment variables.
Resources and Scheduling#
Standard Kubernetes resource management:
spec:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: "2"
memory: 2Gi
nvidia.com/gpu: "1"
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
What Happens Under the Hood#
When you apply an AsyncActor:
kubectl apply -f actor.yaml
|
v
Crossplane sees the AsyncActor CR
|
v
Composition renders:
+-- SQS Queue (or RabbitMQ queue, Pub/Sub subscription)
+-- Deployment
| +-- sidecar container (Go)
| +-- runtime container (your image + asya_runtime.py)
| +-- shared Unix socket volume
| +-- env vars, secrets, volumes
+-- KEDA ScaledObject
+-- TriggerAuthentication (queue credentials)
|
v
KEDA watches queue depth, scales 0 -> N
The sidecar polls the queue, delivers each envelope to your handler via Unix socket, and routes the result to the next actor in the route. Your handler never touches queues.
Further Reading#
- AsyncActor CRD Reference — full field reference
- Separation of Concerns — the two-file model
- Scale Zero to Infinity — KEDA autoscaling deep dive
- Built-in Resiliency — retry policies and error handling
- Actor Flavors — reusable configuration templates