AWS EKS#
Production deployment on Amazon EKS.
For a full map of all secrets, service accounts, and namespaces, see Credentials Reference.
Prerequisites#
- AWS CLI configured
- kubectl 1.24+
- Helm 3.0+
- eksctl (optional, for cluster creation)
- EKS cluster 1.24+
Required Components#
1. VPC and Networking#
Requirements:
- VPC with public and private subnets
- NAT gateway for private subnet internet access
- Security groups allowing pod-to-pod communication
2. IAM Roles and Permissions#
EKS Pod Identity (recommended):
Crossplane AWS provider role (crossplane-provider-aws-role):
{
"Effect": "Allow",
"Action": [
"sqs:CreateQueue",
"sqs:DeleteQueue",
"sqs:GetQueueAttributes",
"sqs:SetQueueAttributes",
"sqs:TagQueue",
"sqs:GetQueueUrl"
],
"Resource": "arn:aws:sqs:*:*:asya-*"
}
Actor role (asya-actor-role) - shared IAM role for all actor sidecars. Provides access to SQS queues and S3 bucket for persisting messages. This role is assigned via IRSA (or EKS Pod Identity) to a shared asya-actors ServiceAccount in each actor namespace — no static AWS credentials are stored in the cluster:
Note: For local development with LocalStack, IRSA is unavailable. Use a static
aws-credsSecret withAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYinstead. See Quickstart for the dev setup.{ "Effect": "Allow", "Action": [ "sqs:ReceiveMessage", "sqs:SendMessage", "sqs:DeleteMessage", "sqs:ChangeMessageVisibility", "sqs:GetQueueAttributes" ], "Resource": "arn:aws:sqs:*:*:asya-*" }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::asya-results-bucket", "arn:aws:s3:::asya-results-bucket/*" ] }
KEDA role (keda-operator-role):
{
"Effect": "Allow",
"Action": [
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:ListQueues"
],
"Resource": "arn:aws:sqs:*:*:asya-*"
}
3. EKS Addons#
# Install Pod Identity Agent
eksctl create addon --cluster my-cluster \
--name eks-pod-identity-agent
# Install VPC CNI
eksctl create addon --cluster my-cluster \
--name vpc-cni --version v1.16.2
4. KEDA Operator#
# Create namespace
kubectl create namespace keda
# Add Helm repo
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
# Install KEDA
helm install keda kedacore/keda \
--namespace keda \
--version 2.15.1
Configure Pod Identity for KEDA:
aws eks create-pod-identity-association \
--cluster-name my-cluster \
--namespace keda \
--service-account keda-operator \
--role-arn arn:aws:iam::ACCOUNT:role/keda-operator-role
5. S3 Bucket for Results#
Create a sample S3 bucket for persisting result messages:
aws s3 mb s3://asya-results-bucket --region us-east-1
Optional Components#
GPU Node Group#
For AI/ML workloads:
eksctl create nodegroup \
--cluster my-cluster \
--name gpu-nodes \
--node-type g4dn.xlarge \
--nodes-min 0 \
--nodes-max 10 \
--node-ami-family AmazonLinux2 \
--node-taints nvidia.com/gpu=true:NoSchedule
Install NVIDIA Device Plugin:
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.0/deployments/static/nvidia-device-plugin.yml
Cluster Autoscaler#
For automatic node provisioning:
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=us-east-1
Metrics Server#
For resource metrics:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
CloudWatch Container Insights#
For centralized logging:
eksctl create iamserviceaccount \
--cluster my-cluster \
--namespace amazon-cloudwatch \
--name cloudwatch-agent \
--attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
--approve
# Install CloudWatch agent
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml
Asya🎭 Deployment#
1. Install Crossplane#
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm install crossplane crossplane-stable/crossplane \
--namespace crossplane-system --create-namespace
Custom namespace: If you install Crossplane in a namespace other than
crossplane-system(e.g.asya-system), setcrossplaneNamespacein yourcrossplane-values.yaml(see Step 2). The chart creates the required ClusterRoleBinding automatically.Without this, the provider cannot create Deployments in actor namespaces and AsyncActors will remain in
Creatingstate.
2. Configure Crossplane Values#
# crossplane-values.yaml
providers:
aws:
enabled: true # opt-in: disabled by default
awsRegion: us-east-1
awsAccountId: "123456789012" # required for KEDA SQS trigger queue URLs
actorNamespace: default # namespace where AsyncActors will be created
# crossplaneNamespace: asya-system # set if Crossplane is not in crossplane-system
irsa:
enabled: true # opt-in: disabled by default
roleArnPattern: "arn:aws:iam::123456789012:role/asya-actors-{namespace}"
awsProviderConfig:
name: default
credentialsSource: Secret
secretRef:
namespace: crossplane-system
name: aws-creds
key: credentials
3. Install Asya Crossplane Chart (two-step)#
Crossplane providers must reach Healthy before their CRDs exist and ProviderConfigs
can be created. Install with providerConfigs.install=false first.
helm repo add asya https://asya.sh/charts
helm repo update asya
# Step 1: install providers, XRDs, and compositions (skip ProviderConfigs)
helm install asya-crossplane asya/asya-crossplane --version $ASYA_VERSION \
-n crossplane-system \
-f crossplane-values.yaml \
--set providerConfigs.install=false
Wait for providers to register their CRDs:
kubectl wait --for=condition=Healthy providers --all --timeout=300s
Then enable ProviderConfigs:
# Step 2: enable ProviderConfigs (CRDs now exist)
helm upgrade asya-crossplane asya/asya-crossplane --version $ASYA_VERSION \
-n crossplane-system \
--reuse-values \
--set providerConfigs.install=true
4. Install Gateway (Optional)#
# gateway-values.yaml
config:
sqsRegion: us-east-1
s3Bucket: asya-results-bucket
postgresHost: postgres.default.svc.cluster.local
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/asya-gateway-role
routes:
tools:
- name: example
description: Example tool
parameters:
text:
type: string
required: true
route: [example-actor]
helm install asya-gateway asya/asya-gateway --version $ASYA_VERSION \
-n default \
-f gateway-values.yaml
6. Install Crew Actors#
Crew actors are pre-defined system actors for handling common scenarios.
For example, actors x-sink and x-sump are the common flow finalizers and can persist messages to S3-compatible storage.
Suppose, we want to save all messages to the bucket s3://asya-results-bucket. Note that the bucket name should be globally unique.
# crew-values.yaml
x-sink:
enabled: true
env:
ASYA_PERSISTENCE_MOUNT: /state/checkpoints
x-sump:
enabled: true
env:
ASYA_PERSISTENCE_MOUNT: /state/checkpoints
helm install asya-crew asya/asya-crew --version $ASYA_VERSION \
-n default \
-f crew-values.yaml
Note: IRSA annotation can be set per-actor in AsyncActor spec if needed.
7. Deploy Your Actors#
apiVersion: asya.sh/v1alpha1
kind: AsyncActor
metadata:
name: my-actor
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/asya-actor-role
spec:
scaling:
minReplicaCount: 0
maxReplicaCount: 50
image: my-actor:v1
handler: handler.process
kubectl apply -f my-actor.yaml
Verification#
# Check Crossplane
kubectl get pods -n crossplane-system
# Check KEDA
kubectl get pods -n keda
# Check actor
kubectl get asyncactor my-actor
kubectl get pods -l asya.sh/actor=my-actor
# Check queue created
aws sqs list-queues | grep asya-my-actor
kubectl get sqsqueue
Troubleshooting#
ProviderConfig CRD not found during install#
The chart uses a two-step install. If you see no matches for kind "ProviderConfig",
install with --set providerConfigs.install=false first, wait for providers, then
upgrade with providerConfigs.install=true. See Step 3 above.
AsyncActor stuck in Creating#
Check the Crossplane Object resource for RBAC errors:
kubectl get objects.kubernetes.crossplane.io -l crossplane.io/composite=$(
kubectl get asyncactor <name> -n <ns> -o jsonpath='{.spec.resourceRef.name}'
) -o yaml | grep -A5 "message:"
If you see "deployments" is forbidden, set crossplaneNamespace in your values to
match the namespace where Crossplane is installed. See the note under Step 1 above.
RabbitMQ: sidecar stuck in backoff#
Known issue (#384): if the RabbitMQ queue does not exist when the sidecar starts, the AMQP channel breaks on the first 404 and subsequent retries cannot recover. Restart the pod after the queue is created. Fix tracked in #372 and #384.
Cost Optimization#
- Use Spot Instances for GPU nodes
- Enable cluster autoscaler scale-to-zero
- Use KEDA scale-to-zero (
minReplicaCount: 0) - Set appropriate
queueLengthfor scaling efficiency - Monitor SQS costs (first 1M requests free)