Karpenter is a high-performance Kubernetes autoscaler that launches and terminates nodes in response to real-time workload demand.
This guide walks you through integrating Karpenter with Talos Linux clusters running on AWS and managed through Omni.
Prerequisites
Before you begin you must have:
- AWS CLI configured
kubectl, talosctl, and helm installed
Step 1: Create your Talos AMIs.
You will need two AMIs: one for your control plane machines and one for your worker machines. To create each AMI:
- Click Download Installation Media in the Omni dashboard.
- Select the AWS AMI for your architecture.
- Add a Machine User Label that identifies the machine’s role. For example, for a control plane machine:
role:karpenter-controlplane-machine.
- Click Download to generate the AMI file.
- Upload the AMI file to your AWS account. Follow the instructions in the Create Your Own AMIs guide.
Repeat the process to create the worker AMI, but use a worker-specific label (e.g. role:karpenter-worker-machine).
Note: The AMI’s final name in AWS will match the filename you upload to S3. You can use naming conventions to clearly distinguish between your control plane and worker AMIs.
Step 2: Create your machines using the AMIs
Use the control plane AMI to create your control plane machines.
Using an odd number of control plane nodes help etcd maintain quorum reliably. For high availability, we recommend creating three control plane nodes.
Step 3: Create Machine Classes
Machine Classes group machines by labels and act as the pool description for clusters.
You’ll create two Machine Classes:
- one for control plane machines
- one for worker machines
3.1: Control plane Machine Class
To create a machine class for your control plane machines:
- Create a file named
controlplane-machine-class.yaml:
metadata:
namespace: default
type: MachineClasses.omni.sidero.dev
id: karpenter-controlplane
spec:
matchlabels:
# Match the label you applied to your control plane AMI
- role = karpenter-controlplane-machine
This creates a karpenter-controlplane Machine Class that matches machines using the control plane label you added to the AMI ( e.g., role:karpenter-controlplane-machine).
- Apply it:
omnictl apply -f controlplane-machine-class.yaml
- Verify it exists:
omnictl get machineclasses
3.2: Worker Machine Class
Repeat the process for the worker machines:
- Create a file named
worker-machine-class.yaml:
metadata:
namespace: default
type: MachineClasses.omni.sidero.dev
id: karpenter-worker
spec:
matchlabels:
# Match the label you applied to your worker AMI
- role = karpenter-worker-machine
- Apply it:
omnictl apply -f worker-machine-class.yaml
- Verify:
omnictl get machineclasses
Step 4: Create a cluster
Next, create a cluster using cluster templates:
- Create a file named
cluster-template.yaml and paste the following YAML, updating the placeholders as needed:
# cluster-template.yaml
kind: Cluster
name: <cluster-name> # Replace with your cluster name
kubernetes:
version: v1.34.1 # Replace this version with your preferred version of Kubernetes
talos:
version: v1.11.3 # Replace this version with your preferred version of Kubernetes
---
kind: ControlPlane
machineClass:
name: <name of your control plane machine class>
size: <number of control plane machines>
---
kind: Workers
machineClass:
name: <name of your worker machine class>
size: Unlimited
This template creates a cluster and assigns machines to the appropriate Machine Classes.
Replace the following placeholders:
<cluster-name>— the name you want to give your cluster
<name of your control plane machine class>— the Machine Class for your control plane nodes.
<number of control plane machines>— how many control plane nodes your cluster should have.
<name of your worker plane machine class>— the Machine Class for your worker nodes.
- Apply the
cluster-template.yaml to create the cluster:
omnictl cluster template sync -f cluster-template.yaml
- Once your cluster is running, download and export its
kubeconfig so you can interact with it. Replace <cluster-name> with the name of your cluster:
omnictl kubeconfig -c <cluster-name>
- Verify that all machines are running:
Step 5: Define all variables
Set the environment variables you will use throughout this guide:
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export CLUSTER_NAME="<cluster-name>" # Your cluster name
export AWS_REGION="<aws-region>" # Example: eu-west-1
export TALOS_AMI_ID="<worker-ami-id>" # Example: ami-0123456789abcdef0
# Subnets where Karpenter is allowed to launch worker nodes
export SUBNET_IDS="subnet-xxxx"
# Security group that will be attached to all worker nodes launched by Karpenter
export SECURITY_GROUP_IDS="sg-xxxx"
export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="1.8.1"
export CLUSTER_ENDPOINT="https://<your-endpoint>.omni.siderolabs.io" # Your Omni API endpoint
Step 6: Tag subnets and security groups for Karpenter discovery
Karpenter can only launch nodes into subnets and security groups that you explicitly mark for discovery.
You do this by adding a karpenter.sh/discovery tag to each resource.
Run the following commands to tag your subnets and security groups:
aws ec2 create-tags \
--resources $SUBNET_IDS \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
aws ec2 create-tags \
--resources $SECURITY_GROUP_IDS \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
Step 7: Fix IAM permissions on the Karpenter node
Karpenter uses the IAM role of the node where the Karpenter controller pod runs.
If the node running the Karpenter pod has no IAM instance profile (which is common when using Omni), Karpenter will fail with AccessDenied errors.
This step ensures your Karpenter nodes have the correct AWS permissions.
7.1: Identify the Karpenter node
First, retrieve your control plane nodes and their IP addresses:
kubectl get nodes -o wide
Pick the node where Karpenter will run, then set its name and IP address as environment variables:
export KARPENTER_NODE_NAME="<replace-with-node-name>"
export KARPENTER_NODE_IP="<replace-with-node-ip>"
Next, find the EC2 instance ID that corresponds to that IP:
export KARPENTER_INSTANCE_ID=$(aws ec2 describe-instances \
--filters "Name=private-ip-address,Values=$KARPENTER_NODE_IP" \
--query 'Reservations[0].Instances[0].InstanceId' \
--output text)
7.2: Create an instance profile and IAM role
Create a new instance profile and IAM role for the Karpenter node:
aws iam create-instance-profile \
--instance-profile-name talos-karpenter-profile
aws iam create-role \
--role-name talos-karpenter-role \
--assume-role-policy-document file://<(cat <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ec2.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}
EOF
)
Next, attach the role to the instance profile:
aws iam add-role-to-instance-profile \
--instance-profile-name talos-karpenter-profile \
--role-name talos-karpenter-role
Then, associate the instance profile with the worker EC2 instance:
aws ec2 associate-iam-instance-profile \
--instance-id "$KARPENTER_INSTANCE_ID" \
--iam-instance-profile Name="talos-karpenter-profile"
Export the values for later steps:
export INSTANCE_PROFILE_NAME="talos-karpenter-profile"
export INSTANCE_ROLE_NAME="talos-karpenter-role"
7.3: Attach required IAM permissions
Next, attach the required policies that grant Karpenter the permissions to launch, tag, and terminate EC2 instances for this cluster:
curl -fsSL https://docs.siderolabs.com/scripts/karpenter-iam.template \
| envsubst > karpenter-policy.json
export KARPENTER_POLICY_ARN=$(aws iam create-policy \
--policy-name talos-karpenter-controller \
--policy-document file://karpenter-policy.json \
--query 'Policy.Arn' \
--output text)
Then attach these policies to the Karpenter role:
aws iam attach-role-policy \
--role-name "$INSTANCE_ROLE_NAME" \
--policy-arn "$KARPENTER_POLICY_ARN"
Step 8: Create Karpenter interruption queue
Karpenter can respond to AWS node interruption events (such as maintenance, spot interruptions, or scheduled shutdowns).
To enable this, create a simple SQS queue that Karpenter can watch. When AWS publishes an interruption event, Karpenter drains and replaces the node before it terminates.
When Karpenter (or AWS) terminates an EC2 instance, the matching node is removed from Kubernetes, but the Machine is not deleted automatically in Omni.You may still see terminated machines listed in the Omni UI and need to clean them up manually.
aws sqs create-queue \
--queue-name "$CLUSTER_NAME" >/dev/null 2>&1 || true
export KARPENTER_QUEUE_NAME="$CLUSTER_NAME"
Step 9: Install Karpenter via Helm
At this point, Karpenter has everything it needs, the cluster endpoint and the required IAM permissions, to provision new machines. Install it using Helm:
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
--namespace "$KARPENTER_NAMESPACE" \
--create-namespace \
--version "$KARPENTER_VERSION" \
--set settings.clusterName="$CLUSTER_NAME" \
--set settings.clusterEndpoint="$CLUSTER_ENDPOINT" \
--set settings.aws.defaultInstanceProfile="$INSTANCE_PROFILE_NAME" \
--set settings.aws.interruptionQueueName="$KARPENTER_QUEUE_NAME" \
--set tolerations [{"key": "node-role.kubernetes.io/control-plane","operator":"Exists","effect":"NoSchedule"}]'
Add the topology spread label required by Karpenter:
kubectl label node $KARPENTER_NODE_NAME topology.kubernetes.io/zone=zone-1
Confirm that the controller is running:
kubectl -n kube-system get pods -l app.kubernetes.io/name=karpenter
Step 10: Create an EC2NodeClass
The EC2NodeClass tells Karpenter how to launch Talos worker machines: which AMI to use, which subnets and security groups to join, and which IAM instance profile to attach.
cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: talos-workers
spec:
amiFamily: Custom
amiSelectorTerms:
- id: $TALOS_AMI_ID
instanceProfile: $INSTANCE_PROFILE_NAME
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: $CLUSTER_NAME
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: $CLUSTER_NAME
tags:
karpenter.sh/discovery: $CLUSTER_NAME
EOF
kubectl get ec2nodeclass
Step 11: Create a NodePool
The NodePool defines what Karpenter is allowed to provision.
This includes limits, disruption behavior, labels, and instance-type requirements.
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: talos-default
spec:
limits:
cpu: "12" # replace with your limits
memory: "24Gi" # replace with your limits
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 2m
template:
metadata:
labels:
node.kubernetes.io/role: worker
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: talos-workers
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
EOF
kubectl get nodepool
Step 12: Deploy a workload that triggers autoscaling
Now deploy a simple workload that Karpenter can scale against:
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: karpenter-demo
spec:
replicas: 0
selector:
matchLabels:
app: karpenter-demo
template:
metadata:
labels:
app: karpenter-demo
spec:
containers:
- name: pause
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "500m"
memory: "512Mi"
EOF
Check workload exists:
kubectl get deploy karpenter-demo
Step 13: Trigger autoscaling
Increase the number of replicas to create pending pods and trigger Karpenter to provision new nodes:
kubectl scale deploy karpenter-demo --replicas=10
Watch autoscaling:
kubectl get pods -o wide
kubectl get nodeclaims
kubectl get nodes -o wide
You can also watch the new machines appear in the Omni dashboard as Karpenter provisions them and they join the cluster.
Cleanup
Delete the resources created in this guide:
- First, remove the demo deployment and Karpenter CRDs:
kubectl scale deploy karpenter-demo --replicas=0
kubectl delete deploy karpenter-demo
kubectl delete nodepool talos-default
kubectl delete ec2nodeclass talos-workers
- Uninstall Karpenter:
helm uninstall karpenter -n "$KARPENTER_NAMESPACE"
- Delete the interuption queue:
aws sqs delete-queue \
--queue-url "$(aws sqs get-queue-url \
--queue-name "$KARPENTER_QUEUE_NAME" \
--query 'QueueUrl' \
--output text)"
- Detach and delete the IAM policy:
aws iam detach-role-policy \
--role-name "$INSTANCE_ROLE_NAME" \
--policy-arn "$KARPENTER_POLICY_ARN"
aws iam delete-policy \
--policy-arn "$KARPENTER_POLICY_ARN"
- Remove and delete the instance profile and role:
aws iam remove-role-from-instance-profile \
--instance-profile-name "$INSTANCE_PROFILE_NAME" \
--role-name "$INSTANCE_ROLE_NAME"
aws iam delete-role \
--role-name "$INSTANCE_ROLE_NAME"
aws iam delete-instance-profile \
--instance-profile-name "$INSTANCE_PROFILE_NAME"