Cilium can be installed either via theThis documentation will outline installing Cilium CNI v1.18.0 on Talos in six different ways. Adhering to Talos principles we’ll deploy Cilium with IPAM mode set to Kubernetes, and using thecilium
cli or usinghelm
.
cgroupv2
and bpffs
mount that talos already provides.
As Talos does not allow loading kernel modules by Kubernetes workloads, SYS_MODULE
capability needs to be dropped from the Cilium default set of values, this override can be seen in the helm/cilium cli install commands.
Each method can either install Cilium using kube proxy (default) or without: Kubernetes Without kube-proxy
In this guide we assume that KubePrism is enabled and configured to use the port 7445.
Machine config preparation
When generating the machine config for a node set the CNI to none. For example using a config patch: Create apatch.yaml
file with the following contents:
patch.yaml
file with the following contents:
Installation using Cilium CLI
Note: It is recommended to template the cilium manifest using helm and use it as part of Talos machine config, but if you want to install Cilium using the Cilium CLI, you can follow the steps below.Install the Cilium CLI following the steps here.
With kube-proxy
Without kube-proxy
Note: If you plan to use gRPC and GRPCRoutes with TLS, you must enable ALPN by setting gatewayAPI.enableAlpn=true
.
Since gRPC relies on HTTP/2, ALPN is required to negotiate HTTP/2 support between the client and server.
Installation using Helm
Refer to Installing with Helm for more information. First we’ll need to add the helm repo for Cilium.Method 1: Helm install
After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready. This happens because nodes in Kubernetes are only marked as ready once the CNI is up. As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior. During this window you can install Cilium manually by running the following:Method 2: Helm manifests install
Instead of directly installing Cilium you can instead first generate the manifest and then apply it:Method 3: Helm manifests hosted install
After generatingcilium.yaml
using helm template
, instead of applying this manifest directly during the Talos boot window (before the reboot timeout).
You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap.
To do this patch your machine configuration to include this config instead of the above:
Create a patch.yaml
file with the following contents:
Method 4: Helm manifests inline install
A more secure option would be to include thehelm template
output manifest inside the machine configuration.
The machine config should be generated with CNI set to none
Create a patch.yaml
file with the following contents:
kube-proxy
disabled, you can also include the following:
Create a patch.yaml
file with the following contents:
- Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace. As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
- Only add the Cilium inline manifest to the control plane nodes machine configuration.
- Make sure all control plane nodes have an identical configuration.
- If you delete any of the generated resources they will be restored whenever a control plane node reboots.
- As a safety measure, Talos only creates missing resources from inline manifests, it never deletes or updates anything.
- If you need to update a manifest make sure to first edit all control plane machine configurations and then run
talosctl upgrade-k8s
as it will take care of updating inline manifests.
Method 5: Using a job
We can utilize a job pattern run arbitrary logic during bootstrap time. We can leverage this to our advantage to install Cilium by using an inline manifest as shown in the example below:Known issues
- There are some gotchas when using Talos and Cilium on the Google cloud platform when using internal load balancers. For more details: GCP ILB support / support scope local routes to be configured
-
When using Talos
forwardKubeDNSToHost=true
option (which is enabled by default) in combination with ciliumbpf.masquerade=true
. There is a known issue that causesCoreDNS
to not work correctly. As a workaround, configuringforwardKubeDNSToHost=false
resolves the issue. For more details see the discusssion here
Other things to know
-
After installing Cilium,
cilium connectivity test
might hang and/or fail with errors similar toError creating: pods "client-69748f45d8-9b9jg" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (container "client" must not include "NET_RAW" in securityContext.capabilities.add)
This is expected, you can workaround it by adding thepod-security.kubernetes.io/enforce=privileged
label on the namespace level. - Talos has full kernel module support for eBPF, See: