Learn how to set up a development environment for local testing and hacking on Talos itself!
make help
to see available make
commands.
You would need Docker and buildx
installed on the host.
Note: Usually it is better to install up to date Docker from Docker apt repositories, e.g. Ubuntu instructions.
If buildx
plugin is not available with OS docker packages, it can be installed as a plugin from GitHub releases.
Set up a builder with access to the host network:
Note: network=host
allows buildx builder to access host network, so that it can push to a local container registry (see below).
Make sure the following steps work:
make talosctl
make initramfs kernel
Note: it is also possible to force a stable image tag by usingTAG
variable:make installer-base IMAGE_REGISTRY=127.0.0.1:5005 TAG=v1.0.0-alpha.1 PUSH=true
.
--provisioner
selects QEMU vs. default Docker--cidr
to make QEMU cluster use different network than default Docker setup (optional)--registry-mirror
uses the caching proxies set up above to speed up boot time a lot, last one adds your local registry (installer image was pushed to it)--install-image
is the image you built with make installer
above--controlplanes
& --workers
configure cluster size, choose to match your resources; 3 controlplanes give you HA control plane; 1 controlplane is enough, never do 2 controlplanes--with-bootloader=false
disables boot from disk (Talos will always boot from _out/vmlinuz-amd64
and _out/initramfs-amd64.xz
).
This speeds up development cycle a lot - no need to rebuild installer and perform install, rebooting is enough to get new code.Note: as boot loader is not used, it’s not necessary to rebuildIf theinstaller
each time (old image is fine), but sometimes it’s needed (when configuration changes are done and old installer doesn’t validate the config).talosctl cluster create
derives Talos machine configuration version from the install image tag, so sometimes early in the development cycle (when new minor tag is not released yet), machine config version can be overridden with —talos-version=.
--with-bootloader=false
flag is not enabled, for Talos cluster to pick up new changes to the code (in initramfs
), it will require a Talos upgrade (so new installer
should be built).
With --with-bootloader=false
flag, Talos always boots from initramfs
in _out/
directory, so simple reboot is enough to pick up new code changes.
If the installation flow needs to be tested, --with-bootloader=false
shouldn’t be used.
tail
:
talosctl cluster create
finishes successfully, talosconfig
and kubeconfig
will be set up automatically to point to your cluster.
Start playing with talosctl
:
kubectl
:
talosctl edit mc --immediate
, config patches can be applied via --config-patch
flags, also many features have specific flags in talosctl cluster create
.
q
to a single socket allows to reboot a single node.
Note: This command performs immediate reboot (as if the machine was powered down and immediately powered back up), for normal Talos reboot use talosctl reboot
.
initramfs
with make initramfs
initramfs
installer
itself), in that case quick development cycle is no longer possible, and cluster should be destroyed and recreated each time.
-test.short
flag.
Specfic tests can be run with -test.run=TestIntegration/api.ResetSuite
.
make <something> WITH_RACE=1
enables Go race detector, Talos runs slower and uses more memory, but memory races are detected.
make <something> WITH_DEBUG=1
enables Go profiling and other debug features, useful for local development.
make initramfs WITH_DEBUG_SHELL=true
adds bash and minimal utilities for debugging purposes.
Combine with --with-debug-shell
flag when creating cluster to obtain shell access.
This is uncommonly used as in this case the bash shell will run in place of machined.
~/.talos/clusters
.
Note: if the host machine is rebooted, QEMU instances and helpers processes won’t be started back.
In that case it’s required to clean up files in ~/.talos/clusters/<cluster-name>
directory manually.
Note: the static qemu binaries which come with Ubuntu 21.10 seem to be broken.
make unit-tests
, on Ubuntu systems some tests using loop
devices will fail because Ubuntu uses low-index loop
devices for snaps.
Most of the unit-tests can be run standalone as well, with regular go test
, or using IDE integration:
root
) or additional binaries available only in Talos rootfs
(containerd tests).
Running tests as root can be done with -exec
flag to go test
, but this is risky, as test code has root access and can potentially make undesired changes:
initramfs
with debug enabled: make initramfs WITH_DEBUG=1
.
Launch Talos cluster with bootloader disabled, and use go tool pprof
to capture the profile and show the output in your browser:
172.20.0.2
is the address of the Talos node, and port :9982
depends on the Go application to profile:
apid
machined
trustd
talosctl debug air-gapped
command which launches two components:
--advertised-address
should match the bridge IP of the Talos node.
Generated machine configuration patch looks like:
talosctl debug air-gapped
command:
CONNECT discovery.talos.dev:443
: the HTTP proxy is used to talk to the discovery servicehttp: TLS handshake error from 172.20.0.2:53512: remote error: tls: bad certificate
: an expected error on Talos side, as self-signed cert is not written yet to the fileGET /debug.yaml
: Talos successfully fetches the extra manifest successfullytalosctl --nodes 172.20.0.2 logs auditd > audit.log
The obtained logs can be processed with audit2allow
to obtain a CIL code that would allow the denied event to happen, alongside an explanation of the denial.
For this we use SELinux userspace utilities, which can be ran in a container for cases you use a Linux system without SELinux or another OS.
Some of the useful commands are:
audit2allow
as a final modification for the policy.
It is a good starting point to understand the denial, but the generated code should be reviewed and correctly reformulated once confirmed to be needed and not caused by mislabeling.
make generate
generates the compiled SELinux files.
However, if you want to iterate on the policy rapidly, you might want to consider only rebuilding the policy during the testing:
talos.auditd.disabled=1 audit=1 audit_backlog_limit=65535 debug=1 sysctl.kernel.printk_ratelimit=0 sysctl.kernel.printk_delay=0 sysctl.kernel.printk_ratelimit_burst=10000
internal/pkg/selinux/policy/selinux
and are compiled into a binary format (e.g. 33
for the current kernel policy format version) using the secilc
tool from Talos tools bundle.
The policy is embedded into the initramfs init and loaded early in the boot process.
For understanding and modifying the policy, CIL language reference is a recommended starting point to get familiar with the language.
Object Classes and Permissions is another helpful document, listing all SELinux entities and the meaning of all the permissions.
The policy directory contains the following main subdirectories:
immutable
: contains the preamble parts, mostly listing SELinux SIDs, classes, policy capabilities and roles, not expected to change frequently.common
: abstractions and common rules, which are used by the other parts of the policy or by all objects of some kind.:
fs_classes
classmap for enabling a group of file operations on all types of files.services
: policy files for each service.
These files contain the definitions and rules that are specific to the service, like allowing access to its configuration files or communicating over sockets.
Some specific parts not being a service in the Talos terms are:
selinux
- selinuxfs rules protecting SELinux settings from modifications after the OS has started.system-containerd
- a containerd instance used for apid
and similar services internal to Talos.system-containers
- apid
, trustd
, etcd
and other system services, running in system containerd instance.fs_classes
- contains file classes and their permissions, used for file operations.
rw
- all operations, except SELinux label management.ro
- read-only operations.netlink_classes (full)
- full (except security labels) access to all netlink socket classes.process_classes
- helpers to allow a wide range of process operations.
full
- all operations, except ptrace (considered to be a rare requirement, so should be added specifically where needed).signal
- send any signal to the target process.service_p
- system services.system_container_p
- containerized system services.pod_p
- Kubernetes pods.system_p
- kernel, init, system services (not containerized).any_p
- any process registered with the SELinux.common_f
- world-rw files, which can be accessed by any process.protected_f
- mostly files used by specific services, not accessible by other processes (except e.g. machined)system_f
- files and directories used by the system services, also generally to be specified by precise type and not typeattribute.system_socket_f
- sockets used for communication between system services, not accessible by workload processes.device_f
:
common_device_f
- devices not considered protected like GPUs.protected_device_f
- protected devices like TPM, watchdog timers.any_f
- any file registered with the SELinux.filesystem_f
- filesystems, generally used for allowing mount operations.service_exec_f
- system service executable files.any_f_any_p
- any file or any process, the widest typeattribute.