Trigger
ThetriggerExpression is a boolean condition used by the OOM controller to
decide whether it should act.
If the expression evaluates to true, the OOM controller will activate and attempt to kill processes
in order to free up memory.
Pressure Stall Information is the key parameter provided to the
expression, it should be the primary indication for determining whether or not OOM killing is required.
To find more information on the meaning of the PSI parameters, please read the linked page.
These are the variables provided to this expression, also listing their types:
memory_some_avg10- double -somememory pressure value, averaged over 10 secondsmemory_some_avg60- double -somememory pressure value, averaged over 60 secondsmemory_some_avg300- double -somememory pressure value, averaged over 300 secondsmemory_some_total- double -somememory pressure value, absolute cumulative valuememory_full_avg10- double -fullmemory pressure value, averaged over 10 secondsmemory_full_avg60- double -fullmemory pressure value, averaged over 60 secondsmemory_full_avg300- double -fullmemory pressure value, averaged over 300 secondsmemory_full_total- double -fullmemory pressure value, absolute cumulative value
d_ prefixed variants of the aforementioned variables (such as d_memory_some_avg10) are also available –
these are doubles representing the current derivative of that value, in absolute units per second.
Additionally, time_since_trigger variable is provided, representing the time past since the previous OOM trigger
as the CEL duration type. You may use this variable to rate limit OOM triggers to ensure the monitored
parameters have time to reflect the updated system state before new trigger decision.
Default condition in detail
The default value fortriggerExpression is:
- The full memory pressure (averaged over 10 seconds) is over 12
- Processes spend more time than a threshold waiting for the requested memory
- The derivative of the memory pressure (averaged over 10 seconds) is positive
- The system is slowing down due to memory pressure, indicated by increasing wait time
- The last OOM kill happened no less than 500 milliseconds ago
- Prevent the OOM killer from being triggered repeatedly without waiting for it to have an effect on the metrics used
Cgroup ranking expression
After the OOM killer is triggered, the controller will create a list of cgroups that can be killed to free up memory. The expression configured by thecgroupRankingExpression property is then used to compute an OOM score for each
of these cgroups.
The cgroup with the highest OOM score is the one that will be killed.
This setting enables the user to customize the priority of killing cgroups by modifying the evaluation rules
dependent on the cgroup class. These are the class constants passed to the expression alongside variables:
Besteffort- Kubernetes pods of the BestEffort QoS classBurstable- Kubernetes pods of the Burstable QoS classGuaranteed- Kubernetes pods of the Guaranteed QoS classPodruntime- container runtime, usually containerd and accompanying processesSystem- Talos Linux system services, such as machined, apid and udevd
memory_max- optional<uint> - if reported for the cgroup: max allowed memory usage, in bytesmemory_current- optional<uint> - if reported for the cgroup: current memory usage, in bytesmemory_peak- optional<uint> - if reported for the cgroup: peak registered memory usage, in bytespath- string - absolute path to the cgroup being evaluatedclass- int - one of the aforementioned cgroup classes, should be matched against those constants
Default formula in detail
- If there is a maximum value defined, return 0 - those are processes with well-defined resource demands and the least likely to be killed by the OOM handler (score 0 cgroups are the last to be killed)
- Prioritize BestEffort pods over Burstable, and ignore other classes
- A map is used here to look up a coefficient depending on the cgroup class
orValueis a method of theoptionaltype allowing to unwrap the option, choosing a default value in case the value is not available