-XX:+UseParallelGC
-XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4
-XX:AdaptiveSizePolicyWeight=90.
As a cluster administrator, you can help your clusters operate efficiently through managing application memory by:
Determining the memory and risk requirements of a containerized application component and configuring the container memory parameters to suit those requirements.
Configuring containerized application runtimes (for example, OpenJDK) to adhere optimally to the configured container memory parameters.
Diagnosing and resolving memory-related error conditions associated with running in a container.
It is recommended to fully read the overview of how OpenShift Container Platform manages Compute Resources before proceeding.
For each kind of resource (memory, CPU, storage), OpenShift Container Platform allows optional request and limit values to be placed on each container in a pod.
Note the following about memory requests and memory limits:
Memory request
The memory request value, if specified, influences the OpenShift Container Platform scheduler. The scheduler considers the memory request when scheduling a container to a node, then fences off the requested memory on the chosen node for the use of the container.
If a node’s memory is exhausted, OpenShift Container Platform prioritizes evicting its containers whose memory usage most exceeds their memory request. In serious cases of memory exhaustion, the node OOM killer may select and kill a process in a container based on a similar metric.
The cluster administrator can assign quota or assign default values for the memory request value.
The cluster administrator can override the memory request values that a developer specifies, to manage cluster overcommit.
Memory limit
The memory limit value, if specified, provides a hard limit on the memory that can be allocated across all the processes in a container.
If the memory allocated by all of the processes in a container exceeds the memory limit, the node Out of Memory (OOM) killer will immediately select and kill a process in the container.
If both memory request and limit are specified, the memory limit value must be greater than or equal to the memory request.
The cluster administrator can assign quota or assign default values for the memory limit value.
The minimum memory limit is 12 MB. If a container fails to start due to a Cannot allocate memory
pod event, the memory limit is too low.
Either increase or remove the memory limit. Removing the limit allows pods to consume unbounded node resources.
The steps for sizing application memory on OpenShift Container Platform are as follows:
Determine expected container memory usage
Determine expected mean and peak container memory usage, empirically if necessary (for example, by separate load testing). Remember to consider all the processes that may potentially run in parallel in the container: for example, does the main application spawn any ancillary scripts?
Determine risk appetite
Determine risk appetite for eviction. If the risk appetite is low, the container should request memory according to the expected peak usage plus a percentage safety margin. If the risk appetite is higher, it may be more appropriate to request memory according to the expected mean usage.
Set container memory request
Set container memory request based on the above. The more accurately the request represents the application memory usage, the better. If the request is too high, cluster and quota usage will be inefficient. If the request is too low, the chances of application eviction increase.
Set container memory limit, if required
Set container memory limit, if required. Setting a limit has the effect of immediately killing a container process if the combined memory usage of all processes in the container exceeds the limit, and is therefore a mixed blessing. On the one hand, it may make unanticipated excess memory usage obvious early ("fail fast"); on the other hand it also terminates processes abruptly.
Note that some OpenShift Container Platform clusters may require a limit value to be set; some may override the request based on the limit; and some application images rely on a limit value being set as this is easier to detect than a request value.
If the memory limit is set, it should not be set to less than the expected peak container memory usage plus a percentage safety margin.
Ensure application is tuned
Ensure application is tuned with respect to configured request and limit values, if appropriate. This step is particularly relevant to applications which pool memory, such as the JVM. The rest of this page discusses this.
The default OpenJDK settings do not work well with containerized environments. As a result, some additional Java memory settings must always be provided whenever running the OpenJDK in a container.
The JVM memory layout is complex, version dependent, and describing it in detail is beyond the scope of this documentation. However, as a starting point for running OpenJDK in a container, at least the following three memory-related tasks are key:
Overriding the JVM maximum heap size.
Encouraging the JVM to release unused memory to the operating system, if appropriate.
Ensuring all JVM processes within a container are appropriately configured.
Optimally tuning JVM workloads for running in a container is beyond the scope of this documentation, and may involve setting multiple additional JVM options.
For many Java workloads, the JVM heap is the largest single consumer of memory.
Currently, the OpenJDK defaults to allowing up to 1/4 (1/-XX:MaxRAMFraction
)
of the compute node’s memory to be used for the heap, regardless of whether the
OpenJDK is running in a container or not. It is therefore essential to
override this behavior, especially if a container memory limit is also set.
There are at least two ways the above can be achieved:
If the container memory limit is set and the experimental options are
supported by the JVM, set -XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
.
The |
This sets -XX:MaxRAM
to the container memory limit, and the maximum heap size
(-XX:MaxHeapSize
/ -Xmx
) to 1/-XX:MaxRAMFraction
(1/4 by default).
Directly override one of -XX:MaxRAM
, -XX:MaxHeapSize
or -Xmx
.
This option involves hard-coding a value, but has the advantage of allowing a safety margin to be calculated.
By default, the OpenJDK does not aggressively return unused memory to the operating system. This may be appropriate for many containerized Java workloads, but notable exceptions include workloads where additional active processes co-exist with a JVM within a container, whether those additional processes are native, additional JVMs, or a combination of the two.
Java-based agents can use the following JVM arguments to encourage the JVM to release unused memory to the operating system:
-XX:+UseParallelGC
-XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4
-XX:AdaptiveSizePolicyWeight=90.
These arguments are intended to return heap
memory to the operating system whenever allocated memory exceeds 110% of in-use
memory (-XX:MaxHeapFreeRatio
), spending up to 20% of CPU time in the garbage
collector (-XX:GCTimeRatio
). At no time will the application heap allocation
be less than the initial heap allocation (overridden by -XX:InitialHeapSize
/
-Xms
). Detailed additional information is available
Tuning Java’s footprint in OpenShift (Part 1),
Tuning Java’s footprint in OpenShift (Part 2),
and at
OpenJDK
and Containers.
In the case that multiple JVMs run in the same container, it is essential to ensure that they are all configured appropriately. For many workloads it will be necessary to grant each JVM a percentage memory budget, leaving a perhaps substantial additional safety margin.
Many Java tools use different environment variables (JAVA_OPTS
, GRADLE_OPTS
, and so on) to configure their JVMs and it can be challenging to ensure
that the right settings are being passed to the right JVM.
The JAVA_TOOL_OPTIONS
environment variable is always respected by the OpenJDK,
and values specified in JAVA_TOOL_OPTIONS
will be overridden by other options
specified on the JVM command line. By default, to ensure that these options are
used by default for all JVM workloads run in the Java-based agent image, the OpenShift Container Platform Jenkins
Maven agent image sets:
JAVA_TOOL_OPTIONS="-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true"
The |
This does not guarantee that additional options are not required, but is intended to be a helpful starting point.
An application wishing to dynamically discover its memory request and limit from within a pod should use the Downward API.
Configure the pod to add the MEMORY_REQUEST
and MEMORY_LIMIT
stanzas:
Create a YAML file similar to the following:
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: test
image: fedora:latest
command:
- sleep
- "3600"
env:
- name: MEMORY_REQUEST (1)
valueFrom:
resourceFieldRef:
containerName: test
resource: requests.memory
- name: MEMORY_LIMIT (2)
valueFrom:
resourceFieldRef:
containerName: test
resource: limits.memory
resources:
requests:
memory: 384Mi
limits:
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
1 | Add this stanza to discover the application memory request value. |
2 | Add this stanza to discover the application memory limit value. |
Create the pod by running the following command:
$ oc create -f <file-name>.yaml
Access the pod using a remote shell:
$ oc rsh test
Check that the requested values were applied:
$ env | grep MEMORY | sort
MEMORY_LIMIT=536870912
MEMORY_REQUEST=402653184
The memory limit value can also be read from inside the container by the
|
OpenShift Container Platform can kill a process in a container if the total memory usage of all the processes in the container exceeds the memory limit, or in serious cases of node memory exhaustion.
When a process is Out of Memory (OOM) killed, this might result in the container exiting immediately. If the container PID 1 process receives the SIGKILL, the container will exit immediately. Otherwise, the container behavior is dependent on the behavior of the other processes.
For example, a container process exited with code 137, indicating it received a SIGKILL signal.
If the container does not exit immediately, an OOM kill is detectable as follows:
Access the pod using a remote shell:
# oc rsh test
Run the following command to see the current OOM kill count in /sys/fs/cgroup/memory/memory.oom_control
:
$ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 0
Run the following command to provoke an OOM kill:
$ sed -e '' </dev/zero
Killed
Run the following command to view the exit status of the sed
command:
$ echo $?
137
The 137
code indicates the container process exited with code 137, indicating it received a SIGKILL signal.
Run the following command to see that the OOM kill counter in /sys/fs/cgroup/memory/memory.oom_control
incremented:
$ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 1
If one or more processes in a pod are OOM killed, when the pod subsequently
exits, whether immediately or not, it will have phase Failed and reason
OOMKilled. An OOM-killed pod might be restarted depending on the value of
restartPolicy
. If not restarted, controllers such as the
replication controller will notice the pod’s failed status and create a new pod
to replace the old one.
Use the follwing command to get the pod status:
$ oc get pod test
NAME READY STATUS RESTARTS AGE
test 0/1 OOMKilled 0 1m
If the pod has not restarted, run the following command to view the pod:
$ oc get pod test -o yaml
...
status:
containerStatuses:
- name: test
ready: false
restartCount: 0
state:
terminated:
exitCode: 137
reason: OOMKilled
phase: Failed
If restarted, run the following command to view the pod:
$ oc get pod test -o yaml
...
status:
containerStatuses:
- name: test
ready: true
restartCount: 1
lastState:
terminated:
exitCode: 137
reason: OOMKilled
state:
running:
phase: Running
OpenShift Container Platform may evict a pod from its node when the node’s memory is exhausted. Depending on the extent of memory exhaustion, the eviction may or may not be graceful. Graceful eviction implies the main process (PID 1) of each container receiving a SIGTERM signal, then some time later a SIGKILL signal if the process has not exited already. Non-graceful eviction implies the main process of each container immediately receiving a SIGKILL signal.
An evicted pod has phase Failed and reason Evicted. It will not be
restarted, regardless of the value of restartPolicy
. However, controllers
such as the replication controller will notice the pod’s failed status and create
a new pod to replace the old one.
$ oc get pod test
NAME READY STATUS RESTARTS AGE
test 0/1 Evicted 0 1m
$ oc get pod test -o yaml
...
status:
message: 'Pod The node was low on resource: [MemoryPressure].'
phase: Failed
reason: Evicted