Common Kubernetes Errors and Solutions: OOMKilled, CrashLoopBackOff, and More

By Contributing Writer
Gilad David Maayan | November 04, 2022

What is Kubernetes Troubleshooting?

Kubernetes is an open source platform for managing Linux containers in private, public, and hybrid cloud environments. It is often used to manage large microservices applications. While Kubernetes is very powerful, it is also complex, and it can be difficult to identify and fix problems with its many components and the resources they create.

When troubleshooting an issue in your Kubernetes deployments, it’s important to realize that the symptom you are experiencing might be only part of the problem. For example, a cluster is unavailable or pods are not responding as expected, but it might require deeper inspection to identify which other components of the Kubernetes cluster or the infrastructure are at play.

Let's look at three common Kubernetes troubleshooting scenarios that IT and DevOps teams may face and how to solve them.

Common Kubernetes Errors and Solutions

OOMKilled

The OOMKilled (Out Of Memory) error indicates that a pod or container terminated because it used more memory than allowed. It has an exit code of 137.

To identify the error:

Use the following command to identify the OOMKilled error:

kubectl get pods command

The pod with the error will have OOMKilled under the STATUS column. To further investigate, look into the Events section of the pod’s text file and locate the following message:

State: Running

Started: Thu, 10 Oct 2019 11:14:13 +0200

Last State: Terminated

Reason: OOMKilled

Exit Code: 137

…

Diagnosis and resolution

Now, go through the pod’s recent activity history and pinpoint what caused the error. Here are some potential causes:

A container limit was reached, and the pod was terminated.
A pod was terminated because the node was overcommitted. It means the pods scheduled for the node collectively requested memory which exceeded the memory available on the node.

If the pod termination occurred because the container limit was reached:

Determine if the application indeed needs the extra memory. If it does, increase the container’s memory limit in the pod specification.
If the increase in memory use is sudden and cannot be tied to the application’s loads, the application could have memory leaks. Debug the applications for memory leaks and resolve them. However, don’t increase the memory limit since the application will consume too many resources on the nodes.

If the pod got terminated because the node was overcommitted, investigate the individual memory requests value, i.e., the minimal memory value for a pod. The total request value for all pods on a node should be less than the node’s available memory. If needed, adjust the memory requests and limit values to ensure that the node doesn’t get overcommitted.

CrashLoopBackOff

The CrashLoopBackOff error indicates that a pod cannot be scheduled on a node. It can only occur if the node doesn’t have the required resources for running the pod or the needed volumes haven't mounted successfully.

To identify the error:

Use the following command to identify the error:

kubectl get pods

The pod facing the issue will have CrashLoopBackOff under STATUS.

Use the following command to get further details about the error:

kubectl describe pod [pod-name]

Common causes and resolution

Here are some common causes of the error:

Inadequate resources—if the node has insufficient resources, manually eject the pods from it or increase your cluster’s scale to ensure there are more nodes present for the pods.
Errors in volume mounting—if there is a problem in mounting a storage volume, check the volume the pod is trying to mount and ensure it is correctly defined within its manifest. Also, ensure that there is a storage volume which matches those definitions.
Using hostPort—if the pods are bound to a hostPort, you can only schedule a single pod per node. In most cases, you can avoid using the hostPort and instead, use a Service object for communication with the pod.

CreateContainerConfigError

The CreateContainerConfigError error commonly results from a missing Secret of ConfigMap. A Secret is a Kubernetes object that stores confidential information such as database credentials. ConfigMaps store data in key-value pair format and are useful for storing the configuration information needed by multiple pods.

To identify the error:

Use the following command to identify the error:

kubectl get pods

The pod facing the issue will have CreateContainerConfigError under STATUS.

Use the following command to get further details about the error:

kubectl describe pod demo-pod

Here is what the output might look like:

Warning Failed 34s (x6 over 1m45s) kubelet

Error: configmap "configmap-7" not found

Run the following command to check if the ConfigMap returned by the previous step is present in the cluster:

kubectl get configmap configmap-7

If it’s absent, create the ConfigMap since it’s missing.

Once created, use the following command to ensure the ConfigMap is available:

get configmap demo-map

Use the command in step 1 to ensure the pod is now running.

ImagePullBackOff or ErrImagePull

The ImagePullBackOff and ErrImagePull errors mean that a pod couldn’t run because it unsuccessfully tried to pull a container image from a registry. Hence, the pod cannot start because it cannot create one (or more) containers given in its manifest.

To identify the error:

Use the following command to identify the error:

kubectl get pods

The pod facing the issue will have ImagePullBackOff or ErrImagePull under STATUS.

Use the following command to get further details about the error:

kubectl describe pod demo-pod

Root causes and resolution

Here are some causes behind the issue:

Wrong container image tag (News - Alert) or name—commonly happens when the container’s image name or tag title was incorrectly typed while defining in the pod manifest.

Ensure that the image names are correct using the following command:

docker pull <image-name|image-tag>

Authentication error with the container registry—the pod might not have successfully authenticated in the registry to pull the container image. It could’ve happened due to issues in the specific Secret that stored the credentials or because the pod doesn’t have the adequate RBAC role that allows it to perform this operation.

Ensure that the pod has the required permissions and Secrets. Then, manually attempt the operation using the docker pull command.

Kubernetes Node Not Ready

All the stateful pods in a node become unavailable when it crashes or shuts down. Then, the node shows NotReady as its status. If this status persists for more than five minutes, Kubernetes changes its scheduled pods’ status to Unknown. Later, Kubernetes attempts to schedule the pods on another node and gives it a ContainerCreating status.

To identify the error:

Use the following command to identify the error:

kubectl get pods

The pod facing the issue will have NotReady under STATUS.

Use the following command to see if the pods scheduled on the node are being shifted to other nodes:

get nodes

Check if the same pod appears on two different nodes in the output.

Resolving the issue

The issue can resolve itself if the failed node recovers or you reboot it. Here is what happens once it recovers and joins the cluster:

The pod with Unknown status gets deleted, and the failed node's volumes are detached.
The pod's status changes to ContainerCreating once it's rescheduled to a new node and the required volumes are attached.
Kubernetes waits for a default period of five minutes. After that, the pod's status will change from ContainerCreating to Running once it starts running on the new node.

If there is a time constraint or the node fails to recover, you must guide Kubernetes about rescheduling the stateful pods on a different working node. Here are the two ways to achieve it:

Remove failed node:

Use the following command to remove the failed node from the cluster:

kubectl delete node demo-node

Delete stateful pods with unknown status:

Use the following command to delete the stateful pods:

kubectl delete pods demo-pod --grace-period=0 --force -n demo-namespace

Conclusion

In this article, I covered some of the most common Kubernetes errors and showed how to solve them:

OOMKilled - indicates that a pod or container terminated because it used more memory than allowed.
CrashLoopBackOff - indicates that a pod cannot be scheduled on a node due to repeated crashing of a container.
CreateContainerConfigError - commonly results from a missing Secret of ConfigMap
ImagePullBackOff or ErrImagePull - indicates that a pod couldn’t run because it unsuccessfully tried to pull an image from a registry.
Kubernetes Node Not Ready - status shown when a node crashes or shuts down and all stateful pods become unavailable.

I hope this will help you get a head start in the exciting world of Kubernetes troubleshooting.

Get stories like this delivered straight to your inbox. [Free eNews Subscription]

» Recent Table of Contents

FEATURED WHITEPAPER

TROUBLESHOOTING MICROSOFT 365 END-TO-END: Creating Actionable Insight Through User Experience and Service Monitoring

If your organization is among the 115M daily Microsoft Teams users or generally relies on the Microsoft 365 platform, it's safe to say that anytime a performance or service delivery issue arises, the impact on productivity and profitability is material. [DOWNLOAD NOW]

Cloud Computing Newsletter

Get the latest expert news, reviews & resources. Tailored specifically for Cloud Computing.

Subscribe Now!

Featured Story

Why Every Business Needs Microsoft 365 Backup: A Comprehensive Guide

Common Kubernetes Errors and Solutions: OOMKilled, CrashLoopBackOff, and More

FEATURED WHITEPAPER

TROUBLESHOOTING MICROSOFT 365 END-TO-END: Creating Actionable Insight Through User Experience and Service Monitoring

Cloud Computing Newsletter

Featured Story

Latest From Cloud Computing

10 Red Flags That Reveal a Fake Website

Achieving ESG Targets: How Infrastructure Decisions Improve Carbon Reporting

Enterprise HCI Implementation: Key Challenges & Solutions

cdmon supports online projects with domain registration, email, and micro hosting

Beyond the Boost: How Online Growth Becomes Something a Business Can Actually Keep

The Orchestration Gap: The Hidden Friction Killing Your Security Operations