You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A bug in nvidia-container-toolkit 1.17.0 results in a containerd config that omits some EKS defaults. Some symptoms we're aware of:
The pause container image used by containerd will be the default (registry.k8s.io/pause:3.5) instead of the regional ECR image that EKS provides. If your nodes have network access to registry.k8s.io, this will work fine. If your nodes can't reach registry.k8s.io, your nodes will not be able to create pods.
The cgroup driver used by containerd will be incorrect, resulting in your pods being placed in the wrong part of the cgroup tree. This may impact your telemetry.
The bug has been fixed in 1.17.1 and EKS will release an AMI with this version of the nvidia-container-toolkit as soon as possible.
not sure if it's related. but since we updated from AL2 v20241024 to v20241106 we found that the /etc/containerd/config.toml is totally changed, and it removes the
What happened:
A bug in
nvidia-container-toolkit
1.17.0 results in acontainerd
config that omits some EKS defaults. Some symptoms we're aware of:containerd
will be the default (registry.k8s.io/pause:3.5
) instead of the regional ECR image that EKS provides. If your nodes have network access toregistry.k8s.io
, this will work fine. If your nodes can't reachregistry.k8s.io
, your nodes will not be able to create pods.containerd
will be incorrect, resulting in your pods being placed in the wrong part of the cgroup tree. This may impact your telemetry.The bug has been fixed in 1.17.1 and EKS will release an AMI with this version of the
nvidia-container-toolkit
as soon as possible.Bug: NVIDIA/nvidia-container-toolkit@a06d838
Fix: NVIDIA/nvidia-container-toolkit@1995925
The text was updated successfully, but these errors were encountered: