Pods with volume stuck in ContainerCreating with Multi-Attach error due to dangling volumeattachments #245

vitality411 · 2023-12-11T11:13:55Z

Describe the bug

I dynamically provisioned a volume and then attached it to a deployment. When I delete a node and let Cluster API provision a new node, it might happen that dangling volumeattachments are left behind. In a future drain of a node, pods might get stuck in ContainerCreating state due to this.

k describe po -n kube-system  nginx-59d9859785-lm97h
Name:             nginx-59d9859785-lm97h
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-55n5f/10.70.27.38
Start Time:       Mon, 11 Dec 2023 08:52:50 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dn4fn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-dn4fn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age    From                     Message
  ----     ------              ----   ----                     -------
  Warning  FailedScheduling    9m55s  default-scheduler        0/5 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 1 node(s) were unschedulable, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling..
  Normal   Scheduled           4m58s  default-scheduler        Successfully assigned kube-system/nginx-59d9859785-lm97h to kubermatic-v3-test-worker-57ccd5c88c-55n5f
  Warning  FailedAttachVolume  4m59s  attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         70s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition

Dangling volumeattachments:

k get volumeattachments.storage.k8s.io | grep pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-65zg2   true       2d19h
csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-54m5f   true       2d19h

k describe volumeattachments.storage.k8s.io csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600
Name:         csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: kubermatic-v3-test-worker-57ccd5c88c-65zg2
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:  2023-12-08T12:52:33Z
  Finalizers:
    external-attacher/named-disk-csi-cloud-director-vmware-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/named-disk-csi-cloud-director-vmware-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2023-12-08T12:52:33Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2023-12-08T12:52:33Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attached:
        f:attachmentMetadata:
          .:
          f:diskID:
          f:diskUUID:
          f:filesystem:
          f:vmID:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2023-12-08T12:54:31Z
  Resource Version:  990246
  UID:               652890d8-7577-40d0-867a-86d795365ba4
Spec:
  Attacher:   named-disk.csi.cloud-director.vmware.com
  Node Name:  kubermatic-v3-test-worker-57ccd5c88c-65zg2
  Source:
    Persistent Volume Name:  pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
Status:
  Attached:  true
  Attachment Metadata:
    Disk ID:     pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
    Disk UUID:   6000c295-355f-da02-a25a-f852b7ce31d8
    Filesystem:  ext4
    Vm ID:       kubermatic-v3-test-worker-57ccd5c88c-65zg2
Events:          <none>

k describe volumeattachments.storage.k8s.io csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Name:         csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: kubermatic-v3-test-worker-57ccd5c88c-54m5f
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:             2023-12-08T12:48:14Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2023-12-08T12:52:26Z
  Finalizers:
    external-attacher/named-disk-csi-cloud-director-vmware-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/named-disk-csi-cloud-director-vmware-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attached:
        f:attachmentMetadata:
          .:
          f:diskID:
          f:diskUUID:
          f:filesystem:
          f:vmID:
        f:detachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2023-12-11T08:05:27Z
  Resource Version:  1984691
  UID:               583797f0-5ebf-46c0-82f6-27c01478c085
Spec:
  Attacher:   named-disk.csi.cloud-director.vmware.com
  Node Name:  kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Source:
    Persistent Volume Name:  pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
Status:
  Attached:  true
  Attachment Metadata:
    Disk ID:     pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
    Disk UUID:   6000c295-355f-da02-a25a-f852b7ce31d8
    Filesystem:  ext4
    Vm ID:       kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Detach Error:
    Message:  rpc error: code = NotFound desc = Could not find VM with nodeID [kubermatic-v3-test-worker-57ccd5c88c-54m5f] from which to detach [pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2]
    Time:     2023-12-11T08:05:27Z
Events:       <none>

Reproduction steps

...

Expected behavior

I expected the pod to successfully start on another node.

Additional context

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc-vcdplugin
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
          - mountPath: /usr/share/nginx/html
            name: csi-data-vcdplugin
      volumes:
      - name: csi-data-vcdplugin
        persistentVolumeClaim:
          claimName: csi-pvc-vcdplugin
          readOnly: false

Apply the manifest.

k apply -n default -f nginx.yaml
persistentvolumeclaim/csi-pvc-vcdplugin created
deployment.apps/nginx created

Once the pod is running, drain and delete a node, let Cluster API provision a new one.

k drain --ignore-daemonsets --delete-emptydir-data node/kubermatic-v3-test-worker-57ccd5c88c-54m5f
k delete machine.cluster.k8s.io/kubermatic-v3-test-worker-57ccd5c88c-54m5f

Drain a node.

k drain --ignore-daemonsets --delete-emptydir-data node/node/kubermatic-v3-test-worker-57ccd5c88c-65zg2

Verify you have dangling volumeattachments.

k get volumeattachments.storage.k8s.io --sort-by .spec.source.persistentVolumeName -o custom-columns=PV:.spec.source.persistentVolumeName --no-headers | uniq -c
      2 pvc-8c4fdada-0815-4448-bf89-e68f36a1ace9
      2 pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
      1 pvc-29051ee3-506a-405e-9021-d02b09cc86c0
      1 pvc-d1e7cc95-5653-4307-a271-f7262889e614
      1 pvc-e53d0a3c-df72-4ef4-9658-ef715d763ce4

It may take a few attempts as it does not happen every time.

I can delete the ContainerCreating pod with the --force flag. But the pod still does not start.

k delete po -n kube-system  nginx-59d9859785-lm97h --force
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "nginx-59d9859785-lm97h" force deleted

k describe po -n kube-system  nginx-59d9859785-gbprn
Name:             nginx-59d9859785-gbprn
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-55n5f/10.70.27.38
Start Time:       Mon, 11 Dec 2023 09:23:56 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwr9q (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-dwr9q:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           13s   default-scheduler        Successfully assigned kube-system/nginx-59d9859785-gbprn to kubermatic-v3-test-worker-57ccd5c88c-55n5f
  Warning  FailedAttachVolume  14s   attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already exclusively attached to one node and can't be attached to another

When I try to delete the previous volumeattachments:

k delete volumeattachments.storage.k8s.io csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600 csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
volumeattachment.storage.k8s.io "csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600" deleted
volumeattachment.storage.k8s.io "csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f" deleted

Only the one on the just drained and still existing node is deleted. The one on the non existing node remains:

k describe volumeattachments.storage.k8s.io csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Name:         csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: kubermatic-v3-test-worker-57ccd5c88c-54m5f
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:             2023-12-08T12:48:14Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2023-12-08T12:52:26Z
  Finalizers:
    external-attacher/named-disk-csi-cloud-director-vmware-com
  Managed Fields:
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:csi.alpha.kubernetes.io/node-id:
        f:finalizers:
          .:
          v:"external-attacher/named-disk-csi-cloud-director-vmware-com":
    Manager:      csi-attacher
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:attacher:
        f:nodeName:
        f:source:
          f:persistentVolumeName:
    Manager:      kube-controller-manager
    Operation:    Update
    Time:         2023-12-08T12:48:14Z
    API Version:  storage.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:attached:
        f:attachmentMetadata:
          .:
          f:diskID:
          f:diskUUID:
          f:filesystem:
          f:vmID:
        f:detachError:
          .:
          f:message:
          f:time:
    Manager:         csi-attacher
    Operation:       Update
    Subresource:     status
    Time:            2023-12-11T08:34:24Z
  Resource Version:  1992050
  UID:               583797f0-5ebf-46c0-82f6-27c01478c085
Spec:
  Attacher:   named-disk.csi.cloud-director.vmware.com
  Node Name:  kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Source:
    Persistent Volume Name:  pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
Status:
  Attached:  true
  Attachment Metadata:
    Disk ID:     pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
    Disk UUID:   6000c295-355f-da02-a25a-f852b7ce31d8
    Filesystem:  ext4
    Vm ID:       kubermatic-v3-test-worker-57ccd5c88c-54m5f
  Detach Error:
    Message:  rpc error: code = NotFound desc = Could not find VM with nodeID [kubermatic-v3-test-worker-57ccd5c88c-54m5f] from which to detach [pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2]
    Time:     2023-12-11T08:34:24Z
Events:       <none>

The new pod will start:

k describe po -n kube-system  nginx-59d9859785-gbprn
Name:             nginx-59d9859785-gbprn
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-55n5f/10.70.27.38
Start Time:       Mon, 11 Dec 2023 09:23:56 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Running
IP:               10.244.10.63
IPs:
  IP:           10.244.10.63
Controlled By:  ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:   containerd://ed9a503d4249f0a9b837c73a0c7063b83b98e781599f0727755225ef900cd927
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:10d1f5b58f74683ad34eb29287e07dab1e90f10af243f151bb50aa5dbb4d62ee
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 11 Dec 2023 09:30:42 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwr9q (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-dwr9q:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               7m18s               default-scheduler        Successfully assigned kube-system/nginx-59d9859785-gbprn to kubermatic-v3-test-worker-57ccd5c88c-55n5f
  Warning  FailedAttachVolume      7m18s               attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             3m (x2 over 5m15s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  49s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2"
  Normal   Pulling                 48s                 kubelet                  Pulling image "nginx"
  Normal   Pulled                  32s                 kubelet                  Successfully pulled image "nginx" in 15.20183854s (15.20186002s including waiting)
  Normal   Created                 32s                 kubelet                  Created container nginx
  Normal   Started                 32s                 kubelet                  Started container nginx

A new volumeattachments is created, but the dangling remains:

k get volumeattachments.storage.k8s.io | grep pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
csi-c53775fdb3196e08803afeddc59a6fb79f4e3a054241fbcd629ecd09a18b28af   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-55n5f   true       5m10s
csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-54m5f   true       2d19h

If I now drain again, I have the same issue again:

k describe po -n kube-system  nginx-59d9859785-6k2nx
Name:             nginx-59d9859785-6k2nx
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-65zg2/10.70.27.39
Start Time:       Mon, 11 Dec 2023 09:36:39 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqg5w (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-bqg5w:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           3m3s  default-scheduler        Successfully assigned kube-system/nginx-59d9859785-6k2nx to kubermatic-v3-test-worker-57ccd5c88c-65zg2
  Warning  FailedAttachVolume  3m4s  attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already used by pod(s) nginx-59d9859785-gbprn
  Warning  FailedMount         61s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition

Now when I remove the finalizers of the dangling volumeattachment:

kubectl patch volumeattachments.storage.k8s.io csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f -p '{"metadata":{"finalizers":null}}' --type=mergevolumeattachment.storage.k8s.io/csi-da9c3c68da884d8459778f4dfedd66cc28bcb7d1e0c89bb8abb967ce6d36407f patched

Delete the volumeattachment of the just drained node:

k delete volumeattachments.storage.k8s.io csi-c53775fdb3196e08803afeddc59a6fb79f4e3a054241fbcd629ecd09a18b28af
volumeattachment.storage.k8s.io "csi-c53775fdb3196e08803afeddc59a6fb79f4e3a054241fbcd629ecd09a18b28af" deleted

A new volumeattachment is created:

k get volumeattachments.storage.k8s.io | grep pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2
csi-33c5c660443f4a1d8fa7ec048bfe010699b9245cc2f0d2aac30db6b3b665f600   named-disk.csi.cloud-director.vmware.com   pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2   kubermatic-v3-test-worker-57ccd5c88c-65zg2   true       20s

And the pod is able to start:

k describe po -n kube-system  nginx-59d9859785-6k2nx
Name:             nginx-59d9859785-6k2nx
Namespace:        kube-system
Priority:         0
Service Account:  default
Node:             kubermatic-v3-test-worker-57ccd5c88c-65zg2/10.70.27.39
Start Time:       Mon, 11 Dec 2023 09:36:39 +0100
Labels:           app=nginx
                  pod-template-hash=59d9859785
Annotations:      <none>
Status:           Running
IP:               10.244.9.246
IPs:
  IP:           10.244.9.246
Controlled By:  ReplicaSet/nginx-59d9859785
Containers:
  nginx:
    Container ID:   containerd://d2467ccd11fa6a0524208e248eeabf2ceafd81e48bfa6e3c192a1c4528a1907a
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:10d1f5b58f74683ad34eb29287e07dab1e90f10af243f151bb50aa5dbb4d62ee
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 11 Dec 2023 09:43:09 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from csi-data-vcdplugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bqg5w (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  csi-data-vcdplugin:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  csi-pvc-vcdplugin
    ReadOnly:   false
  kube-api-access-bqg5w:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               6m52s                  default-scheduler        Successfully assigned kube-system/nginx-59d9859785-6k2nx to kubermatic-v3-test-worker-57ccd5c88c-65zg2
  Warning  FailedAttachVolume      6m53s                  attachdetach-controller  Multi-Attach error for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2" Volume is already used by pod(s) nginx-59d9859785-gbprn
  Warning  FailedMount             2m36s (x2 over 4m50s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[csi-data-vcdplugin], unattached volumes=[csi-data-vcdplugin], failed to process volumes=[]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  26s                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8032a622-19c0-4e66-aacf-c0911ac8bda2"
  Normal   Pulling                 24s                    kubelet                  Pulling image "nginx"
  Normal   Pulled                  23s                    kubelet                  Successfully pulled image "nginx" in 1.073543173s (1.073553964s including waiting)
  Normal   Created                 23s                    kubelet                  Created container nginx
  Normal   Started                 23s                    kubelet                  Started container nginx

When I now drain again, the pod is able to start without issues and manual intervention.

Kubernetes version:

kubectl version
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.10", GitCommit:"b8609d4dd75c5d6fba4a5eaa63a5507cb39a6e99", GitTreeState:"clean", BuildDate:"2023-10-18T11:44:31Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"07a61d861519c45ef5c89bc22dda289328f29343", GitTreeState:"clean", BuildDate:"2023-10-18T11:33:23Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider
VMware Cloud Director version: 10.4.2.21954589

OS version:

cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

uname -a
Linux kubermatic-v3-test-cp-0 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
KubeOne v1.7.1

Container runtime (CRI) and version (if applicable)
containerd://1.6.25

Related plugins (CNI, CSI, ...) and versions (if applicable)

csi-vcd-controllerplugin version:
        image: registry.k8s.io/sig-storage/csi-attacher:v3.2.1
        image: registry.k8s.io/sig-storage/csi-provisioner:v2.2.2
        image: projects.registry.vmware.com/vmware-cloud-director/cloud-director-named-disk-csi-driver:1.4.0

csi-vcd-nodeplugin version:
        image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.2.0
        image: projects.registry.vmware.com/vmware-cloud-director/cloud-director-named-disk-csi-driver:1.4.0

Logs:
csi-vcd-nodeplugin-zg6b4.zip
csi-vcd-controllerplugin-76cff99975-xh8vc.zip
kubelet-journal-kubermatic-v3-test-worker-57ccd5c88c-65zg2.zip

The text was updated successfully, but these errors were encountered:

0hlov3 · 2024-10-04T19:07:38Z

Good evening, have you found a solution to this problem? Every time I want to update a Pod with a PVC attached, I get stuck and have to delete the attachment myself...

vitality411 · 2024-10-07T04:50:28Z

Unfortunately no. It still happens from time to time.

penoux · 2024-11-19T09:23:18Z

We also encounter the same issues. Rolling upgrades are failing, we are obliged to go on each cluster, identify pods, PVC and volumeattachments, force delete them with removal of finalizers...

This issue completely defeats the purpose of Cluster API automation.

Thanks a lot @vitality411 for all details you have submitted on this issue. I also saw the issue you opened on Kubernetes for the same, which concludes to a bug in VCD CSI Driver.

We have no automated workaround which can make CAPI work as expected.

vitality411 added the bug Something isn't working label Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods with volume stuck in ContainerCreating with Multi-Attach error due to dangling volumeattachments #245

Pods with volume stuck in ContainerCreating with Multi-Attach error due to dangling volumeattachments #245

vitality411 commented Dec 11, 2023 •

edited

Loading

0hlov3 commented Oct 4, 2024

vitality411 commented Oct 7, 2024

penoux commented Nov 19, 2024

Pods with volume stuck in ContainerCreating with Multi-Attach error due to dangling volumeattachments #245

Pods with volume stuck in ContainerCreating with Multi-Attach error due to dangling volumeattachments #245

Comments

vitality411 commented Dec 11, 2023 • edited Loading

Describe the bug

Reproduction steps

Expected behavior

Additional context

0hlov3 commented Oct 4, 2024

vitality411 commented Oct 7, 2024

penoux commented Nov 19, 2024

vitality411 commented Dec 11, 2023 •

edited

Loading