Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version of runc CLI cause elastic-operator not to deploy properly #5325

Closed
ondrej-ivanko opened this issue Feb 3, 2022 · 14 comments
Closed
Assignees

Comments

@ondrej-ivanko
Copy link

ondrej-ivanko commented Feb 3, 2022

Bug Report

What I did
Tried to deploy sts elastic-operator and CRDs by skaffold using elastic operator/crds .yaml manifests (https://download.elastic.co/downloads/eck/1.9.1/crds.yaml, https://download.elastic.co/downloads/eck/1.9.1/operator.yaml) to local kubernetes cluster created by minikube.

What did you expect to see?
Successful deploy of elastic-operator and CRDs.

What did you see instead? Under which circumstances?
First minikube created local cluster with namespaces, than skaffold tried to deploy elastic resources based on manifests.
Elastic Operator pod went into CrashLoopBackOff state and operator's pod stdout returned logs with errors. Pod never recovers from CrashLoopBackOff.

Environment

Arctix Linux
Kernel version: 5.16.3-artix1-1
arch: x86_64
init system: runit

Docker version 20.10.12, build e91ed5707e
  • ECK version:
    1.9.1

  • Kubernetes information:
    local cluster created by Minikube version: v1.23.2

Cluster parameters it started with:

minikube start -p k8s-desktop-dev --namespace=my-namespace --cpus=8 --memory=12g --driver=docker --kubernetes-version=stable

Deploy provided by Skaffold v1.35.2

Kubectl v

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:32:41Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
  • Resource definition:
https://download.elastic.co/downloads/eck/1.9.1/crds.yaml
https://download.elastic.co/downloads/eck/1.9.1/operator.yaml
  • Logs:
Waiting for deployments to stabilize...
 - elastic-system:statefulset/elastic-operator: creating container manager
    - elastic-system:pod/elastic-operator-0: creating container manager
 - elastic-system:statefulset/elastic-operator: container manager is backing off waiting to restart
    - elastic-system:pod/elastic-operator-0: container manager is backing off waiting to restart
      > [elastic-operator-0 manager] {"log.level":"error","@timestamp":"2022-02-02T17:30:34.906Z","log.logger":"manager","message":"Error setting GOMAXPROCS","service.version":"1.9.1+75cb4d4d","service.type":"eck","ecs.version":"1.4.0","error":"path \"/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1\" is not a descendant of mount point root \"/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1/kubelet\" and cannot be exposed from \"/sys/fs/cgroup/rdma/kubelet\"","error.stack_trace":"github.com/elastic/cloud-on-k8s/cmd/manager.doRun.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:329"}
      > [elastic-operator-0 manager] {"log.level":"error","@timestamp":"2022-02-02T17:30:34.906Z","log.logger":"manager","message":"Operator stopped with error","service.version":"1.9.1+75cb4d4d","service.type":"eck","ecs.version":"1.4.0","error":"path \"/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1\" is not a descendant of mount point root \"/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1/kubelet\" and cannot be exposed from \"/sys/fs/cgroup/rdma/kubelet\""}
      > [elastic-operator-0 manager] {"log.level":"error","@timestamp":"2022-02-02T17:30:34.906Z","log.logger":"manager","message":"Shutting down due to error","service.version":"1.9.1+75cb4d4d","service.type":"eck","ecs.version":"1.4.0","error":"path \"/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1\" is not a descendant of mount point root \"/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1/kubelet\" and cannot be exposed from \"/sys/fs/cgroup/rdma/kubelet\"","error.stack_trace":"github.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/main.go:30\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
      > [elastic-operator-0 manager] Error: path "/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1" is not a descendant of mount point root "/docker/b742ffe63ace321d7175e0124fe708dda8551a8295fd9511c4cab0a7887485a1/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"
 - elastic-system:statefulset/elastic-operator failed. Error: container manager is backing off waiting to restart.
1/1 deployment(s) failed

*Cause of bug:
New version of runc CLI runc-1.1.0-1-x86_64.

*How to fix:
It's confirmed that runc-1.0.2-2-x86_64 is working.

@botelastic botelastic bot added the triage label Feb 3, 2022
@barkbay
Copy link
Contributor

barkbay commented Feb 16, 2022

  • Cause of bug:
    New version of runc CLI runc-1.1.0-1-x86_64.

I'm not able to reproduce, Minikube v1.25.1 comes with runc 1.0.2 (as Docker 20.10.12):

minikube ssh -p k8s-desktop-dev
Last login: Wed Feb 16 09:02:48 2022 from 192.168.49.1
docker@k8s-desktop-dev:~$ runc -v
runc version 1.0.2
commit: v1.0.2-0-g52b36a2
spec: 1.0.2-dev
go: go1.16.10
libseccomp: 2.5.1

Could you share how you setup runc 1.1.0 ?

Thanks

@ondrej-ivanko
Copy link
Author

Could you share how you setup runc 1.1.0 ?

Hi @barkbay ,

I'm not sure how skaffold and minikube use runc internally and didn't know that Minikube comes with it's own version of runc. On my system, the runc that is used during deployment phase with Skaffold and Minikube is the one that is installed with pacman package manager and possibly is prioritized for use over runc that minikube uses.

All you have to do is update your system runc the way you do it to and check that you have runc 1.1.0 installed.

@barkbay
Copy link
Contributor

barkbay commented Mar 15, 2022

Sorry I didn't find the time to work on this one yet. I'll try to setup an environment with that version of runc. If by any chance you have an easy way to reproduce (I'm not familiar with Skaffold) please let me know.

@barkbay
Copy link
Contributor

barkbay commented Mar 15, 2022

I have setup an environment with the latest versions of Kubernetes, containerd and runc:

# k version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-03-15T14:57:24Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
# runc -v
runc version 1.1.0
commit: v1.1.0-0-g067aaf85
spec: 1.0.2-dev
go: go1.17.6
libseccomp: 2.5.3
# crictl version
Version:  0.1.0
RuntimeName:  containerd
RuntimeVersion:  v1.6.1
RuntimeApiVersion:  v1alpha2

I successfully deployed the operator and a small Elasticsearch+Kibana deployment:

# k get sts -n elastic-system
NAME               READY   AGE
elastic-operator   1/1     15m
# k get es,kb
NAME                                                              HEALTH   NODES   VERSION   PHASE   AGE
elasticsearch.elasticsearch.k8s.elastic.co/elasticsearch-sample   green    1       8.0.0     Ready   4m9s

NAME                                         HEALTH   NODES   VERSION   AGE
kibana.kibana.k8s.elastic.co/kibana-sample   green    1       8.0.0     4m9s

The problem does not seem to come from runc

@barkbay
Copy link
Contributor

barkbay commented Apr 12, 2022

Closing due to inactivity. Feel free to reopen if needed.

@ondrej-ivanko

This comment was marked as duplicate.

1 similar comment
@ondrej-ivanko
Copy link
Author

Hi @barkbay. I tried to deploy eck again multiple times. Upon ssh logging to minikube v 1.25.2 the runc version is 1.0.2. The runc version in minikube does not reflect the version of runc I have installed in my laptop (currently 1.1.1).

These are specs for binaries in minikube:

docker@k8s-desktop-dev:~$ sudo crictl version
Version:  0.1.0
RuntimeName:  docker
RuntimeVersion:  20.10.12
RuntimeApiVersion:  1.41.0

docker@k8s-desktop-dev:~$ docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:42 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}

You seem to be having the newer version of runc and containerd in your minikube. I could not find reason why my version of minikube has different version installed.

Anyway, my issue still persists and keeping runc version in my os installation at version 1.0.2 is the only way to not run into this error:

WARN[0003] Image "manager" not configured for debugging: unable to determine runtime for "" subtask=-1 task=DevLoop

  • customresourcedefinition.apiextensions.k8s.io/agents.agent.k8s.elastic.co created
  • customresourcedefinition.apiextensions.k8s.io/apmservers.apm.k8s.elastic.co created
  • customresourcedefinition.apiextensions.k8s.io/beats.beat.k8s.elastic.co created
  • customresourcedefinition.apiextensions.k8s.io/elasticmapsservers.maps.k8s.elastic.co created
  • customresourcedefinition.apiextensions.k8s.io/elasticsearches.elasticsearch.k8s.elastic.co created
  • customresourcedefinition.apiextensions.k8s.io/enterprisesearches.enterprisesearch.k8s.elastic.co created
  • customresourcedefinition.apiextensions.k8s.io/kibanas.kibana.k8s.elastic.co created
  • namespace/elastic-system created
  • serviceaccount/elastic-operator created
  • secret/elastic-webhook-server-cert created
  • configmap/elastic-operator created
  • clusterrole.rbac.authorization.k8s.io/elastic-operator created
  • clusterrole.rbac.authorization.k8s.io/elastic-operator-view created
  • clusterrole.rbac.authorization.k8s.io/elastic-operator-edit created
  • clusterrolebinding.rbac.authorization.k8s.io/elastic-operator created
  • service/elastic-webhook-server created
  • statefulset.apps/elastic-operator created
  • validatingwebhookconfiguration.admissionregistration.k8s.io/elastic-webhook.k8s.elastic.co created
    Waiting for deployments to stabilize...
  • elastic-system:statefulset/elastic-operator: creating container manager
    • elastic-system:pod/elastic-operator-0: creating container manager
  • elastic-system:statefulset/elastic-operator: BackOff: Back-off restarting failed container
    • elastic-system:pod/elastic-operator-0: BackOff: Back-off restarting failed container
  • elastic-system:statefulset/elastic-operator: container manager is backing off waiting to restart
    • elastic-system:pod/elastic-operator-0: container manager is backing off waiting to restart

      [elastic-operator-0 manager] {"log.level":"error","@timestamp":"2022-04-13T13:37:05.854Z","log.logger":"manager","message":"Error setting GOMAXPROCS","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","error":"path "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb" is not a descendant of mount point root "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"","error.stack_trace":"github.com/elastic/cloud-on-k8s/cmd/manager.doRun.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:341"}
      [elastic-operator-0 manager] {"log.level":"error","@timestamp":"2022-04-13T13:37:05.854Z","log.logger":"manager","message":"Operator stopped with error","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","error":"path "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb" is not a descendant of mount point root "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"","error.stack_trace":"runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581"}
      [elastic-operator-0 manager] {"log.level":"error","@timestamp":"2022-04-13T13:37:05.855Z","log.logger":"manager","message":"Shutting down due to error","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","error":"path "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb" is not a descendant of mount point root "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"","error.stack_trace":"github.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:902\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/main.go:31\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
      [elastic-operator-0 manager] Error: path "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb" is not a descendant of mount point root "/docker/99ad9864c3e25495a64dbca4c57361f95e7158a80e6c1a873083355c5046c4eb/kubelet" and cannot be exposed from "/sys/fs/cgroup/rdma/kubelet"

  • elastic-system:statefulset/elastic-operator failed. Error: container manager is backing off waiting to restart.

@gllb
Copy link

gllb commented May 11, 2022

Hello,

Up, I get same issue with the version you use @barkbay :

$ k version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
$ docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:42 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
$ runc -v
runc version 1.0.2
commit: v1.0.2-0-g52b36a2
spec: 1.0.2-dev
go: go1.16.10
libseccomp: 2.5.1

it fails trying to set GOMAXPROCS:

{"log.level":"error","@timestamp":"2022-05-11T12:38:34.348Z","log.logger":"manager","message":"Error setting GOMAXPROCS","service.version":"2.2.0+02f250eb","service.type":"eck","ecs.version":"1.4.0","error":"path \"/docker/21b4c822a3195009aaa7bbee7624d011f0c50a11c88037e0072213c9b8f87fdc\" is not a descendant of mount point root \"/docker/21b4c822a3195009aaa7bbee7624d011f0c50a11c88037e0072213c9b8f87fdc/kubelet\" and cannot be exposed from \"/sys/fs/cgroup/rdma/kubelet\"","error.stack_trace":"github.com/elastic/cloud-on-k8s/cmd/manager.doRun.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:341"}

@barkbay
Copy link
Contributor

barkbay commented May 11, 2022

RDMA support has been added to runc 1.1
Also it is stated in the first comment that:

It's confirmed that runc-1.0.2-2-x86_64 is working.

I think that runc 1.0.2 should not be affected.

That being said something I missed is that the RDMA cgroup controller has been introduced in the version 4.11 of the Linux kernel. I'll double check what version of the kernel I've been using.

In the meantime it would be helpful if you could run the following commands on both your host and on your K8S nodes:

  • uname -a
  • cat /proc/cgroups
  • runc -v

@barkbay
Copy link
Contributor

barkbay commented May 12, 2022

I managed to reproduce the problem and I think I understand the root cause of the issue. I'll explain the problem and post a workaround in a bit.

@barkbay barkbay reopened this May 12, 2022
@barkbay barkbay self-assigned this May 12, 2022
@barkbay
Copy link
Contributor

barkbay commented May 12, 2022

The root cause of the problem is that the RDMA cgroup controller has been introduced in version 4.11 of the Linux kernel, but the support for that new controller has only been added in runc 1.1
Since Minikube comes with runc 1.0.2 I guess that something is wrong in the way the cgroup hierarchy is managed, which leads to an error when uber-go/automaxprocs attempts to parse it.

Until a new version of minikube is released, you can create your own minikube docker image with a more recent version of runc:

  1. Write the following Dockerfile in an empty directory:
FROM gcr.io/k8s-minikube/kicbase:v0.0.30@sha256:02c921df998f95e849058af14de7045efc3954d90320967418a0d1f182bbc0b2

RUN curl -sSL --retry 5 --output /tmp/runc "https://github.com/opencontainers/runc/releases/download/v1.1.0/runc.amd64" \
    && mv /tmp/runc /usr/bin/runc \
    && chmod 755 /usr/bin/runc
  1. Create the image: docker build -t k8s-minikube/kicbase:runc110 .
  2. Start Minikube with that new image: minikube start -p k8s-desktop-dev --driver=docker --kubernetes-version=stable --base-image='k8s-minikube/kicbase:runc110'

@gllb
Copy link

gllb commented May 12, 2022

I confirm that fix the issue, thank you for your time @barkbay.
There is just a few typo that can be annoying when copy/paste :

  • at 1. the chmod : it miss a 'c' at /usr/bin/run
  • at 2. there is missing '.' to target current directory
  • at 3. it lack a single quote at the end of base-image option

@barkbay
Copy link
Contributor

barkbay commented May 12, 2022

I confirm that fix the issue, thank you for your time @barkbay.

Thanks for testing, glad it helps

There is just a few typo that can be annoying when copy/paste

Sorry, it should be fixed in the original comment

I'm closing the issue as it seems to be fixed.

@barkbay barkbay closed this as completed May 12, 2022
@ondrej-ivanko
Copy link
Author

ondrej-ivanko commented May 12, 2022

Hello,

I can confirm, that this workaround is working for me. I used different Dockerfile INSTRUCTION - instead of just replacing minikube runc, I updated the whole system with apt:

RUN sudo apt update && \
         sudo apt -y upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants