Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker] dual-publish setcap; default raise limits #1745

Merged
merged 5 commits into from
Aug 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci
25 changes: 19 additions & 6 deletions docker/images.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,29 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add m3aggregator to this btw? Noticed we don't have an image (can do this in followup I suppose though).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

"images": {
"m3dbnode": {
"dockerfile": "docker/m3dbnode/Dockerfile"
"m3aggregator": {
"dockerfile": "docker/m3aggregator/Dockerfile",
"name": "m3aggregator"
},
"m3coordinator": {
"dockerfile": "docker/m3coordinator/Dockerfile"
"dockerfile": "docker/m3coordinator/Dockerfile",
"name": "m3coordinator"
},
"m3query": {
"dockerfile": "docker/m3query/Dockerfile"
"m3dbnode": {
"dockerfile": "docker/m3dbnode/Dockerfile",
"name": "m3dbnode"
},
"m3dbnode-setcap": {
"dockerfile": "docker/m3dbnode/Dockerfile-setcap",
"name": "m3dbnode",
"tag_suffix": "setcap"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it add a "-" automatically? i.e. 0.10.2-setcap or will it be 0.10.2setcap ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in the builder script we have

TAG="${TAG}-${TAG_SUFFIX}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

},
"m3nsch": {
"dockerfile": "docker/m3nsch/Dockerfile"
"dockerfile": "docker/m3nsch/Dockerfile",
"name": "m3nsch"
},
"m3query": {
"dockerfile": "docker/m3query/Dockerfile",
"name": "m3query"
}
}
}
5 changes: 3 additions & 2 deletions docker/m3dbnode/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@ EXPOSE 2379/tcp 2380/tcp 7201/tcp 7203/tcp 9000-9004/tcp

RUN apk add --no-cache curl jq

COPY --from=builder /go/src/github.com/m3db/m3/bin/m3dbnode /bin/
COPY --from=builder /go/src/github.com/m3db/m3/src/dbnode/config/m3dbnode-local-etcd.yml /etc/m3dbnode/m3dbnode.yml
COPY --from=builder /go/src/github.com/m3db/m3/scripts/m3dbnode_bootstrapped.sh /bin/
COPY --from=builder /go/src/github.com/m3db/m3/bin/m3dbnode \
/go/src/github.com/m3db/m3/scripts/m3dbnode_bootstrapped.sh \
/bin/

ENTRYPOINT [ "/bin/m3dbnode" ]
CMD [ "-f", "/etc/m3dbnode/m3dbnode.yml" ]
34 changes: 34 additions & 0 deletions docker/m3dbnode/Dockerfile-setcap
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# stage 1: build
FROM golang:1.12-alpine AS builder
LABEL maintainer="The M3DB Authors <[email protected]>"

# Install Glide
RUN apk add --update glide git make bash

# Add source code
RUN mkdir -p /go/src/github.com/m3db/m3
ADD . /go/src/github.com/m3db/m3

# Build m3dbnode binary
RUN cd /go/src/github.com/m3db/m3/ && \
git submodule update --init && \
make m3dbnode-linux-amd64

# Stage 2: lightweight "release"
FROM alpine:latest
LABEL maintainer="The M3DB Authors <[email protected]>"

EXPOSE 2379/tcp 2380/tcp 7201/tcp 7203/tcp 9000-9004/tcp

COPY --from=builder /go/src/github.com/m3db/m3/src/dbnode/config/m3dbnode-local-etcd.yml /etc/m3dbnode/m3dbnode.yml
COPY --from=builder /go/src/github.com/m3db/m3/bin/m3dbnode \
/go/src/github.com/m3db/m3/scripts/m3dbnode_bootstrapped.sh \
/bin/

# Use setcap to set +e "effective" and +p "permitted" to adjust the SYS_RESOURCE
# so the process can raise the hard file limit with setrlimit.
RUN apk add --no-cache curl jq libcap && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need curl and jq? I'm fine to leave it out and add on login (i.e. just apk add when you start a shell session), or if you feel we should (which I'm fine with too), should we putting it in the other base image too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need it for M3DB for the m3dbnode_bootstrapped.sh script. When we deprecate that though we can just remove this entirely

setcap cap_sys_resource=+ep /bin/m3dbnode

ENTRYPOINT [ "/bin/m3dbnode" ]
CMD [ "-f", "/etc/m3dbnode/m3dbnode.yml" ]
25 changes: 24 additions & 1 deletion docs/operational_guide/kernel_configuration.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
Kernel Configuration
Docker & Kernel Configuration
====================

This document lists the Kernel tweaks M3DB needs to run well. If you are running on Kubernetes, you may use our
`sysctl-setter` [DaemonSet](https://github.com/m3db/m3/blob/master/kube/sysctl-daemonset.yaml) that will set these
values for you. Please read the comment in that manifest to understand the implications of applying it.

## Running with Docker

When running M3DB inside Docker, it is recommended to add the `SYS_RESOURCE` capability to the container (using the
`--cap-add` argument to `docker run`) so that it can raise its file limits:

```
docker run --cap-add SYS_RESOURCE quay.io/m3/m3dbnode:latest
```

If M3DB is being run as a non-root user, M3's `setcap` images are required:
```
docker run --cap-add SYS_RESOURCE -u 1000:1000 quay.io/m3/m3dbnode:latest-setcap
```

More information on Docker's capability settings can be found [here][docker-caps].

## vm.max_map_count
M3DB uses a lot of mmap-ed files for performance, as a result, you might need to bump `vm.max_map_count`. We suggest setting this value to `3000000`, so you don’t have to come back and debug issues later.

Expand Down Expand Up @@ -62,3 +78,10 @@ Also note that systemd has a `system.conf` file and a `user.conf` file which may
Be sure to check that those files aren't configured with values lower than the value you configure at the service level.

Before running the process make sure the limits are set, if running manually you can raise the limit for the current user with `ulimit -n 3000000`.

## Automatic Limit Raising

During startup, M3DB will attempt to raise its open file limit to the current value of `fs.nr_open`. This is a benign
operation; if it fails M3DB, will simply emit a warning.

[docker-caps]: https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
6 changes: 3 additions & 3 deletions glide.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ pages:
- "Placement/Topology Configuration": "operational_guide/placement_configuration.md"
- "Namespace Configuration": "operational_guide/namespace_configuration.md"
- "Bootstrapping": "operational_guide/bootstrapping.md"
- "Kernel Configuration": "operational_guide/kernel_configuration.md"
- "Docker & Kernel Configuration": "operational_guide/kernel_configuration.md"
- "etcd": "operational_guide/etcd.md"
- "Integrations":
- "Prometheus": "integrations/prometheus.md"
Expand Down
28 changes: 17 additions & 11 deletions src/dbnode/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import (
"path"
"runtime"
"runtime/debug"
"strings"
"time"

clusterclient "github.com/m3db/m3/src/cluster/client"
Expand Down Expand Up @@ -93,6 +94,8 @@ const (
cpuProfileDuration = 5 * time.Second
filePathPrefixLockFile = ".lock"
defaultServiceName = "m3dbnode"
skipRaiseProcessLimitsEnvVar = "SKIP_PROCESS_LIMITS_RAISE"
skipRaiseProcessLimitsEnvVarTrue = "true"
)

// RunOptions provides options for running the server
Expand Down Expand Up @@ -153,17 +156,20 @@ func Run(runOpts RunOptions) {

xconfig.WarnOnDeprecation(cfg, logger)

// Raise fd limits to nr_open system limit
result, err := xos.RaiseProcessNoFileToNROpen()
if err != nil {
logger.Warn("unable to raise rlimit to no file fds limit",
zap.Error(err))
} else {
logger.Info("raised rlimit no file fds limit",
zap.Bool("required", result.RaisePerformed),
zap.Uint64("sysNROpenValue", result.NROpenValue),
zap.Uint64("noFileMaxValue", result.NoFileMaxValue),
zap.Uint64("noFileCurrValue", result.NoFileCurrValue))
// By default attempt to raise process limits, which is a benign operation.
skipRaiseLimits := strings.TrimSpace(os.Getenv(skipRaiseProcessLimitsEnvVar))
if skipRaiseLimits != skipRaiseProcessLimitsEnvVarTrue {
// Raise fd limits to nr_open system limit
result, err := xos.RaiseProcessNoFileToNROpen()
if err != nil {
logger.Warn("unable to raise rlimit", zap.Error(err))
} else {
logger.Info("raised rlimit no file fds limit",
zap.Bool("required", result.RaisePerformed),
zap.Uint64("sysNROpenValue", result.NROpenValue),
zap.Uint64("noFileMaxValue", result.NoFileMaxValue),
zap.Uint64("noFileCurrValue", result.NoFileCurrValue))
}
}

// Parse file and directory modes
Expand Down