Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eventing pods request much more CPU and memory than they need #17168

Closed
2 tasks done
pbochynski opened this issue Mar 24, 2023 · 9 comments · Fixed by #18006
Closed
2 tasks done

Eventing pods request much more CPU and memory than they need #17168

pbochynski opened this issue Mar 24, 2023 · 9 comments · Fixed by #18006
Assignees
Labels
area/eventing Issues or PRs related to eventing

Comments

@pbochynski
Copy link
Contributor

pbochynski commented Mar 24, 2023

Description

The default installation of eventing requests a lot of resources:

  • publisher proxy: 410m CPU and 448Mi memory x 2 pods
  • controller: 410m CPU and 704Mi memory x 1 pod
  • nats: 400m CPU and 576Mi memory x 3 pods

What gives together: 2,43 CPU and 3,33 GB memory
The actual usage of eventing pods in the idle cluster is: 0.015 CPU and 0.35 GB

Could you adjust requests settings to allow better cluster resources utilization, please?

Acceptance

  • combine settings used in MPS-Config and kyma/production profile and make them as low as possible
  • validate using loadtester that with minimal setting eventing still comes up and is able to run a low workload (10eps)
@pbochynski pbochynski assigned pbochynski and k15r and unassigned pbochynski Mar 24, 2023
@muralov muralov added the area/eventing Issues or PRs related to eventing label Mar 24, 2023
@kyma-bot
Copy link
Contributor

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 23, 2023
@pbochynski
Copy link
Contributor Author

Still valid.

@pbochynski pbochynski removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2023
@kyma-bot
Copy link
Contributor

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2023
@kyma-bot
Copy link
Contributor

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

@kyma-bot
Copy link
Contributor

@kyma-bot: Closing this issue.

In response to this:

This issue or PR has been automatically closed due to the lack of activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle stale

If you think that I work incorrectly, kindly raise an issue with the problem.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k15r k15r removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2023
@k15r k15r reopened this Aug 3, 2023
@k15r
Copy link
Contributor

k15r commented Aug 3, 2023

production profile states:

controller:
  resources:
    limits:
      cpu: 1000m
      memory: 512Mi
    requests:
      cpu: 10m
      memory: 256Mi
  publisherProxy:
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 10m
        memory: 256Mi

nats:
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 10m
        memory: 512Mi
    logging:
      debug: false
      trace: false

this results in

requests:
  cpu: 70m
  memory: 2.5Gi 

on skrs the current configuration states:

  controller.publisherProxy.replicas: "2"
  controller.publisherProxy.resources.limits.cpu: "500m"
  controller.publisherProxy.resources.limits.memory: "512Mi"
  controller.publisherProxy.resources.requests.cpu: "100m"
  controller.publisherProxy.resources.requests.memory: "64Mi"

  controller.resources.limits.cpu: "500m"
  controller.resources.limits.memory: "1Gi"
  controller.resources.requests.cpu: "100m"
  controller.resources.requests.memory: "64Mi"

  nats.cluster.replicas: "3"
  nats.nats.resources.limits.cpu: "500m"
  nats.nats.resources.limits.memory: "1Gi"
  nats.nats.resources.requests.cpu: "400m"
  nats.nats.resources.requests.memory: "512Mi"

that results in:

requests:
  cpu: 1.5
  memory: 1728Mi

CPU-requests can safely be decreased for all 3 deployments
Memory should also be decreased for EPP and EC

@mfaizanse
Copy link
Member

@k15r There is a PR, which is even increasing the request resources for eventing.

@marcobebway marcobebway assigned marcobebway and unassigned k15r Aug 16, 2023
@marcobebway marcobebway linked a pull request Aug 18, 2023 that will close this issue
@marcobebway
Copy link
Contributor

On SKRs, we reduced the Eventing requested resources as follows:

EC(1 replica):

  • CPU: 40m
  • MEM: 64Mi

EPP(sum of 2 replicas):

  • CPU: 80m (2 * 40m)
  • MEM: 128Mi (2 * 64Mi)

NATS(sum of 3 replicas):

  • CPU: 120m (3 * 40m)
  • MEM: 192Mi (3 * 64Mi)

Total:

  • CPU: 240m (6 * 40m)
  • MEM: 384Mi (6 * 64Mi)

@marcobebway
Copy link
Contributor

OSS Production:

controller:
jetstream:
retentionPolicy: interest
streamReplicas: 3
consumerDeliverPolicy: new
maxMessages: -1
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 10m
memory: 64Mi
publisherProxy:
replicas: 1
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 10m
memory: 64Mi
nats:
cluster:
enabled: true
replicas: 3
reloader:
enabled: false
nats:
jetstream:
memStorage:
enabled: true
size: 1Gi
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 10m
memory: 64Mi
logging:
debug: false
trace: false

OSS Evaluation

controller:
jetstream:
retentionPolicy: interest
streamReplicas: 1
consumerDeliverPolicy: new
maxMessages: -1
resources:
limits:
cpu: 20m
memory: 256Mi
requests:
cpu: 1m
memory: 32Mi
publisherProxy:
replicas: 1
resources:
limits:
cpu: 10m
memory: 32Mi
requests:
cpu: 1m
memory: 16Mi
nats:
cluster:
enabled: false
replicas: 1
reloader:
enabled: false
nats:
jetstream:
memStorage:
enabled: true
size: 64Mi
resources:
limits:
cpu: 20m
memory: 64Mi
requests:
cpu: 1m
memory: 16Mi
logging:
debug: true
trace: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/eventing Issues or PRs related to eventing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants