Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] CPU throttling #1688

Open
lschmidtcavalcante-sc opened this issue Feb 7, 2025 · 2 comments
Open

[NEW] CPU throttling #1688

lschmidtcavalcante-sc opened this issue Feb 7, 2025 · 2 comments

Comments

@lschmidtcavalcante-sc
Copy link

The problem/use-case that the feature addresses

The goal is to prevent node loss due to heavy load on CPU. For example, due to a sudden traffic change the CPU can become overloaded putting the cluster's nodes at risk.

Description of the feature

Periodically monitor the CPU usage, if it is above a given threshold then respond to commands as THROTTLED: it alleviates CPU usage by not processing the command and gives the client an opportunity to act accordingly.

@xbasel
Copy link
Member

xbasel commented Feb 9, 2025

If the concern is that high CPU usage puts cluster nodes at risk (and even generally preventing ADMIN clients from time constrained operations), would a more aggressive cluster management approach be a better solution? For example, implementing QoS at the ae-loop level could prioritize critical operations and prevent overload more effectively. Also, if the engine returns THROTTLED, clients might abuse it by resending or reconnecting in a busy loop, potentially making the problem worse. Would blocking clients for some time be a better alternative?"

@sarthakaggarwal97
Copy link
Contributor

Also, if the engine returns THROTTLED, clients might abuse it by resending or reconnecting in a busy loop, potentially making the problem worse. Would blocking clients for some time be a better alternative?"

Sending THROTTLED seems synonymous to HTTP 429s, which is a common practice. This way clients can be allowed to have their retry mechanisms in place imo. Although, blocking a client till the duress reduces could be a self healing action by valkey engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants