Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU in shared dict :get() #2393

Open
kkmuffme opened this issue Feb 8, 2025 · 1 comment
Open

100% CPU in shared dict :get() #2393

kkmuffme opened this issue Feb 8, 2025 · 1 comment

Comments

@kkmuffme
Copy link

kkmuffme commented Feb 8, 2025

when using a shared dict where all nginx workers repeatedly :get() the same 1-2 keys with a high number of simultaenous requests, some nginx processes seem to get stuck in a deadlock caused by locking causing 100% CPU load even when the number of requests has subsided already (all still existing requests are idle keepalive).

stub_status

Active connections: 102
server accepts handled requests
39308 39308 339293
Reading: 0 Writing: 287 Waiting: 93

strace shows (however it takes up to a minute for this to show/add another line, which is a further indicator this is not in nginx but in userland lua)

futex(0x7f7ad4c37080, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)

https://mailman.nginx.org/pipermail/nginx/2017-September/054687.html reports a similar issue in nginx, however nginx doesn't natively use mutex and the issue could clearly be traced back to lua code

Checking the nginx processes with pstack

#0 0x0000000000438c86 in ngx_shmtx_lock (mtx=0x7f7ad4c37068) at src/core/ngx_shmtx.c:86
#1 0x0000000000527937 in ngx_http_lua_ffi_shdict_get (zone=0xbd5cc0, key=0x7f7ace894b60 "REDACTED-2", key_len=12, value_type=0x7f7ad4099928, str_value_buf=0x7f7ad409c340, str_value_len=0x7f7ad40b8850, num_value=0x7f7ad4093b40, user_flags=0x7f7ad4093b20, get_stale=0, is_stale=0x7f7ad409c300, err=0x7f7ad40bc318) at ../ngx_lua-0.10.26/src/ngx_http_lua_shdict.c:1593
#2 0x00007f7ad635dd19 in ?? ()
#3 0x00007f7ad4093b40 in ?? ()
#4 0x00007f7ad4093b20 in ?? ()
#5 0x0000000000000000 in ?? ()

https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L1568 (-> nginx https://github.com/nginx/nginx/blob/master/src/core/ngx_shmtx.c#L70C1-L70C15) shows that "get" creates a lock.

Is the lock for "get" really necessary? Is there a way to disable it?
Any ideas what could caus this? Is it possibly not related to the :get() at all?

@kkmuffme
Copy link
Author

kkmuffme commented Feb 8, 2025

Possibly similar to #1207 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant