Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% cpu #278

Closed
cs0604 opened this issue Sep 6, 2013 · 11 comments
Closed

100% cpu #278

cs0604 opened this issue Sep 6, 2013 · 11 comments

Comments

@cs0604
Copy link

cs0604 commented Sep 6, 2013

Hello, I found a strange problem that the cpu usage is too high in several workers. Here is the gdb info:
#0 0x00002b6a94401183 in ?? () from /usr/lib/libluajit-5.1.so.2
#1 0x00002b6a94401415 in lua_getstack () from /usr/lib/libluajit-5.1.so.2
#2 0x000000000053fb19 in ngx_http_lua_thread_traceback (L=0x40f22378, co=0x7ef24628, coctx=0xbcdbfd0)

at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_util.c:2682

#3 0x000000000053d176 in ngx_http_lua_run_thread (L=0x40f22378, r=0xbcdaf60, ctx=0xbcdbfb8, nrets=1)

at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_util.c:1330

#4 0x0000000000526dc9 in ngx_http_lua_subrequest_resume (r=0xbcdaf60)

at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_subrequest.c:1576

#5 0x0000000000541f5e in ngx_http_lua_content_wev_handler (r=0xbcdaf60)

at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_contentby.c:131

#6 0x0000000000499caa in ngx_http_run_posted_requests (c=0x2b6a94d32250) at src/http/ngx_http_request.c:2170
#7 0x00000000004b1fa3 in ngx_http_upstream_handler (ev=0x2b6a956931b0) at src/http/ngx_http_upstream.c:972
#8 0x000000000046f860 in ngx_event_process_posted (cycle=0xbbdfdd0, posted=0x9b3088) at src/event/ngx_event_posted.c:40
#9 0x000000000046d3a9 in ngx_process_events_and_timers (cycle=0xbbdfdd0) at src/event/ngx_event.c:276
#10 0x000000000047a59c in ngx_worker_process_cycle (cycle=0xbbdfdd0, data=0x7) at src/os/unix/ngx_process_cycle.c:807
#11 0x0000000000477185 in ngx_spawn_process (cycle=0xbbdfdd0, proc=0x47a3e9 <ngx_worker_process_cycle>, data=0x7, name=0x6d11be "worker process", respawn=-4)

at src/os/unix/ngx_process.c:198

#12 0x0000000000479477 in ngx_start_worker_processes (cycle=0xbbdfdd0, n=8, type=-4) at src/os/unix/ngx_process_cycle.c:362
#13 0x0000000000479075 in ngx_master_process_cycle (cycle=0xbbdfdd0) at src/os/unix/ngx_process_cycle.c:249
#14 0x0000000000448a19 in main (argc=3, argv=0x7fffaa3edea8) at src/core/nginx.c:412

I can't find any clue to solve the problem,please help me ...
I use the nginx_lua v0.8.3 , and I can't reproduce the problem ,but when the nginx serve many clients for several hours , it comes again.

@agentzh
Copy link
Member

agentzh commented Sep 7, 2013

Thank you for the report!

Assuming you're on Linux, could you please generate a C-land flamegraph for the nginx worker process spinning at 100% CPU usage? See

https://github.com/agentzh/nginx-systemtap-toolkit#ngx-sample-bt

Also, please use the gdb command "bt full" instead of "bt" to get the backtrace when you're in gdb, so that we can have more context information :)

Thanks!

@agentzh
Copy link
Member

agentzh commented Sep 7, 2013

BTW, is there anything interesting in your nginx's error log file? If the file is empty, please check if you configure the error log filtering level too high in the "error_log" directive in your nginx.conf. Thanks!

@cs0604
Copy link
Author

cs0604 commented Sep 9, 2013

hi,agentzh! Here is the svg file,

qq20130909175643

and the bt full info is:

#0 0x00002aee1279a1a6 in ?? () from /usr/lib/libluajit-5.1.so.2
No symbol table info available.
#1 0x00002aee1279a415 in lua_getstack () from /usr/lib/libluajit-5.1.so.2
No symbol table info available.
#2 0x000000000053fb19 in ngx_http_lua_thread_traceback (L=0x41227378, co=0x7fd5b7a8, coctx=0x6042540)
at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_util.c:2682
base = 0
level = 1
count = 1
firstpart = 1
ar = {event = 0, name = 0x3b3700cf65 "I\211àd\213\004%\030", namewhat = 0x1 <Address 0x1 out of bounds>, what = 0x0, source = 0x7fd5d938 "¨·?\177\005",
currentline = 309959577, nups = 10990, linedefined = 309901048, lastlinedefined = 10990,
short_src = "@%\004\006\000\000\000\000\020Ju_?\177\000\000a)\001\067;\000\000\000@%\004\006\000\000\000\000\bIu_?\177\000\000\000\062TIJAUL????\000\000\000\000¨·?\177", i_ci = 0}
#3 0x000000000053d176 in ngx_http_lua_run_thread (L=0x41227378, r=0x5fc5d60, ctx=0x6042528, nrets=1)
at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_util.c:1330
next_coctx = 0x60306e2
parent_coctx = 0x61de2e8
orig_coctx = 0x6042540
rv = 2
success = 1
next_co = 0x7fff5f754a20
old_co = 0x0
err = 0x6e53ce "runtime error"
msg = 0x7fdd6ad8 "attempt to call a number value"
trace = 0x60304b0 "è"
rc = 100424976
old_pool = 0x0
#4 0x0000000000526dc9 in ngx_http_lua_subrequest_resume (r=0x5fc5d60)
at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_subrequest.c:1576
rc = 100870816
c = 0x2aee1320a550
ctx = 0x6042528
coctx = 0x6042540
lmcf = 0x5f69278
#5 0x0000000000541f5e in ngx_http_lua_content_wev_handler (r=0x5fc5d60)
at /search/nginx_build/BUILD/nginx-sogou-lua/stable/20130627160733_18270/compile/modules/ngx_lua.d8aa2fe/src/ngx_http_lua_contentby.c:131
ctx = 0x6042528
#6 0x0000000000499caa in ngx_http_run_posted_requests (c=0x2aee1320a550) at src/http/ngx_http_request.c:2170
r = 0x5fc5d60
ctx = 0x6049238
pr = 0x61de2e0
#7 0x00000000004b1fa3 in ngx_http_upstream_handler (ev=0x2aee13b6b350) at src/http/ngx_http_upstream.c:972
c = 0x2aee1320a550
r = 0x6031a90
ctx = 0x6049238
u = 0x6032aa0
#8 0x000000000046f860 in ngx_event_process_posted (cycle=0x5f5f120, posted=0x9b3088) at src/event/ngx_event_posted.c:40
ev = 0x2aee13b6b350
#9 0x000000000046d3a9 in ngx_process_events_and_timers (cycle=0x5f5f120) at src/event/ngx_event.c:276
flags = 3
timer = 811
delta = 0
#10 0x000000000047a59c in ngx_worker_process_cycle (cycle=0x5f5f120, data=0x5) at src/os/unix/ngx_process_cycle.c:807
worker = 5
i = 140734794911936
#11 0x0000000000477185 in ngx_spawn_process (cycle=0x5f5f120, proc=0x47a3e9 <ngx_worker_process_cycle>, data=0x5, name=0x6d11be "worker process", respawn=-3)
at src/os/unix/ngx_process.c:198
on = 1
pid = 0
s = 5
#12 0x0000000000479477 in ngx_start_worker_processes (cycle=0x5f5f120, n=8, type=-3) at src/os/unix/ngx_process_cycle.c:362
i = 5
ch = {command = 1, pid = 2651, slot = 4, fd = 17}
#13 0x0000000000478b42 in ngx_master_process_cycle (cycle=0x5f5f120) at src/os/unix/ngx_process_cycle.c:136
title = 0x5f984f4 "master process /usr/local/nginx/sbin/nginx"
p = 0x5f9851e ""
size = 43
i = 1
n = 0
sigio = 100004152
set = {__val = {0 <repeats 16 times>}}
itv = {it_interval = {tv_sec = 0, tv_usec = 9966888}, it_value = {tv_sec = 10, tv_usec = 31}}
live = 0
delay = 0
ls = 0x5
ccf = 0x5f60108
#14 0x0000000000448a19 in main (argc=1, argv=0x7fff5f7550c8) at src/core/nginx.c:412
i = 73
log = 0x9a9fc0
cycle = 0x5f5f120
init_cycle = {conf_ctx = 0x0, pool = 0x5f5e860, log = 0x9a9fc0, new_log = {log_level = 0, file = 0x0, connection = 0, handler = 0, data = 0x0, action = 0x0},
files = 0x0, free_connections = 0x0, free_connection_n = 0, reusable_connections_queue = {prev = 0x0, next = 0x0}, listening = {elts = 0x0, nelts = 0, size = 0,
nalloc = 0, pool = 0x0}, paths = {elts = 0x0, nelts = 0, size = 0, nalloc = 0, pool = 0x0}, open_files = {last = 0x0, part = {elts = 0x0, nelts = 0, next = 0x0},
size = 0, nalloc = 0, pool = 0x0}, shared_memory = {last = 0x0, part = {elts = 0x0, nelts = 0, next = 0x0}, size = 0, nalloc = 0, pool = 0x0}, connection_n = 0,
files_n = 0, connections = 0x0, read_events = 0x0, write_events = 0x0, old_cycle = 0x0, conf_file = {len = 32, data = 0x5f5e8b0 "á?\002\t?"}, conf_param = {len = 0,
data = 0x0}, conf_prefix = {len = 22, data = 0x5f5e8b0 "á?\002\t?"}, prefix = {len = 17, data = 0x6cd4e6 "/usr/local/nginx/"}, lock_file = {len = 0, data = 0x0},
hostname = {len = 0, data = 0x0}}
ccf = 0x5f60108

@cs0604
Copy link
Author

cs0604 commented Sep 9, 2013

emm, I saw "nginx: lua atpanic: Lua VM crashed, reason: not enough memory" in my error_log file. BTW, when the nginx worker process spinning at 100% CPU usage, it's memory usage is 1gb.

@agentzh
Copy link
Member

agentzh commented Sep 9, 2013

@cs0604 Okay, you're having memory leaks in your Lua VM and your Lua GC eventually hits the 1GB limit on Linux.

Because there've been quite a few important memory leak bugfixes since the ngx_lua 0.8.3 release, could you please try upgrading your ngx_lua to the latest 0.8.8 version? (or you can just upgrade to ngx_openresty 1.4.2.5 if you're using the bundle). If the problem persists, you should check if your own Lua code leaks memory.

@cs0604
Copy link
Author

cs0604 commented Sep 14, 2013

I upgrade the ngx_lua to the latest 0.8.8 version, but the problem still exists. How can I find what takes up memory in Lua VM ?
BTW, maybe nginx-lua is not suitable for my application, because I find that when a single request post 4mb data, my nginx-lua uses 200mb memory to deal with it ! Then, when the request is finished, the memory usage goes down.

@agentzh
Copy link
Member

agentzh commented Sep 14, 2013

Hello!

On Fri, Sep 13, 2013 at 10:47 PM, cs0604 wrote:

I upgrade the ngx_lua to the latest 0.8.8 version, but the problem still exists. How can I find what takes up memory in Lua VM ?

You can try using the luatraverse.lua script to inspect your Lua memory usage:

http://code.matthewwild.co.uk/luatraverse/file/cefa0be5f567

BTW, maybe nginx-lua is not suitable for my application, because I find that when a single request post 4mb data, my nginx-lua uses 200mb memory to deal with it !
Then, when the request is finished, the memory usage goes down.

To me it looks like some programming mistakes in your Lua code :)
Maybe you're just handling big POST data in a wrong way. Because you
didn't provide any Lua code, I cannot really give any suggestions.
Ideally you can try to provide a minimal but still complete example
that can reproduce the problem that you see.

For the reference, you can take a look at the lua-resty-upload module
for an example of handling potentially big request body stream with a
constant and small memory foot print:

https://github.com/agentzh/lua-resty-upload

Basically you should avoid reading all the incoming big data into the
memory at once. You should try to read and process the data in chunks
(or in other words, in a streaming processing manner).

Regards,
-agentzh

@agentzh
Copy link
Member

agentzh commented Sep 25, 2013

@cs0604 Any updates on this issue?

@cs0604
Copy link
Author

cs0604 commented Sep 26, 2013

@agentzh It seems ok when I add "collectgarbage('collect')" at the end of the code , and the problem doesn't recur. Thank you for your suggestions, but it's hard to deal with request body in streaming mode ,because the body is encoded by protobuf, and the real content in the protobuf is encoded by aes algorithm ...
Well, now the problem looks like to be resolved . But I still worry about the 1GB memory limitation , maybe I need to increase the number of nginx workers when the concurrency goes up?

@agentzh
Copy link
Member

agentzh commented Sep 26, 2013

@cs0604 Forcing a full GC cycle is expensive but it seems that you're just allocating too much in the LuaJIT GC. If you really want to handle big request bodies, you should make your data format itself streaming.

Anyway, you can work around the 1GB limit by spawning more Nginx workers and/or use the following library to increase the limit to 4GB on x86_64:

https://github.com/Neopallium/mmap_lowmem

-agentzh

@agentzh
Copy link
Member

agentzh commented Sep 26, 2013

Anyway, I'm closing this. The upcoming LuaJIT 3.0's new GC will probably make your existing Lua code doing full buffered processing behave better in the future. Still, I believe you should optimize your data format and processing algorithm.

@agentzh agentzh closed this as completed Sep 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants