Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bugfix: fixed a race condition in the new conn queuing mechanism.
This issue appeared in our EC2 test cluster, which compiles Nginx with `-DNGX_LUA_USE_ASSERT` and `-DNGX_LUA_ABORT_AT_PANIC`. The lua-resty-redis test: === TEST 1: github issue openresty#108: ngx.locaiton.capture + redis.set_keepalive in t/bugs.t [1] would produce core dumps in the check leak testing mode. The backtrace for these core dumps was: #0 0x00007fd417bee277 in raise () from /lib64/libc.so.6 openresty#1 0x00007fd417bef968 in abort () from /lib64/libc.so.6 openresty#2 0x00007fd417be7096 in __assert_fail_base () from /lib64/libc.so.6 openresty#3 0x00007fd417be7142 in __assert_fail () from /lib64/libc.so.6 openresty#4 0x000000000050d227 in ngx_http_lua_socket_tcp_resume_conn_op (spool=c/ngx_http_lua_socket_tcp.c:3963 openresty#5 0x000000000050e51a in ngx_http_lua_socket_tcp_finalize (r=r@entry=0x5628) at ../../src/ngx_http_lua_socket_tcp.c:4195 openresty#6 0x000000000050e570 in ngx_http_lua_socket_tcp_cleanup (data=0x7fd419p_lua_socket_tcp.c:3755 openresty#7 0x0000000000463aa5 in ngx_http_free_request (r=r@entry=0xbfaec0, rc=http_request.c:3508 ... Which was caused by the following assertion in ngx_http_lua_socket_tcp.c with `NGX_DEBUG`: #if (NGX_DEBUG) ngx_http_lua_assert(spool->connections >= 0); #else Thanks to Mozilla's rr, a recorded session showed that `spool->connections` was `-1`. Unfortunately, reproducing this case does not seem possible, since the failure is due to the request cleanup (`ngx_http_free_request`). Here is an explanation: -- thread 1 local sock = ngx.socket.tcp() sock:connect() sock:setkeepalive() -- pool created, connections: 1 -- thread 2 local sock = ngx.socket.tcp() sock:connect() -- from pool, connections: 1 -- thread 1 -- sock from thread 1 idle timeout, closes, and calls -- ngx_http_lua_socket_tcp_finalize, connections: 0 -- thread 2 sock:setkeepalive() -- connections: -1 -- ngx_http_lua_socket_tcp_resume_conn_op gets called, assertion fails In order to avoid this race condition, we must determine whether the socket pool exists or not, not from the `ngx_http_lua_socket_tcp_upstream` struct, but from the Lua Registry. This way, when thread 2's socket enters the keepalive state, it will respect the previous call to `ngx_http_lua_socket_free_pool` (which unset the pool from the registry). [1]: https://github.com/openresty/lua-resty-redis/blob/master/t/bugs.t
- Loading branch information