Skip to content

Commit

Permalink
bugfix: fixed a race condition in the new conn queuing mechanism.
Browse files Browse the repository at this point in the history
This issue appeared in our EC2 test cluster, which compiles Nginx with
`-DNGX_LUA_USE_ASSERT` and `-DNGX_LUA_ABORT_AT_PANIC`.

The lua-resty-redis test:

    === TEST 1: github issue openresty#108: ngx.locaiton.capture + redis.set_keepalive

in t/bugs.t [1] would produce core dumps in the check leak testing mode.

The backtrace for these core dumps was:

    #0  0x00007fd417bee277 in raise () from /lib64/libc.so.6
    openresty#1  0x00007fd417bef968 in abort () from /lib64/libc.so.6
    openresty#2  0x00007fd417be7096 in __assert_fail_base () from /lib64/libc.so.6
    openresty#3  0x00007fd417be7142 in __assert_fail () from /lib64/libc.so.6
    openresty#4  0x000000000050d227 in ngx_http_lua_socket_tcp_resume_conn_op (spool=c/ngx_http_lua_socket_tcp.c:3963
    openresty#5  0x000000000050e51a in ngx_http_lua_socket_tcp_finalize (r=r@entry=0x5628) at ../../src/ngx_http_lua_socket_tcp.c:4195
    openresty#6  0x000000000050e570 in ngx_http_lua_socket_tcp_cleanup (data=0x7fd419p_lua_socket_tcp.c:3755
    openresty#7  0x0000000000463aa5 in ngx_http_free_request (r=r@entry=0xbfaec0, rc=http_request.c:3508
    ...

Which was caused by the following assertion in ngx_http_lua_socket_tcp.c
with `NGX_DEBUG`:

    #if (NGX_DEBUG)
        ngx_http_lua_assert(spool->connections >= 0);

    #else

Thanks to Mozilla's rr, a recorded session showed that
`spool->connections` was `-1`.

Unfortunately, reproducing this case does not seem possible, since the
failure is due to the request cleanup (`ngx_http_free_request`). Here is
an explanation:

    -- thread 1
    local sock = ngx.socket.tcp()
    sock:connect()
    sock:setkeepalive() -- pool created, connections: 1

        -- thread 2
        local sock = ngx.socket.tcp()
        sock:connect() -- from pool, connections: 1

    -- thread 1
    -- sock from thread 1 idle timeout, closes, and calls
    -- ngx_http_lua_socket_tcp_finalize, connections: 0

        -- thread 2
        sock:setkeepalive() -- connections: -1
        -- ngx_http_lua_socket_tcp_resume_conn_op gets called, assertion fails

In order to avoid this race condition, we must determine whether the
socket pool exists or not, not from the
`ngx_http_lua_socket_tcp_upstream` struct, but from the Lua Registry.
This way, when thread 2's socket enters the keepalive state, it will
respect the previous call to `ngx_http_lua_socket_free_pool` (which
unset the pool from the registry).

[1]: https://github.com/openresty/lua-resty-redis/blob/master/t/bugs.t
  • Loading branch information
thibaultcha committed Feb 15, 2019
1 parent fd90f4e commit 71ed993
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions src/ngx_http_lua_socket_tcp.c
Original file line number Diff line number Diff line change
Expand Up @@ -4132,6 +4132,7 @@ static void
ngx_http_lua_socket_tcp_finalize(ngx_http_request_t *r,
ngx_http_lua_socket_tcp_upstream_t *u)
{
lua_State *L;
ngx_connection_t *c;
ngx_http_lua_socket_pool_t *spool;

Expand Down Expand Up @@ -4185,6 +4186,19 @@ ngx_http_lua_socket_tcp_finalize(ngx_http_request_t *r,
return;
}

L = spool->lua_vm;

lua_pushlightuserdata(L, ngx_http_lua_lightudata_mask(socket_pool_key));
lua_rawget(L, LUA_REGISTRYINDEX);
lua_pushstring(L, (char *) spool->key);
lua_rawget(L, -2);
spool = lua_touserdata(L, -1);
lua_pop(L, 1);

if (spool == NULL) {
return;
}

spool->connections--;

if (spool->connections == 0) {
Expand Down

0 comments on commit 71ed993

Please sign in to comment.