-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oidcc_provider_configuration_worker:get_provider_configuration/1 randomly times out #359
Comments
@asabil Hm, I possibly haven’t set any timeouts which will hang it forever. I‘m open to a PR or I’ll investigate this next week. |
Added some printout in
So it looks like we somehow end up being blocked by auth0 because the number of For reference, this is the print out code: io:format("[~B], HTTP: ~p ~p ~p ~p~n", [
os:system_time(millisecond),
Method,
Request,
HttpOpts,
HttpProfile
]), |
Hm, are they setting a strange cache header that leads to continuous refreshing? |
Looks like this helps at least with the accumlation of timers: diff --git a/src/oidcc_provider_configuration_worker.erl b/src/oidcc_provider_configuration_worker.erl
index d4539ec..a9cefa1 100644
--- a/src/oidcc_provider_configuration_worker.erl
+++ b/src/oidcc_provider_configuration_worker.erl
@@ -224,7 +224,8 @@ handle_continue(
%% @private
handle_info(backoff_retry, State) ->
{noreply, State, {continue, load_configuration}};
-handle_info(configuration_expired, State) ->
+handle_info(configuration_expired, #state{jwks_refresh_timer = JwksRefreshTimer} = State) ->
+ maybe_cancel_timer(JwksRefreshTimer),
{noreply, State#state{configuration_refresh_timer = undefined, jwks_refresh_timer = undefined},
{continue, load_configuration}};
handle_info(jwks_expired, State) -> That being said, from my own experience, I think that it is better to use For example, with the current code I think it is still possible to have a race condition where the timer being canceled has already been fired and the message corresponding to the timeout is in the mailbox but not yet processed. |
Yes, 15 seconds:
|
@asabil Sorry for the long wait. I can't reproduce the issue. There should only be one single message triggered after the jwk reload is successful. You are seeing hundreds of messages in the queue. Can you try to log each time |
Sorry for the delay, finally got to add some logs:
I added empty lines to group together the printouts. As you can see we somehow end up with accumulating timers that lead to excessive calls. The qlen is 0, because we will only start to see that once auth0 (in my case) starts to apply rate limiting because of the excessive calls. I still need to test the patch I posted above extensively, but it should solve the problem. |
With the patch above applied, the logs look like this:
|
@asabil That looks promising. I’ll go ahead and merge the PR then. Thanks for debugging. |
Which PR? |
Just looked at the PR, that will not work. The fix is the one I posted in the comments above.
|
@asabil Oh, I misinterpreted your message. I'll revert the PR. I believe you've pinpointed the exact issue though: The timer was cleared in the handle_info and so the timer cancellation can never work. I'll revert and open a new PR to test. |
@asabil Does the new PR work for you? |
oidcc version
3.2.0
Erlang version
26.2.5
Elixir version
Summary
calls to
oidcc_provider_configuration_worker:get_provider_configuration/1
randomly times out. It seems that this is somehow related to httpc timing out?The following crash report is reported by cowboy:
Current behavior
Inspecting the worker state shows that it is hanging in
httpc:handle_answer/3
:Also, it seems like the worker process has accumulated a set of timer events:
Checking the stacktrace:
How to reproduce
Haven't been able to pinpoint the exact scenario leading to the hang/timeout yet.
Expected behavior
It shouldn't hang :)
The text was updated successfully, but these errors were encountered: