-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite recursion with the stderr wrapper thing #844
Comments
somehow i'm also getting segfaults and other memory corruption in multithreaded code in 0.11.5 and not 0.11.3 . Is there a less stompy way to suppress the error? |
Eek! That's not good. Perhaps we could arrange to replace stderr for only the main thread? That would cover the main use-case (of hiding the noisy default printout from Absent a MWE then I'd be happy to take a PR on this, if you can identify such a fix. |
Mostly notes for me (or @hawkinsp said he could maybe take a look). I have isolated it to 2188e01, which isn't surprising. The newer variant (with flush, doesn't fix it, except that it makes wandb happier) A non-minimal, but reasonably reliable, reproducer of a crash is here: https://github.com/stanford-crfm/levanter/tree/reproduce_eqx_crash Running That branch separates the crash from tensorstore, ray, and wandb, which were my other suspects. I'll think on the right way to address in Equinox. My suspicion is a judicious use of thread locals is in order. |
Hey @patrick-kidger what's the code path that makes the _FilteredStderr thing useful? I can't find any tests that fail if i turn it off or even a test that calls write to stderr inside filter_jit |
(in general this approach is clearly not threadsafe, though I still don't understand why it explodes.) |
All good questions! I've just poked at this more carefully and noticed that really what's going on is that JAX is making a Take a look at #849 -- how does this behave for you? Notably I've not tried doing anything with threads just yet, we may need to tweak this further. |
OK, it's of course hard to know with multithreading issues, but it's not crashing with that branch. Thank you for the quick fix! |
Sometimes 'not crashing' is the highest ideal we can aspire to in this industry. Glad that works! We've picked up a few such bugfixes now so I'll do a new release shortly. |
Don't have a minimization yet, but it affects 0.11.5. Doesn't seem to be deterministic...
The text was updated successfully, but these errors were encountered: