-
-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatal runtime error: stack overflow #1140
Comments
I tested on my local machine hashing 300 000 files, but not had any problems(it was just slow) How much cores you have in cpu and which OS do you use? |
I tried again multiple times with smaller file counts, like 1600 or so and had the same issue. I think it may have something to do with the gradient hashing versions. If I switch to blockhash it seems to do better in testing. Im running on unraid which is based on slackware. Its in a docker container from this repo https://hub.docker.com/r/jlesage/czkawka/ |
I just tested blockhash on 8k photos and it crashed again.
|
Can you somehow add By default most of linux distros have 8MB of stack which should be enough for this app, but slackware is quite old distribution and can have different limits(Looks that can have 1MB of stack size - https://slackwiki.com/Resource_Limits). How much CPU/threads have server? |
Unraid is a custom build, it uses an up to date kernel, but I dont know much about its inner workings. root@NAS:~# ulimit -s 14 cores, 28 threads [supervisor ] loading service 'openbox'... |
In hash_images function I cannot find any place that could use more than few kilobytes of stack, so I don't know why stack overflows. Limiting used cores is probably the easiest workaround(I have 8 threads and never had similar problems, but I think that 15/20 should also works fine - but this needs to be tested). |
I'll try that now |
Note that this version of Czkawka is compiled against musl, instead of glibc. The thread stack size allocated by musl is 128K by default, which is small compared to few MB by glibc (https://wiki.musl-libc.org/functional-differences-from-glibc.html). |
Sorry this is beyond my comprehension. Is there a fix? |
The comment was for @qarmin, so he can see if currently Czkawka could approach the thread stack size limit of musl. |
@docwisdom, to see if it's a stack size issue, could you try to run the following commands inside the container? This will increase the default stack size to 1MB.
Then restart the container and see if it's crashing again. If it does, you can try to increase to 8MB:
|
Thanks for this. I tried both 1mb and 8mb settings and still had it crash at the end of the hashing |
so it is possible that thread stack size was set here and that is why this not worked(main thread in gui is not responsible for heavy calculation). I already tried to set stack size in rayon with https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.stack_size to 1 byte to see crash, but everything worked fine, so not sure where problem can be. I tried to debug stack size with https://crates.io/crates/cargo-call-stack, but looks that it is not possible due several crashes and I don't know which other tool I could use to debug this problem. |
In #1102 I changed some stack size values which may fix problem, but for me this values just works, so I cannot test if this will fix problem:
|
Will this be an upcomming release? |
Yes, even now binaries to test are available here - https://github.com/qarmin/czkawka/actions/runs/6992056327, but since they are build with glibc not with musl, by running it, it is impossible to tell if the problem has been fixed |
I've reproduced the stack overflow error. I'm currently testing a version that sets the stack size in rayon. I will let you know about the result. |
Thank you |
Finally, I don't seem to be able to reproduce in a consistent manner. |
Ive done 3 test batches so far (3-6k each) and no crashes. I think this may have resolved the issue. I am going to do a larger batch this morning. |
Ran 160,000 photo comparisons using 32 gradient and it completed successfully. I would consider the issue resolved. Thank you. |
@qarmin, this is the patch that @docwisdom tested: --- a/czkawka_core/src/common.rs 2023-11-24 14:45:40.462095198 -0500
+++ b/czkawka_core/src/common.rs 2023-11-24 14:47:29.678337169 -0500
@@ -76,7 +76,7 @@
pub fn set_number_of_threads(thread_number: usize) {
NUMBER_OF_THREADS.set(thread_number);
- rayon::ThreadPoolBuilder::new().num_threads(get_number_of_threads()).build_global().unwrap();
+ rayon::ThreadPoolBuilder::new().num_threads(get_number_of_threads()).stack_size(8*1024*1204).build_global().unwrap();
}
pub const RAW_IMAGE_EXTENSIONS: &[&str] = &[ Do you want to integrate the change yourself or you want me to create a PR ? |
I already added a little different limits - #1140 (comment) |
Ok yes, in this PR, 4MB ( |
tested on 3900 photos, no issues |
@jlesage @qarmin hey, is this rolled up into the available docker image too? EDIT: I went to dockerhub and saw that the "latest" tag is not in fact the lastest. There is an image with the tag 1140-2, which was commited around the same time as this issue was closed. I tried to use this image instead, and thus far (on smaller tests with 3-5k images which also caused the stackoverflow in the "latest" image) it's been working. EDIT2: Still causes a stack overflow with 11k pictures. |
The latest version of Czkawka doesn't have the fix. The next version should include it.
This was a non-official image to test a potential fix.
Can you try |
[xvnc ] Tue Nov 21 14:22:35 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[xvnc ] Tue Nov 21 14:27:46 2023
[xvnc ] VNCSConnST: closing /tmp/vnc.sock: Clean disconnection
[xvnc ] EncodeManager: Framebuffer updates: 1523
[xvnc ] EncodeManager: Tight:
[xvnc ] EncodeManager: Solid: 34 rects, 1.23945 Mpixels
[xvnc ] EncodeManager: 544 B (1:9114.32 ratio)
[xvnc ] EncodeManager: Bitmap RLE: 18 rects, 13.809 kpixels
[xvnc ] EncodeManager: 582 B (1:95.2784 ratio)
[xvnc ] EncodeManager: Indexed RLE: 2.615 krects, 429.034 kpixels
[xvnc ] EncodeManager: 409.938 KiB (1:4.16297 ratio)
[xvnc ] EncodeManager: Tight (JPEG):
[xvnc ] EncodeManager: Full Colour: 1.622 krects, 2.05636 Mpixels
[xvnc ] EncodeManager: 3.23682 MiB (1:2.42922 ratio)
[xvnc ] EncodeManager: Total: 4.289 krects, 3.73865 Mpixels
[xvnc ] EncodeManager: 3.63822 MiB (1:3.93348 ratio)
[xvnc ] Connections: closed: /tmp/vnc.sock
[xvnc ] ComparingUpdateTracker: 24.3215 Mpixels in / 1.10949 Mpixels out
[xvnc ] ComparingUpdateTracker: (1:21.9214 ratio)
[xvnc ] Tue Nov 21 14:38:02 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] Tue Nov 21 14:38:03 2023
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[app ] thread '' has overflowed its stack
[app ] fatal runtime error: stack overflow
[supervisor ] service 'app' exited (got signal SIGABRT).
[supervisor ] service 'app' exited, shutting down...
[supervisor ] stopping service 'openbox'...
[supervisor ] service 'openbox' exited (with status 0).
[supervisor ] stopping service 'nginx'...
Bug Description
When doing image comparison, fails just after hashing 180,000 images before showing results in GUI
The text was updated successfully, but these errors were encountered: