Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal runtime error: stack overflow #1140

Closed
docwisdom opened this issue Nov 21, 2023 · 27 comments · Fixed by #1102
Closed

fatal runtime error: stack overflow #1140

docwisdom opened this issue Nov 21, 2023 · 27 comments · Fixed by #1102
Labels
bug Something isn't working

Comments

@docwisdom
Copy link

  • Czkawka version 12.11.2 docker GUI
  • OS version unraid 6.12.4
  • Terminal output[optional]:

[xvnc ] Tue Nov 21 14:22:35 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[xvnc ] Tue Nov 21 14:27:46 2023
[xvnc ] VNCSConnST: closing /tmp/vnc.sock: Clean disconnection
[xvnc ] EncodeManager: Framebuffer updates: 1523
[xvnc ] EncodeManager: Tight:
[xvnc ] EncodeManager: Solid: 34 rects, 1.23945 Mpixels
[xvnc ] EncodeManager: 544 B (1:9114.32 ratio)
[xvnc ] EncodeManager: Bitmap RLE: 18 rects, 13.809 kpixels
[xvnc ] EncodeManager: 582 B (1:95.2784 ratio)
[xvnc ] EncodeManager: Indexed RLE: 2.615 krects, 429.034 kpixels
[xvnc ] EncodeManager: 409.938 KiB (1:4.16297 ratio)
[xvnc ] EncodeManager: Tight (JPEG):
[xvnc ] EncodeManager: Full Colour: 1.622 krects, 2.05636 Mpixels
[xvnc ] EncodeManager: 3.23682 MiB (1:2.42922 ratio)
[xvnc ] EncodeManager: Total: 4.289 krects, 3.73865 Mpixels
[xvnc ] EncodeManager: 3.63822 MiB (1:3.93348 ratio)
[xvnc ] Connections: closed: /tmp/vnc.sock
[xvnc ] ComparingUpdateTracker: 24.3215 Mpixels in / 1.10949 Mpixels out
[xvnc ] ComparingUpdateTracker: (1:21.9214 ratio)
[xvnc ] Tue Nov 21 14:38:02 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] Tue Nov 21 14:38:03 2023
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[app ] thread '' has overflowed its stack
[app ] fatal runtime error: stack overflow

[supervisor ] service 'app' exited (got signal SIGABRT).
[supervisor ] service 'app' exited, shutting down...
[supervisor ] stopping service 'openbox'...
[supervisor ] service 'openbox' exited (with status 0).
[supervisor ] stopping service 'nginx'...

Bug Description
When doing image comparison, fails just after hashing 180,000 images before showing results in GUI

@docwisdom docwisdom added the bug Something isn't working label Nov 21, 2023
@qarmin
Copy link
Owner

qarmin commented Nov 22, 2023

I tested on my local machine hashing 300 000 files, but not had any problems(it was just slow)

How much cores you have in cpu and which OS do you use?

@docwisdom
Copy link
Author

I tried again multiple times with smaller file counts, like 1600 or so and had the same issue. I think it may have something to do with the gradient hashing versions. If I switch to blockhash it seems to do better in testing.

Im running on unraid which is based on slackware. Its in a docker container from this repo https://hub.docker.com/r/jlesage/czkawka/

@docwisdom
Copy link
Author

I just tested blockhash on 8k photos and it crashed again.

[xvnc ] Connections: accepted: /tmp/vnc.sock [xvnc ] SConnection: Client needs protocol version 3.8 [xvnc ] SConnection: Client requests security type None(1) [xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888 [xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888 [xvnc ] ComparingUpdateTracker: 0 pixels in / 0 pixels out [xvnc ] ComparingUpdateTracker: (1:-nan ratio) [app ] 19:07:46.673 [INFO] czkawka_core::similar_images: find_similar_images [app ] 19:13:32.743 [INFO] czkawka_core::similar_images: find_similar_images: Done in 346.07s [app ] 19:15:14.553 [INFO] czkawka_core::similar_images: find_similar_images [app ] 19:17:06.562 [INFO] czkawka_core::similar_images: find_similar_images: Done in 112.01s [app ] 19:18:10.395 [INFO] czkawka_core::similar_images: find_similar_images [app ] thread '<unknown>' has overflowed its stack [app ] fatal runtime error: stack overflow

@qarmin
Copy link
Owner

qarmin commented Nov 23, 2023

Can you somehow add RUST_LOG=debug to environment variables of this app?
This shows that stack overflow happens in similar image tool, but not shows exact function that cause problem.

By default most of linux distros have 8MB of stack which should be enough for this app, but slackware is quite old distribution and can have different limits(Looks that can have 1MB of stack size - https://slackwiki.com/Resource_Limits).
Can you check what returns ulimit -s ? On my OS it returns 8192 [KB].

How much CPU/threads have server?

@docwisdom
Copy link
Author

docwisdom commented Nov 23, 2023

Unraid is a custom build, it uses an up to date kernel, but I dont know much about its inner workings.

root@NAS:~# ulimit -s
unlimited

14 cores, 28 threads

[supervisor ] loading service 'openbox'...
[supervisor ] loading service 'logmonitor'...
[supervisor ] service 'logmonitor' is disabled.
[supervisor ] loading service 'logrotate'...
[supervisor ] all services loaded.
[supervisor ] starting services...
[supervisor ] starting service 'xvnc'...
[xvnc ] Xvnc TigerVNC 1.13.1 - built Nov 10 2023 13:43:39
[xvnc ] Copyright (C) 1999-2022 TigerVNC Team and many others (see README.rst)
[xvnc ] See https://www.tigervnc.org for information on TigerVNC.
[xvnc ] Underlying X server release 12014000
[xvnc ] Thu Nov 23 12:57:23 2023
[xvnc ] vncext: VNC extension running!
[xvnc ] vncext: Listening for VNC connections on /tmp/vnc.sock (mode 0660)
[xvnc ] vncext: Listening for VNC connections on all interface(s), port 5900
[xvnc ] vncext: created VNC server for screen 0
[supervisor ] starting service 'nginx'...
[nginx ] Listening for HTTP connections on port 5800.
[supervisor ] starting service 'openbox'...
[supervisor ] starting service 'app'...
[supervisor ] all services started.
[app ] 20:57:25.680 [INFO] czkawka_core::common: Czkawka version: 6.1.0, was compiled with release mode
[app ] 20:57:26.264 [INFO] czkawka_gui: Set thread number to 28
[xvnc ] Thu Nov 23 12:58:01 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[xvnc ] ComparingUpdateTracker: 0 pixels in / 0 pixels out
[xvnc ] ComparingUpdateTracker: (1:-nan ratio)
[app ] 20:58:11.345 [DEBUG] czkawka_gui::connect_things::connect_button_search: clean_tree_view
[app ] 20:58:11.345 [DEBUG] czkawka_gui::connect_things::connect_button_search: clean_tree_view: Done in 1.04µs
[app ] 20:58:11.347 [INFO] czkawka_core::similar_images: find_similar_images
[app ] 20:58:11.348 [DEBUG] czkawka_core::similar_images: check_for_similar_images
[app ] 20:58:11.584 [DEBUG] czkawka_core::common: send_info_and_wait_for_ending_all_threads
[app ] 20:58:11.589 [DEBUG] czkawka_core::common: send_info_and_wait_for_ending_all_threads: Done in 4.86ms
[app ] 20:58:11.589 [DEBUG] czkawka_core::similar_images: check_for_similar_images: Done in 241.04ms
[app ] 20:58:11.589 [DEBUG] czkawka_core::similar_images: hash_images
[app ] 20:58:11.589 [DEBUG] czkawka_core::similar_images: hash_images_load_cache
[app ] 20:58:11.589 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized_by_path
[app ] 20:58:11.589 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized
[app ] 20:58:11.616 [DEBUG] czkawka_core::common_cache: Starting removing outdated cache entries (removing non existent files from cache - true)
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: Completed removing outdated cache entries, removed 0 out of all 3845 entries
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: Loaded cache from file cache_similar_images_32_Blockhash_Lanczos3_61.bin (or json alternative) - 3845 results
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized: Done in 120.77ms
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: Converting cache Vec into BTreeMap<String, T>
[app ] 20:58:11.712 [DEBUG] czkawka_core::common_cache: Converted cache Vec into BTreeMap<String, T>
[app ] 20:58:11.712 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized_by_path: Done in 123.11ms
[app ] 20:58:11.712 [DEBUG] czkawka_core::similar_images: hash_images-load_cache - starting calculating diff
[app ] 20:58:11.738 [DEBUG] czkawka_core::similar_images: hash_images_load_cache - completed diff between loaded and prechecked files, 5383(15.41 GiB) - non cached, 3845(8.66 GiB) - already cached
[app ] 20:58:11.738 [DEBUG] czkawka_core::similar_images: hash_images_load_cache: Done in 149.25ms
[app ] 20:58:11.738 [DEBUG] czkawka_core::similar_images: hash_images - start hashing images
[app ] thread ' < unknown > ' has overflowed its stack
[app ] fatal runtime error: stack overflow
[supervisor ] service 'app' exited (got signal SIGABRT).
[supervisor ] service 'app' exited, shutting down...
[supervisor ] stopping service 'openbox'...
[supervisor ] service 'openbox' exited (with status 0).
[supervisor ] stopping service 'nginx'...
[xvnc ] Thu Nov 23 13:09:21 2023
[xvnc ] VNCSConnST: closing /tmp/vnc.sock: Clean disconnection
[xvnc ] EncodeManager: Framebuffer updates: 4062
[xvnc ] EncodeManager: CopyRect:
[xvnc ] EncodeManager: Copies: 1 rects, 182.7 kpixels
[xvnc ] EncodeManager: 16 B (1:45675.8 ratio)
[xvnc ] EncodeManager: Tight:
[xvnc ] EncodeManager: Solid: 174 rects, 7.3631 Mpixels
[xvnc ] EncodeManager: 2.71875 KiB (1:10579.9 ratio)
[xvnc ] EncodeManager: Bitmap RLE: 99 rects, 74.165 kpixels
[xvnc ] EncodeManager: 2.84766 KiB (1:102.143 ratio)
[xvnc ] EncodeManager: Indexed RLE: 5.702 krects, 1.23611 Mpixels
[xvnc ] EncodeManager: 866.635 KiB (1:5.64873 ratio)
[xvnc ] EncodeManager: Tight (JPEG):
[xvnc ] EncodeManager: Full Colour: 4.698 krects, 5.61785 Mpixels
[xvnc ] EncodeManager: 8.92706 MiB (1:2.40663 ratio)
[xvnc ] EncodeManager: Total: 10.674 krects, 14.4739 Mpixels
[xvnc ] EncodeManager: 9.77884 MiB (1:5.65873 ratio)
[xvnc ] Connections: closed: /tmp/vnc.sock
[xvnc ] ComparingUpdateTracker: 135.499 Mpixels in / 7.93127 Mpixels out
[xvnc ] ComparingUpdateTracker: (1:17.0842 ratio)
[supervisor ] service 'nginx' exited (with status 0).
[supervisor ] stopping service 'xvnc'...
[xvnc ] Thu Nov 23 13:09:22 2023
[xvnc ] ComparingUpdateTracker: 0 pixels in / 0 pixels out
[xvnc ] ComparingUpdateTracker: (1:-nan ratio)
[supervisor ] service 'xvnc' exited (with status 0).
[supervisor ] sending SIGTERM to all processes...
[finish ] executing container finish scripts...
[finish ] all container finish scripts executed.

@qarmin
Copy link
Owner

qarmin commented Nov 23, 2023

In hash_images function I cannot find any place that could use more than few kilobytes of stack, so I don't know why stack overflows.

Limiting used cores is probably the easiest workaround(I have 8 threads and never had similar problems, but I think that 15/20 should also works fine - but this needs to be tested).

@docwisdom
Copy link
Author

I'll try that now

@jlesage
Copy link
Contributor

jlesage commented Nov 23, 2023

Note that this version of Czkawka is compiled against musl, instead of glibc. The thread stack size allocated by musl is 128K by default, which is small compared to few MB by glibc (https://wiki.musl-libc.org/functional-differences-from-glibc.html).

@docwisdom
Copy link
Author

Sorry this is beyond my comprehension. Is there a fix?

@jlesage
Copy link
Contributor

jlesage commented Nov 24, 2023

The comment was for @qarmin, so he can see if currently Czkawka could approach the thread stack size limit of musl.

@jlesage
Copy link
Contributor

jlesage commented Nov 24, 2023

@docwisdom, to see if it's a stack size issue, could you try to run the following commands inside the container? This will increase the default stack size to 1MB.

export GOPATH=/go
add-pkg go git musl-dev
go install github.com/yaegashi/muslstack@latest
cp /usr/bin/czkawka_gui /usr/bin/czkawka_gui2
/go/bin/muslstack -s 0x100000 /usr/bin/czkawka_gui2
mv /usr/bin/czkawka_gui2 /usr/bin/czkawka_gui

Then restart the container and see if it's crashing again. If it does, you can try to increase to 8MB:

cp /usr/bin/czkawka_gui /usr/bin/czkawka_gui2
/go/bin/muslstack -s 0x800000 /usr/bin/czkawka_gui2
mv /usr/bin/czkawka_gui2 /usr/bin/czkawka_gui

@docwisdom
Copy link
Author

Thanks for this. I tried both 1mb and 8mb settings and still had it crash at the end of the hashing

@qarmin
Copy link
Owner

qarmin commented Nov 24, 2023

From - https://stackoverflow.com/questions/44003589/how-to-increase-the-stack-size-available-to-a-rust-library#comment75039223_44003965:

I'll note that [std::thread::Builder](https://doc.rust-lang.org/1.8.0/std/thread/struct.Builder.html) let you specify the stack size of the created thread from within the program. Only the stack size of the main thread is set by the OS.

so it is possible that thread stack size was set here and that is why this not worked(main thread in gui is not responsible for heavy calculation).

I already tried to set stack size in rayon with https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.stack_size to 1 byte to see crash, but everything worked fine, so not sure where problem can be.

I tried to debug stack size with https://crates.io/crates/cargo-call-stack, but looks that it is not possible due several crashes and I don't know which other tool I could use to debug this problem.

@qarmin
Copy link
Owner

qarmin commented Nov 24, 2023

In #1102 I changed some stack size values which may fix problem, but for me this values just works, so I cannot test if this will fix problem:

  • main thread stack size - os default values - not too much to calculate/store from gui perspective
  • main scan thread stack size - 8MB
  • worker threads stack size - 4MB

@docwisdom
Copy link
Author

Will this be an upcomming release?

@qarmin
Copy link
Owner

qarmin commented Nov 26, 2023

Yes, even now binaries to test are available here - https://github.com/qarmin/czkawka/actions/runs/6992056327, but since they are build with glibc not with musl, by running it, it is impossible to tell if the problem has been fixed

@jlesage
Copy link
Contributor

jlesage commented Nov 26, 2023

I've reproduced the stack overflow error. I'm currently testing a version that sets the stack size in rayon. I will let you know about the result.

@docwisdom
Copy link
Author

Thank you

@jlesage
Copy link
Contributor

jlesage commented Nov 27, 2023

Finally, I don't seem to be able to reproduce in a consistent manner.
@docwisdom, could your try the jlesage/czkawka:issue-1140 Docker image and see if you can reproduce ?

@docwisdom
Copy link
Author

docwisdom commented Nov 27, 2023

Ive done 3 test batches so far (3-6k each) and no crashes. I think this may have resolved the issue. I am going to do a larger batch this morning.

@docwisdom
Copy link
Author

Ran 160,000 photo comparisons using 32 gradient and it completed successfully. I would consider the issue resolved. Thank you.

@jlesage
Copy link
Contributor

jlesage commented Nov 28, 2023

@qarmin, this is the patch that @docwisdom tested:

--- a/czkawka_core/src/common.rs        2023-11-24 14:45:40.462095198 -0500
+++ b/czkawka_core/src/common.rs        2023-11-24 14:47:29.678337169 -0500
@@ -76,7 +76,7 @@
 pub fn set_number_of_threads(thread_number: usize) {
     NUMBER_OF_THREADS.set(thread_number);

-    rayon::ThreadPoolBuilder::new().num_threads(get_number_of_threads()).build_global().unwrap();
+    rayon::ThreadPoolBuilder::new().num_threads(get_number_of_threads()).stack_size(8*1024*1204).build_global().unwrap();
 }

 pub const RAW_IMAGE_EXTENSIONS: &[&str] = &[

Do you want to integrate the change yourself or you want me to create a PR ?

@qarmin
Copy link
Owner

qarmin commented Nov 28, 2023

I already added a little different limits - #1140 (comment)

@jlesage
Copy link
Contributor

jlesage commented Nov 28, 2023

Ok yes, in this PR, 4MB (DEFAULT_WORKER_THREAD_SIZE) is used instead of 8MB. @docwisdom, I pushed jlesage/czkawka:issue-1140-2, if you want to confirm that it's still working with a 4MB stack.

@docwisdom
Copy link
Author

tested on 3900 photos, no issues

@nicoKoehler
Copy link

nicoKoehler commented Feb 11, 2024

@jlesage @qarmin hey, is this rolled up into the available docker image too?
I am experiencing the same issue when I run anything other than the standard selected algorithm for similar images. Tested on 3 machines of varying CPU strength, all resulting in the same issue.
(pulled jlesage/czkawka image via docker compose)

EDIT: I went to dockerhub and saw that the "latest" tag is not in fact the lastest. There is an image with the tag 1140-2, which was commited around the same time as this issue was closed. I tried to use this image instead, and thus far (on smaller tests with 3-5k images which also caused the stackoverflow in the "latest" image) it's been working.

EDIT2: Still causes a stack overflow with 11k pictures.

@jlesage
Copy link
Contributor

jlesage commented Feb 18, 2024

is this rolled up into the available docker image too?

The latest version of Czkawka doesn't have the fix. The next version should include it.

I went to dockerhub and saw that the "latest" tag is not in fact the lastest. There is an image with the tag 1140-2, which was commited around the same time as this issue was closed.

This was a non-official image to test a potential fix.

EDIT2: Still causes a stack overflow with 11k pictures.

Can you try jlesage/czkawka:issue-1140 instead ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants