fatal runtime error: stack overflow #1140

docwisdom · 2023-11-21T23:13:24Z

Czkawka version 12.11.2 docker GUI
OS version unraid 6.12.4
Terminal output[optional]:

[xvnc ] Tue Nov 21 14:22:35 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[xvnc ] Tue Nov 21 14:27:46 2023
[xvnc ] VNCSConnST: closing /tmp/vnc.sock: Clean disconnection
[xvnc ] EncodeManager: Framebuffer updates: 1523
[xvnc ] EncodeManager: Tight:
[xvnc ] EncodeManager: Solid: 34 rects, 1.23945 Mpixels
[xvnc ] EncodeManager: 544 B (1:9114.32 ratio)
[xvnc ] EncodeManager: Bitmap RLE: 18 rects, 13.809 kpixels
[xvnc ] EncodeManager: 582 B (1:95.2784 ratio)
[xvnc ] EncodeManager: Indexed RLE: 2.615 krects, 429.034 kpixels
[xvnc ] EncodeManager: 409.938 KiB (1:4.16297 ratio)
[xvnc ] EncodeManager: Tight (JPEG):
[xvnc ] EncodeManager: Full Colour: 1.622 krects, 2.05636 Mpixels
[xvnc ] EncodeManager: 3.23682 MiB (1:2.42922 ratio)
[xvnc ] EncodeManager: Total: 4.289 krects, 3.73865 Mpixels
[xvnc ] EncodeManager: 3.63822 MiB (1:3.93348 ratio)
[xvnc ] Connections: closed: /tmp/vnc.sock
[xvnc ] ComparingUpdateTracker: 24.3215 Mpixels in / 1.10949 Mpixels out
[xvnc ] ComparingUpdateTracker: (1:21.9214 ratio)
[xvnc ] Tue Nov 21 14:38:02 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] Tue Nov 21 14:38:03 2023
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[app ] thread '' has overflowed its stack
[app ] fatal runtime error: stack overflow
[supervisor ] service 'app' exited (got signal SIGABRT).
[supervisor ] service 'app' exited, shutting down...
[supervisor ] stopping service 'openbox'...
[supervisor ] service 'openbox' exited (with status 0).
[supervisor ] stopping service 'nginx'...

Bug Description
When doing image comparison, fails just after hashing 180,000 images before showing results in GUI

qarmin · 2023-11-22T15:14:26Z

I tested on my local machine hashing 300 000 files, but not had any problems(it was just slow)

How much cores you have in cpu and which OS do you use?

docwisdom · 2023-11-23T19:09:21Z

I tried again multiple times with smaller file counts, like 1600 or so and had the same issue. I think it may have something to do with the gradient hashing versions. If I switch to blockhash it seems to do better in testing.

Im running on unraid which is based on slackware. Its in a docker container from this repo https://hub.docker.com/r/jlesage/czkawka/

docwisdom · 2023-11-23T19:40:17Z

I just tested blockhash on 8k photos and it crashed again.

[xvnc ] Connections: accepted: /tmp/vnc.sock [xvnc ] SConnection: Client needs protocol version 3.8 [xvnc ] SConnection: Client requests security type None(1) [xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888 [xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888 [xvnc ] ComparingUpdateTracker: 0 pixels in / 0 pixels out [xvnc ] ComparingUpdateTracker: (1:-nan ratio) [app ] 19:07:46.673 [INFO] czkawka_core::similar_images: find_similar_images [app ] 19:13:32.743 [INFO] czkawka_core::similar_images: find_similar_images: Done in 346.07s [app ] 19:15:14.553 [INFO] czkawka_core::similar_images: find_similar_images [app ] 19:17:06.562 [INFO] czkawka_core::similar_images: find_similar_images: Done in 112.01s [app ] 19:18:10.395 [INFO] czkawka_core::similar_images: find_similar_images [app ] thread '<unknown>' has overflowed its stack [app ] fatal runtime error: stack overflow

qarmin · 2023-11-23T20:43:22Z

Can you somehow add RUST_LOG=debug to environment variables of this app?
This shows that stack overflow happens in similar image tool, but not shows exact function that cause problem.

By default most of linux distros have 8MB of stack which should be enough for this app, but slackware is quite old distribution and can have different limits(Looks that can have 1MB of stack size - https://slackwiki.com/Resource_Limits).
Can you check what returns ulimit -s ? On my OS it returns 8192 [KB].

How much CPU/threads have server?

docwisdom · 2023-11-23T21:10:20Z

Unraid is a custom build, it uses an up to date kernel, but I dont know much about its inner workings.

root@NAS:~# ulimit -s
unlimited

14 cores, 28 threads

[supervisor ] loading service 'openbox'...
[supervisor ] loading service 'logmonitor'...
[supervisor ] service 'logmonitor' is disabled.
[supervisor ] loading service 'logrotate'...
[supervisor ] all services loaded.
[supervisor ] starting services...
[supervisor ] starting service 'xvnc'...
[xvnc ] Xvnc TigerVNC 1.13.1 - built Nov 10 2023 13:43:39
[xvnc ] Copyright (C) 1999-2022 TigerVNC Team and many others (see README.rst)
[xvnc ] See https://www.tigervnc.org for information on TigerVNC.
[xvnc ] Underlying X server release 12014000
[xvnc ] Thu Nov 23 12:57:23 2023
[xvnc ] vncext: VNC extension running!
[xvnc ] vncext: Listening for VNC connections on /tmp/vnc.sock (mode 0660)
[xvnc ] vncext: Listening for VNC connections on all interface(s), port 5900
[xvnc ] vncext: created VNC server for screen 0
[supervisor ] starting service 'nginx'...
[nginx ] Listening for HTTP connections on port 5800.
[supervisor ] starting service 'openbox'...
[supervisor ] starting service 'app'...
[supervisor ] all services started.
[app ] 20:57:25.680 [INFO] czkawka_core::common: Czkawka version: 6.1.0, was compiled with release mode
[app ] 20:57:26.264 [INFO] czkawka_gui: Set thread number to 28
[xvnc ] Thu Nov 23 12:58:01 2023
[xvnc ] Connections: accepted: /tmp/vnc.sock
[xvnc ] SConnection: Client needs protocol version 3.8
[xvnc ] SConnection: Client requests security type None(1)
[xvnc ] VNCSConnST: Server default pixel format depth 24 (32bpp) little-endian rgb888
[xvnc ] VNCSConnST: Client pixel format depth 24 (32bpp) little-endian bgr888
[xvnc ] ComparingUpdateTracker: 0 pixels in / 0 pixels out
[xvnc ] ComparingUpdateTracker: (1:-nan ratio)
[app ] 20:58:11.345 [DEBUG] czkawka_gui::connect_things::connect_button_search: clean_tree_view
[app ] 20:58:11.345 [DEBUG] czkawka_gui::connect_things::connect_button_search: clean_tree_view: Done in 1.04µs
[app ] 20:58:11.347 [INFO] czkawka_core::similar_images: find_similar_images
[app ] 20:58:11.348 [DEBUG] czkawka_core::similar_images: check_for_similar_images
[app ] 20:58:11.584 [DEBUG] czkawka_core::common: send_info_and_wait_for_ending_all_threads
[app ] 20:58:11.589 [DEBUG] czkawka_core::common: send_info_and_wait_for_ending_all_threads: Done in 4.86ms
[app ] 20:58:11.589 [DEBUG] czkawka_core::similar_images: check_for_similar_images: Done in 241.04ms
[app ] 20:58:11.589 [DEBUG] czkawka_core::similar_images: hash_images
[app ] 20:58:11.589 [DEBUG] czkawka_core::similar_images: hash_images_load_cache
[app ] 20:58:11.589 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized_by_path
[app ] 20:58:11.589 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized
[app ] 20:58:11.616 [DEBUG] czkawka_core::common_cache: Starting removing outdated cache entries (removing non existent files from cache - true)
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: Completed removing outdated cache entries, removed 0 out of all 3845 entries
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: Loaded cache from file cache_similar_images_32_Blockhash_Lanczos3_61.bin (or json alternative) - 3845 results
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized: Done in 120.77ms
[app ] 20:58:11.710 [DEBUG] czkawka_core::common_cache: Converting cache Vec into BTreeMap<String, T>
[app ] 20:58:11.712 [DEBUG] czkawka_core::common_cache: Converted cache Vec into BTreeMap<String, T>
[app ] 20:58:11.712 [DEBUG] czkawka_core::common_cache: load_cache_from_file_generalized_by_path: Done in 123.11ms
[app ] 20:58:11.712 [DEBUG] czkawka_core::similar_images: hash_images-load_cache - starting calculating diff
[app ] 20:58:11.738 [DEBUG] czkawka_core::similar_images: hash_images_load_cache - completed diff between loaded and prechecked files, 5383(15.41 GiB) - non cached, 3845(8.66 GiB) - already cached
[app ] 20:58:11.738 [DEBUG] czkawka_core::similar_images: hash_images_load_cache: Done in 149.25ms
[app ] 20:58:11.738 [DEBUG] czkawka_core::similar_images: hash_images - start hashing images
[app ] thread ' < unknown > ' has overflowed its stack
[app ] fatal runtime error: stack overflow
[supervisor ] service 'app' exited (got signal SIGABRT).
[supervisor ] service 'app' exited, shutting down...
[supervisor ] stopping service 'openbox'...
[supervisor ] service 'openbox' exited (with status 0).
[supervisor ] stopping service 'nginx'...
[xvnc ] Thu Nov 23 13:09:21 2023
[xvnc ] VNCSConnST: closing /tmp/vnc.sock: Clean disconnection
[xvnc ] EncodeManager: Framebuffer updates: 4062
[xvnc ] EncodeManager: CopyRect:
[xvnc ] EncodeManager: Copies: 1 rects, 182.7 kpixels
[xvnc ] EncodeManager: 16 B (1:45675.8 ratio)
[xvnc ] EncodeManager: Tight:
[xvnc ] EncodeManager: Solid: 174 rects, 7.3631 Mpixels
[xvnc ] EncodeManager: 2.71875 KiB (1:10579.9 ratio)
[xvnc ] EncodeManager: Bitmap RLE: 99 rects, 74.165 kpixels
[xvnc ] EncodeManager: 2.84766 KiB (1:102.143 ratio)
[xvnc ] EncodeManager: Indexed RLE: 5.702 krects, 1.23611 Mpixels
[xvnc ] EncodeManager: 866.635 KiB (1:5.64873 ratio)
[xvnc ] EncodeManager: Tight (JPEG):
[xvnc ] EncodeManager: Full Colour: 4.698 krects, 5.61785 Mpixels
[xvnc ] EncodeManager: 8.92706 MiB (1:2.40663 ratio)
[xvnc ] EncodeManager: Total: 10.674 krects, 14.4739 Mpixels
[xvnc ] EncodeManager: 9.77884 MiB (1:5.65873 ratio)
[xvnc ] Connections: closed: /tmp/vnc.sock
[xvnc ] ComparingUpdateTracker: 135.499 Mpixels in / 7.93127 Mpixels out
[xvnc ] ComparingUpdateTracker: (1:17.0842 ratio)
[supervisor ] service 'nginx' exited (with status 0).
[supervisor ] stopping service 'xvnc'...
[xvnc ] Thu Nov 23 13:09:22 2023
[xvnc ] ComparingUpdateTracker: 0 pixels in / 0 pixels out
[xvnc ] ComparingUpdateTracker: (1:-nan ratio)
[supervisor ] service 'xvnc' exited (with status 0).
[supervisor ] sending SIGTERM to all processes...
[finish ] executing container finish scripts...
[finish ] all container finish scripts executed.

qarmin · 2023-11-23T21:28:15Z

In hash_images function I cannot find any place that could use more than few kilobytes of stack, so I don't know why stack overflows.

Limiting used cores is probably the easiest workaround(I have 8 threads and never had similar problems, but I think that 15/20 should also works fine - but this needs to be tested).

docwisdom · 2023-11-23T21:33:59Z

I'll try that now

jlesage · 2023-11-23T22:12:21Z

Note that this version of Czkawka is compiled against musl, instead of glibc. The thread stack size allocated by musl is 128K by default, which is small compared to few MB by glibc (https://wiki.musl-libc.org/functional-differences-from-glibc.html).

docwisdom · 2023-11-23T23:14:16Z

Sorry this is beyond my comprehension. Is there a fix?

jlesage · 2023-11-24T00:06:46Z

The comment was for @qarmin, so he can see if currently Czkawka could approach the thread stack size limit of musl.

jlesage · 2023-11-24T02:01:09Z

@docwisdom, to see if it's a stack size issue, could you try to run the following commands inside the container? This will increase the default stack size to 1MB.

export GOPATH=/go
add-pkg go git musl-dev
go install github.com/yaegashi/muslstack@latest
cp /usr/bin/czkawka_gui /usr/bin/czkawka_gui2
/go/bin/muslstack -s 0x100000 /usr/bin/czkawka_gui2
mv /usr/bin/czkawka_gui2 /usr/bin/czkawka_gui

Then restart the container and see if it's crashing again. If it does, you can try to increase to 8MB:

cp /usr/bin/czkawka_gui /usr/bin/czkawka_gui2
/go/bin/muslstack -s 0x800000 /usr/bin/czkawka_gui2
mv /usr/bin/czkawka_gui2 /usr/bin/czkawka_gui

docwisdom · 2023-11-24T16:56:25Z

Thanks for this. I tried both 1mb and 8mb settings and still had it crash at the end of the hashing

qarmin · 2023-11-24T18:17:03Z

From - https://stackoverflow.com/questions/44003589/how-to-increase-the-stack-size-available-to-a-rust-library#comment75039223_44003965:

I'll note that [std::thread::Builder](https://doc.rust-lang.org/1.8.0/std/thread/struct.Builder.html) let you specify the stack size of the created thread from within the program. Only the stack size of the main thread is set by the OS.

so it is possible that thread stack size was set here and that is why this not worked(main thread in gui is not responsible for heavy calculation).

I already tried to set stack size in rayon with https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.stack_size to 1 byte to see crash, but everything worked fine, so not sure where problem can be.

I tried to debug stack size with https://crates.io/crates/cargo-call-stack, but looks that it is not possible due several crashes and I don't know which other tool I could use to debug this problem.

qarmin · 2023-11-24T19:28:14Z

In #1102 I changed some stack size values which may fix problem, but for me this values just works, so I cannot test if this will fix problem:

main thread stack size - os default values - not too much to calculate/store from gui perspective
main scan thread stack size - 8MB
worker threads stack size - 4MB

docwisdom · 2023-11-26T03:45:46Z

Will this be an upcomming release?

qarmin · 2023-11-26T07:59:39Z

Yes, even now binaries to test are available here - https://github.com/qarmin/czkawka/actions/runs/6992056327, but since they are build with glibc not with musl, by running it, it is impossible to tell if the problem has been fixed

jlesage · 2023-11-26T14:04:23Z

I've reproduced the stack overflow error. I'm currently testing a version that sets the stack size in rayon. I will let you know about the result.

docwisdom · 2023-11-26T15:11:09Z

Thank you

jlesage · 2023-11-27T14:21:41Z

Finally, I don't seem to be able to reproduce in a consistent manner.
@docwisdom, could your try the jlesage/czkawka:issue-1140 Docker image and see if you can reproduce ?

docwisdom · 2023-11-27T18:27:15Z

Ive done 3 test batches so far (3-6k each) and no crashes. I think this may have resolved the issue. I am going to do a larger batch this morning.

docwisdom · 2023-11-28T01:20:26Z

Ran 160,000 photo comparisons using 32 gradient and it completed successfully. I would consider the issue resolved. Thank you.

jlesage · 2023-11-28T02:02:12Z

@qarmin, this is the patch that @docwisdom tested:

--- a/czkawka_core/src/common.rs        2023-11-24 14:45:40.462095198 -0500
+++ b/czkawka_core/src/common.rs        2023-11-24 14:47:29.678337169 -0500
@@ -76,7 +76,7 @@
 pub fn set_number_of_threads(thread_number: usize) {
     NUMBER_OF_THREADS.set(thread_number);

-    rayon::ThreadPoolBuilder::new().num_threads(get_number_of_threads()).build_global().unwrap();
+    rayon::ThreadPoolBuilder::new().num_threads(get_number_of_threads()).stack_size(8*1024*1204).build_global().unwrap();
 }

 pub const RAW_IMAGE_EXTENSIONS: &[&str] = &[

Do you want to integrate the change yourself or you want me to create a PR ?

qarmin · 2023-11-28T07:26:59Z

I already added a little different limits - #1140 (comment)

jlesage · 2023-11-28T15:28:21Z

Ok yes, in this PR, 4MB (DEFAULT_WORKER_THREAD_SIZE) is used instead of 8MB. @docwisdom, I pushed jlesage/czkawka:issue-1140-2, if you want to confirm that it's still working with a 4MB stack.

docwisdom · 2023-11-28T21:38:26Z

tested on 3900 photos, no issues

nicoKoehler · 2024-02-11T14:30:49Z

@jlesage @qarmin hey, is this rolled up into the available docker image too?
I am experiencing the same issue when I run anything other than the standard selected algorithm for similar images. Tested on 3 machines of varying CPU strength, all resulting in the same issue.
(pulled jlesage/czkawka image via docker compose)

EDIT: I went to dockerhub and saw that the "latest" tag is not in fact the lastest. There is an image with the tag 1140-2, which was commited around the same time as this issue was closed. I tried to use this image instead, and thus far (on smaller tests with 3-5k images which also caused the stackoverflow in the "latest" image) it's been working.

EDIT2: Still causes a stack overflow with 11k pictures.

jlesage · 2024-02-18T01:38:38Z

is this rolled up into the available docker image too?

The latest version of Czkawka doesn't have the fix. The next version should include it.

I went to dockerhub and saw that the "latest" tag is not in fact the lastest. There is an image with the tag 1140-2, which was commited around the same time as this issue was closed.

This was a non-official image to test a potential fix.

EDIT2: Still causes a stack overflow with 11k pictures.

Can you try jlesage/czkawka:issue-1140 instead ?

docwisdom added the bug Something isn't working label Nov 21, 2023

qarmin mentioned this issue Nov 29, 2023

Alternative frontend with Slint #1102

Merged

qarmin closed this as completed in #1102 Dec 3, 2023

qarmin mentioned this issue Jan 14, 2024

Search for similar images breaks down #1179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fatal runtime error: stack overflow #1140

fatal runtime error: stack overflow #1140

docwisdom commented Nov 21, 2023

qarmin commented Nov 22, 2023

docwisdom commented Nov 23, 2023

docwisdom commented Nov 23, 2023

qarmin commented Nov 23, 2023 •

edited

Loading

docwisdom commented Nov 23, 2023 •

edited

Loading

qarmin commented Nov 23, 2023

docwisdom commented Nov 23, 2023

jlesage commented Nov 23, 2023

docwisdom commented Nov 23, 2023

jlesage commented Nov 24, 2023

jlesage commented Nov 24, 2023

docwisdom commented Nov 24, 2023

qarmin commented Nov 24, 2023 •

edited

Loading

qarmin commented Nov 24, 2023

docwisdom commented Nov 26, 2023

qarmin commented Nov 26, 2023

jlesage commented Nov 26, 2023

docwisdom commented Nov 26, 2023

jlesage commented Nov 27, 2023

docwisdom commented Nov 27, 2023 •

edited

Loading

docwisdom commented Nov 28, 2023

jlesage commented Nov 28, 2023

qarmin commented Nov 28, 2023

jlesage commented Nov 28, 2023

docwisdom commented Nov 28, 2023

nicoKoehler commented Feb 11, 2024 •

edited

Loading

jlesage commented Feb 18, 2024

fatal runtime error: stack overflow #1140

fatal runtime error: stack overflow #1140

Comments

docwisdom commented Nov 21, 2023

qarmin commented Nov 22, 2023

docwisdom commented Nov 23, 2023

docwisdom commented Nov 23, 2023

qarmin commented Nov 23, 2023 • edited Loading

docwisdom commented Nov 23, 2023 • edited Loading

qarmin commented Nov 23, 2023

docwisdom commented Nov 23, 2023

jlesage commented Nov 23, 2023

docwisdom commented Nov 23, 2023

jlesage commented Nov 24, 2023

jlesage commented Nov 24, 2023

docwisdom commented Nov 24, 2023

qarmin commented Nov 24, 2023 • edited Loading

qarmin commented Nov 24, 2023

docwisdom commented Nov 26, 2023

qarmin commented Nov 26, 2023

jlesage commented Nov 26, 2023

docwisdom commented Nov 26, 2023

jlesage commented Nov 27, 2023

docwisdom commented Nov 27, 2023 • edited Loading

docwisdom commented Nov 28, 2023

jlesage commented Nov 28, 2023

qarmin commented Nov 28, 2023

jlesage commented Nov 28, 2023

docwisdom commented Nov 28, 2023

nicoKoehler commented Feb 11, 2024 • edited Loading

jlesage commented Feb 18, 2024

qarmin commented Nov 23, 2023 •

edited

Loading

docwisdom commented Nov 23, 2023 •

edited

Loading

qarmin commented Nov 24, 2023 •

edited

Loading

docwisdom commented Nov 27, 2023 •

edited

Loading

nicoKoehler commented Feb 11, 2024 •

edited

Loading