Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpra segfault (musl) on TCP connection refresh #4510

Open
kuhnchris opened this issue Feb 15, 2025 · 6 comments
Open

xpra segfault (musl) on TCP connection refresh #4510

kuhnchris opened this issue Feb 15, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@kuhnchris
Copy link

Describe the bug
I tried to get HTML5+SSH remote desktop up and working for me (utilizing --bind-tcp and /or --bind-ws). During my tests I noticed that the XPRA session kept dying every time I reconnected to the TCP socket via XPRA HTML5 client.
After looking around far too long (and without checking dmesg to actually notice the dumps) I tried to GDB my way around and found that there was actually a sigfault whenever I tried to the socket more than once:

[New LWP 24486]
[LWP 24483 exited]

Thread 8 "xpra" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 24479]
get_nominal_size (p=0x7f173422ba80 "", end=0x7f173422ba8c "") at src/malloc/mallocng/meta.h:169
warning: 169    src/malloc/mallocng/meta.h: No such file or directory

(gdb) 
#0  get_nominal_size (p=0x7f173422ba80 "", end=0x7f173422ba8c "") at src/malloc/mallocng/meta.h:169
#1  __libc_free (p=0x7f173422ba80) at src/malloc/mallocng/free.c:110
#2  0x00007f17389dcebc in __pyx_pf_4xpra_7buffers_6membuf_6MemBuf_10__dealloc__ (__pyx_v_self=0x7f17349cb2b0) at xpra/buffers/membuf.c:3765
#3  __pyx_pw_4xpra_7buffers_6membuf_6MemBuf_11__dealloc__ (__pyx_v_self=0x7f17349cb2b0) at xpra/buffers/membuf.c:3736
#4  __pyx_tp_dealloc_4xpra_7buffers_6membuf_MemBuf (o=0x7f17349cb2b0) at xpra/buffers/membuf.c:5489
#5  0x00007f173a4197ce in ??? () at /usr/lib/libpython3.12.so.1.0
#6  0x00007f173a3810b7 in ??? () at /usr/lib/libpython3.12.so.1.0
#7  0x00007f173a380db3 in ??? () at /usr/lib/libpython3.12.so.1.0
#8  0x00007f173a368e11 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.12.so.1.0
#9  0x00007f173a3a6590 in ??? () at /usr/lib/libpython3.12.so.1.0
#10 0x00007f173a36dec4 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.12.so.1.0
#11 0x00007f173a3a6590 in ??? () at /usr/lib/libpython3.12.so.1.0
#12 0x00007f173a4691f4 in ??? () at /usr/lib/libpython3.12.so.1.0
#13 0x00007f173a4269e4 in ??? () at /usr/lib/libpython3.12.so.1.0
#14 0x00007f173a9459d2 in start (p=0x7f173467a160) at src/thread/pthread_create.c:207
#15 0x00007f173a947314 in __clone () at src/thread/x86_64/clone.s:22
(gdb) 

--- dmesg ---
[ 7974.748934] traps: xpra[23125] general protection fault ip:7f55d059bc21 sp:7f55ca09caa0 error:0 in ld-musl-x86_64.so.1[45c21,7f55d056a000+57000]
[ 8033.866944] traps: xpra[23668] general protection fault ip:7fb303577c21 sp:7fb2fd077aa0 error:0 in ld-musl-x86_64.so.1[45c21,7fb303546000+57000]
[ 8145.252047] traps: xpra[23980] general protection fault ip:7f2e89520c21 sp:7f2e8326faa0 error:0 in ld-musl-x86_64.so.1[45c21,7f2e894ef000+57000]

--- xpra list ---
/usr/bin/xpra:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').run_script('xpra==6.3', 'xpra')
Found the following xpra sessions:
/run/user/1000/xpra/0:
        UNKNOWN session at :0
/run/user/1000/xpra:
        UNKNOWN session at :0
/home/kuhnchris/.xpra:
        UNKNOWN session at :0
Re-probing unknown sessions in: /run/user/1000/xpra/0, /home/kuhnchris/.xpra, /run/user/1000/xpra

looks like some kind of buffer over/underflow to me... not sure why this only triggers on --bind-tcp/--bind-ws, it worked fine until now for the SSH side of things.
I'm using alpine as the xpra host (server: alpine, client: web/windows/rocky/alpine)

To Reproduce
Steps to reproduce the behavior:

  1. literaly anything that uses --bind-tcp, but the easiest when running a desktop would be: xpra shadow :0 --bind-tcp=0.0.0.0:13335
  2. (connect via html5 client on localhost:13335, pressing F5 on the localhost:13335 website twice, see that the server disappeared)
  3. specific action to trigger the bug: no specifica

System Information (please complete the following information):

  • Server OS: alpine linux latest/edge
  • Client OS: n/a, but if important: Firefox on Win10, Firefox on Rocky9, Chromium on Rocky9
  • Xpra Server Version xpra v6.3-r36898 (g72b8c4f1d6) beta - python 3.12.8
  • Xpra Client Version n/a xpra v6.3-r37406 (ga3972271e2) beta / Xpra HTML5 Client - Version 18 (latest git as on 15/feb/25)

Additional context
Not sure if XPRA or "mallocng" (whatever that is) bug.

@kuhnchris kuhnchris added the bug Something isn't working label Feb 15, 2025
@kuhnchris
Copy link
Author

Just as a note, it seems to be that mallocng seems to be another libmusl speciality... and apparently ffmpeg was also hit by a similar bug https://trac.ffmpeg.org/ticket/9551

@totaam
Copy link
Collaborator

totaam commented Feb 16, 2025

It's possible that the connection-lost handler frees memory from the wrong thread.
Do you get the same behaviour with 6.2.x?

@kuhnchris
Copy link
Author

hi, sorry for the late reply.
I compiled 6.2.x, it survives "longer", as in you need to physically close the tab to cause the crash, but it also crashes:

#0  get_nominal_size (p=0x7ff729db4cc0 "", end=0x7ff729db4ccc "") at src/malloc/mallocng/meta.h:169
#1  __libc_free (p=0x7ff729db4cc0) at src/malloc/mallocng/free.c:110
#2  0x00007ff72eab0ecc in __pyx_pf_4xpra_7buffers_6membuf_6MemBuf_10__dealloc__ (__pyx_v_self=0x7ff72a7005f0) at xpra/buffers/membuf.c:3765
#3  __pyx_pw_4xpra_7buffers_6membuf_6MemBuf_11__dealloc__ (__pyx_v_self=0x7ff72a7005f0) at xpra/buffers/membuf.c:3736
#4  __pyx_tp_dealloc_4xpra_7buffers_6membuf_MemBuf (o=0x7ff72a7005f0) at xpra/buffers/membuf.c:5489
#5  0x00007ff7304197ce in ??? () at /usr/lib/libpython3.12.so.1.0
#6  0x00007ff7303810b7 in ??? () at /usr/lib/libpython3.12.so.1.0
#7  0x00007ff730380db3 in ??? () at /usr/lib/libpython3.12.so.1.0
#8  0x00007ff730368e11 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.12.so.1.0
#9  0x00007ff7303a6590 in ??? () at /usr/lib/libpython3.12.so.1.0
#10 0x00007ff73036dec4 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.12.so.1.0
#11 0x00007ff7303a6590 in ??? () at /usr/lib/libpython3.12.so.1.0
#12 0x00007ff7304691f4 in ??? () at /usr/lib/libpython3.12.so.1.0
#13 0x00007ff7304269e4 in ??? () at /usr/lib/libpython3.12.so.1.0
#14 0x00007ff7309fd9d2 in start (p=0x7ff72a59f230) at src/thread/pthread_create.c:207
#15 0x00007ff7309ff314 in __clone () at src/thread/x86_64/clone.s:22
(gdb) c
Continuing.
(venv) gateway:~/xpra-v62x$ xpra --version
/home/kuhnchris/xpra-v62x/venv/bin/xpra:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').run_script('xpra==6.2.4', 'xpra')
xpra v6.2.4-r43

@totaam
Copy link
Collaborator

totaam commented Feb 16, 2025

The MemBuf wrapper class is used throughout the codebase for managing C buffers as Python objects.
Primarily in the codecs (which you can disable by running with --encodings=rgb24,rgb32 or --encodings=png) packet compression (which you can turn off by using -z 0) and the websockets mask code (if you're connecting using the html5 client then you can't disable it easily without using some kind of websocket proxy, ie nginx ).
This should narrow it down.


$ xpra --version
/home/kuhnchris/xpra-v62x/venv/bin/xpra:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
__import__('pkg_resources').run_script('xpra==6.2.4', 'xpra')

Your installation uses the botched setuptools generated script, use this cleaner one instead:
/bin/xpra

@kuhnchris
Copy link
Author

kuhnchris commented Feb 16, 2025

  • -z 0 -> no change fs/bin/xpra shadow :0 --resize-display=yes --bind-tcp=0.0.0.0:13335 -z 0
  • --encodings=rgb24,rgb32-> same fs/bin/xpra shadow :0 --resize-display=yes --bind-tcp=0.0.0.0:13335 --encodings=rgb24,rgb32 --debug debug
  • Same for combined. fs/bin/xpra shadow :0 --resize-display=yes --bind-tcp=0.0.0.0:13335 --encodings=rgb24,rgb32 -z 0 --debug debug

I do see 2025-02-16 15:59:15,880 automatic picture encoding enabled, also available:[[0m 2025-02-16 15:59:15,880 rgb24, rgb32[[0m
in the logs tho, is it worth trying disabling that too?
If there's anything I can do to help debugging the issue let me know!
Thanks

@totaam
Copy link
Collaborator

totaam commented Feb 17, 2025

automatic picture encoding enabled

That's correct.

is it worth trying disabling that too?

No, and you can't really do that anyway with plain rgb.

--debug debug

FYI: that's invalid, see xpra --debug help


Do you get the same crash re-connecting with the python client in tcp mode?
What about using the python client in websockets mode?

I have manually inspected the masking code which uses MemBuf:

def hybi_unmask(data, unsigned int offset, unsigned int datalen) -> SizedBuffer:

And I really can't see anything wrong with it.
I would have expected one of the many other platforms we support to have problems with it by now. (this code is ~10 years old)

Could it be that malloc-ng is causing problems with memalign here:
https://github.com/Xpra-org/xpra/blob/master/xpra/buffers/memalign.c
You could unset it here:
https://github.com/Xpra-org/xpra/blob/647f8a513ed393eb75c1fd037e748f5b81d6021f/xpra/buffers/memalign.c#L11C9-L11C17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants