Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_repl's test_no_memory: AssertionError: -6 not found in (1, 120) #118331

Open
hugovk opened this issue Apr 26, 2024 · 7 comments
Open

test_repl's test_no_memory: AssertionError: -6 not found in (1, 120) #118331

hugovk opened this issue Apr 26, 2024 · 7 comments
Assignees
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@hugovk
Copy link
Member

hugovk commented Apr 26, 2024

Bug report

Bug description:

One of my PRs (#118283) started failing (https://github.com/python/cpython/actions/runs/8843882336?pr=118283) with this. Investigating, I can reproduce it locally on main, although I don't know why the CI passes on main.

OS: macOS Sonoma 14.4.1, M2.

To reproduce

git clone https://github.com/python/cpython
cd cpython
./configure --with-pydebug && make -j10
./python.exe Lib/test/test_repl.py

Actual result

.....F.
======================================================================
FAIL: test_no_memory (__main__.TestInteractiveInterpreter.test_no_memory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/tmp/cpython/Lib/test/test_repl.py", line 86, in test_no_memory
    self.assertIn(p.returncode, (1, 120))
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: -6 not found in (1, 120)

Printing output from the test:

Python 3.13.0a5+ (bisect/good-e16062dd3428a5846344e0a8c6ee2f352d34ce1b-1-gdf73179048:df73179048, A) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> >>> >>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    1/0
    ~^~
ZeroDivisionError: division by zero
>>> After the exception.
>>> object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b9a0
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
object address  : 0x102f8b930
object refcount : 3
object type     : 0x102b831a8
object type name: MemoryError
object repr     :
lost sys.stderr
Fatal Python error: _Py_Dealloc: Deallocator of type '_thread._localdummy' raised an exception
Python runtime state: finalizing (tstate=0x0000000102bf3c40)

Current thread 0x0000000201f77ac0 (most recent call first):
  <no Python frame>

Expected result

It passes with v3.13.0a5, where output is:

Python 3.13.0a5 (tags/v3.13.0a5:076d169ebb, Apr 26 2024, 22:13:38) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> >>> >>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    1/0
    ~^~
ZeroDivisionError: division by zero
>>> After the exception.
>>> object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f63a10
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr
object address  : 0x100f639a0
object refcount : 3
object type     : 0x100b5f1a8
object type name: MemoryError
object repr     : 
lost sys.stderr

The main difference is the failure includes this:

Fatal Python error: _Py_Dealloc: Deallocator of type '_thread._localdummy' raised an exception
Python runtime state: finalizing (tstate=0x0000000102bf3c40)

Current thread 0x0000000201f77ac0 (most recent call first):
  <no Python frame>

Bisecting

df7317904849a41d51db39d92c5d431a18e22637 is the first bad commit
commit df7317904849a41d51db39d92c5d431a18e22637
Author: mpage <[email protected]>
Date:   Mon Apr 8 07:58:38 2024 -0700

    gh-111926: Make weakrefs thread-safe in free-threaded builds (#117168)

    Most mutable data is protected by a striped lock that is keyed on the
    referenced object's address. The weakref's hash is protected using the
    weakref's per-object lock.

    Note that this only affects free-threaded builds. Apart from some minor
    refactoring, the added code is all either gated by `ifdef`s or is a no-op
    (e.g. `Py_BEGIN_CRITICAL_SECTION`).

 Include/cpython/weakrefobject.h                |   8 +
 Include/internal/pycore_interp.h               |   7 +
 Include/internal/pycore_object.h               |  40 +-
 Include/internal/pycore_pyatomic_ft_wrappers.h |   5 +
 Include/internal/pycore_weakref.h              |  73 +++-
 Lib/test/test_sys.py                           |   8 +-
 Lib/test/test_weakref.py                       |  19 +
 Modules/_sqlite/blob.c                         |   5 +-
 Modules/_sqlite/connection.c                   |   4 +-
 Modules/_ssl.c                                 |  13 +-
 Modules/_ssl/debughelpers.c                    |   6 +-
 Modules/_weakref.c                             |  42 +-
 Modules/clinic/_weakref.c.h                    |  20 +-
 Objects/dictobject.c                           |   8 +-
 Objects/typeobject.c                           |  12 +-
 Objects/weakrefobject.c                        | 537 ++++++++++++++-----------
 Python/pystate.c                               |   9 +
 17 files changed, 490 insertions(+), 326 deletions(-)
bisect found first bad commit

PR: #117168
Issue: #111926

The PR was merged three weeks ago and the CI is passing.


On my PR (https://github.com/python/cpython/actions/runs/8843882336?pr=118283), free-threaded builds pass, but regular ones fail. Ubuntu and macOS fail with:

AssertionError: -6 not found in (1, 120)

Windows with:

AssertionError: 3221225477 not found in (1, 120)

3221225477 seems to be 0xc0000005 STATUS_ACCESS_VIOLATION.

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

@hugovk hugovk added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes labels Apr 26, 2024
@colesbury
Copy link
Contributor

I am able to reproduce it locally when checking out #118283 (but not in main for whatever reason).

The bug I saw was due to _PyObject_SetManagedDict calling _PyDict_DetachFromObject, which sets an exception (due to the no-memory condition). The exception is not handled, which causes the fatal error in _Py_Dealloc.

We should call PyErr_WriteUnraisable() or further propagate up the error.

cc @DinoV (I think the relevant PR is #114742)

@colesbury colesbury self-assigned this Apr 26, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Apr 26, 2024
When detaching a dict, the copy_values call may fail due to
out-of-memory errors. This can be triggered by test_no_memory in
test_repl.
colesbury added a commit to colesbury/cpython that referenced this issue Apr 26, 2024
@mpage
Copy link
Contributor

mpage commented Apr 27, 2024

@hugovk - I still can't repro this after merging gh-118334, but I did take a look through the weakref code and found a preexisting bug that might be what you're hitting. Can you try merging gh-118338 along with gh-118334 and see if it fixes things?

@hugovk
Copy link
Member Author

hugovk commented Apr 27, 2024

Good news!

Thanks both!

mpage added a commit to mpage/cpython that referenced this issue Apr 29, 2024
colesbury pushed a commit that referenced this issue Apr 29, 2024
…ng weakrefs (#118338)

It's not safe to raise an exception in `PyObject_ClearWeakRefs()` if one
is not already set, since it may be called by `_Py_Dealloc()`, which
requires that the active exception does not change.

Additionally, make sure we clear the weakrefs even when tuple allocation
fails.
colesbury added a commit that referenced this issue Apr 29, 2024
When detaching a dict, the `copy_values` call may fail due to
out-of-memory errors. This can be triggered by test_no_memory in
test_repl.
@colesbury
Copy link
Contributor

@hugovk, both PRs are now merged. When you get a chance to merge main into your PR, would you please verify that everything is okay and close the issue?

@colesbury colesbury assigned hugovk and unassigned colesbury Apr 29, 2024
@hugovk
Copy link
Member Author

hugovk commented Apr 29, 2024

Passing now, thank you!

After the first run, 5 tests failed on "Windows (free-threading) / build and test (x86)" but passed when restarted.

https://github.com/python/cpython/actions/runs/8884999534/attempts/1

https://github.com/python/cpython/actions/runs/8884999534/job/24396359883

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Feb 23, 2025

It looks like this wasn't fully fixed, I'm seeing failures on my PR: #130469

The test case seems to pass on main, but my PR triggers the failure again for whatever reason. On main, actually running the snippet from the test in the REPL results in an assertion failure, and a hang on 3.13 release builds:

Python 3.14.0a5+ (heads/main:1e4a4344af4, Feb 23 2025, 07:51:14) [GCC 14.2.1 20250207] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
... 
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
                           ~^~
ZeroDivisionError: division by zero
python: Objects/object.c:2972: _Py_Dealloc: Assertion `tstate->current_frame == NULL || tstate->current_frame->stackpointer != NULL' failed.

@ZeroIntensity ZeroIntensity reopened this Feb 23, 2025
@picnixz picnixz added the 3.14 new features, bugs and security fixes label Feb 23, 2025
@mpage
Copy link
Contributor

mpage commented Feb 23, 2025

I think there are three separate issues on main that are being tickled. I haven't had a chance to investigate 3.13.

  1. _PyList_FromStackRefStealOnSuccess is marked as non-escaping. A quick workaround here is to remove it from the non-escaping list.
  2. take_ownership does not save and restore exceptions around the call to _PyFrame_GetFrameObject(). We should do that.
  3. list_dealloc does not set ob_item to NULL after freeing the items array (only affects the free-threaded build). We should also do that.

1. _PyList_FromStackRefStealOnSuccess

First, I think the assertion failure that @ZeroIntensity reported on the default build is because _PyList_FromStackRefStealOnSuccess is marked as non-escaping. We are attempting to allocate a new list (frame 7)

(rr) bt
#0  0x00007ffff7c8bdac in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7c3e886 in raise () from /lib64/libc.so.6
#2  0x00007ffff7c28833 in abort () from /lib64/libc.so.6
#3  0x00007ffff7c2875b in __assert_fail_base.cold () from /lib64/libc.so.6
#4  0x00007ffff7c375c6 in __assert_fail () from /lib64/libc.so.6
#5  0x000000000052d1f1 in _Py_Dealloc (op=op@entry=0x7ffff753e120) at ./Include/refcount.h:392
#6  0x00000000004f8c9c in Py_DECREF (op=0x7ffff753e120, lineno=262, filename=0x7cd6f7 "Objects/listobject.c") at ./Include/refcount.h:393
#7  PyList_New (size=size@entry=1) at Objects/listobject.c:262
#8  0x00000000004fbc4f in _PyList_FromStackRefStealOnSuccess (src=src@entry=0x7ffff7e1d588, n=n@entry=1) at Objects/listobject.c:3206
#9  0x000000000063f54e in _PyEval_EvalFrameDefault (tstate=tstate@entry=0xa5c950 <_PyRuntime+329808>, frame=0x7ffff7e1d4b0,

Allocating the list object succeeds because it is handled by the free list. Allocating the items array fails, and we decref the the list object that was returned from the free list:

op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
#endif
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();

The decref destroys the last reference, calls _Py_Dealloc, and we end up hitting the assertion:

cpython/Objects/object.c

Lines 2970 to 2972 in 1e4a434

/* This assertion doesn't hold for the free-threading build, as
* PyStackRef_CLOSE_SPECIALIZED is not implemented */
assert(tstate->current_frame == NULL || tstate->current_frame->stackpointer != NULL);

Since _PyList_FromStackRefStealOnSuccess is marked as non-escaping, the interpreter generator doesn't generate any code to set and clear the stack pointer around the call:

values = &stack_pointer[-oparg];
PyObject *list_o = _PyList_FromStackRefStealOnSuccess(values, oparg);
if (list_o == NULL) {

I'm not sure what the right solution is. I think _PyList_FromStackRefStealOnSuccess is correctly marked as non-escaping because the decref can't execute arbitrary code and it's the only decref that can be performed. Removing it from the list of non-escaping calls gets us past the assertion and probably isn't too bad for performance. I suspect that there are probably other functions that are marked as non-escaping with similar properties as _PyList_FromStackRefStealOnSuccess under low memory scenarios though and it doesn't seem great to have to pessimize those as well.

2. take_ownership

After working around the above, we hit another assertion that I believe is caused by unconditionally clearing MemoryErrors in take_ownership.

Python 3.14.0a5+ (tags/v3.14.0a5-162-g1e4a4344af4-dirty:1e4a4344af4, Feb 23 2025, 13:18:39) [GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
                           ~^~
ZeroDivisionError: division by zero
python: Python/generated_cases.c.h:11990: _PyEval_EvalFrameDefault: Assertion `_PyErr_Occurred(tstate)' failed.

Program received signal SIGABRT, Aborted.

We're triggering this assertion:

#else
assert(_PyErr_Occurred(tstate));
#endif

because no exception is set:

(rr) frame 5
#5  0x0000000000667db3 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0xa5c950 <_PyRuntime+329808>, frame=0x7fd7f802f400,
    frame@entry=0x7fd7f802f1d0, throwflag=throwflag@entry=0) at Python/generated_cases.c.h:11970
11970	            JUMP_TO_LABEL(error);
(rr) p tstate->current_exception
$2 = (PyObject *) 0x0
(rr)

Running backwards, we can see that the exception is cleared in take_ownership:

(rr) watch *(&tstate->current_exception)
Hardware watchpoint 1: *(&tstate->current_exception)
(rr) reverse-continue
Continuing.

Program received signal SIGABRT, Aborted.
0x00007fd7f7e8bdac in __pthread_kill_implementation () from /lib64/libc.so.6
(rr) reverse-continue
Continuing.

Hardware watchpoint 1: *(&tstate->current_exception)

Old value = (PyObject *) 0x0
New value = (PyObject *) 0x7fd7f7d9ba80
_PyErr_SetRaisedException (tstate=0xa5c950 <_PyRuntime+329808>, exc=exc@entry=0x0) at Python/errors.c:29
29	    tstate->current_exception = exc;
(rr) bt
#0  _PyErr_SetRaisedException (tstate=0xa5c950 <_PyRuntime+329808>, exc=exc@entry=0x0) at Python/errors.c:29
#1  0x000000000068f2fe in _PyErr_Restore (tstate=0xa5c950 <_PyRuntime+329808>, type=type@entry=0x0, value=value@entry=0x0,
    traceback=traceback@entry=0x0) at Python/errors.c:66
#2  0x000000000068f59f in _PyErr_Clear (tstate=<optimized out>) at Python/errors.c:535
#3  0x000000000068f67e in PyErr_Clear () at Python/errors.c:543
#4  0x000000000069a411 in take_ownership (f=f@entry=0x7fd7f787eb70, frame=0x7fd7f787ebb0, frame@entry=0x7fd7f802f4b0) at Python/frame.c:79
#5  0x000000000069a720 in _PyFrame_ClearExceptCode (frame=frame@entry=0x7fd7f802f4b0) at Python/frame.c:118
#6  0x0000000000635346 in clear_thread_frame (tstate=tstate@entry=0xa5c950 <_PyRuntime+329808>, frame=frame@entry=0x7fd7f802f4b0)
    at Python/ceval.c:1628
#7  0x0000000000638804 in _PyEval_FrameClearAndPop (tstate=tstate@entry=0xa5c950 <_PyRuntime+329808>, frame=frame@entry=0x7fd7f802f4b0)
    at Python/ceval.c:1652
#8  0x000000000063c21f in _PyEval_EvalFrameDefault (tstate=tstate@entry=0xa5c950 <_PyRuntime+329808>, frame=0x7fd7f802f400,
    frame@entry=0x7fd7f802f1d0, throwflag=throwflag@entry=0) at Python/generated_cases.c.h:12077

take_ownership clears any MemoryError exceptions when getting the previous frame fails (even if they were not raised by the call to _PyFrame_GetFrameObject):

cpython/Python/frame.c

Lines 74 to 79 in 1e4a434

PyFrameObject *back = _PyFrame_GetFrameObject(prev);
if (back == NULL) {
/* Memory error here. */
assert(PyErr_ExceptionMatches(PyExc_MemoryError));
/* Nothing we can do about it */
PyErr_Clear();

In this case, there was a pre-existing MemoryError (caused by failing to allocate the list) that shouldn't be cleared. I think we should save and restore the exception around the call to _PyFrame_GetFrameObject(prev);. Something like:

PyObject *exc = PyErr_GetRaisedException();
PyFrameObject *back = _PyFrame_GetFrameObject(prev);
if (back == NULL) {
    /* Memory error here. */
    assert(PyErr_ExceptionMatches(PyExc_MemoryError));
    /* Nothing we can do about it */
    PyErr_Clear();
}
else {
    f->f_back = (PyFrameObject *)Py_NewRef(back);
}
PyErr_SetRaisedException(exc);

After this, it looks like things are working again on the default build:

> ./python
Python 3.14.0a5+ (tags/v3.14.0a5-162-g1e4a4344af4-dirty:1e4a4344af4, Feb 23 2025, 14:10:04) [GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
                           ~^~
ZeroDivisionError: division by zero
object address  : 0x7fc758b9ba80
object refcount : 3
object type     : 0x9d6060
object type name: MemoryError
object repr     :
lost sys.stderr
>

3. list_dealloc

It looks like there's another issue in the free-threaded build that we hit:

(rr) c
Continuing.
Python 3.14.0a5+ experimental free-threading build (tags/v3.14.0a5-162-g1e4a4344af4-dirty:1e4a4344af4, Feb 23 2025, 14:14:18) [GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
                           ~^~
ZeroDivisionError: division by zero

Program received signal SIGSEGV, Segmentation fault.
Py_XDECREF (op=0xdddddddddddddddd) at ./Include/refcount.h:502
502	        Py_DECREF(op);
(rr) bt
#0  Py_XDECREF (op=0xdddddddddddddddd) at ./Include/refcount.h:502
#1  list_dealloc (self=0x200008ebcb0) at Objects/listobject.c:533
#2  0x000000000053ec5c in _Py_Dealloc (op=0x200008ebcb0) at Objects/object.c:2986
#3  0x000000000053edd1 in _Py_MergeZeroLocalRefcount (op=op@entry=0x200008ebcb0) at Objects/object.c:436
#4  0x0000000000504e29 in Py_DECREF (op=0x200008ebcb0, lineno=253, filename=0x817777 "Objects/listobject.c") at ./Include/refcount.h:343
#5  PyList_New (size=size@entry=1) at Objects/listobject.c:253
#6  0x0000000000509b36 in _PyList_FromStackRefStealOnSuccess (src=src@entry=0x7f2791235588, n=n@entry=1) at Objects/listobject.c:3206
#7  0x000000000065de54 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0xabef70 <_PyRuntime+360560>, frame=0x7f27912354b0,
    frame@entry=0x7f27912351d0, throwflag=throwflag@entry=0) at Python/generated_cases.c.h:1019

Again, we're attempting to allocate a new list. Allocating the list object succeeds because it's returned from the free list, but allocating the items array fails:

#ifdef Py_GIL_DISABLED
_PyListArray *array = list_allocate_array(size);
if (array == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}

Since we hold the only reference, the decref ends up calling list_dealloc, which is attempting to decref items in the items array, which has been filled with poisoned values since it was already freed:

if (op->ob_item != NULL) {
/* Do it backwards, for Christian Tismer.
There's a simple test case where somehow this reduces
thrashing when a *very* large list is created and
immediately deleted. */
i = Py_SIZE(op);
while (--i >= 0) {
Py_XDECREF(op->ob_item[i]);
}
free_list_items(op->ob_item, false);
}

I think setting setting ob_item to NULL after we free the items array should fix the problem. Something like:

if (op->ob_item != NULL) {
    /* Do it backwards, for Christian Tismer.
       There's a simple test case where somehow this reduces
       thrashing when a *very* large list is created and
       immediately deleted. */
    i = Py_SIZE(op);
    while (--i >= 0) {
        Py_XDECREF(op->ob_item[i]);
    }
    free_list_items(op->ob_item, false);
    op->ob_item = NULL;
}

After this, the free-threaded build no longer crashes:

> ./python
Python 3.14.0a5+ experimental free-threading build (tags/v3.14.0a5-162-g1e4a4344af4-dirty:1e4a4344af4, Feb 23 2025, 14:29:01) [GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import sys, _testcapi; 1/0; _testcapi.set_nomemory(0); sys.exit(0)
                           ~^~
ZeroDivisionError: division by zero
object address  : 0x20000136b10
object refcount : 3
object type     : 0xa2f520
object type name: MemoryError
object repr     :
lost sys.stderr
>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants