add `Py2` as an internal API for optimization and dogfooding #3445

davidhewitt · 2023-09-10T20:13:03Z

This PR adds the Py2 struct and PyAnyMethods trait from #3361 as private APIs which we can use internally.

The idea is that:

we can still benefit from the performance advantages of not touching the pool where possible internally
this lets us start learning about this API so that we can completely satisfy ourselves it's the right API for users
it lets us start making incremental progress with migrating to the new API in reviewable steps

mejrs

This is a good idea, thanks for biting the bullet. I'm still trying to think of a good name...

Some nits below...

src/instance.rs

davidhewitt · 2023-09-13T18:25:51Z

If there are no immediate concerns, I would be keen to merge this now I've pushed the coverage up. I think it'll be useful to keep anything else we build optimal.

adamreichold · 2023-09-13T18:37:52Z

If there are no immediate concerns, I would be keen to merge this now I've pushed the coverage up. I think it'll be useful to keep anything else we build optimal.

No concerns, but also did not find any time to actually look at this specific PR.

davidhewitt · 2023-09-13T18:39:25Z

Would you like me to hold until you've read it? I can also amend post merge if you're happy with that route.

adamreichold · 2023-09-13T18:42:02Z

Is it really pressing, i.e. will this be part of 0.20? I suspect that this is basically what we discussed elsewhere quite in depth, so I am not particularly anxious. But I also do not really see a reason we would need to rush this?

davidhewitt · 2023-09-13T19:08:06Z

No urgency other than that I'm enthusiastic to make stuff go faster 😂 There's no API changes here though, so this can wait until 0.20.1 if needed. For 0.20 I'm just looking at various recent bug reports to understand if any need breaking changes to fix, and otherwise I'm hoping we can release within maybe a week.

davidhewitt · 2023-10-12T21:30:10Z

Revisiting this now that 0.20 is released, I benchmarked this branch against main (raw numbers plus + diff):



          identify_object_type: [195.49 ns 196.00 ns 196.52 ns]
                                [-0.3530% +0.2300% +0.7163%] (p = 0.43 > 0.05)

      collect_generic_iterator: [7.5128 ms 7.5424 ms 7.5733 ms]
                                [-3.0232% -2.3853% -1.7609%] (p = 0.00 < 0.05)

                        call_0: [20.362 µs 20.388 µs 20.422 µs]
                                [+0.2598% +0.4492% +0.6769%] (p = 0.00 < 0.05)

                 call_method_0: [70.852 µs 70.907 µs 70.971 µs]
                                [-2.5288% -2.3959% -2.2817%] (p = 0.00 < 0.05)

        ordered_dunder_methods: [36.186 ns 36.200 ns 36.213 ns]
                                [-22.930% -22.817% -22.713%] (p = 0.00 < 0.05)

               ordered_richcmp: [36.631 ns 36.649 ns 36.666 ns]
                                [-22.134% -22.049% -21.959%] (p = 0.00 < 0.05)

     err_new_restore_and_fetch: [111.64 ns 111.73 ns 111.82 ns]
                                [-0.3089% -0.0347% +0.2335%] (p = 0.80 > 0.05)

           err_new_without_gil: [16.658 ns 16.760 ns 16.856 ns]
                                [-4.6907% -4.0453% -3.3965%] (p = 0.00 < 0.05)

   extract_str_extract_success: [3.7456 ns 3.7469 ns 3.7482 ns]
                                [-0.5158% -0.3141% -0.1338%] (p = 0.00 < 0.05)

      extract_str_extract_fail: [25.260 ns 25.321 ns 25.381 ns]
                                [-1.2420% -0.9741% -0.6840%] (p = 0.00 < 0.05)
                                
  extract_str_downcast_success: [3.1552 ns 3.1602 ns 3.1665 ns]
                                [-0.2737% -0.0839% +0.0852%] (p = 0.37 > 0.05)

     extract_str_downcast_fail: [2.2645 ns 2.2678 ns 2.2717 ns]
                                [+0.0363% +0.1339% +0.2700%] (p = 0.02 < 0.05)

   extract_int_extract_success: [4.4889 ns 4.4962 ns 4.5040 ns]
                                [-0.0016% +0.1069% +0.2132%] (p = 0.04 < 0.05)

      extract_int_extract_fail: [110.09 ns 110.16 ns 110.23 ns]
                                [-0.8899% -0.7757% -0.6548%] (p = 0.00 < 0.05)
                                
  extract_int_downcast_success: [4.6468 ns 4.6485 ns 4.6502 ns]
                                [-0.2473% -0.1213% -0.0161%] (p = 0.04 < 0.05)

     extract_int_downcast_fail: [2.2598 ns 2.2608 ns 2.2617 ns]
                                [+1.1040% +1.1620% +1.2187%] (p = 0.00 < 0.05)

 extract_float_extract_success: [3.5183 ns 3.5197 ns 3.5210 ns]
                                [-0.0716% -0.0015% +0.0669%] (p = 0.97 > 0.05)

    extract_float_extract_fail: [94.550 ns 94.627 ns 94.709 ns]
                                [-0.4231% -0.2887% -0.1431%] (p = 0.00 < 0.05)

extract_float_downcast_success: [3.5179 ns 3.5194 ns 3.5210 ns]
                                [-0.0914% -0.0166% +0.0618%] (p = 0.68 > 0.05)

   extract_float_downcast_fail: [2.2483 ns 2.2502 ns 2.2523 ns]
                                [-0.0554% +0.0402% +0.1338%] (p = 0.43 > 0.05)

            enum_from_pyobject: [570.63 ns 570.99 ns 571.36 ns]
                                [-0.2673% -0.1184% +0.0376%] (p = 0.14 > 0.05)

             list_via_downcast: [373.75 ps 373.89 ps 374.03 ps]
                                [-0.1157% -0.0436% +0.0332%] (p = 0.26 > 0.05)

              list_via_extract: [1.2798 ns 1.2802 ns 1.2807 ns]
                                [-0.0320% +0.0330% +0.0964%] (p = 0.31 > 0.05)

       not_a_list_via_downcast: [373.79 ps 373.95 ps 374.12 ps]
                                [-0.0807% +0.0020% +0.0825%] (p = 0.96 > 0.05)

        not_a_list_via_extract: [22.195 ns 22.243 ns 22.291 ns]
                                [-0.8011% -0.6032% -0.4014%] (p = 0.00 < 0.05)

   not_a_list_via_extract_enum: [332.53 ns 332.72 ns 332.92 ns]
                                [-1.8772% -1.6851% -1.4651%] (p = 0.00 < 0.05)

             f64_from_pyobject: [3.5184 ns 3.5195 ns 3.5207 ns]
                                [-0.1180% -0.0424% +0.0303%] (p = 0.27 > 0.05)

             clean_gilpool_new: [13.690 ns 13.696 ns 13.702 ns]
                                [-0.0077% +0.0706% +0.1515%] (p = 0.08 > 0.05)

             clean_acquire_gil: [53.250 ns 53.270 ns 53.290 ns]
                                [-0.0817% -0.0104% +0.0584%] (p = 0.78 > 0.05)

             dirty_acquire_gil: [58.179 ns 58.256 ns 58.349 ns]
                                [-0.0564% +0.1029% +0.2864%] (p = 0.23 > 0.05)

                getattr_direct: [70.388 ns 70.437 ns 70.487 ns]
                                [-7.6346% -7.3034% -6.9781%] (p = 0.00 < 0.05)

                getattr_intern: [16.056 ns 16.072 ns 16.088 ns]
                                [-10.832% -10.727% -10.626%] (p = 0.00 < 0.05)

                     iter_list: [903.68 µs 906.28 µs 908.90 µs]
                                [-0.1725% +0.1213% +0.4078%] (p = 0.42 > 0.05)

                      list_new: [438.67 µs 439.40 µs 440.24 µs]
                                [-0.8039% -0.2354% +0.2559%] (p = 0.41 > 0.05)

                 list_get_item: [512.93 µs 513.48 µs 514.07 µs]
                                [-3.7039% -3.5536% -3.4087%] (p = 0.00 < 0.05)

       list_get_item_unchecked: [451.73 µs 452.50 µs 453.32 µs]
                                [-0.3969% -0.2051% -0.0033%] (p = 0.05 < 0.05)

            sequence_from_list: [1.2798 ns 1.2802 ns 1.2807 ns]
                                [-0.2589% -0.1713% -0.0898%] (p = 0.00 < 0.05)

               first_time_init: [1.6841 µs 1.6859 µs 1.6878 µs]
                                [-0.8102% -0.4418% -0.1040%] (p = 0.01 < 0.05)

             drop_many_objects: [4.7113 µs 4.7126 µs 4.7139 µs]
                                [-0.7234% -0.4719% -0.2567%] (p = 0.00 < 0.05)

                    iter_tuple: [495.81 µs 496.14 µs 496.49 µs]
                                [-0.9445% -0.6637% -0.4283%] (p = 0.00 < 0.05)

                     tuple_new: [434.86 µs 435.16 µs 435.45 µs]
                                [-1.0320% -0.3762% +0.2590%] (p = 0.27 > 0.05)

                tuple_get_item: [354.50 µs 354.81 µs 355.19 µs]
                                [-0.0440% +0.0855% +0.2054%] (p = 0.18 > 0.05)

      tuple_get_item_unchecked: [289.84 µs 290.88 µs 292.63 µs]
                                [+0.0291% +0.2019% +0.5000%] (p = 0.09 > 0.05)

           sequence_from_tuple: [1.2791 ns 1.2796 ns 1.2801 ns]
                                [-0.0385% +0.0173% +0.0750%] (p = 0.55 > 0.05)

                tuple_new_list: [166.36 µs 166.47 µs 166.57 µs]
                                [-0.0515% +0.0184% +0.0907%] (p = 0.62 > 0.05)

                 tuple_to_list: [120.33 µs 120.80 µs 121.10 µs]
                                [-0.9240% -0.5622% -0.2445%] (p = 0.00 < 0.05)

                 tuple_into_py: [41.072 ns 41.093 ns 41.117 ns]
                                [-0.7140% -0.6204% -0.5210%] (p = 0.00 < 0.05)

I took a look into a couple of the bigger wins, and here's my explanation for them:

ordered_dunder_methods / ordered_richcmp - this -22% change is because PyAny::gt no longer needs to put the intermediate value from PyAnyMethods::rich_compare onto the GILPool, plus we use Py2<'_, PyAny> instead of PyObject instead of the fn inner, which is a little faster to drop because it doesn't need to check if the GIL is acquired.
getattr_direct / getattr_intern - I think this is mostly accounted for by being able to use Py2<'_, PyString> instead of Py<PyString> as the container for the attribute name, which again is faster to drop for the same reason.

davidhewitt · 2023-10-12T21:32:41Z

Overall, it looks like this PR is managing to already make a slight improvement to benchmarks. I'd really love to merge this and start submitting follow-up PRs to rework the rest of PyO3's types (internally) over the next patch release cycle. @adamreichold perhaps if you're still short on time at the moment, you'll forgive me if I merge this and get on with optimising? 🙏

davidhewitt · 2023-10-12T21:34:57Z

(force-pushed to rebase)

mejrs

I'm not adamreichold but I'm happy if you merge :)

davidhewitt · 2023-10-12T23:00:09Z

An approval is an approval, thanks! I can always submit follow-up PRs if @adamreichold wants to see changes later. I expect there will be a fair bit we'll learn by making these internals changes.

🚀

mejrs · 2023-10-12T23:42:49Z

Now that I've read all the discussion again my preference is to call it GIL (or Gil) once we release it.

davidhewitt · 2023-10-13T08:05:13Z

Now that I've read all the discussion again my preference is to call it GIL (or Gil) once we release it.

The only reservation I have about the GIL name is that it doesn't make sense in a nogil world, so we have to break users again with a big rename. I sorta hope that we can pick a name which is good for the future, though maybe I am getting ahead of myself.

adamreichold · 2023-10-13T11:18:50Z

An approval is an approval, thanks! I can always submit follow-up PRs if @adamreichold wants to see changes later. I expect there will be a fair bit we'll learn by making these internals changes.

As you probably noticed, I am currently very much swamped by $DAYJOB and while I try to keep reading and take care of that little NumPy thing, I am rather happy if things are not blocked by me.

And yes, a review is a review is a review.

davidhewitt added the CI-skip-changelog Skip checking changelog entry label Sep 10, 2023

davidhewitt force-pushed the py2-internals branch from 26cd4eb to a6a8ab0 Compare September 10, 2023 20:20

adamreichold mentioned this pull request Sep 11, 2023

Support sub-interpreters #576

Open

mejrs reviewed Sep 11, 2023

View reviewed changes

src/instance.rs Outdated Show resolved Hide resolved

src/instance.rs Show resolved Hide resolved

src/instance.rs Outdated Show resolved Hide resolved

davidhewitt force-pushed the py2-internals branch 2 times, most recently from 958b8ee to 56463b3 Compare September 13, 2023 12:03

davidhewitt force-pushed the py2-internals branch from 56463b3 to 3f6089f Compare October 12, 2023 21:34

mejrs approved these changes Oct 12, 2023

View reviewed changes

davidhewitt enabled auto-merge October 12, 2023 23:00

davidhewitt added this pull request to the merge queue Oct 12, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 12, 2023

add Py2 as an internal API for optimization and dogfooding

cac95f3

davidhewitt force-pushed the py2-internals branch from 454c923 to cac95f3 Compare October 13, 2023 06:10

davidhewitt enabled auto-merge October 13, 2023 06:11

davidhewitt added this pull request to the merge queue Oct 13, 2023

Merged via the queue into PyO3:main with commit db13a97 Oct 13, 2023

davidhewitt deleted the py2-internals branch October 13, 2023 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `Py2` as an internal API for optimization and dogfooding #3445

add `Py2` as an internal API for optimization and dogfooding #3445

davidhewitt commented Sep 10, 2023

mejrs left a comment •

edited

Loading

davidhewitt commented Sep 13, 2023

adamreichold commented Sep 13, 2023

davidhewitt commented Sep 13, 2023

adamreichold commented Sep 13, 2023

davidhewitt commented Sep 13, 2023

davidhewitt commented Oct 12, 2023

davidhewitt commented Oct 12, 2023

davidhewitt commented Oct 12, 2023

mejrs left a comment

davidhewitt commented Oct 12, 2023

mejrs commented Oct 12, 2023 •

edited

Loading

davidhewitt commented Oct 13, 2023

adamreichold commented Oct 13, 2023

add Py2 as an internal API for optimization and dogfooding #3445

add Py2 as an internal API for optimization and dogfooding #3445

Conversation

davidhewitt commented Sep 10, 2023

mejrs left a comment • edited Loading

Choose a reason for hiding this comment

davidhewitt commented Sep 13, 2023

adamreichold commented Sep 13, 2023

davidhewitt commented Sep 13, 2023

adamreichold commented Sep 13, 2023

davidhewitt commented Sep 13, 2023

davidhewitt commented Oct 12, 2023

davidhewitt commented Oct 12, 2023

davidhewitt commented Oct 12, 2023

mejrs left a comment

Choose a reason for hiding this comment

davidhewitt commented Oct 12, 2023

mejrs commented Oct 12, 2023 • edited Loading

davidhewitt commented Oct 13, 2023

adamreichold commented Oct 13, 2023

add `Py2` as an internal API for optimization and dogfooding #3445

add `Py2` as an internal API for optimization and dogfooding #3445

mejrs left a comment •

edited

Loading

mejrs commented Oct 12, 2023 •

edited

Loading