Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Py2 as an internal API for optimization and dogfooding #3445

Merged
merged 1 commit into from
Oct 13, 2023

Conversation

davidhewitt
Copy link
Member

This PR adds the Py2 struct and PyAnyMethods trait from #3361 as private APIs which we can use internally.

The idea is that:

  • we can still benefit from the performance advantages of not touching the pool where possible internally
  • this lets us start learning about this API so that we can completely satisfy ourselves it's the right API for users
  • it lets us start making incremental progress with migrating to the new API in reviewable steps

@davidhewitt davidhewitt added the CI-skip-changelog Skip checking changelog entry label Sep 10, 2023
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, thanks for biting the bullet. I'm still trying to think of a good name...

Some nits below...

@davidhewitt davidhewitt force-pushed the py2-internals branch 2 times, most recently from 958b8ee to 56463b3 Compare September 13, 2023 12:03
@davidhewitt
Copy link
Member Author

If there are no immediate concerns, I would be keen to merge this now I've pushed the coverage up. I think it'll be useful to keep anything else we build optimal.

@adamreichold
Copy link
Member

If there are no immediate concerns, I would be keen to merge this now I've pushed the coverage up. I think it'll be useful to keep anything else we build optimal.

No concerns, but also did not find any time to actually look at this specific PR.

@davidhewitt
Copy link
Member Author

Would you like me to hold until you've read it? I can also amend post merge if you're happy with that route.

@adamreichold
Copy link
Member

Is it really pressing, i.e. will this be part of 0.20? I suspect that this is basically what we discussed elsewhere quite in depth, so I am not particularly anxious. But I also do not really see a reason we would need to rush this?

@davidhewitt
Copy link
Member Author

No urgency other than that I'm enthusiastic to make stuff go faster 😂 There's no API changes here though, so this can wait until 0.20.1 if needed. For 0.20 I'm just looking at various recent bug reports to understand if any need breaking changes to fix, and otherwise I'm hoping we can release within maybe a week.

@davidhewitt
Copy link
Member Author

Revisiting this now that 0.20 is released, I benchmarked this branch against main (raw numbers plus + diff):



          identify_object_type: [195.49 ns 196.00 ns 196.52 ns]
                                [-0.3530% +0.2300% +0.7163%] (p = 0.43 > 0.05)

      collect_generic_iterator: [7.5128 ms 7.5424 ms 7.5733 ms]
                                [-3.0232% -2.3853% -1.7609%] (p = 0.00 < 0.05)

                        call_0: [20.362 µs 20.388 µs 20.422 µs]
                                [+0.2598% +0.4492% +0.6769%] (p = 0.00 < 0.05)

                 call_method_0: [70.852 µs 70.907 µs 70.971 µs]
                                [-2.5288% -2.3959% -2.2817%] (p = 0.00 < 0.05)

        ordered_dunder_methods: [36.186 ns 36.200 ns 36.213 ns]
                                [-22.930% -22.817% -22.713%] (p = 0.00 < 0.05)

               ordered_richcmp: [36.631 ns 36.649 ns 36.666 ns]
                                [-22.134% -22.049% -21.959%] (p = 0.00 < 0.05)

     err_new_restore_and_fetch: [111.64 ns 111.73 ns 111.82 ns]
                                [-0.3089% -0.0347% +0.2335%] (p = 0.80 > 0.05)

           err_new_without_gil: [16.658 ns 16.760 ns 16.856 ns]
                                [-4.6907% -4.0453% -3.3965%] (p = 0.00 < 0.05)

   extract_str_extract_success: [3.7456 ns 3.7469 ns 3.7482 ns]
                                [-0.5158% -0.3141% -0.1338%] (p = 0.00 < 0.05)

      extract_str_extract_fail: [25.260 ns 25.321 ns 25.381 ns]
                                [-1.2420% -0.9741% -0.6840%] (p = 0.00 < 0.05)
                                
  extract_str_downcast_success: [3.1552 ns 3.1602 ns 3.1665 ns]
                                [-0.2737% -0.0839% +0.0852%] (p = 0.37 > 0.05)

     extract_str_downcast_fail: [2.2645 ns 2.2678 ns 2.2717 ns]
                                [+0.0363% +0.1339% +0.2700%] (p = 0.02 < 0.05)

   extract_int_extract_success: [4.4889 ns 4.4962 ns 4.5040 ns]
                                [-0.0016% +0.1069% +0.2132%] (p = 0.04 < 0.05)

      extract_int_extract_fail: [110.09 ns 110.16 ns 110.23 ns]
                                [-0.8899% -0.7757% -0.6548%] (p = 0.00 < 0.05)
                                
  extract_int_downcast_success: [4.6468 ns 4.6485 ns 4.6502 ns]
                                [-0.2473% -0.1213% -0.0161%] (p = 0.04 < 0.05)

     extract_int_downcast_fail: [2.2598 ns 2.2608 ns 2.2617 ns]
                                [+1.1040% +1.1620% +1.2187%] (p = 0.00 < 0.05)

 extract_float_extract_success: [3.5183 ns 3.5197 ns 3.5210 ns]
                                [-0.0716% -0.0015% +0.0669%] (p = 0.97 > 0.05)

    extract_float_extract_fail: [94.550 ns 94.627 ns 94.709 ns]
                                [-0.4231% -0.2887% -0.1431%] (p = 0.00 < 0.05)

extract_float_downcast_success: [3.5179 ns 3.5194 ns 3.5210 ns]
                                [-0.0914% -0.0166% +0.0618%] (p = 0.68 > 0.05)

   extract_float_downcast_fail: [2.2483 ns 2.2502 ns 2.2523 ns]
                                [-0.0554% +0.0402% +0.1338%] (p = 0.43 > 0.05)

            enum_from_pyobject: [570.63 ns 570.99 ns 571.36 ns]
                                [-0.2673% -0.1184% +0.0376%] (p = 0.14 > 0.05)

             list_via_downcast: [373.75 ps 373.89 ps 374.03 ps]
                                [-0.1157% -0.0436% +0.0332%] (p = 0.26 > 0.05)

              list_via_extract: [1.2798 ns 1.2802 ns 1.2807 ns]
                                [-0.0320% +0.0330% +0.0964%] (p = 0.31 > 0.05)

       not_a_list_via_downcast: [373.79 ps 373.95 ps 374.12 ps]
                                [-0.0807% +0.0020% +0.0825%] (p = 0.96 > 0.05)

        not_a_list_via_extract: [22.195 ns 22.243 ns 22.291 ns]
                                [-0.8011% -0.6032% -0.4014%] (p = 0.00 < 0.05)

   not_a_list_via_extract_enum: [332.53 ns 332.72 ns 332.92 ns]
                                [-1.8772% -1.6851% -1.4651%] (p = 0.00 < 0.05)

             f64_from_pyobject: [3.5184 ns 3.5195 ns 3.5207 ns]
                                [-0.1180% -0.0424% +0.0303%] (p = 0.27 > 0.05)

             clean_gilpool_new: [13.690 ns 13.696 ns 13.702 ns]
                                [-0.0077% +0.0706% +0.1515%] (p = 0.08 > 0.05)

             clean_acquire_gil: [53.250 ns 53.270 ns 53.290 ns]
                                [-0.0817% -0.0104% +0.0584%] (p = 0.78 > 0.05)

             dirty_acquire_gil: [58.179 ns 58.256 ns 58.349 ns]
                                [-0.0564% +0.1029% +0.2864%] (p = 0.23 > 0.05)

                getattr_direct: [70.388 ns 70.437 ns 70.487 ns]
                                [-7.6346% -7.3034% -6.9781%] (p = 0.00 < 0.05)

                getattr_intern: [16.056 ns 16.072 ns 16.088 ns]
                                [-10.832% -10.727% -10.626%] (p = 0.00 < 0.05)

                     iter_list: [903.68 µs 906.28 µs 908.90 µs]
                                [-0.1725% +0.1213% +0.4078%] (p = 0.42 > 0.05)

                      list_new: [438.67 µs 439.40 µs 440.24 µs]
                                [-0.8039% -0.2354% +0.2559%] (p = 0.41 > 0.05)

                 list_get_item: [512.93 µs 513.48 µs 514.07 µs]
                                [-3.7039% -3.5536% -3.4087%] (p = 0.00 < 0.05)

       list_get_item_unchecked: [451.73 µs 452.50 µs 453.32 µs]
                                [-0.3969% -0.2051% -0.0033%] (p = 0.05 < 0.05)

            sequence_from_list: [1.2798 ns 1.2802 ns 1.2807 ns]
                                [-0.2589% -0.1713% -0.0898%] (p = 0.00 < 0.05)

               first_time_init: [1.6841 µs 1.6859 µs 1.6878 µs]
                                [-0.8102% -0.4418% -0.1040%] (p = 0.01 < 0.05)

             drop_many_objects: [4.7113 µs 4.7126 µs 4.7139 µs]
                                [-0.7234% -0.4719% -0.2567%] (p = 0.00 < 0.05)

                    iter_tuple: [495.81 µs 496.14 µs 496.49 µs]
                                [-0.9445% -0.6637% -0.4283%] (p = 0.00 < 0.05)

                     tuple_new: [434.86 µs 435.16 µs 435.45 µs]
                                [-1.0320% -0.3762% +0.2590%] (p = 0.27 > 0.05)

                tuple_get_item: [354.50 µs 354.81 µs 355.19 µs]
                                [-0.0440% +0.0855% +0.2054%] (p = 0.18 > 0.05)

      tuple_get_item_unchecked: [289.84 µs 290.88 µs 292.63 µs]
                                [+0.0291% +0.2019% +0.5000%] (p = 0.09 > 0.05)

           sequence_from_tuple: [1.2791 ns 1.2796 ns 1.2801 ns]
                                [-0.0385% +0.0173% +0.0750%] (p = 0.55 > 0.05)

                tuple_new_list: [166.36 µs 166.47 µs 166.57 µs]
                                [-0.0515% +0.0184% +0.0907%] (p = 0.62 > 0.05)

                 tuple_to_list: [120.33 µs 120.80 µs 121.10 µs]
                                [-0.9240% -0.5622% -0.2445%] (p = 0.00 < 0.05)

                 tuple_into_py: [41.072 ns 41.093 ns 41.117 ns]
                                [-0.7140% -0.6204% -0.5210%] (p = 0.00 < 0.05)

I took a look into a couple of the bigger wins, and here's my explanation for them:

  • ordered_dunder_methods / ordered_richcmp - this -22% change is because PyAny::gt no longer needs to put the intermediate value from PyAnyMethods::rich_compare onto the GILPool, plus we use Py2<'_, PyAny> instead of PyObject instead of the fn inner, which is a little faster to drop because it doesn't need to check if the GIL is acquired.
  • getattr_direct / getattr_intern - I think this is mostly accounted for by being able to use Py2<'_, PyString> instead of Py<PyString> as the container for the attribute name, which again is faster to drop for the same reason.

@davidhewitt
Copy link
Member Author

Overall, it looks like this PR is managing to already make a slight improvement to benchmarks. I'd really love to merge this and start submitting follow-up PRs to rework the rest of PyO3's types (internally) over the next patch release cycle. @adamreichold perhaps if you're still short on time at the moment, you'll forgive me if I merge this and get on with optimising? 🙏

@davidhewitt
Copy link
Member Author

(force-pushed to rebase)

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not adamreichold but I'm happy if you merge :)

@davidhewitt
Copy link
Member Author

An approval is an approval, thanks! I can always submit follow-up PRs if @adamreichold wants to see changes later. I expect there will be a fair bit we'll learn by making these internals changes.

🚀

@davidhewitt davidhewitt enabled auto-merge October 12, 2023 23:00
@davidhewitt davidhewitt added this pull request to the merge queue Oct 12, 2023
@mejrs
Copy link
Member

mejrs commented Oct 12, 2023

Now that I've read all the discussion again my preference is to call it GIL (or Gil) once we release it.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 12, 2023
@davidhewitt davidhewitt enabled auto-merge October 13, 2023 06:11
@davidhewitt davidhewitt added this pull request to the merge queue Oct 13, 2023
Merged via the queue into PyO3:main with commit db13a97 Oct 13, 2023
@davidhewitt davidhewitt deleted the py2-internals branch October 13, 2023 07:44
@davidhewitt
Copy link
Member Author

Now that I've read all the discussion again my preference is to call it GIL (or Gil) once we release it.

The only reservation I have about the GIL name is that it doesn't make sense in a nogil world, so we have to break users again with a big rename. I sorta hope that we can pick a name which is good for the future, though maybe I am getting ahead of myself.

@adamreichold
Copy link
Member

An approval is an approval, thanks! I can always submit follow-up PRs if @adamreichold wants to see changes later. I expect there will be a fair bit we'll learn by making these internals changes.

As you probably noticed, I am currently very much swamped by $DAYJOB and while I try to keep reading and take care of that little NumPy thing, I am rather happy if things are not blocked by me.

And yes, a review is a review is a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-skip-changelog Skip checking changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants