Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge co_names and co_consts into a single array. #92

Closed
markshannon opened this issue Oct 7, 2021 · 13 comments
Closed

Merge co_names and co_consts into a single array. #92

markshannon opened this issue Oct 7, 2021 · 13 comments

Comments

@markshannon
Copy link
Member

Merging co_names and co_consts would have some significant benefits:

  1. It would make code objects smaller. There is unlikely to be much sharing between names and constants, but saving two tuples per code object is significant.
  2. It would free up a register in the interpreter which should be worth a few percent speed up (in the interpreter, less overall).
  3. By using an array, not a tuple, some lazy initialization of constants becomes possible. For example, a deep-frozen module might not be able to include frozen sets due to hashing; tuples could be stored and converted to frozen-sets on code object initialization.

The obvious downside is for functions where the total number of names and constants exceeds 256. I hypothesize that those functions are rare, but don't have data to confirm that.

This is probably best left until https://bugs.python.org/issue36521 has been resolved.

@iritkatriel
Copy link
Collaborator

Are there no backwards compatibility issues? They are both part of the api.

Also, would this make the consts not immutable from user code? That would probably be an api change.

@gvanrossum
Copy link
Collaborator

The obvious downside is for functions where the total number of names and constants exceeds 256. I hypothesize that those functions are rare, but don't have data to confirm that.

That would be easy to validate by counting these numbers in the top 100 or 5000 packages on GitHub.

@pxeger
Copy link

pxeger commented Oct 7, 2021

Using the top 1000 PyPI packages, I got 0.14% of code objects having more than 256 total names and constants combined.

names    proportion >=256: 0.000237037
consts   proportion >=256: 0.000597600
total    proportion >=256: 0.001360792

names    proportion >=65536: 0.000000000
consts   proportion >=65536: 0.000000000
total    proportion >=65536: 0.000000000

https://github.com/pxeger/pycstats


Edit: updated data with top 1000 packages

@gvanrossum
Copy link
Collaborator

But how often are they >= 255 together?

Also, it looks like to minimize extended-arg opcodes we need to place the names first. (The dynamic profile could theoretically be different, but that feels unlikely.)

@pxeger
Copy link

pxeger commented Oct 8, 2021

I'm not sure what you mean by "But how often are they >= 255 together?".

The 0.12% number is already the proportion for which len(co_consts) + len(co_names) >= 256, not just the sum of the proportions for names and consts separately, if that's what you meant

@gvanrossum
Copy link
Collaborator

Oh, d’oh. Of course. Sorry!

@markshannon
Copy link
Member Author

Are there no backwards compatibility issues? They are both part of the api.

Also, would this make the consts not immutable from user code? That would probably be an api change.

The array would be internal, so could not be mutated. co_consts and co_names would be lazily created if needed.

@gvanrossum
Copy link
Collaborator

The array would be internal, so could not be mutated. co_consts and co_names would be lazily created if needed.

Could you update the initial comment to clarify that (a lot)? This wasn't obvious to me at all (though I'm sure it was in your mind when you wrote it down :-).

@markshannon
Copy link
Member Author

I didn't want to shackle whoever was going to implement it :)

Now that I've given a bit of thought, here's another way to implement it:

The consts and names go in the same array (names first, but it doesn't really matter), and we store a pointer to the first const.
That way consts can be indexed forwards and names backwards.
We can still access 256 consts and 255 names before needing extended instructions. We already do this for the quickened code (instructions forwards, caches backwards).

E.g. names: "spam", "eggs". consts: None, 7.

array = { "spam", "eggs", None, 7}
pointer = &array[2];

@iritkatriel
Copy link
Collaborator

When I experimented with MAKE_INT there was a noticeable difference in micro benchmarks between the opcode doing small_ints[oparg] and small_ints[oparg-5].

@markshannon
Copy link
Member Author

@iritkatriel OOI what was the difference between the two?

@iritkatriel
Copy link
Collaborator

@iritkatriel OOI what was the difference between the two?

Start here: #65 (comment)

@gvanrossum
Copy link
Collaborator

Presumably small_ints[oparg-5] was slower because of the extra subtract?

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants