Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling and performance improvements #6

Open
krackers opened this issue Aug 26, 2023 · 19 comments
Open

Profiling and performance improvements #6

krackers opened this issue Aug 26, 2023 · 19 comments

Comments

@krackers
Copy link

krackers commented Aug 26, 2023

seq 1 235976 | python3 -m cProfile (which pyp) "p | pp[-1]"

shows a lot of time (~20 out of 28 seconds) is spent in string_splitter. Some of this might be unavoidable due to python slowness, but I wonder if there's some cleverness we can do to avoid such heavy performance?

The slowdown is the particular line

split_variables = dict((x, PypList([PypStr(y) for y in split_variables_raw[x]])) for x in  split_variables_raw)

It's not clear to me what exactly it's doing. If that is removed, then we get the following heavy hitters:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   235977    1.415    0.000    7.143    0.000 pyp:1940(process_line)
   235978    0.758    0.000    2.128    0.000 pyp:1561(update_history)
   235982    0.704    0.000    0.724    0.000 {built-in method builtins.eval}
   235977    0.627    0.000    0.743    0.000 pyp:1385(translate_preset_variables)
   235978    0.603    0.000    0.603    0.000 {built-in method maketrans}
  4955517    0.539    0.000    0.539    0.000 pyp:1982(<genexpr>)
   235977    0.363    0.000    0.881    0.000 pyp:1218(string_splitter)
   235977    0.335    0.000    0.817    0.000 pyp:447(__init__)
   235977    0.256    0.000    1.146    0.000 pyp:1484(safe_eval)
   471954    0.253    0.000    0.442    0.000 posixpath.py:100(split)
   471957    0.253    0.000    0.253    0.000 {method 'update' of 'dict' objects}
   235977    0.216    0.000    0.363    0.000 pyp:1204(all_meta_split)
   235979    0.208    0.000    8.508    0.000 pyp:1935(<genexpr>)
  2595764    0.194    0.000    0.194    0.000 {method 'split' of 'str' objects}
   235976    0.175    0.000    0.257    0.000 pyp:1688(format_history)
   235978    0.173    0.000    0.173    0.000 {method 'isatty' of '_io.TextIOWrapper' objects}
   235978    0.148    0.000    0.261    0.000 pyp:1647(flatten_list)
@krackers
Copy link
Author

krackers commented Jan 10, 2024

Found a way to reduce time of builtin.eval. Python can precompile string to bytecode, with compile which can be passed to eval. Doing this basically optimizes as much as possible I think, only remaining is translate_preset_variables and string_splitter which is going to be unavoidable. Maybe one could optimize by seeing if the variable is used in the expression first, but I don't really know if that would be any faster.

@thepyedpiper
Copy link
Owner

thepyedpiper commented Jan 10, 2024 via email

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 14, 2024 via email

@krackers
Copy link
Author

krackers commented Nov 14, 2024

I've uploaded the version I use at https://gist.github.com/krackers/3443b5e175096ba5e790360353a864cc

Note that since it's the version I use personally it's got some other fixes/changes as well, and is suited to my needs/tastes (e.g. look at the other issues I opened in this repo, some of these are probably API breaking – especially the switch to generators for pp).

If I remember correctly, this issue in particular came from two observations: we were splitting unnecessarily even the results were never going to be used, and the repeated builtin.eval. "Fixing" the former technically requires losing some features, while the latter can be fixed without any loss in features.

As for benchmark, re-running the same command as in opening post

seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

i get

235976
         10401075 function calls (10400749 primitive calls) in 2.735 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   235977    0.576    0.000    0.976    0.000 pyp:1843(process_line)
   235976    0.331    0.000    0.985    0.000 pyp:443(__init__)
   471952    0.226    0.000    0.383    0.000 posixpath.py:100(split)
   235978    0.126    0.000    0.162    0.000 pyp:1484(update_history)
   235976    0.118    0.000    0.212    0.000 pyp:1597(format_history)
   235976    0.095    0.000    0.095    0.000 pyp:134(__init__)
   235976    0.089    0.000    0.169    0.000 pyp:539(split)
   235977    0.085    0.000    0.096    0.000 pyp:1416(safe_eval)
   235977    0.083    0.000    1.251    0.000 pyp:2015(<genexpr>)
   235979    0.077    0.000    2.304    0.000 pyp:1838(<genexpr>)
        2    0.068    0.034    2.718    1.359 more.py:2437(_islice_helper)
   235976    0.064    0.000    0.083    0.000 pyp:1990(strip_last_newline)
   235976    0.062    0.000    0.062    0.000 pyp:741(__init__)
   235977    0.061    0.000    2.634    0.000 pyp:1615(<genexpr>)
   235976    0.059    0.000    0.086    0.000 pyp:751(__getitem__)

But again this is with basically a mix of changes included and I don't really want to go through and create a clean PR for just the compile change. Hopefully you can piece together something out of the above though.

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 14, 2024 via email

@krackers
Copy link
Author

Sure, tested pyp 3.0.9 source straight from pypi, and gives the result

> seq 1 235976 | python3 -m cProfile -s tottime $(which pyp) "p | pp[-1]"

235976
         98304563 function calls (98304330 primitive calls) in 40.519 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4955517    8.215    0.000   16.163    0.000 pyp:407(__init__)
   236005    5.113    0.000    5.113    0.000 {method 'pop' of 'list' objects}
  9911034    4.359    0.000    7.222    0.000 posixpath.py:100(split)
   235979    4.267    0.000    4.289    0.000 {built-in method builtins.eval}
  4955517    1.833    0.000   19.723    0.000 pyp:1202(<genexpr>)
        3    1.609    0.536    1.652    0.551 pyp:1604(stamp_history)
   235977    1.336    0.000   36.755    0.000 pyp:1882(process_line)
  4719540    1.265    0.000   16.715    0.000 pyp:1202(<listcomp>)
  4719540    1.174    0.000    1.174    0.000 pyp:686(__init__)
   235982    1.148    0.000    1.148    0.000 {built-in method builtins.dir}
   235977    1.076    0.000   21.376    0.000 pyp:1157(string_splitter)
  9911041    1.037    0.000    1.559    0.000 posixpath.py:41(_get_sep)
  9911066    0.975    0.000    0.975    0.000 {method 'rfind' of 'str' objects}
   235977    0.657    0.000    0.910    0.000 pyp:1325(translate_preset_variables)
  7315301    0.623    0.000    0.623    0.000 {method 'split' of 'str' objects}
   235977    0.606    0.000    2.367    0.000 pyp:1504(update_history)
        2    0.576    0.288    2.306    1.153 pyp:1805(format_input)
  4955496    0.568    0.000    0.568    0.000 pyp:1912(<genexpr>)
  9911670    0.522    0.000    0.522    0.000 {built-in method builtins.isinstance}
   707933    0.330    0.000    0.330    0.000 {method 'update' of 'dict' objects}
  9911102    0.329    0.000    0.329    0.000 {built-in method posix.fspath}
  4956618    0.287    0.000    0.287    0.000 {method 'rstrip' of 'str' objects}
   235977    0.259    0.000    4.752    0.000 pyp:1429(safe_eval)

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 14, 2024 via email

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 15, 2024 via email

@krackers
Copy link
Author

krackers commented Nov 16, 2024

How long did this take you?

I think I was noodling with it around last Christmas? Didn't spend too much time on it, just enough so that I could use it as an awk replacement without blowing up memory. Some of the nicher feature of pyp are probably broken by all the generator stuff (e.g. I didn't test too hard whether things like history still work), but to me it is an acceptable tradeoff.

pp.sort() doesn't work...this should take the whole input and treat it like a line by line array

Hm this is a good catch. Explicitly converting to list like

seq 10 1 | pyp "sorted(list(pp))"

works, but even then below does not

seq 10 1 | pyp "list(pp).sort()"

(actually python's sorted accepts generators so sorted(pp) is sufficient). I'd need to look through the code again to see why this is, I'm guessing it's something to do with the fact that since we no longer operate in-place on a pre-materialized pp array, there's no way to implicitly "pass" the output of an in-place sort onto the next stage: you always need to explicitly return something. Lazy hack to make it work is just define .sort(), .reverse() operations on PowerPipeList that are equivalent to sorted(self)/etc.

pp also has an "analysis" mode, so just a plain pp would return the
index per line so you can pick out particular lines easily...for
example:

So I do know that the analysis feature works for arrays element wise

seq 10 1 | pyp "[p, p, p]"

I don't know what i did that broke it in pp mode. It should definitely be possible to get it working since adding the indices is an easy operation on generators. Probably the code near

        for idx, out in enumerate(output):
            self.n = idx
            ignore, result = self.update_history(out,second_stream_input,file_input , None, return_output = True)
            if not ignore:
                print(result)

would need to be updated, you already have access to idx there, it's just a matter of formatting it properly on powerpipe output.

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 16, 2024 via email

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 21, 2024 via email

@krackers
Copy link
Author

I've modified what you have to get back the missing array methods and analysis mode behavior

Neat, can you post a diff somewhere? If I have time this Christmas I'll noodle around and see if I can squeeze out more performance improvements.

@thepyedpiper
Copy link
Owner

thepyedpiper commented Nov 21, 2024 via email

@thepyedpiper
Copy link
Owner

thepyedpiper commented Dec 12, 2024 via email

@krackers
Copy link
Author

Here is my latest

Is there supposed to be something attached? (Don't see anything in releases page either). Changes 2-4 seem good to me, especially (4) which is a good addition I overlooked. Personally don't like (1) but I realize it's needed for backwards compat, I'll probably just revert that one in my own copy.

@thepyedpiper
Copy link
Owner

I'm manually reposting this since it didn't appear here although the email was sent.

Sorry, I guess attachments don't work. I've posted the changes here as an alpha:

https://github.com/thepyedpiper/pyp/releases/tag/v3.0.11

It should be easy to revert (1), it was just a few lines you can kill here:

if hasattr(total_output[0], 'iter'): #checks for iterable, otherwise ints break it
if tuple([x for x in total_output[0] if type(x) in [str, PypStr]]) == total_output[0]:
total_output = [PypStr(' '.join(total_output[0]))]

Glad you like the other stuff!

cheers,

Toby

@krackers
Copy link
Author

krackers commented Dec 30, 2024

Made several fixes - https://gist.github.com/krackers/3443b5e175096ba5e790360353a864cc

Changes were mostly fixing up the other methods in PowerPipeList class to properly handle streaming logic. Additionally some minor changes were made to fix when outputting to non-tty.

I didn't test anything related to fpp and spp, but I didn't touch that either so if you tested it then it should still work. Btw I noticed that we're still loading the entire file contents in memory, I wonder if we can switch that to be lazy-loaded as well?

@krackers
Copy link
Author

Also the non-minor version should probably be bumped since it's a fairly major rewrite touching a lot of deep internals. I assume you guys have some sort of test suite for this? (since there a lot of subtleties around history management, output formatting, etc. that may have been altered in the rewrite)

@thepyedpiper
Copy link
Owner

thepyedpiper commented Jan 4, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants