feat: Weighted cutflow #1272

NJManganelli · 2025-02-06T23:57:59Z

This PR adds an (opinionated) take on adding weighted cutflows to coffea, permitting the cutflow to be instantiated with a Weights object and a weights modifier (i.e. a weights variation name, as in Weights.weights("jesUp"))

Because of need for TypeError: meta must be an instance of an Awkward Array, not <class 'module'> triggered by fill_flattened for a dask-histogram and dask-awkward arrays dask-contrib/dask-awkward#531 need to roll-back to manual flattening of variables in plot_vars for now
4 tests currently broken because of above (and potentially other bugs)
need expansion of tests to cover weighted fills
Docs showing usage, including commonmask argument

Edit: Equivalent in NMinusOne would also be highly desirable, but this PR has become large enough that it should be split off (and incorporate any changes requested in review)

lgray · 2025-02-07T01:17:46Z

FAILED tests/test_analysis_tools.py::test_packed_selection_cutflow_dak[True] - TypeError: meta must be an instance of an Awkward Array, not <class 'module'>.
FAILED tests/test_analysis_tools.py::test_packed_selection_cutflow_dak[False] - TypeError: meta must be an instance of an Awkward Array, not <class 'module'>.
FAILED tests/test_analysis_tools.py::test_packed_selection_cutflow_dak_uproot_only[True] - TypeError: meta must be an instance of an Awkward Array, not <class 'module'>.
FAILED tests/test_analysis_tools.py::test_packed_selection_cutflow_dak_uproot_only[False] - TypeError: meta must be an instance of an Awkward Array, not <class 'module'>.

NJManganelli · 2025-02-07T12:25:15Z

Aye, those failures are the attempt to use fill_flattened in dask which doesn't currently work, so it will be rolled-back to manual flattening for now

…edSelection class Adjust result object to not break unpacking of cutflow result, restoring one test; need dask-contrib/dask-awkward#531 Remove unnecessary return Cutflow Add boolean_masks_to_categorical_integers for optimization of histogram fill in cutflow WIP add the optimized version of filling cutflows, with a testable v2 variable temprorarily, and new categorical option for additional axis (needs documentation and example) Rename slice to slc WIP weighted cutflow with commonmask, categorical axes, plot_vars still needs much work and cleanup Untested plot_vars function with commonmask and categorical support, using broadcast_arrays func Fixes for original tests to pass WIP - fixes from tests, expanded print header revert change to masks in compute for npz Fix commonmask handling of initial weighted event counts Some logic fixes for weights in npz file Fix for v2 eager mode Handle delayed cutflow to_npz pathway, Weights does not have a compute method black src/coffea/analysis_tools.py

…low from PackedSelection class weights testing okay, histos next weighted and commonmask tests in place, categorical next stricter tests for weights, passing Try v2 yieldhist categorical tests added for yieldhist Added categorical tests, need one more specific to histograms then cleanup Added cutflow_extended_dak tests black tests/test_analysis_tools.py

…ficant performance difference, fix delayed commonmasked path, remove debug print statement, add scale for all paths of yieldhist, plot_vars; reduce LOC and make more consistent

for more information, see https://pre-commit.ci

…rmitting parallel testing by pytest-xdist; 1 core speed: '52 passed, 729287 warnings in 487.82s (0:08:07)', 4 core speed: '52 passed, 732599 warnings in 148.59s (0:02:28)'

for more information, see https://pre-commit.ci

NJManganelli · 2025-02-19T20:52:11Z

I'll convert this to ready for review, presuming all the tests (very, very slowly) pass.

I tacked on a change to the npz tests to write to temporary directories, this should allow running them in parallel (verified to work on my laptop), cross-reference #1278
Also, see this issue, which might be alleviated by this change: #1246

The extended cutflow tests technically cover the non-extended tests when all non-dask options are False, but I left the original tests in-place out of paranoia. But, importantly, the dak tests call compute a lot... several dozen times, and so an individual test can be O(1 minute). These tests might benefit from refactoring to follow the ideal dask pattern a bit more: setup graph, execute/compute once. ~~But best left to a dedicated PR, IMO (no time to fold it into this one, right now)~~ Edit: The optimization has now been added in a pair of commits, see Feb 20th updates

Now the opinionated part:

the cutflow expects a Weights() instance and optionally a specific (str) modifier to pass to it. Partially this was because I originally envisioned (optionally) passing in a list of weight variations and being able to fill a new systematics axis with them. That may be worth doing one day, but not in this PR
I added a "commonmask" feature. This lets you apply another boolean array, potentially external-to-packed-selection, to every individual and stacked cut. Originally I used this to handle samples where many different GenModels are stored together per-file, with an event-level bit indicated it belongs to a particular GenModel hypothesis. Since there can easily be 100s of these hypotheses in the same file (which, BTW, is absolutely terrible for performance when doing loops over corresponding masks and filling histograms, per-hypothesis), they don't fit in a PackedSelection, and creating copies of PackedSelection per-hypothesis when the only cut that differs is effectively a single boolean array is very wasteful and blows up the task graph complexity. That was the original motivation, but it works just as well -- if not better -- for many other cases, and lets you start the cutflows off from some 'base' more easily (one can also ignore the initial unmasked fill, but I like this a bit better.
I added a "categorical" feature. This was also developed for the GenModel case originally, when it became apparent that masking every array for each hypothesis and doing separate histograms, etc. was insanely expensive time-wise. Ergo, this allows one to pass in an event-by-event categorical value (ideally an int, to be filled into an Integer or IntCategory histogram axis) and instantiates an additional axis that can be sliced/projected/etc. Eventually, I'd like this to support a StrCategory axis in an efficient way, but that will depend on getting this feature implemented: https://iris-hep.slack.com/archives/C03UA418VJ8/p1719933533612429
I added a scale to the print, yieldhist, and plot_vars methods. Originally only put it in print, and figured it is really for the user to scale histograms afterwards, and in particular one cannot (currently) scale a dask-histogram prior to compute and have it registered in the dask task-graph (so no scaling will occur pre-compute). Since support per-event weights for PackedSelection.cutflow #1270 indicated a desire for both per-event weighting (which I hope can cover the usecase detailed more in iris-hep slack with external-tools providing weights of some kind, but not sure) and broadcastable weights, I worked an implementation in for the histogram methods as well. I hesitated to balloon the testing time more with an explicit scale test, but I did a variation locally that worked as expected, so it would make sense to add something later (also for the print function, which doesn't have any coverage in the tests I based these extended ones upon)...
the yieldhist and plot_vars methods do not follow the same design pattern (as I wrote the latter recently and the former close to a year ago, this is unfortunate but also seemed a necessity to handle the jagged-arrays expected for plot_vars). The latter follows a simpler cut-by-cut filling scheme, and has a unified method for delayed/non-delayed (I hope using an appropriate approach, but let me know if not). The former, in order to support all features, has three code paths (two the updated versions of non-delayed and delayed from before, now supporting weights and commonmasks; the third to support categorical fills with any combination of the other features, but at a performance cost since it cannot rely on the nevonecut/nevcutflow -- or corresponding weighted variants -- to do a fast few-entry fill). The last code path covers all use-cases, but in an end-to-end test ingesting a ~2GB file, setting up a PackedSelection and Weights classes with just a few entries, then calling yeildhists and computing... averaged about 1 minute for the eager version on the 'fast path' and about 3 minutes on the per-event fills in the unified/categorical path. That prompted me to leave the updated delayed and non-delayed paths in. A future change might unify these, presuming others agree it's good to keep them around, since faster is better where possible (at the expense of more code maintenance, of course...). The yieldhists method also relies on some code that converts a list of boolean arrays into a (ragged) set of categorical integers. That was originally developed to improve performance when I was trying to discover the best way forward a year ago. Whether it's worth it now, in hindsight, would take more testing and mimicking the cut-by-cut approach again. Finally, I'll cross-reference TypeError: meta must be an instance of an Awkward Array, not <class 'module'> triggered by fill_flattened for a dask-histogram and dask-awkward arrays dask-contrib/dask-awkward#531 as fill_flattened would reduce LOC by a good chunk, but needs some development (as outlined by Angus in his reply to that issue)

Fin
(but not really)

NJManganelli · 2025-02-19T21:25:21Z

Non-parallel testing times (for comparison to the pytest-xdist PR)

lgray · 2025-02-19T23:29:02Z

Have we got any idea about the really large time taken for the new tests?

lgray · 2025-02-19T23:30:41Z

Could you please mark the tests that use dask, as was done in the recent pr from Andrzej?

NJManganelli · 2025-02-19T23:40:20Z

Should any test using dask be marked, or only those instantiating/utilizing a Client()?

BTW, I tested these cutflow_dak tests with a Client and it was slower on my machine than without.

The original cutflow_dak test (and therefore, the 16 variations of cutflow_extended_dak, which started as a carbon copy of cutflow_dak) take more time than they really should because of the frequent calls to compute. It should be refactored to reduce those calls, as noted above, but it'll be a bit of moving things around to make that work.

For comparison, the time of tests after Andrzej's PR was merged in:

…s by building task graph up-front

…tended_dak

NJManganelli · 2025-02-20T18:37:02Z

From my understanding, marking these tests as with-dask would prevent them running in parallel, and these ones are closer to ideal scaling than not. Let me know if I've misunderstood.

Some refactoring has eliminated pretty much all the extraneous computes. There's about a 7x improvement (read reduction) in the dak tests added for this PR, and times for just that test and the whole analysis_tools test suite follow:

d98a039==a184c35 corresponds to optimized tests
98d68ef corresponds to a halfway point where many compute calls were removed
c68955c corresponds to the version whose times I posted most immediately above, with dozens of compute calls.

Commit d98a0395, pytest -n 1 tests/test_analysis_tools.py, 52 passed, 208063 warnings in 159.53s (0:02:39)
Commit d98a0395, pytest -n 4 tests/test_analysis_tools.py, 52 passed, 211375 warnings in 52.04s
Commit d98a0395, pytest -n 4 tests/test_analysis_tools.py::test_packed_selection_cutflow_extended_dak, 16 passed, 74500 warnings in 19.75s
Commit 98d68ef2, pytest -n 1 tests/test_analysis_tools.py, 52 passed, 348223 warnings in 248.91s (0:04:08)
Commit 98d68ef2, pytest -n 4 tests/test_analysis_tools.py, 52 passed, 351535 warnings in 75.05s (0:01:15)
Commit 98d68ef2, pytest -n 4 tests/test_analysis_tools.py::test_packed_selection_cutflow_extended_dak, 16 passed, 214660 warnings in 49.92s
Commit c68955c6, pytest -n 1 tests/test_analysis_tools.py, 52 passed, 729287 warnings in 489.46s (0:08:09)
Commit c68955c6, pytest -n 4 tests/test_analysis_tools.py, 52 passed, 732599 warnings in 148.17s (0:02:28)
Commit c68955c6, pytest -n 4 tests/test_analysis_tools.py::test_packed_selection_cutflow_extended_dak, 16 passed, 595724 warnings in 132.85s (0:02:12)

ikrommyd · 2025-02-20T18:39:22Z

Thanks for the refactoring Nick! It was a mistake when I wrote thoses tests like that (I didn't know the impact back then) and I just never fixed it afterwards.

NJManganelli · 2025-02-20T18:59:27Z

While anti-patterning a bit, it's a lot clearer to write the tests with all those computes, though (to play devil's advocate).

Note that I've left the original ('non-extended') test alone. On my M4Pro with 1 core, it takes 50s to run both variations of it (optimization on/off), whereas the one I added based on it, the extended_dak test, has 2x2x2x as many variations, hence the painfully long additional time on the github runners. While that's an optimization opportunity, it'll take a similar amount of shuffling around as for the extended_dak test

For posterity, the run times now:

ikrommyd · 2025-02-20T22:21:58Z

It may also make sense to not check both optimizations options at some point. Those were finding some issues in earlier stages where optimization was messing something up but they haven’t found something in a while. I would argue that it’s generally bad to remove tests but maybe it’s fine to always have optimization on in the future.

lgray · 2025-02-21T15:00:57Z

Honestly for weights you may even only need to check that it builds the typetracer and the graph without failing.
The mechanisms for touching data on individual columns are pretty thoroughly tested, and probably don't need further exercise here. Just a thought.

NJManganelli · 2025-02-24T09:03:49Z

Any feedback/change requests on the implementation?

For the new tests, I could disable things now (just optimization off?), though at about 19s to run all variations multicore on my Mac (maybe 2x worse on the github runners?) there won't be huge savings there. Disabling it on the older test could save some similar amount of time. Just let me know. This also seems like it could be compiled into a dedicated test optimization pass to assist in improving the scikit-hep integration testing.

lgray · 2025-02-24T17:08:24Z

I think it's in a reasonable state at this point, tests are a bit long but we can whittle on this.
The changes to the code have made it a large enough base that factorization into its own library area within coffea, for readability, probably makes since but that can come as another PR.

Forking off the optimization on/off tests into their own jobs is a good idea, but that can also come in another PR.

Thanks for the addition!

ikrommyd · 2025-02-24T17:29:04Z

@NJManganelli Would you like at some point to extend this to N-1 hists as well? If you ever get some time of course. It will be mostly copy pasting stuff cause the original code is pretty much the same.

NJManganelli · 2025-02-24T23:20:48Z

Hi @ikrommyd Yeah, it should be extended there too. As tempting as it is to 'balance things out,' I'll need to wait a while and work on some Upgrade, Analysis, and R&D works first.

Thanks @lgray , should I make an issue or two to track these to-dos? Test time reductions, refactorization of analysis tools, extension of N-1...

lgray · 2025-02-25T03:06:34Z

Yes, please make a tracking issue! Thanks again!

NJManganelli mentioned this pull request Feb 7, 2025

support per-event weights for PackedSelection.cutflow #1270

Open

NJManganelli changed the title ~~[Feature] Weighted cutflow~~ feat: Weighted cutflow Feb 7, 2025

Nick Manganelli added 2 commits February 17, 2025 17:41

NJManganelli force-pushed the weighted-cutflow branch from 377f358 to 44048c3 Compare February 17, 2025 16:44

Nick Manganelli and others added 6 commits February 19, 2025 18:39

Remove v2 flag, keep fast path for non-categorical fills due to signi…

3d2daf3

…ficant performance difference, fix delayed commonmasked path, remove debug print statement, add scale for all paths of yieldhist, plot_vars; reduce LOC and make more consistent

Remove v2 flag from tests

d3e7329

Update packedselection notebook for expanded cutflows

d971bbf

[pre-commit.ci] auto fixes from pre-commit.com hooks

276613c

for more information, see https://pre-commit.ci

convert test_analysis_tools.py to use tempdirectories for outputs, pe…

44e93de

…rmitting parallel testing by pytest-xdist; 1 core speed: '52 passed, 729287 warnings in 487.82s (0:08:07)', 4 core speed: '52 passed, 732599 warnings in 148.59s (0:02:28)'

[pre-commit.ci] auto fixes from pre-commit.com hooks

7c49afd

for more information, see https://pre-commit.ci

NJManganelli marked this pull request as ready for review February 19, 2025 20:59

Merge branch 'master' into weighted-cutflow

c68955c

Nick Manganelli added 3 commits February 20, 2025 16:23

Refactor test_packed_selection_extended_dak to use fewer compute call…

98d68ef

…s by building task graph up-front

Further reduce dask compute calls in test_packed_selection_cutflow_ex…

d98a039

…tended_dak

Appease pre-commit tooling

a184c35

Merge branch 'master' into weighted-cutflow

e7a4872

lgray merged commit 7f65c2f into scikit-hep:master Feb 24, 2025
21 checks passed

This was referenced Feb 25, 2025

Refactor PackedSelection/Cutflow from analysis_tools.py #1283

Open

Add Weighted N-1 Functionality (As well as commonmasks and categorical axis injection/filling) #1284

Open

Optimize Tests and reduce CI time #1285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Weighted cutflow #1272

feat: Weighted cutflow #1272

NJManganelli commented Feb 6, 2025 •

edited

Loading

lgray commented Feb 7, 2025

NJManganelli commented Feb 7, 2025

NJManganelli commented Feb 19, 2025 •

edited

Loading

NJManganelli commented Feb 19, 2025

lgray commented Feb 19, 2025

lgray commented Feb 19, 2025

NJManganelli commented Feb 19, 2025 •

edited

Loading

NJManganelli commented Feb 20, 2025

ikrommyd commented Feb 20, 2025

NJManganelli commented Feb 20, 2025

ikrommyd commented Feb 20, 2025

lgray commented Feb 21, 2025

NJManganelli commented Feb 24, 2025

lgray commented Feb 24, 2025

ikrommyd commented Feb 24, 2025

NJManganelli commented Feb 24, 2025

lgray commented Feb 25, 2025

feat: Weighted cutflow #1272

feat: Weighted cutflow #1272

Conversation

NJManganelli commented Feb 6, 2025 • edited Loading

lgray commented Feb 7, 2025

NJManganelli commented Feb 7, 2025

NJManganelli commented Feb 19, 2025 • edited Loading

NJManganelli commented Feb 19, 2025

lgray commented Feb 19, 2025

lgray commented Feb 19, 2025

NJManganelli commented Feb 19, 2025 • edited Loading

NJManganelli commented Feb 20, 2025

ikrommyd commented Feb 20, 2025

NJManganelli commented Feb 20, 2025

ikrommyd commented Feb 20, 2025

lgray commented Feb 21, 2025

NJManganelli commented Feb 24, 2025

lgray commented Feb 24, 2025

ikrommyd commented Feb 24, 2025

NJManganelli commented Feb 24, 2025

lgray commented Feb 25, 2025

NJManganelli commented Feb 6, 2025 •

edited

Loading

NJManganelli commented Feb 19, 2025 •

edited

Loading

NJManganelli commented Feb 19, 2025 •

edited

Loading