-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimagining histogram autobin #3001
Comments
Ok, wow. I didn't realize how much our autobin logic got complex in #1944 🔬 As you might expect, I'm leading toward calling the proposal a breaking change, but I could be convinced otherwise. Here are some questions and comments:
Was this a conscious decision in #1944? Would things break if all those "subsequent" traces get
I'm glad the (heavy) stacked area option logic has some parallels elsewhere in the library 👍
I agree 💯 % here
Is dropping autobin(x|y) entirely necessary to get your proposal to work? Sounds to me like autobin(x|y) will still be valuable in e.g. the chart-editor where one could easily toggle between the auto values and their set values even if we make all non-overlaid histogram traces share the same bins. All in all, it sounds to me the the proposed algorithm could be an opt-on feature (in v1.x) via a
|
No, pure bug. It wouldn't break things, but given that this is already broken I feel we have some extra flexibility to improve things, as long as we don't change things that currently work.
Perhaps not, but I'm not quite sure how
I feel like there's actually more potential for confusion if we have If And if we get that far, the entire Is there something else
Unless we make some option for multiple overlay groups (where each group could have independent binning but we stack or something within each group... would be strange), I think this has to be all-or-nothing, at least per-subplot. So I suppose we could make a boolean to opt into this... but I'm hopeful that we can design it without that in such a way that the initial render is close enough to current behavior (ie just fixes bugs) and changes are mostly confined to dynamic behavior. |
A few things to consider if we stopped mutating autobin values into
It will be interesting to see how 🔪
This gets my vote, but if
turns out ot be true when the PR is ready for review, I might change my mind. |
For the record, some aspects of the current Single traceStarting from
As proposed above, the only change to first draw behavior would be case 3, where partial bin specification would respect the specified parts and auto-determine the others. But all Multiple traces
All of these scenarios have problems, other than (3) explicit and complete autobin (though even that I think we agree we should pick the bin size differently and usually larger). I'd say the multi-trace behavior is broken enough that anyone who's currently using it successfully must either have data that happens to work fine fully auto, or must have inserted complete manual (and explicitly matching!) bin specs. Neither of those cases would break using the above proposal. |
Referencing #1318 which shares some thoughts about autobins and CDF histograms. |
Referencing #1282 for some "old" thoughts on |
Histogram autobin works relatively well for a single trace, but could be better. But for multiple traces, despite several attempts to clean it up over time (#2028, #1944, #1901, ...), it has a bunch of problems:
autobinx: true
back to the first trace, butautobin: false
to all subsequent traces. This doesn't have any immediate impact, but if you then try to alter bins later, the results depend on which trace you edit. In particular if you edit the first one, (which seems logical, right?) the later ones will keep their original bin sizes https://codepen.io/alexcjohnson/pen/pOVrGj?editors=0010, but if instead you edit the other two (so change[0]
to[1,2]
in therestyle
call at the end) the first one will get this new size as well.barmode='overlay'
. I initially thought we might need a "bingroup
" attribute similar to scatter'sstackgroup
but now I think it's better to just enforce a match across the already known group.gd.data
(autobinx
,xbins.(start|end|size)
) - and doing so in buggy ways at that.Proposal:
autobin(x|y)
entirely, and just use the (improved) autobin routine to fill in whatever gaps there are in the explicitly specified attributes. So if you have explicitly specified an attribute and would like it to revert to auto, instead of turning onautobin
you would delete the value.restyle
we can convertautobinx: true
intoxbins: null
. I'll have to investigate the existing behavior whenautobinx: true
andxbins
are both specified upfront to figure out what we would want to do incleanData
.nbinsx
iff an explicitxbins.size
is not found (in any trace in the group)xbins
, and stash these and only these infullTrace._xbins
, but have bothsupplyDefaults
andcalc
ensure the two are merged infullTrace.xbins
. So for example, if you specifytrace.xbins: {end: 30}
we'll setfullTrace._xbins: {start: 0, size: 5}
andfullTrace.xbins: {start: 0, end: 30, size: 5}
. The reason for this isPlotly.react
: we want the full set infullTrace
, andsupplyDefaults
needs access to the auto-generated values, but in case you delete an explicit value fromtrace
we don't wantsupplyDefaults
to be able to fill it back in, so we will see the change and trigger acalc
.@etpinard I know this is a fairly big change, potentially breaking for some users, but the existing behavior is sufficiently broken and problematic already that I think we should consider it. This would also allow us to take a broader view of what "autobin" means, realizing it's not just a boolean but
size
,start
, andend
can all be independently auto or explicit.Note that
start
does not need to match exactly from one trace to the next but does need to be compatible (all of them must be the same modulosize
).end
can be completely independent from trace to trace. So this logic will still be a little bit intricate...The text was updated successfully, but these errors were encountered: