-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inaccuracies in rank estimation when using FFT histogram method #184
base: main
Are you sure you want to change the base?
Conversation
Thanks for your contribution! We require contributors to sign our Contributor License Agreement, and we don"t have you on file. In order for us to review and merge your code, please follow the instructions on our website. |
Can show a comparison of the rank estimation between NTL, the old method and this fix? Does it match NTL? |
@@ -7,11 +7,17 @@ mod bnp { | |||
include!(concat!(env!("OUT_DIR"), "/binding_bnp.rs")); | |||
} | |||
|
|||
const CONV_LIMIT: f64 = 34359738368.0 as f64; //define a certain limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the choice of this value in more detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While testing the original histogram-based ranking implementation, I observed that it begins to fail when the sum of the states (bin counts) exceeds 2^70. Since the fft convolution involves multiplication, I set the limit at the square root of this value, to ensure that the limit 2^70 is never exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why it happens at 2^70 and why your solution works in more detail? You need to also edit the comment to explain that constant in more detail.
@rishubn thank you for your feedback. NTL scaled histogram Below, I list the output values for the edge case I implemented using the requested three different ranking methods. As before, the first lines show the results from rank_accuracy, and the second lines present the log2 of these results. NTL original histogram scaled histogram As you can see the original histogram implementation is far off, but the scaled version is nearly the same (accuracy within a 0.052 margin). This difference is caused by the rounding (LB and UB hostograms). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for the PR and the results, that looks very promising.
A few high-level remarks:
- It would be good to keep the old F64Hist implementation around, and add a new method. It might not be useful to expose it to users of the library, but internally it is interesting to be able to keep evaluating the different approaches (in accuracy and performance).
- Did you evaluate the performance impact of this change? At a high level, it should be around 2x (twice as many FFTs), but it might be that the divisions by
scaler
are a bottleneck for some sizes, making the algorithm much slower (if that happens, pre-computing1/scaler
should fix the problem). - I'm a bit surprised by the high value for
CONV_LIMIT
: given thatf64
mantissa are 53-bit, I expectedCONV_LIMIT
to be closer to2**26
. This makes me a bit uncomfortable with the reliability of the code (how much precision do we loose due to rounding ?). As a long-term future idea, using integers instead of floats would close this question (that would be quite a bit of work: we could not use complex-number based FFTs, but would need integer FFT/NTT - I don't know if there is a library with a good implementation of those, besides arbitrary-precision libraries like NTL.)
Hi @rishubn @cassiersg, Apologies for the delay in the update—I encountered a bug that took longer than expected to resolve. Thank you very much for your valuable feedback. I have tried to incorporate all of your suggestions into the implementation. Specifically, I’ve separated the scaled histogram implementation from the original, and you can now use it via the "scaledhist" method. You were absolutely right about the convolution limit. After triggering some more edge cases I observed that my previous implementation still returned negative values in some cases. I’ve corrected that, and the value (2^26) should now be suitable for our needs. Regarding performance, the new implementation requires approximately twice as much memory since it uses two histograms. The execution speed varies depending on the number of rescalings needed. For example, I measured the execution time of the rank_accuracy function in the two new test cases—once with the "scaledhist" method and once with the original hist method. In typical cases, where minimal rescaling is needed, the difference is negligible: the original implementation took 0.47795854503056034 seconds, while the scaled version took 0.5041293610120192 seconds. However, in the edge case where numerous scalings are required to avoid faulty results, the difference is more significant. The original hist implementation took 0.09203085798071697 seconds, while the scaled version took 0.6014815020025708 seconds. I hope this update better aligns with the project’s requirements, and I look forward to your feedback. PS: I think ci.yml has to be updated. |
Hi! Thanks for the new version, it looks great. I would like to make this the default rank estimation method (i.e., change the default in I do not understand how you have cases with "numerous scalings": the code is always running the scaling, no? Overall, the scaling should have negligible cost, since it is O(n) (with a small constant) while the FFT is O(n*log n), so it's not obvious where the issue is coming from. One possibility is the overhead of re-allocating new Vec's in It that doesn't solve the issue, then I'm not sure what the problem is, we will probably need to profile the code to get a better idea. However, before doing that, it might be worth checking if the slowdown could come from the scaling creating many denormal numbers (on which the CPU is very slow): https://easyperf.net/blog/2018/11/08/Using-denormal-floats-is-slow-how-to-detect-it |
(CI is fixed, you can rebase on |
…to ensure adaptability for high bin counts
…king bounds, and apply pull request suggestions
040d2fe
to
a29064a
Compare
Hi! Thank you for your quick response! I applied some changes (also the ones you suggested) and in my example the hist method returned the result in 0.099279 seconds and the scaled version in 0.213483 seconds. This indicates that the computation time is approximately twice as long. This outcome aligns with our expectations, as the presence of both upper and lower histograms (doubling the number of histograms) naturally leads to a twofold performance degradation. Thank you for fixing the CI so fast, it works now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the investigation, that explains the results.
I think we are very close to merging this, I think the only parts missing are:
- Ensuring that the other methods are not broken (see code comments), and adding a test for the "hist" method (it's fine to keep the NTL test disabled - although I'm also ok with enabling the NTL feature by default).
- Add a bit of documentation in
rankestimation.py
: explain what hist vs scaledhist do inside the top-level comment (e.g. in the note about the algorithm), and add "scaledhist" to the documentation ofrank_nbin
. - Add an item in the CHANGELOG.
@@ -70,6 +91,9 @@ impl Histogram for F64Hist { | |||
fn coefs_f64(&self) -> Vec<f64> { | |||
self.state.clone() | |||
} | |||
fn coefs_f64_upper(&self) -> Vec<f64> { | |||
unimplemented!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should return self.state
.
@@ -101,6 +275,9 @@ impl Histogram for BigNumHist { | |||
fn coefs_f64(&self) -> Vec<f64> { | |||
self.coefs().collect() | |||
} | |||
fn coefs_f64_upper(&self) -> Vec<f64> { | |||
unimplemented!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, should be same as coefs_f64
.
…g and documentation
Hi, |
I believe the libntl-dev library is missing. However, when I attempt to install it in the CI, it shows as successfully installed, yet it still doesn't work. |
After validating that the FFT convolution operation fails (return negative values, see Figure) for histograms with high bin counts, I implemented a potential countermeasure to address this issue. This approach introduces a compressed histogram that aims to mitigate the problem while maintaining precision. The compressed histograms include a scaling factor, a lower-bounds histogram, and an upper-bounds histogram. The convolution operation for compressed histograms multiplies the scaling factors and computes the convolution for both the LB and UB histograms separately. To make sure that this is no precision loss i made sure that the max bin count stays under a certain limit.

I implemented this approach in a way that ensures other ranking methods, such as NTL, remain usable.
I tested the implementation with a variety of input values and added a test case that uses a specific input set known to fail with the original implementation. The reference results for the test case are generated using the NTL ranking method. For tests involving standard histograms without high bin counts—those that do not represent edge cases—the implementation behaves exactly as before, yielding the same results
I hope this implementation helps address the issue of FFT convolution failing for histograms with high bin counts, and I am open to any suggestions for improvement and look forward to your feedback! (Also feel free to edit the implementation)
The figure shows a comparison between the result of the fft convolution and the result of the convolution implemented in numpy (not fft).