Fix inaccuracies in rank estimation when using FFT histogram method #184

lemleb · 2025-01-28T21:11:10Z

After validating that the FFT convolution operation fails (return negative values, see Figure) for histograms with high bin counts, I implemented a potential countermeasure to address this issue. This approach introduces a compressed histogram that aims to mitigate the problem while maintaining precision. The compressed histograms include a scaling factor, a lower-bounds histogram, and an upper-bounds histogram. The convolution operation for compressed histograms multiplies the scaling factors and computes the convolution for both the LB and UB histograms separately. To make sure that this is no precision loss i made sure that the max bin count stays under a certain limit.
I implemented this approach in a way that ensures other ranking methods, such as NTL, remain usable.
I tested the implementation with a variety of input values and added a test case that uses a specific input set known to fail with the original implementation. The reference results for the test case are generated using the NTL ranking method. For tests involving standard histograms without high bin counts—those that do not represent edge cases—the implementation behaves exactly as before, yielding the same results
I hope this implementation helps address the issue of FFT convolution failing for histograms with high bin counts, and I am open to any suggestions for improvement and look forward to your feedback! (Also feel free to edit the implementation)

The figure shows a comparison between the result of the fft convolution and the result of the convolution implemented in numpy (not fft).

cla-bot · 2025-01-28T21:11:13Z

Thanks for your contribution!

We require contributors to sign our Contributor License Agreement, and we don"t have you on file. In order for us to review and merge your code, please follow the instructions on our website.

rishubn · 2025-01-29T13:35:47Z

Can show a comparison of the rank estimation between NTL, the old method and this fix? Does it match NTL?

rishubn · 2025-01-29T13:37:24Z

src/scalib_ext/ranklib/src/histogram.rs

@@ -7,11 +7,17 @@ mod bnp {
    include!(concat!(env!("OUT_DIR"), "/binding_bnp.rs"));
 }

+const CONV_LIMIT: f64 = 34359738368.0 as f64; //define a certain limit


Please explain the choice of this value in more detail

While testing the original histogram-based ranking implementation, I observed that it begins to fail when the sum of the states (bin counts) exceeds 2^70. Since the fft convolution involves multiplication, I set the limit at the square root of this value, to ensure that the limit 2^70 is never exceeded.

Can you explain why it happens at 2^70 and why your solution works in more detail? You need to also edit the comment to explain that constant in more detail.

lemleb · 2025-01-29T17:36:15Z

@rishubn thank you for your feedback.
In a general ranking test case using 16 subkeys (each 8-bit) with random costs, the difference is negligible. To provide more specific details, here are some reference values showing the outputs of the NTL implementation and my scaled histogram implementation when calling the rank_accuracy function. Following the output of the rank_accuracy function, in the second lines, I have also included the log2 of these returned values. If these values would not be precise enough I think the unit tests would also fail as it compares the results with the output of naive ranking (rank2_vs_naive function).

NTL
2.3127785636369424e+38 < 2.697140362283015e+38 < 3.0667306146795653e+38
127.44289474785647 < 127.6646982087543 < 127.84996904981583

scaled histogram
2.3127785652135127e+38 < 2.6971403638637237e+38 < 3.0667306163263063e+38
127.44289474883992 < 127.66469820959982 < 127.84996905059052

Below, I list the output values for the edge case I implemented using the requested three different ranking methods. As before, the first lines show the results from rank_accuracy, and the second lines present the log2 of these results.

NTL
5952304925252321.0 < 6187845177933544.0 < 7321622167993872.0
52.40236985817056 < 52.45835852381074 < 52.70108474865651

original histogram
-447018641455.0 < 1.4837156954018768e+16 < 3.0498092486446173e+20
nan < 53.72006419228016 < 68.04728090939314

scaled histogram
6155488040303454.0 < 6401758333984994.0 < 7590098399148162.0
52.450794669201755 < 52.50738963956567 < 52.753040012382385

As you can see the original histogram implementation is far off, but the scaled version is nearly the same (accuracy within a 0.052 margin). This difference is caused by the rounding (LB and UB hostograms).

src/scalib_ext/ranklib/src/histogram.rs

tests/test_postprocessing.py

cassiersg

Hi,

Thanks for the PR and the results, that looks very promising.

A few high-level remarks:

It would be good to keep the old F64Hist implementation around, and add a new method. It might not be useful to expose it to users of the library, but internally it is interesting to be able to keep evaluating the different approaches (in accuracy and performance).
Did you evaluate the performance impact of this change? At a high level, it should be around 2x (twice as many FFTs), but it might be that the divisions by scaler are a bottleneck for some sizes, making the algorithm much slower (if that happens, pre-computing 1/scaler should fix the problem).
I'm a bit surprised by the high value for CONV_LIMIT: given that f64 mantissa are 53-bit, I expected CONV_LIMIT to be closer to 2**26. This makes me a bit uncomfortable with the reliability of the code (how much precision do we loose due to rounding ?). As a long-term future idea, using integers instead of floats would close this question (that would be quite a bit of work: we could not use complex-number based FFTs, but would need integer FFT/NTT - I don't know if there is a library with a good implementation of those, besides arbitrary-precision libraries like NTL.)

src/scalib_ext/ranklib/src/histogram.rs

src/scalib_ext/ranklib/src/rank.rs

lemleb · 2025-02-19T19:55:15Z

Hi @rishubn @cassiersg,

Apologies for the delay in the update—I encountered a bug that took longer than expected to resolve.

Thank you very much for your valuable feedback. I have tried to incorporate all of your suggestions into the implementation. Specifically, I’ve separated the scaled histogram implementation from the original, and you can now use it via the "scaledhist" method.

You were absolutely right about the convolution limit. After triggering some more edge cases I observed that my previous implementation still returned negative values in some cases. I’ve corrected that, and the value (2^26) should now be suitable for our needs.

Regarding performance, the new implementation requires approximately twice as much memory since it uses two histograms. The execution speed varies depending on the number of rescalings needed. For example, I measured the execution time of the rank_accuracy function in the two new test cases—once with the "scaledhist" method and once with the original hist method. In typical cases, where minimal rescaling is needed, the difference is negligible: the original implementation took 0.47795854503056034 seconds, while the scaled version took 0.5041293610120192 seconds. However, in the edge case where numerous scalings are required to avoid faulty results, the difference is more significant. The original hist implementation took 0.09203085798071697 seconds, while the scaled version took 0.6014815020025708 seconds.

I hope this update better aligns with the project’s requirements, and I look forward to your feedback.

PS: I think ci.yml has to be updated.

cassiersg · 2025-02-22T01:15:50Z

Hi!

Thanks for the new version, it looks great. I would like to make this the default rank estimation method (i.e., change the default in src/scalib/postprocessing/rankestimation.py), but I'm a bit worried by the performance cost. Would you have some time to look into it?

I do not understand how you have cases with "numerous scalings": the code is always running the scaling, no?

Overall, the scaling should have negligible cost, since it is O(n) (with a small constant) while the FFT is O(n*log n), so it's not obvious where the issue is coming from.

One possibility is the overhead of re-allocating new Vec's in rescale. That could be easily resolved by making rescale modify the histograms in-place (i.e. change to fn rescale(&mut self) and (in trait Histogram) fn convolve(&mut self, other: &mut Self) -> Self).

It that doesn't solve the issue, then I'm not sure what the problem is, we will probably need to profile the code to get a better idea. However, before doing that, it might be worth checking if the slowdown could come from the scaling creating many denormal numbers (on which the CPU is very slow): https://easyperf.net/blog/2018/11/08/Using-denormal-floats-is-slow-how-to-detect-it

cassiersg · 2025-02-22T01:31:10Z

(CI is fixed, you can rebase on main)

…to ensure adaptability for high bin counts

…king bounds, and apply pull request suggestions

lemleb · 2025-02-23T12:56:27Z

Hi!

Thank you for your quick response!

I applied some changes (also the ones you suggested) and in my example the hist method returned the result in 0.099279 seconds and the scaled version in 0.213483 seconds. This indicates that the computation time is approximately twice as long. This outcome aligns with our expectations, as the presence of both upper and lower histograms (doubling the number of histograms) naturally leads to a twofold performance degradation.
I also found out what caused the extreme performance degradation. It is the rank_accuracy function in lib.rs which calls the rank_inner excessive amount of times. The scaled solution utilizes a lower and upper bound, it can happen (in the extreme edge cases) that the default acc_bit defined accuracy (between the min and max) is not met, and reruns the rank_inner many times. For the other ranking methods, the min-max range is an artificially defined margin, as we determine its extent. Therefore, in the scaled version, I increased the acc_bit value in the tests to compensate for this. Since rounding the histograms up and down naturally results in slightly larger gaps between the minimum and maximum values, this adjustment ensures more accurate and stable behavior.

Thank you for fixing the CI so fast, it works now.

cassiersg

Thanks for the investigation, that explains the results.

I think we are very close to merging this, I think the only parts missing are:

Ensuring that the other methods are not broken (see code comments), and adding a test for the "hist" method (it's fine to keep the NTL test disabled - although I'm also ok with enabling the NTL feature by default).
Add a bit of documentation in rankestimation.py: explain what hist vs scaledhist do inside the top-level comment (e.g. in the note about the algorithm), and add "scaledhist" to the documentation of rank_nbin.
Add an item in the CHANGELOG.

cassiersg · 2025-02-23T16:36:18Z

src/scalib_ext/ranklib/src/histogram.rs

@@ -70,6 +91,9 @@ impl Histogram for F64Hist {
    fn coefs_f64(&self) -> Vec<f64> {
        self.state.clone()
    }
+    fn coefs_f64_upper(&self) -> Vec<f64> {
+        unimplemented!()


I think that this should return self.state.

cassiersg · 2025-02-23T16:39:24Z

src/scalib_ext/ranklib/src/histogram.rs

@@ -101,6 +275,9 @@ impl Histogram for BigNumHist {
    fn coefs_f64(&self) -> Vec<f64> {
        self.coefs().collect()
    }
+    fn coefs_f64_upper(&self) -> Vec<f64> {
+        unimplemented!()


Also here, should be same as coefs_f64.

…g and documentation

lemleb · 2025-02-24T13:12:25Z

Hi,
thank you very much for your quick review! I’ve applied the requested updates. I also enabled NTL by default, as I believe it makes sense, but this caused the CI build wheels to fail. Locally it works fine. Could you take a look and help identify the cause?

lemleb · 2025-02-24T16:21:12Z

I believe the libntl-dev library is missing. However, when I attempt to install it in the CI, it shows as successfully installed, yet it still doesn't work.

rishubn changed the title ~~Scaled ranking~~ Fix inaccuracies in rank estimation when using FFT histogram method Jan 29, 2025

rishubn reviewed Jan 29, 2025

View reviewed changes

rishubn reviewed Feb 9, 2025

View reviewed changes

src/scalib_ext/ranklib/src/histogram.rs Outdated Show resolved Hide resolved

rishubn reviewed Feb 10, 2025

View reviewed changes

src/scalib_ext/ranklib/src/histogram.rs Outdated Show resolved Hide resolved

rishubn reviewed Feb 10, 2025

View reviewed changes

src/scalib_ext/ranklib/src/histogram.rs Outdated Show resolved Hide resolved

rishubn reviewed Feb 10, 2025

View reviewed changes

tests/test_postprocessing.py Outdated Show resolved Hide resolved

cassiersg reviewed Feb 10, 2025

View reviewed changes

src/scalib_ext/ranklib/src/histogram.rs Outdated Show resolved Hide resolved

src/scalib_ext/ranklib/src/rank.rs Outdated Show resolved Hide resolved

cla-bot bot added the cla-signed label Feb 19, 2025

lemleb added 7 commits February 22, 2025 12:07

[HISTOGRAM] add support for dual histograms and scalable bin counts, …

1db880b

…to ensure adaptability for high bin counts

[RANK] enable ranking with scaled histograms and add edge case test

a655fd5

[FORMAT] run make fmt to format my code

da4e579

[RANK] small update to make unit tests work as well

81e909f

[SCALED_RANKING] Separate scaled histogram implementation, update ran…

be89807

…king bounds, and apply pull request suggestions

[FIX] delete additional ntl build feature

f0eb7d8

[SCALED_RANKING] performance optimization and make the method default

a29064a

lemleb force-pushed the scaled_ranking branch from 040d2fe to a29064a Compare February 23, 2025 11:07

lemleb added 2 commits February 23, 2025 12:25

[FIX] get out the ntl build flag

bc3e4e2

[FORMAT] reformat to align with the coding standards

9cad34d

cassiersg reviewed Feb 23, 2025

View reviewed changes

[RANKLIB] final touches, add tests and ntl feature, updating changelo…

432682e

…g and documentation

lemleb added 2 commits February 24, 2025 16:39

[CI] test new CI by adding ntl to the deps

9f59514

[CI] use different linux package for NTL

f9a04d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inaccuracies in rank estimation when using FFT histogram method #184

Fix inaccuracies in rank estimation when using FFT histogram method #184

lemleb commented Jan 28, 2025

cla-bot bot commented Jan 28, 2025

rishubn commented Jan 29, 2025

rishubn Jan 29, 2025

lemleb Jan 29, 2025

rishubn Feb 9, 2025

lemleb commented Jan 29, 2025

cassiersg left a comment

lemleb commented Feb 19, 2025

cassiersg commented Feb 22, 2025 •

edited

Loading

cassiersg commented Feb 22, 2025

lemleb commented Feb 23, 2025

cassiersg left a comment

cassiersg Feb 23, 2025

cassiersg Feb 23, 2025

lemleb commented Feb 24, 2025

lemleb commented Feb 24, 2025

Fix inaccuracies in rank estimation when using FFT histogram method #184

Are you sure you want to change the base?

Fix inaccuracies in rank estimation when using FFT histogram method #184

Conversation

lemleb commented Jan 28, 2025

cla-bot bot commented Jan 28, 2025

rishubn commented Jan 29, 2025

rishubn Jan 29, 2025

Choose a reason for hiding this comment

lemleb Jan 29, 2025

Choose a reason for hiding this comment

rishubn Feb 9, 2025

Choose a reason for hiding this comment

lemleb commented Jan 29, 2025

cassiersg left a comment

Choose a reason for hiding this comment

lemleb commented Feb 19, 2025

cassiersg commented Feb 22, 2025 • edited Loading

cassiersg commented Feb 22, 2025

lemleb commented Feb 23, 2025

cassiersg left a comment

Choose a reason for hiding this comment

cassiersg Feb 23, 2025

Choose a reason for hiding this comment

cassiersg Feb 23, 2025

Choose a reason for hiding this comment

lemleb commented Feb 24, 2025

lemleb commented Feb 24, 2025

cassiersg commented Feb 22, 2025 •

edited

Loading