Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize varint encoding #767

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MelonShooter
Copy link

This optimizes varint encoding by turning the loop into a for loop which causes the compiler to see that it can unroll the loop.

Bench:

varint/small/encode     time:   [61.577 ns 61.786 ns 62.016 ns]
                        change: [-18.793% -17.983% -17.024%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

varint/medium/encode    time:   [312.28 ns 313.01 ns 313.78 ns]
                        change: [-13.850% -13.002% -12.209%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

varint/large/encode     time:   [533.79 ns 535.11 ns 536.43 ns]
                        change: [-26.542% -25.921% -25.275%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

varint/mixed/encode     time:   [308.34 ns 309.11 ns 309.87 ns]
                        change: [-23.418% -22.569% -21.718%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

@LucioFranco
Copy link
Member

I am not seeing the same improvements on my laptop

varint/small/encode     time:   [254.51 ns 255.08 ns 255.78 ns]
                        change: [+4.9943% +5.3430% +5.6646%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

varint/small/decode     time:   [230.34 ns 230.96 ns 231.62 ns]
                        change: [+0.0577% +0.4059% +0.7424%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

varint/small/encoded_len
                        time:   [57.304 ns 57.385 ns 57.477 ns]
                        change: [-0.3950% -0.1263% +0.1243%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

varint/medium/encode    time:   [1.2231 us 1.2262 us 1.2301 us]
                        change: [+5.3350% +5.6709% +6.0007%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

varint/medium/decode    time:   [240.14 ns 240.56 ns 241.05 ns]
                        change: [-0.8120% -0.5223% -0.2217%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

varint/medium/encoded_len
                        time:   [57.344 ns 57.443 ns 57.556 ns]
                        change: [-0.3119% -0.0467% +0.2367%] (p = 0.73 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

varint/large/encode     time:   [2.3195 us 2.3250 us 2.3310 us]
                        change: [-1.1808% -0.8771% -0.5842%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild

varint/large/decode     time:   [352.08 ns 352.89 ns 353.81 ns]
                        change: [-0.3626% -0.0670% +0.2361%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

varint/large/encoded_len
                        time:   [57.386 ns 57.487 ns 57.598 ns]
                        change: [-0.5971% -0.2721% +0.0337%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

varint/mixed/encode     time:   [1.4749 us 1.4777 us 1.4808 us]
                        change: [-2.2308% -1.9438% -1.6452%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

varint/mixed/decode     time:   [286.76 ns 287.35 ns 288.00 ns]
                        change: [-0.4404% -0.1303% +0.1851%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe

varint/mixed/encoded_len
                        time:   [57.407 ns 57.537 ns 57.689 ns]
                        change: [-0.5136% -0.1899% +0.1612%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

@LucioFranco
Copy link
Member

Could you explain more what system you ran those benchmarks on?

@MelonShooter
Copy link
Author

MelonShooter commented Aug 12, 2023

Apologies for the late response. I ran this on an Intel i5-10300H on Ubuntu 22.04 through WSL. I don't remember what rustc version I ran the original benchmarks on, but I ran it again just now on rustc 1.71.1 and got similar results. What compiler version and CPU did you use to run those benchmarks?

varint/small/encode     time:   [68.863 ns 69.823 ns 70.857 ns]                                
                        change: [-22.839% -20.072% -16.939%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

varint/small/decode     time:   [160.91 ns 162.97 ns 165.16 ns]                                
                        change: [-1.9325% +1.1842% +4.5143%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

varint/small/encoded_len                                                                            
                        time:   [86.179 ns 88.462 ns 90.912 ns]
                        change: [+0.7213% +4.2934% +7.8878%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

varint/medium/encode    time:   [358.18 ns 364.79 ns 372.55 ns]                                 
                        change: [-19.784% -16.565% -13.286%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

varint/medium/decode    time:   [237.39 ns 241.08 ns 245.41 ns]                                 
                        change: [-21.082% -15.509% -9.7628%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

varint/medium/encoded_len                                                                            
                        time:   [82.511 ns 83.446 ns 84.489 ns]
                        change: [+0.2021% +3.5902% +7.4506%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

varint/large/encode     time:   [636.18 ns 647.51 ns 659.99 ns]                                 
                        change: [-26.210% -23.732% -20.976%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

varint/large/decode     time:   [357.62 ns 363.08 ns 369.10 ns]                                
                        change: [-4.2180% -0.3207% +3.2406%] (p = 0.87 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

varint/large/encoded_len                                                                            
                        time:   [81.459 ns 82.667 ns 83.991 ns]
                        change: [-4.8438% -1.2792% +2.3932%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

varint/mixed/encode     time:   [348.02 ns 352.78 ns 358.36 ns]                                
                        change: [-26.256% -23.085% -20.160%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

varint/mixed/decode     time:   [237.17 ns 241.66 ns 246.70 ns]                                
                        change: [-4.2677% -1.0296% +2.5542%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

varint/mixed/encoded_len                                                                            
                        time:   [83.815 ns 84.976 ns 86.126 ns]
                        change: [-5.0328% -1.2148% +2.4944%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

@caspermeijn
Copy link
Collaborator

The current master branch now uses for _ in 0..10 { to tell the compiler there is a maximum of 10 iterations. Can you rerun your benchmark to see if this PR is still an improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants