Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make generic over a Field trait #40

Merged
merged 3 commits into from
Jan 11, 2019
Merged

Conversation

rphmeier
Copy link
Contributor

@rphmeier rphmeier commented Jan 8, 2019

And implement it for galois_8.
After #38 I will add an implementation for galois_16 as well.

@coveralls
Copy link

coveralls commented Jan 8, 2019

Coverage Status

Coverage increased (+0.04%) to 96.3% when pulling 938c394 on paritytech:rh-wrap-8 into 0890891 on darrenldl:master.

@codecov
Copy link

codecov bot commented Jan 8, 2019

Codecov Report

Merging #40 into master will increase coverage by 0.07%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #40      +/-   ##
=========================================
+ Coverage   96.22%   96.3%   +0.07%     
=========================================
  Files           7       7              
  Lines        2624    2649      +25     
=========================================
+ Hits         2525    2551      +26     
+ Misses         99      98       -1
Impacted Files Coverage Δ
src/tests/mod.rs 98% <100%> (ø) ⬆️
src/matrix.rs 98.73% <100%> (ø) ⬆️
src/inversion_tree.rs 91.66% <75%> (ø) ⬆️
src/galois_8.rs 98.99% <92.59%> (-0.64%) ⬇️
src/lib.rs 95% <93.75%> (+0.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0890891...938c394. Read the comment docs.

@rphmeier
Copy link
Contributor Author

rphmeier commented Jan 9, 2019

I'd be interested to see if there is a performance hit for GF(2^8) with the generification.

@darrenldl
Copy link
Contributor

I can try benchmarking it later.

If there is a significant performance hit, what would be your suggestion though?

@darrenldl
Copy link
Contributor

darrenldl commented Jan 10, 2019

Sorry haven't quite gotten to setting up benchmarking due to time constraint, but I went through the changes and it should be fine, as galois_8::Field implementation of mul_slice and mul_slice_xor are using the lower level ones rather than the higher level default one in lib.rs.

Majority of the execution time during encoding is spent in Galois coding, specifically mul_slice, so even if other parts carry slight overhead, it shouldn't impact encoding phase too much, if noticeable at all.

Matrix is pretty simple to begin with, and matrix generation occupies relatively small portion of the execution time in general due to InversionTree caching matrices. So unless the generification causes InversionTree to cache a lot slower for some reason, overall the performance should be almost the same as before.

@rphmeier
Copy link
Contributor Author

rphmeier commented Jan 11, 2019

The benchmarks show as much slower, but this is mostly because of the removal of Rayon. I wouldn't imagine that this change would make things slower except in some real failure of monomorphization.

@darrenldl
Copy link
Contributor

Huh, so rayon did speed things up a lot.

Do you have the specific numbers?

@rphmeier
Copy link
Contributor Author

Old:

encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 1024
    time taken       : 0.418300462
    byte count       : 5242880000
    MB/s             : 11953.130474907293
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 4096
    time taken       : 0.387275401
    byte count       : 5242880000
    MB/s             : 12910.708986652111
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 8192
    time taken       : 0.387310016
    byte count       : 5242880000
    MB/s             : 12909.555119793236
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 16384
    time taken       : 0.366356906
    byte count       : 5242880000
    MB/s             : 13647.893401523595
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 32768
    time taken       : 0.350995964
    byte count       : 5242880000
    MB/s             : 14245.178044269478
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 65536
    time taken       : 0.354697661
    byte count       : 5242880000
    MB/s             : 14096.512466147895

New:

encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 1024
    time taken       : 5.654385699
    byte count       : 5242880000
    MB/s             : 884.269355888522
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 4096
    time taken       : 5.768529836
    byte count       : 5242880000
    MB/s             : 866.7719752086933
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 8192
    time taken       : 5.745482032
    byte count       : 5242880000
    MB/s             : 870.2490012416769
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 16384
    time taken       : 5.824110237
    byte count       : 5242880000
    MB/s             : 858.500233775709
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 32768
    time taken       : 5.703026328
    byte count       : 5242880000
    MB/s             : 876.7274973730404
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 65536
    time taken       : 5.708100441
    byte count       : 5242880000
    MB/s             : 875.9481462670359

@rphmeier
Copy link
Contributor Author

Totally makes sense that rayon speeds things up IMO.

@darrenldl
Copy link
Contributor

darrenldl commented Jan 11, 2019

Thanks for benchmarking!

Okay yeah, that makes a lot of sense now that I think about it, independent slice coding do allow speed up linear to (parity) shard count.

But yeah, the generification itself seems fine. If we really want, we can benchmark with rayon set to using single thread, then it should show the overhead of generifcation, if any exists at all. But I personally don't feel this is too useful.

@rphmeier
Copy link
Contributor Author

(I was accidentally benchmarking w/o simd acceleration due to the feature change)

Old without rayon:

encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 1024
    time taken       : 0.745146417
    byte count       : 5242880000
    MB/s             : 6710.090642494494
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 4096
    time taken       : 0.746232383
    byte count       : 5242880000
    MB/s             : 6700.325681256317
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 8192
    time taken       : 0.794534702
    byte count       : 5242880000
    MB/s             : 6292.991341239114
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 16384
    time taken       : 0.762313775
    byte count       : 5242880000
    MB/s             : 6558.978945382431
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 32768
    time taken       : 0.883574129
    byte count       : 5242880000
    MB/s             : 5658.834766539436
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 65536
    time taken       : 0.784516589
    byte count       : 5242880000
    MB/s             : 6373.351526413676

New with simd-accel:

encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 1024
    time taken       : 0.814699297
    byte count       : 5242880000
    MB/s             : 6137.233723426179
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 4096
    time taken       : 0.829256454
    byte count       : 5242880000
    MB/s             : 6029.497842171753
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 8192
    time taken       : 0.736849525
    byte count       : 5242880000
    MB/s             : 6785.645956682947
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 16384
    time taken       : 0.733428098
    byte count       : 5242880000
    MB/s             : 6817.30085557753
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 32768
    time taken       : 0.740080925
    byte count       : 5242880000
    MB/s             : 6756.017931417433
encode :
    shards           : 10 / 2
    shard length     : 1048576
    bytes per encode : 65536
    time taken       : 0.744777619
    byte count       : 5242880000
    MB/s             : 6713.413336337112

@rphmeier
Copy link
Contributor Author

rphmeier commented Jan 11, 2019

In short, seems about the same. Rayon can roughly double throughput in this configuration.

@darrenldl
Copy link
Contributor

Sounds about right - 2 parity shards are used in the benchmarks.

So I think this PR looks good, as nothing important is changed. I'm not sure if there's a very good way to test the impl crate::Field for Field { ... } part to make sure the right actual functions are used for the impl other than duplicating all the existing tests, so I'll just open an issue and see if any new idea pops up.

I'll merge this if you think it's okay.

@rphmeier
Copy link
Contributor Author

yes please :)

@darrenldl darrenldl merged commit ba2389a into rust-rse:master Jan 11, 2019
@rphmeier rphmeier deleted the rh-wrap-8 branch January 11, 2019 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants