Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster sha256 computation #248

Merged
merged 2 commits into from
Feb 13, 2020
Merged

Faster sha256 computation #248

merged 2 commits into from
Feb 13, 2020

Conversation

dgageot
Copy link
Member

@dgageot dgageot commented Feb 10, 2020

The computation of the sh256 digest is now done concurrently to the tarball creation.
This gives a 30% boost on my tests (with a hot cache)

Signed-off-by: David Gageot [email protected]

The computation is now done concurrently to the
tarball creation.

Signed-off-by: David Gageot <[email protected]>
@dgageot dgageot requested a review from a team as a code owner February 10, 2020 15:05
@ekcasey
Copy link
Member

ekcasey commented Feb 11, 2020

@dgageot How are you you testing this when you see a 30% performance. When I did a quick and dirty benchmark the times were slightly faster without the ConcurrentHasher.

@dgageot
Copy link
Member Author

dgageot commented Feb 11, 2020

On a Mac, I created a 1.2Gb tarball from my go-build folder. I created the archive multiple times and ignored that first result. I'll try to run a better benchmark.

@dgageot
Copy link
Member Author

dgageot commented Feb 11, 2020

I ran the tests again, here's what I got:

I'm taring a copy of my go env GOCACHE folder. The resulting tar file is 1.2Gb. I ran the tests on master and on this branch, on my mac. I ran the tests both in cold cache and hot cache conditions. I used purge to approximate cold cache conditions.

master branch

Run Cold cache Hot cache
1 16.5 5.8
2 16.2 5.7
3 16 5.8

with this PR

Run Cold cache Hot cache
1 12.9 3.4
2 13.2 3.1
3 13.2 3.3

@ekcasey
Copy link
Member

ekcasey commented Feb 13, 2020

@dgageot I was able to reproduce your results and performance gains. There is one tradeoff worth noting. While this implementation ran faster in real time on my workstation the actual CPU time required by this implementation is higher. Therefore, in CPU bound environments, this change might actually degrade performance.

That being said I think this is probably the right tradeoff.

Copy link
Member

@ekcasey ekcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a minimal unit test ensuring that ConcurrentHasher.Sum returns identical to the underlying Hash.Sum. It looks correct right now, but a test will prevent someone from accidentally breaking it in the future.

archive/tar.go Outdated
return fmt.Sprintf("sha256:%x", hasher.Sum(nil)), nil
}

type ConcurrentHasher struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind moving the hasher implementation into a new file (e.g. hash.go)? I think that small organizational change would make both the hashing and the tarring logic more readable.

Signed-off-by: David Gageot <[email protected]>
@dgageot
Copy link
Member Author

dgageot commented Feb 13, 2020

@ekcasey This should be better now. Thanks for your time!

@ekcasey ekcasey merged commit 03c1cb0 into buildpacks:master Feb 13, 2020
@ekcasey ekcasey mentioned this pull request Feb 13, 2020
@zmackie zmackie added this to the lifecycle-0.7.0 milestone Feb 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants