-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check modification time when loading stored modules #2698
Conversation
lukaszcz
commented
Mar 21, 2024
- Closes Improve per-module compilation by checking modification time #2680
I've done a rudimentary benchmark to check whether getting the last modified time was cheaper than computing the hash. It turns out that computing the hash is much faster. The benchmark consists in replicating the juvix stdlib 50 times and then compare the time it takes to hash every juvix file as opposed to get the last modification time. #!/usr/bin/env bash
ORIGINAL_DIR="juvix-stdlib"
rm -rf benchtmp
mkdir benchtmp
for i in {1..50}; do
NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done
hyperfine --warmup 2 \
--command-name "hash-256" \
'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 sha256sum' \
--command-name "last-modified-time" \
'find benchtmp -type f -name "*.juvix" -exec stat --format="%y %n" {} \;' The result is:
My expectation is that replicating a similar benchmark in Haskell would give similar results, so I'm not sure merging this pr will be an improvement. |
Yeah, maybe it's not worth merging this PR. I guess this depends a lot on the OS, how long it takes to read modification time in comparison to computing sha256. Probably there's some disk caching involved. |
This has nothing to do with the difference between checking last modified time and computing hash. It's because of some weird properties of the With the benchmark: #!/usr/bin/env bash
ORIGINAL_DIR="juvix-stdlib"
rm -rf benchtmp
mkdir benchtmp
for i in {1..50}; do
NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done
hyperfine --warmup 2 \
--command-name "hash-256" \
'find benchtmp -type f -name "*.juvix" -exec sha256sum {} \;' \
--command-name "last-modified-time" \
'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 stat --format="%y %n"' I get the results:
Apparently, the |
fa1647e
to
3c29a26
Compare
Actually, when run in a comparable way: #!/usr/bin/env bash
ORIGINAL_DIR="juvix-stdlib"
rm -rf benchtmp
mkdir benchtmp
for i in {1..50}; do
NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done
hyperfine --warmup 2 \
--command-name "hash-256" \
'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 sha256sum' \
--command-name "last-modified-time" \
'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 stat --format="%y %n"' checking modification time is ~1.5 times faster, as one would expect:
The difference is probably much bigger for typical usage, because when you run the benchmark several times the file contents will be in the OS disk cache after the first run, i.e., in RAM, which |
Okay, here is a version which drops filesystem caches on every run (Linux only). That makes more sense. The difference is 1.86 times faster in favour of checking modification time. #!/usr/bin/env bash
ORIGINAL_DIR="juvix-stdlib"
rm -rf benchtmp
mkdir benchtmp
for i in {1..50}; do
NEW_DIR="benchtmp/${ORIGINAL_DIR}-${i}"
cp -rf "$ORIGINAL_DIR" "$NEW_DIR"
done
hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
--command-name "hash-256" \
'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 sha256sum' \
--command-name "last-modified-time" \
'find benchtmp -type f -name "*.juvix" -print0 | xargs -0 stat --format="%y %n"' Results:
|
I'd say the benchmark with cache is more relevant since that will be the most common.
|
Yes, actually, you're right. The first time we do recompilation anyway. |
Since this doesn't make a significant impact on performance, we decided not to merge it. |