ci: explicit caching and even smaller runners #2518

tychoish · 2024-01-27T13:39:40Z

No description provided.

Co-authored-by: universalmind303 <[email protected]>

Reverts #2495

…runner-size

universalmind303 · 2024-01-29T18:09:12Z

.github/workflows/ci.yaml

+        name: build cache
        with:
-          path: target/debug/glaredb
-          key: ${{ github.run_id }}
+          path: |
+            target/
+          key: ${{ runner.os }}-cargo--target-${{github.sha}}


what is the purpose of this new cache? Hard time telling from the changes.

There's sort of a stacked cache (looking at the build cache is really where this might become clear):

cache toolchains based on the rust-toolchain file (restores before, saves after the build task. (changes rarely, used between different branches so we're not downloading big files

cache the user's .cargo directory based on the `Cargo.lock. (restores before and saves after the build task.) should prevent us from re-downloading dependencies.

cache the workspace except for the glaredb binary (restores before and after the build task only, keyed off of the Cargo.lock, hopefully with the effect of saving as much incremental build time as possible in a fairly stable way and preventing rebuilds/trigger.

cache the glaredb binary explicitly (saves after the build task, restored before any task that runs SLTs or needs glaredb (pytests) keyed off the current run id.

save a second cache of target after the build task, restored before any task that needs to do on a build. key'd off the current commit.

Then we restore exactly what we need before the task that it runs.

This means we end up caching many more build-specific things and so the cache accelerates what we're doing in the task. and we don't end up thrashing the cache and rebuilding with different flags as much (e.g. unittests shouldn't thrash the build cache)

The "workspace cache" is maybe of questionable effect, but it seems pretty harmless.

tychoish · 2024-01-29T20:09:57Z

In the penultimate build as of writing, we saw very good time: 15m wall clock, and 25m paid compute (this doesn't count the "free runners" which is about 40m per build, but that's fine).

However, 1 of those wall-clock minutes was spent saving a per-run cache of the target directory which is probably overkill. My desire was to have a cache per run so that any per-run changes didn't blow the cache between different builds if one branch had a lot of churn, but 1 minute in the build task was a lot to justify that. Running a second build with a less radical caching strategy. to see how that ends up.

tychoish · 2024-01-30T03:07:39Z

ok, final report:

20m of large-instance compute time per build (65% reduction)
15m wall clock time (best case) if everything is a cache hit, for a CI run (and the major determinant of this is the SQLserver test)

caveats

the above time is really best case because there aren't a lot of builds going on right now, so the cache was very well hit
the caching situation is pretty corse, and if we have a lot of builds going on at a time, compiles are going to get slower as cache's are overwritten
the only thing that uses large instances now is the compile and (unsurprisingly the unittests. (which always have to be their own compile)

future improvements/observations:

SQL server is slow because the tests not the compile or the fixture is slow or the hardware is slow. It has the same runtime on large hosts as small hosts. So either: there's a problem with the configuration of the fixture that is making is slow (possible) or something in our implementation is terribly slow, in a way that might be observable for any user doing a short query, and also it could be impacting the server itself (which is a thing we should probably avoid). My suspicion is that connection creation is just slow.
I tried creating per-run caches of the target directory. (the evolution of that here is caching the binaries specifically,) but it added a minute to the build task and didn't save a minute of time elsewhere. It might be more sustainable to have a per-branch build cache, which means you pay the cost once per branch, rather than on every build. The upside is, when lots of builds are running at once, they're less likely to cause caches
the python bindings seem to compile more than I think they should (I suspect maturin just compiles everything rather than only the dependencies.) slt, the proxies and rpc service are all compiled, and while it's not huge in terms of time, it's a bit odd seeming. I avoided letting the bindings update the cache themselves, and contemplated having separate caches for the bindings builds.

Co-authored-by: universalmind303 <[email protected]>

tychoish and others added 5 commits January 27, 2024 08:40

ci: explicit caching and even smaller runners

512f84f

chore: enable ci for merge queue events (#2495)

be2ca53

Co-authored-by: universalmind303 <[email protected]>

Revert "chore: enable ci for merge queue events" (#2521)

7cc62d3

Reverts #2495

Merge remote-tracking branch 'origin/main' into tycho/ci-caching-and-…

9c28dca

…runner-size

fixup

fe4a8c3

tychoish requested review from universalmind303 and scsmithr January 29, 2024 17:30

universalmind303 reviewed Jan 29, 2024

View reviewed changes

tychoish requested a review from universalmind303 January 29, 2024 19:46

more experiment

b058671

tychoish added 15 commits January 29, 2024 15:23

clarifications

28f27c5

fix task names in matrix

9a3360b

restore only more

4839b77

tighten up

662409f

explict caching

8d53b4d

two fixes

9b32191

one more

6b09485

memory

0a029ea

cache test

5edbc7b

reorderng and more

88b43d6

trim caches

bebdcf8

bindings build

b11080a

THIS IS AN EMPTY COMMIT

fe03f72

bindings and clippy don't use the cache anyway

0883702

return

325971e

scsmithr approved these changes Jan 30, 2024

View reviewed changes

tychoish merged commit 9c79f4c into main Jan 30, 2024
21 checks passed

tychoish deleted the tycho/ci-caching-and-runner-size branch January 30, 2024 15:00

tychoish mentioned this pull request Jan 30, 2024

chore: skip sqlserver+flight combination. #2526

Merged

tychoish added a commit that referenced this pull request Feb 1, 2024

ci: explicit caching and even smaller runners (#2518)

c264733

Co-authored-by: universalmind303 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: explicit caching and even smaller runners #2518

ci: explicit caching and even smaller runners #2518

tychoish commented Jan 27, 2024

universalmind303 Jan 29, 2024

tychoish Jan 29, 2024

tychoish commented Jan 29, 2024

tychoish commented Jan 30, 2024

ci: explicit caching and even smaller runners #2518

ci: explicit caching and even smaller runners #2518

Conversation

tychoish commented Jan 27, 2024

universalmind303 Jan 29, 2024

Choose a reason for hiding this comment

tychoish Jan 29, 2024

Choose a reason for hiding this comment

tychoish commented Jan 29, 2024

tychoish commented Jan 30, 2024