Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: run multidim-interop.yml on self-hosted runners #154

Merged
merged 10 commits into from
Jul 7, 2023

Conversation

galargh
Copy link
Contributor

@galargh galargh commented Mar 17, 2023

This PR revives the proposal from #92 (c53be4c).

Please note that the first build on self-hosted runners had to populate the cache https://github.com/libp2p/test-plans/actions/runs/4447196185/attempts/1. This suggests that some part of the machine config influences the cache keys. It might need some tweaks to make it interoperable with hosted GitHub Actions runners; or it might be a good idea to have another job which populates the cache for hosted GitHub Actions runners.

The subsequent runs were using cache as expected:

I did not try to increase worker count nor modify docker run parameters to accommodate the more powerful environment.

This is the self-hosted runner definition: https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/runners.tf#L28
This is the packer config the AMI was built from: https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/images/ubuntu-jammy/kubo.pkrvars.hcl

We can modify the runners as you wish of course.

@galargh galargh force-pushed the feat/self-hosted-runners branch 3 times, most recently from a2cd76d to 0360dee Compare March 17, 2023 11:51
Comment on lines +71 to +87
env:
AWS_BUCKET: ${{ inputs.s3-cache-bucket }}
AWS_REGION: ${{ inputs.aws-region }}
AWS_ACCESS_KEY_ID: ${{ inputs.s3-access-key-id }}
AWS_SECRET_ACCESS_KEY: ${{ inputs.s3-secret-access-key }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way, we modify the env only for this step.

Comment on lines +61 to +68
with:
config: ${{ steps.buildkit.outputs.config }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On standard, hosted GitHub Actions runners this will be an empty string. Which is the default.

@galargh galargh requested a review from MarcoPolo March 17, 2023 13:42
@galargh galargh marked this pull request as ready for review March 17, 2023 13:42
@thomaseizinger
Copy link
Contributor

Half an hour is still pretty slow. This job usually finishes in 15min on PRs for us? Is that because we filter?

Will this mean we will also be able to have a cache for the PR docker builds that use RUN --type=cache (forgot the exact syntax).

@MarcoPolo
Copy link
Contributor

m5.large is 2 cpus and 8GB of ram. Isn't that roughly the same as the hosted runners? https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

@galargh
Copy link
Contributor Author

galargh commented Mar 20, 2023

I moved you to c5.4xlarge, but, ultimately, you can pick any instance type that suits your workload characteristic the best - https://instances.vantage.sh/.

I requested the job that runs on self-hosted (the one with cache) to use 16 workers here - 255238e

I also had to request the build part to remain sequential - 0626f93

The job - https://github.com/libp2p/test-plans/actions/runs/4465794780/jobs/7843205397?pr=154 - took 20 minutes to finish. It spent ~7 minutes building stuff and then ~13 minutes running the tests.

@thomaseizinger
Copy link
Contributor

The job - libp2p/test-plans/actions/runs/4465794780/jobs/7843205397?pr=154 - took 20 minutes to finish. It spent ~7 minutes building stuff and then ~13 minutes running the tests.

That is an amazing improvement! With the filtering that we do in the pull-request workflows, that is likely to be even quicker!

@thomaseizinger
Copy link
Contributor

Is anything blocking this from merging? I'd be great to improve the runtime of this job.

@MarcoPolo
Copy link
Contributor

defer until after ipfs thing

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here! I'd suggest we split this into two PRs so we can land parts of it quicker and discuss if we can perhaps avoid introducing the "input".

@MarcoPolo MarcoPolo force-pushed the feat/self-hosted-runners branch from f4b07dd to bc94d5f Compare May 22, 2023 21:16
@MarcoPolo
Copy link
Contributor

rebased, fyi

@MarcoPolo
Copy link
Contributor

26 minutes on this branch versus 54 minutes on master. @galargh What's the cost to running this? It's nice it's twice as fast, but it's also currently free.

Copy link
Contributor

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One open question around cost, but changes look good. Please request another review when the question has been answered.

@MarcoPolo
Copy link
Contributor

@galargh Friendly ping on my open question :)

@galargh
Copy link
Contributor Author

galargh commented Jun 23, 2023

I totally missed the question about cost! Let me do a quick calculation for you.

The monthly cost of moving on with this should be around $38/month (see the AWS resource calculation - https://calculator.aws/#/estimate?id=9a5a201262a70d9f491015a5aa81a4aeaf1ae966). I based this on:

  • the current configuration of 4xlarge runner
  • 66 libp2p multidimensional interop test workflow runs in the past 30 days
  • 36 minutes run-multidim-interop job took to complete on this PR
  • ~3GB of new data in the S3 bucket in the past 30 days

@galargh galargh requested a review from MarcoPolo June 23, 2023 08:06
@thomaseizinger
Copy link
Contributor

With the recently added implementations, the interop test are now by far our longest CI job at > 20 minutes: https://github.com/libp2p/rust-libp2p/actions/runs/5387398505/jobs/9778664726

What is the process around deciding whether the spend is worth it?

@MarcoPolo
Copy link
Contributor

Will merge after CI passes. Thank you so much @galargh! ❤️

@MarcoPolo MarcoPolo merged commit 35ab35b into master Jul 7, 2023
@MarcoPolo MarcoPolo deleted the feat/self-hosted-runners branch July 7, 2023 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ci: better caching for interop-tests docker build
3 participants