Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NCCL plugin to v1.0.3 in A3U #3594

Conversation

akiki-liang0
Copy link
Contributor

@akiki-liang0 akiki-liang0 commented Jan 27, 2025

  • update a3u slurm blueprints with NCCL plugin v1.0.3
  • update NCCL env vars in NeMo example Dockerfile

Tests:

  • 5 consecutive successful NCCL example runs
  • 5 consecutive successful NeMo example runs
  • 2 consecutive successful Ramble workload runs

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@samskillman samskillman added the release-version-updates Added to release notes under the "Version Updates" heading. label Jan 27, 2025
@tpdownes
Copy link
Member

/gcbrun

@tpdownes tpdownes self-requested a review January 28, 2025 01:01
@tpdownes tpdownes merged commit 8fd1598 into GoogleCloudPlatform:develop Jan 28, 2025
11 of 57 checks passed
@abbas1902 abbas1902 mentioned this pull request Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-version-updates Added to release notes under the "Version Updates" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants