Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(nvidia): support arm64 builds #2137

Merged
merged 1 commit into from
Feb 13, 2025
Merged

Conversation

ndbaker1
Copy link
Member

@ndbaker1 ndbaker1 commented Feb 5, 2025

Issue #, if available:

Description of changes:

this PR removes x86_64 specific packages in the nvidia driver provisioner and an override for the repository source URL as an env-template.

also adds g5g to the list of supported GPU instance type families in nodeadm.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

make \
    k8s=1.31 \
    arch=arm64 \
    os_distro=al2023 \
    enable_accelerator=nvidia \
    nvidia_repository_url='https://developer.download.nvidia.com/compute/cuda/repos/rhel9/sbsa/cuda-rhel9.repo'

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

Comment on lines +57 to +58
# update all current .repo sources to enable gpgcheck
sudo dnf config-manager --save --setopt=*.gpgcheck=1
Copy link
Member Author

@ndbaker1 ndbaker1 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this api is cleaner than using sed

@@ -32,6 +32,7 @@
| `launch_block_device_mappings_volume_size` | |
| `nodeadm_build_image` | Image to use as a build environment for nodeadm |
| `nvidia_driver_major_version` | To be used only when ```enable_accelerator = nvidia```. Driver version to install, depends on what is available in NVIDIA repository. |
| `nvidia_repository_url` | YUM/DNF Repository override for the NVIDIA driver packages |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the use case for override?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the source packages like you see in the overview, which could also be a private target

@ndbaker1 ndbaker1 merged commit 2ffb0d5 into awslabs:main Feb 13, 2025
11 checks passed
@ndbaker1 ndbaker1 deleted the arm64-gpu branch February 14, 2025 22:16
@arousseau-coveo
Copy link

Was this reviewed before merge ?
This doesn't seems to work out of the box with AL2023.
The following repo : https://developer.download.nvidia.com/compute/cuda/repos/amz2023/x86_64/cuda-amz2023.repo
Doesn't support arm64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants