-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(nvidia): build pytorch to get older cuda compute capabilities and setup arm64 support #578
Conversation
dfda945
to
01297b0
Compare
############################################################################### | ||
# 4) Install Pytorch from Source | ||
############################################################################### | ||
# envs needed to make the path of NVCC known to the compilation | ||
ENV CUDA_HOME=/usr/local/cuda | ||
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 | ||
ENV PATH=$PATH:$CUDA_HOME/bin | ||
# this list could be minimized based on the supported GPUs | ||
ENV TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 8.7 8.9 9.0" | ||
|
||
RUN pip3 install typing-extensions sympy | ||
RUN git clone \ | ||
--recursive https://github.com/pytorch/pytorch.git \ | ||
--branch $PYTORCH_BRANCH \ | ||
&& cd pytorch && eval "$PYTORCH_BUILD_ENV python3 setup.py install" && cd .. \ | ||
&& rm -rf pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how long this step takes? Just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the workflow took ~5hr 30mins, which the bulk of is here 😅
- run: docker build --file test/images/nvidia-inference/Dockerfile test/images/nvidia-inference | ||
- run: | | ||
docker build --file test/images/nvidia-inference/Dockerfile test/images/nvidia-inference \ | ||
--build-arg PYTORCH_BUILD_ENV="MAX_JOBS=8 BUILD_TEST=0 USE_FLASH_ATTENTION=0 USE_MEM_EFF_ATTENTION=0 USE_DISTRIBUTED=0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious... Any reason for choosing a value of 8
for MAX_JOBS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea this was just manually tuned based on OOM errors for MAX_JOBS
being too high and searched for a value that passed in under the 6hr default gh-action limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
5a2805e
to
e867863
Compare
…d setup arm64 support
@@ -110,8 +110,7 @@ def main(): | |||
# Retrieve environment variables | |||
rank = int(os.getenv("OMPI_COMM_WORLD_RANK", "0")) | |||
world_size = int(os.getenv("OMPI_COMM_WORLD_SIZE", "1")) | |||
num_gpus_per_node = int(os.getenv("NUM_GPUS_PER_NODE", "8")) | |||
local_rank = rank % num_gpus_per_node | |||
local_rank = int(os.getenv("OMPI_COMM_WORLD_LOCAL_RANK", "0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattcjo any reason we didn't do this before based on https://docs.open-mpi.org/en/v5.0.x/tuning-apps/environment-var.html?
test_04_bus_grind() | ||
{ | ||
assert_status_code 0 "$DEMU_SUITE_DIR/busGrind -a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the busGrind
suite was not in the upstream cuda samples repo that i moved to instead of the cuda-demo-suite
package that was only available for x86_64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Issue #, if available:
Description of changes:
Running these containers on instances like the
g5g
family will not work because the compute capability of the NVIDIA T4 architecture (7.5) is older than whats provided in pytorch (generally 8.0+ at this point).This PR sets up arm64 compatibility and builds pytorch in the images to gain older cuda compute capability support.
Testing
additionally tested for backwards compatibility with
x86_64
nvidia instance types.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.