-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preview: GPU support #717
Preview: GPU support #717
Conversation
This has been tested and it works, as per above description. TODO in order to move from Preview to fully supported option (some of these may be debated):
I think this, as a preview, can possibly be shipped this week. |
…iver not loaded: unknown
Remaining work: Ensure that this functionality is either turned off or can be safely used if there is no public internet access (I don't know if the VM extensions can be reached from a private virtual network with no public IP access). Bonus work: if we recognize that the image already has the drivers, just pass them through (don't try to reinstall them). |
For the future: the actions taken in the start task for this are not needed for certain VM images, so we can reduce the node startup times (faster tasks, lower costs). |
This is an initial implementation of GPU support in TES. It has some current limitations:
backend_parameters
in the task resources must includevm_size
set to a specific SKU within the above-described series of VM families.This follows a combination of NVIDIA, Docker, and Azure documentation for enabling containers to use GPUs. Specifically, this:
--gpus all
docker cli implementationThings in the NVIDIA documentation that are NOT implemented (it's unclear what should or should not be implemented based on our use cases)
driver capabilities
mounted inside docker containersNote that the azure documentation for the VM extension for GPU support on linux points the reader to the following EULA: https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/
addresses microsoft/CromwellOnAzure#356