-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError when running matrix mutliplication example : CUDA: Error- invalid ptx #187
Comments
Hey! FP16 on Pascal GPUs is not supported. More generally compute capability < 70 is not supported, though you can make it work by changing the datatype to fp16 and using block sizes that don't overflow shared memory |
Thank you for your quick answer! I tested with a GPU V100S-PCIE-32GB which has compute capabilities 7.0 |
Hmm this is very odd, given that the CI happens on a V100 and reliably passes. Is it with a fresh clone of the repo? |
No, it is a pip install. For some reason, I actually had changed the matrix shape, which caused the error. I changed the example code to reapply the mm a second time: matrix_size = 512
torch.manual_seed(0)
a = torch.randn((matrix_size , matrix_size ), device='cuda', dtype=torch.float16)
b = torch.randn((matrix_size , matrix_size ), device='cuda', dtype=torch.float16)
c_0 = matmul(a, b, activation=None)
c_1 = torch.matmul(a, b)
print(c_0)
print(c_1)
print(triton.testing.allclose(c_0, c_1))
c_0 = matmul(a, b, activation=None)
c_1 = torch.matmul(a, b)
print(c_0)
print(c_1)
print(triton.testing.allclose(c_0, c_1)) When the matrix shape is a power of 2, the first and second mm are equal to When |
Ah yes, this is expected in the tutorial. For a more robust matmul you can refer to https://github.com/openai/triton/blob/master/python/triton/ops/matmul.py Look in particular at the following lines
in the tutorials it's not there, which means that the kernel will accumulate out-of-bounds memory elements when K isn't a multiple of BLOCK_K |
Thank you much for you reactivity ! |
Add the public llvm-spirv component to the Triton XPU backend. Support to build the XPU backend friendly.
Hello,
First: Thank you for this great piece of work!
I installed triton from
pip
.When I try to run the matrix multiplication example, I get the error:
RuntimeError: CUDA: Error- invalid ptx
GPU: GeForce GTX 1080 Ti
Output of
nvcc --version
:And more details from pytorch
collect_env.py
:Full trace:
The text was updated successfully, but these errors were encountered: