Change the sparse attention API to be compatible with latest changes of triton #902

RezaYazdaniAminabadi · 2021-03-29T22:42:50Z

This PR makes some changes to the parameters passed for the triton kernel to be compatible with the latest Triton.

This address #900 and #838

…on the triton side

sdtblck · 2021-04-05T13:55:35Z

Hi @RezaYazdaniAminabadi will this update also make the sparse kernels compatible with different GPU architectures (e.g A100s?)

RezaYazdaniAminabadi · 2021-04-05T16:09:52Z

Hi @RezaYazdaniAminabadi will this update also make the sparse kernels compatible with different GPU architectures (e.g A100s?)

Hi @sdtblck

Yes, I think the main intension of these changes on the Triton is to support A100. I have already tested it on V100. I will run more tests on A100 as well. Please feel free to try them out and let me know if there is any issue.
Thanks,
Reza

ShivanshuPurohit · 2021-04-07T09:44:54Z

Hi @RezaYazdaniAminabadi I updated our fork to the sparse-attn/support-latest-triton branch, but even then ds_report shows that A100 isn't compatible with sparsity

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch']
torch version .................... 1.8.0+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/mchorse/gpt-neox/src/deepspeed/deepspeed']
deepspeed info ................... 0.3.13+feb288a, feb288a, HEAD
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

But I did notice that triton version in requirements-sparse-attn.txt is 1.0.0.dev20210329 and not 1.1. Could that be the issue here?

RezaYazdaniAminabadi · 2021-04-07T17:59:10Z

Hi @RezaYazdaniAminabadi I updated our fork to the sparse-attn/support-latest-triton branch, but even then ds_report shows that A100 isn't compatible with sparsity

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires CUDA version 10.1+, does not currently support >=11 or <10.1
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch']
torch version .................... 1.8.0+cu111
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/mchorse/gpt-neox/src/deepspeed/deepspeed']
deepspeed info ................... 0.3.13+feb288a, feb288a, HEAD
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1

But I did notice that triton version in requirements-sparse-attn.txt is 1.0.0.dev20210329 and not 1.1. Could that be the issue here?

The issue is that we have several places for installation and JIT compile that we guard the usage of higher version of Triton. I am ganna resolve them soon!

sdtblck · 2021-04-07T23:30:06Z

@RezaYazdaniAminabadi can confirm this runs on the A100s 🚀 . I had to make some changes to op builder to pass the compatibility steps https://github.com/EleutherAI/DeeperSpeed/blob/eb9a6a8201215307ba071357a06a9b03c03af3be/op_builder/sparse_attn.py

mgrankin · 2021-04-09T17:23:55Z

@RezaYazdaniAminabadi sparse-attn/support-latest-triton works for me on Nvidia 3090, thank you 👍

sdtblck · 2021-05-06T12:32:33Z

Hi @RezaYazdaniAminabadi - it seems triton has totally changed their API and removed triton==1.0.0.dev20210329 from pip - any plans to update to the latest version?

exelents · 2021-05-12T15:10:57Z

Hi @RezaYazdaniAminabadi - it seems triton has totally changed their API and removed triton==1.0.0.dev20210329 from pip - any plans to update to the latest version?

Same problem. I cannot run ruGPT3 on my 3090 because I should use newest CUDA and therefore newest triton. But it seems that triton==1.0.0.dev20210329 was removed from pip, and I can't know what I should do. Newest version of triton==1.0.0.dev20210510 seems not work for me.

UPD:
Here is an error when I try to run ruGPT model:
ImportError: /export/DeepSpeed-triton/deepspeed/ops/sparse_attention/sparse_attn_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv

exelents · 2021-05-12T19:19:50Z

UPD2: You can get this error if install older triton==0.4.0 , setup DeepSpeed and import this:

# And this cell should be run without errors
import deepspeed.ops.sparse_attention.sparse_attn_op

exelents · 2021-05-13T19:55:25Z

UPD3 - problem have been solved.

Backport deepspeedai#902

Change the sparse attention API to be compatible with latest changes …

fb1bf6c

…on the triton side

RezaYazdaniAminabadi requested review from arashashari, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, samyam, ShadenSmith and tjruwase as code owners March 29, 2021 22:42

This was referenced Mar 29, 2021

Triton 1.1 support #900

Closed

Could you olease update triton support to 0.3.0, old version of that even do not have a doc. #838

Closed

sdtblck mentioned this pull request Apr 5, 2021

Can we get sparse attention working with A100s / CUDA 11? EleutherAI/gpt-neox#207

Closed

remove compatibility checks for CUDA 11

2ea0fee

mgrankin mentioned this pull request Apr 9, 2021

GPT3XL: generation doesn't work ai-forever/ru-gpts#42

Closed

arashashari added 2 commits June 2, 2021 11:43

Update requirements-sparse_attn.txt

3386309

Merge branch 'master' into sparse-attn/support-latest-triton

e7378a4

arashashari approved these changes Jun 2, 2021

View reviewed changes

arashashari merged commit 26e3841 into master Jun 2, 2021

jeffra mentioned this pull request Jun 14, 2021

Cuda 11 support? #790

Closed

elnardu added a commit to socrash-ai/DeepSpeed that referenced this pull request Feb 18, 2022

[sokrashai] Backport triton 1.0 port

91981f7

Backport deepspeedai#902

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the sparse attention API to be compatible with latest changes of triton #902

Change the sparse attention API to be compatible with latest changes of triton #902

RezaYazdaniAminabadi commented Mar 29, 2021

sdtblck commented Apr 5, 2021

RezaYazdaniAminabadi commented Apr 5, 2021

ShivanshuPurohit commented Apr 7, 2021

RezaYazdaniAminabadi commented Apr 7, 2021

sdtblck commented Apr 7, 2021

mgrankin commented Apr 9, 2021

sdtblck commented May 6, 2021

exelents commented May 12, 2021 •

edited

Loading

exelents commented May 12, 2021

exelents commented May 13, 2021

Change the sparse attention API to be compatible with latest changes of triton #902

Change the sparse attention API to be compatible with latest changes of triton #902

Conversation

RezaYazdaniAminabadi commented Mar 29, 2021

sdtblck commented Apr 5, 2021

RezaYazdaniAminabadi commented Apr 5, 2021

ShivanshuPurohit commented Apr 7, 2021

RezaYazdaniAminabadi commented Apr 7, 2021

sdtblck commented Apr 7, 2021

mgrankin commented Apr 9, 2021

sdtblck commented May 6, 2021

exelents commented May 12, 2021 • edited Loading

exelents commented May 12, 2021

exelents commented May 13, 2021

exelents commented May 12, 2021 •

edited

Loading