Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added compiler hints to enable buffer loads #729

Merged
merged 1 commit into from
Feb 21, 2025
Merged

Conversation

azaidy
Copy link

@azaidy azaidy commented Feb 21, 2025

Performance:
python gemm.py -model all

        global_load     buffer_load    
model M N K Bf16 Fp8E4 Fp8E5 Bf16 Fp8E4 Fp8E5
llama3-8B 4096 4096 14336 536 913 932 553 941 949
llama3-70B 4096 8192 28672 485 907 915 496 930 945
llama3-405B 4096 16384 53248 464 803 815 485 824 870

@azaidy
Copy link
Author

azaidy commented Feb 21, 2025

In order to enable usage of buffer_loads instead of global_loads, environment variable AMDGCN_USE_BUFFER_OPS must be set to 1

@azaidy azaidy requested a review from vgokhale February 21, 2025 17:40
@azaidy azaidy merged commit 752d83c into main_perf Feb 21, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants