[SYCL]fix mul_mat_vec_q error #9939

NeoZhangJianyu · 2024-10-18T13:05:09Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

It fix the issues:
#9612
#9106

When WARP_SIZE is changed to 16 for Intel GPU, the mul_mat_vec_q() would be in die loop.

qnixsynapse · 2024-10-18T13:37:21Z

Are we considering setting warp size of 32 for all mmvq kernels? Why not just change the default warp size for all Intel GPUs instead of using a separate define QK_WARP_SIZE here?

characharm · 2024-10-18T19:33:27Z

crash log

llama_new_context_with_model: freq_scale = 1
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
found 1 SYCL devices:
| | | | |Max | |Max |Global |
|
| | | | |compute|Max work|sub |mem |
|

ID	Device Type	Name	Version	units	group	group	size	Driver version
0	[level_zero:gpu:0]	Intel Arc A770 Graphics	1.5	512	1024	32	16704M	1.3.30714

Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Exception caught at file:S:/LLM/SYCL/llama.cpp/ggml/src/ggml-sycl.cpp, line:434, func:operator()
SYCL error: CHECK_TRY_ERROR((*stream) .memset(ctx->dev_ptr, value, buffer->size) .wait()): Meet error in this line code!
in function ggml_backend_sycl_buffer_clear at S:/LLM/SYCL/llama.cpp/ggml/src/ggml-sycl.cpp:434
S:\LLM\SYCL\llama.cpp\ggml\src\ggml-sycl\common.hpp:107: SYCL error

Qwen2.5-32B-Instruct-Q3_K_M.gguf 14.8 GB size. but i think is not related to #9612
#9106

NeoZhangJianyu · 2024-10-19T11:53:20Z

crash log
llama_new_context_with_model: freq_scale = 1 [SYCL] call ggml_check_sycl ggml_check_sycl: GGML_SYCL_DEBUG: 0 ggml_check_sycl: GGML_SYCL_F16: no found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | |

ID Device Type Name Version units group group size Driver version
0 [level_zero:gpu:0] Intel Arc A770 Graphics 1.5 512 1024 32 16704M 1.3.30714
Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) Exception caught at file:S:/LLM/SYCL/llama.cpp/ggml/src/ggml-sycl.cpp, line:434, func:operator() SYCL error: CHECK_TRY_ERROR((*stream) .memset(ctx->dev_ptr, value, buffer->size) .wait()): Meet error in this line code! in function ggml_backend_sycl_buffer_clear at S:/LLM/SYCL/llama.cpp/ggml/src/ggml-sycl.cpp:434 S:\LLM\SYCL\llama.cpp\ggml\src\ggml-sycl\common.hpp:107: SYCL error

Qwen2.5-32B-Instruct-Q3_K_M.gguf 14.8 GB size. but i think is not related to #9612 #9106

This issue is due to the memory is not enough.
The model is 14.8GB, but the Arc770 only has 16GB.
When loading the model, there is additional memory usage, so the whole needed memory is more then 16GB.
Please use smaller models.

NeoZhangJianyu · 2024-10-19T12:09:15Z

Are we considering setting warp size of 32 for all mmvq kernels? Why not just change the default warp size for all Intel GPUs instead of using a separate define QK_WARP_SIZE here?

WARP_SIZE=16 could speed up common cases on Intel GPU.
But it obviously has side effect. So QK_WARP_SIZE is used for the cases need 32 value.
This PR is used to fix the known issues.

For Intel GPU, WARP_SIZE is defined as 16.
For the case need 32, use QK_WARP_SIZE.
It's created by a9554e2.
I have discussed with the author, and decide to keep WARP_SIZE=16.

qnixsynapse · 2024-10-19T12:21:55Z

@NeoZhangJianyu Indeed. I did discuss this with the author when they were working on it. Kernels which use the Q_K datatype was having problems and we had to revert to 32 to fix them with a performance penalty.

Will the onednn GEMM path fix this? onednn gemm was implemented by the same author in ggml-sycl. I think this entire issue is related to an old version of the driver(Working driver version is 1.3.30872; 1.3.30714 is affected).

characharm · 2024-10-19T13:44:25Z

This issue is due to the memory is not enough.

Got it, I was hoping that this would work the same way with the Vulkan backend, where system RAM is used when there's a lack of VRAM memory

NeoZhangJianyu · 2024-10-21T02:37:22Z

@NeoZhangJianyu Indeed. I did discuss this with the author when they were working on it. Kernels which use the Q_K datatype was having problems and we had to revert to 32 to fix them with a performance penalty.

Will the onednn GEMM path fix this? onednn gemm was implemented by the same author in ggml-sycl. I think this entire issue is related to an old version of the driver(Working driver version is 1.3.30872; 1.3.30714 is affected).

I hope to fix the all issues of Q_K by this PR. Some users are blocked by it.

For better optimization by oneDNN GEMM, it is another big topic. I suggest to create another discussion for it. Maybe there are more solutions for it.

luoyu-intel

LGTM

Co-authored-by: arthw <[email protected]>

fix mul_mat_vec_q and *_vec_q error

033a241

NeoZhangJianyu requested a review from airMeng October 18, 2024 13:06

github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Oct 18, 2024

airMeng approved these changes Oct 21, 2024

View reviewed changes

luoyu-intel approved these changes Oct 21, 2024

View reviewed changes

NeoZhangJianyu merged commit 1db8c84 into ggml-org:master Oct 21, 2024
53 checks passed

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

fix mul_mat_vec_q and *_vec_q error (ggml-org#9939)

ec2a378

Co-authored-by: arthw <[email protected]>

arthw added a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

fix mul_mat_vec_q and *_vec_q error (ggml-org#9939)

ca38242

Co-authored-by: arthw <[email protected]>

arthw added a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

fix mul_mat_vec_q and *_vec_q error (ggml-org#9939)

f2b7df5

Co-authored-by: arthw <[email protected]>

NeoZhangJianyu mentioned this pull request Dec 12, 2024

SYCL: Reduce most of the compiler warnings #10748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL]fix mul_mat_vec_q error #9939

[SYCL]fix mul_mat_vec_q error #9939

NeoZhangJianyu commented Oct 18, 2024

qnixsynapse commented Oct 18, 2024

characharm commented Oct 18, 2024

NeoZhangJianyu commented Oct 19, 2024 •

edited

Loading

NeoZhangJianyu commented Oct 19, 2024

qnixsynapse commented Oct 19, 2024 •

edited

Loading

characharm commented Oct 19, 2024

NeoZhangJianyu commented Oct 21, 2024

luoyu-intel left a comment

[SYCL]fix mul_mat_vec_q error #9939

[SYCL]fix mul_mat_vec_q error #9939

Conversation

NeoZhangJianyu commented Oct 18, 2024

qnixsynapse commented Oct 18, 2024

characharm commented Oct 18, 2024

NeoZhangJianyu commented Oct 19, 2024 • edited Loading

NeoZhangJianyu commented Oct 19, 2024

qnixsynapse commented Oct 19, 2024 • edited Loading

characharm commented Oct 19, 2024

NeoZhangJianyu commented Oct 21, 2024

luoyu-intel left a comment

Choose a reason for hiding this comment

NeoZhangJianyu commented Oct 19, 2024 •

edited

Loading

qnixsynapse commented Oct 19, 2024 •

edited

Loading