Bug: Intel Arc - not working at all #9106

Eugeniusz-Gienek · 2024-08-20T19:45:26Z

What happened?

Going through the manual - SYCL I mean. Everything compiles okay. Running it always thows an error. Can't make it work. OS used: Linux Gentoo. P.S. docker doesn't work either. P.P.S. device IS listed in the list.

Name and Version

./build/bin/llama-cli --version

version: 3609 (2f3c146)
built with Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711) for x86_64-unknown-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

# ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
Log start
main: build = 3609 (2f3c1466)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711) for x86_64-unknown-linux-gnu
main: seed  = 1724182694
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from models/llama-2-7b.Q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 3
llm_load_vocab: token to piece cache size = 0.1684 MB
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0                                                  llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0                                               llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)                                llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'                                            llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'                                          llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: max token length = 48                                                 ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes                                                          ggml_sycl_init: found 2 SYCL devices:
llm_load_tensors: ggml ctx size =    0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU                                    llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  3577.56 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 2048                                            llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1                                               [SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: yes
found 2 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A380 Graphics|    1.3|    128|    1024|   32|  6064M|            1.3.29735|
| 1| [level_zero:gpu:1]|                 Intel UHD Graphics 630|    1.3|     24|     256|   32| 46333M|            1.3.29735|
llama_kv_cache_init:      SYCL0 KV buffer size =  2048.00 MiB
llama_new_context_with_model: KV self size  = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =     0.12 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =   296.00 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    16.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
Unexpected pattern!
UNREACHABLE executed at /var/tmp/portage/dev-util/spirv-llvm-translator-15.0.0-r1/work/SPIRV-LLVM-Translator-15.0.0/lib/SPIRV/SPIRVUtil.cpp:2037!
The program was built for 1 devices
Build program log for 'Intel(R) Arc(TM) A380 Graphics':
 -11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:/home/username/llama/ggml/src/ggml-sycl.cpp, line:2722

The text was updated successfully, but these errors were encountered:

Eugeniusz-Gienek · 2024-08-20T19:47:54Z

Additionally:

# ./build/bin/llama-ls-sycl-device                               found 2 SYCL devices:                                                                      |  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A380 Graphics|    1.3|    128|    1024|   32|  6064M|            1.3.29735|
| 1| [level_zero:gpu:1]|                 Intel UHD Graphics 630|    1.3|     24|     256|   32| 46333M|            1.3.29735|

And

# sycl-ls                                                        [opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A380 Graphics OpenCL 3.0 NEO  [24.22.029735]
[opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 630 OpenCL 3.0 NEO  [24.22.029735]
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A380 Graphics 1.3 [1.3.29735]
[level_zero:gpu][level_zero:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 630 1.3 [1.3.29735

Eugeniusz-Gienek · 2024-08-20T19:51:56Z

...and just in case:

# lspci -k
[...]
03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05)
        Subsystem: ASRock Incorporation Device 6006
        Kernel driver in use: i915
        Kernel modules: xe
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
        Subsystem: Hewlett-Packard Company Device 83ee
        Kernel driver in use: i915

qnixsynapse · 2024-08-21T04:13:20Z

Run ./test-backend-ops -b SYCL0 and paste the output here.

NeoZhangJianyu · 2024-08-22T01:45:25Z

@Eugeniusz-Giene
Did you build it with FP16?
If yes, please build it with FP32 firstly.

airMeng · 2024-08-22T01:58:18Z

@ClarkChin08 please give a review

ProjectMoon · 2024-08-27T19:00:11Z

I am having this exact same issue, on Gentoo, with the exact same LLVM spirv library.

I built llama.cpp according to the SYCL docs, using the example build script, which builds with FP32 by default. So I have this issue when using FP32.

ClarkChin08 · 2024-08-28T02:33:56Z

What happened?

Going through the manual - SYCL I mean. Everything compiles okay. Running it always thows an error. Can't make it work. OS used: Linux Gentoo. P.S. docker doesn't work either. P.P.S. device IS listed in the list.

Name and Version

./build/bin/llama-cli --version

version: 3609 (2f3c146) built with Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711) for x86_64-unknown-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

# ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
Log start
main: build = 3609 (2f3c1466)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711) for x86_64-unknown-linux-gnu
main: seed  = 1724182694
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from models/llama-2-7b.Q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 3
llm_load_vocab: token to piece cache size = 0.1684 MB
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0                                                  llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0                                               llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)                                llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'                                            llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'                                          llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: max token length = 48                                                 ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes                                                          ggml_sycl_init: found 2 SYCL devices:
llm_load_tensors: ggml ctx size =    0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU                                    llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  3577.56 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 2048                                            llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1                                               [SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: yes
found 2 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A380 Graphics|    1.3|    128|    1024|   32|  6064M|            1.3.29735|
| 1| [level_zero:gpu:1]|                 Intel UHD Graphics 630|    1.3|     24|     256|   32| 46333M|            1.3.29735|
llama_kv_cache_init:      SYCL0 KV buffer size =  2048.00 MiB
llama_new_context_with_model: KV self size  = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =     0.12 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =   296.00 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    16.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
Unexpected pattern!
UNREACHABLE executed at /var/tmp/portage/dev-util/spirv-llvm-translator-15.0.0-r1/work/SPIRV-LLVM-Translator-15.0.0/lib/SPIRV/SPIRVUtil.cpp:2037!
The program was built for 1 devices
Build program log for 'Intel(R) Arc(TM) A380 Graphics':
 -11 (PI_ERROR_BUILD_PROGRAM_FAILURE)Exception caught at file:/home/username/llama/ggml/src/ggml-sycl.cpp, line:2722

-sm none -mg 0

Hi， you use '-sm none -mg 0'， This will keep the model running on the GPU-0 and the memory is only 6064M, Please change the command to '-sm layer' instead of '-sm none -mg 0' , this will split the graph to two GPUs.

NeoZhangJianyu · 2024-08-28T06:07:44Z

@Eugeniusz-Gienek
I see your CPU is i5-8500.
For Intel Arc GPU, it's supported from 11th Core CPU or newer CPU.
I'm not sure your CPU + Arc 380 are supported by oneAPI well. Looks like no.

Eugeniusz-Gienek · 2024-08-28T08:13:08Z

@Eugeniusz-Gienek
I see your CPU is i5-8500.
For Intel Arc GPU, it's supported from 11th Core CPU or newer CPU.
I'm not sure your CPU + Arc 380 are supported by oneAPI well. Looks like no.

Oh that's a very important information which I would assume belongs to SYCL installation manual... Thanks for sharing!

ProjectMoon · 2024-08-28T09:37:29Z

Hmm, I have a 10th gen CPU. That might be why mine does not work, either.

Eugeniusz-Gienek · 2024-08-28T09:45:24Z

To be honest I have bought specifically Intel Arc A380 with intent to use on a HP Prodesk 600 G4 SFF as there are not a lot of options out there for SFF format. If I knew in advance it won't work I wouldn't buy it. Anyways, I've managed to return it to seller and out of curiosity checked if the onboard Intel GPU would work at least somehow. The only way for it to work was to use Vulkan, which on that GPU is TERRIBLY slow. So, to sum up - nope, you probably not want to use any Intel GPU if Your CPU is not Intel 11th gen+. Ah, and actually I've switched to another PC for llama.cpp with Intel 10th gen and NVidia 3060, which works great with GPT4ALL but to my surprise didn't work that good with llama.cpp as it appears to be that if one wants to run a webserver the RAM requrement is also a thing, not only VRam - thus for a 8Gb llama model my 16Gb is .... not enough. Anyways, now it is a waiting game for additional RAM to appear and maybe finally the local GPT webserver will be functional in the network.

NeoZhangJianyu · 2024-08-28T12:56:16Z

It's limited by oneAPI/SYCL support in fact.

Some motherboard of 10th Core support Arc GPU by support resize bar feature.
Please check: https://www.intel.com/content/www/us/en/support/articles/000090831/graphics.html

Eugeniusz-Gienek · 2024-08-28T13:08:59Z

I would propose to include this information in the manual - in order for potential new users to not be frustrated and purchase the hardware which won't fulfill the expectations... What do You think? Does it make sense?

piDack · 2024-08-28T23:20:41Z

I suggest you might consider using the Vulkan version of llama.cpp, as it is compatible with almost all graphics cards.Intel's GPU software & driver is bad

Eugeniusz-Gienek · 2024-08-28T23:23:44Z

I suggest you might consider using the Vulkan version of llama.cpp, as it is compatible with almost all graphics cards.Intel's GPU software & driver is bad

This is exactly what I did to be honest but that implementation is extremely slow...

NeoZhangJianyu · 2024-08-29T03:19:31Z

I would propose to include this information in the manual - in order for potential new users to not be frustrated and purchase the hardware which won't fulfill the expectations... What do You think? Does it make sense?

Yes.

resize bar feature could be detected in BIOS.
It's hard to list all supported hardware for it.

If your PC with 10th Core CPU has resize bar, it could support Arc GPU.

NeoZhangJianyu · 2024-08-30T03:19:28Z

i7 8700K on GIGABYTE Z370d motherboard by enable "Above 4G Decoding" in BOIS.
Arc 770 is supported.
But due to miss the "resize bar" feature, the performance is 50%.

jshield · 2024-09-11T14:59:25Z

technically resizable BARs have been supported since PCIe 2.0, you can most likely enable them by modifying the DSDTs/DXE of your UEFI BIOS, there is a framework for that here: https://github.com/xCuri0/ReBarUEFI

if you do have Above 4G Decoding as an option, the linux kernel will attempt itself to resize the BARs when loading the drivers, although I'm not sure if the ARC drivers do that. amdgpu drivers do however.

NineMeowICT · 2024-10-02T02:19:48Z

@NeoZhangJianyu
This is mine:
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz OpenCL 3.0 (Build 0) [2024.18.7.0.11_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [24.35.30872.22]
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.5 [1.3.30872]

The command that works properly:
./llama-server -m /media/ninemeow/Delta/NLP/Lumimaid-v0.2-8B-Q6_K-imat.gguf -c 48000

And I got this error after I specified -ngl option:

MKL Warning: Incompatible OpenCL driver version. GPU performance may be reduced.
Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)
Exception caught at file:/home/ninemeow/llama.cpp/ggml/src/ggml-sycl.cpp, line:3438, func:operator()
SYCL error: CHECK_TRY_ERROR(dpct::gemm_batch( *main_stream, oneapi::mkl::transpose::trans, oneapi::mkl::transpose::nontrans, ne01, ne11, ne10, alpha, (const void **)(ptrs_src.get() + 0 * ne23), dpct::library_data_t::real_half, nb01 / nb00, (const void **)(ptrs_src.get() + 1 * ne23), dpct::library_data_t::real_half, nb11 / nb10, beta, (void **)(ptrs_dst.get() + 0 * ne23), cu_data_type, ne01, ne23, cu_compute_type)): Meet error in this line code!
  in function ggml_sycl_mul_mat_batched_sycl at /home/ninemeow/llama.cpp/ggml/src/ggml-sycl.cpp:3438
/home/ninemeow/llama.cpp/ggml/src/ggml-sycl/common.hpp:107: SYCL error
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

llama.cpp: build: 3848 (c919d5d)
oneAPI: 24.2.1
Intel OpenCL compute runtime: 24.35.30872.22
Kernel: 6.8
ReBAR and Above 4G Decoding is enabled. (I use i3-9100 though. B360 chipset supports ReBAR in fact and ASUS has pushed a BIOS update to support ReBAR in software )

My A770 can run ComfyUI flawlessly in the meantime, so it doesn't look like a oneAPI compatibility issue.

NeoZhangJianyu · 2024-10-12T00:27:40Z

@NineMeowICT
Could you rm the folder "build" and compile again with the default setting?
And share the build cmd and running cmd.

Additional, does it work well with llama2-7b-q4 as the example in SYCL.md?

NineMeowICT · 2024-10-12T14:19:30Z

@NeoZhangJianyu
I had tried what you said and got a same error.

build cmd: ./examples/sycl/build.sh
running cmd: ZES_ENABLE_SYSMAN=1 ./llama-server -m /media/ninemeow/Delta/NLP/Lumimaid-v0.2-8B-Q6_K-imat.gguf -ngl 33 -c 48000

But it can pass run-llama2.sh test.

NeoZhangJianyu · 2024-10-13T02:26:23Z

@NineMeowICT
I reproduce this issue with b3848.
I am checking it.

qnixsynapse · 2024-10-13T04:09:25Z

@NineMeowICT Seems like #9612 (comment)

NeoZhangJianyu · 2024-10-15T04:52:44Z

@NineMeowICT
Before we fix it, please try with commit id: c35e586
it should work well.

NeoZhangJianyu · 2024-10-22T02:17:40Z

@Eugeniusz-Gienek
Could you test with latest code?
I have fixed it.

Eugeniusz-Gienek · 2024-10-22T08:15:37Z

Thank You, I will test it hopefully today after work :)

NineMeowICT · 2024-11-02T02:20:22Z

@NeoZhangJianyu Thank you. I just built it and tested several models and no errors occurred.

github-actions · 2024-12-17T01:07:43Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Eugeniusz-Gienek added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Aug 20, 2024

Eugeniusz-Gienek closed this as completed Aug 20, 2024

Eugeniusz-Gienek reopened this Aug 20, 2024

airMeng assigned arthw Aug 22, 2024

ngxson added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Aug 22, 2024

NeoZhangJianyu mentioned this issue Oct 18, 2024

[SYCL]fix mul_mat_vec_q error #9939

Merged

4 tasks

github-actions bot added the stale label Dec 3, 2024

github-actions bot closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Intel Arc - not working at all #9106

Bug: Intel Arc - not working at all #9106

Eugeniusz-Gienek commented Aug 20, 2024

Eugeniusz-Gienek commented Aug 20, 2024

Eugeniusz-Gienek commented Aug 20, 2024

qnixsynapse commented Aug 21, 2024

NeoZhangJianyu commented Aug 22, 2024

airMeng commented Aug 22, 2024

ProjectMoon commented Aug 27, 2024

ClarkChin08 commented Aug 28, 2024

What happened?

Name and Version

./build/bin/llama-cli --version

What operating system are you seeing the problem on?

Relevant log output

NeoZhangJianyu commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

ProjectMoon commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

NeoZhangJianyu commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

piDack commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

NeoZhangJianyu commented Aug 29, 2024

NeoZhangJianyu commented Aug 30, 2024

jshield commented Sep 11, 2024

NineMeowICT commented Oct 2, 2024 •

edited

Loading

NeoZhangJianyu commented Oct 12, 2024 •

edited

Loading

NineMeowICT commented Oct 12, 2024

NeoZhangJianyu commented Oct 13, 2024

qnixsynapse commented Oct 13, 2024

NeoZhangJianyu commented Oct 15, 2024

NeoZhangJianyu commented Oct 22, 2024

Eugeniusz-Gienek commented Oct 22, 2024

NineMeowICT commented Nov 2, 2024

github-actions bot commented Dec 17, 2024

Bug: Intel Arc - not working at all #9106

Bug: Intel Arc - not working at all #9106

Comments

Eugeniusz-Gienek commented Aug 20, 2024

What happened?

Name and Version

./build/bin/llama-cli --version

What operating system are you seeing the problem on?

Relevant log output

Eugeniusz-Gienek commented Aug 20, 2024

Eugeniusz-Gienek commented Aug 20, 2024

qnixsynapse commented Aug 21, 2024

NeoZhangJianyu commented Aug 22, 2024

airMeng commented Aug 22, 2024

ProjectMoon commented Aug 27, 2024

ClarkChin08 commented Aug 28, 2024

What happened?

Name and Version

./build/bin/llama-cli --version

What operating system are you seeing the problem on?

Relevant log output

NeoZhangJianyu commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

ProjectMoon commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

NeoZhangJianyu commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

piDack commented Aug 28, 2024

Eugeniusz-Gienek commented Aug 28, 2024

NeoZhangJianyu commented Aug 29, 2024

NeoZhangJianyu commented Aug 30, 2024

jshield commented Sep 11, 2024

NineMeowICT commented Oct 2, 2024 • edited Loading

NeoZhangJianyu commented Oct 12, 2024 • edited Loading

NineMeowICT commented Oct 12, 2024

NeoZhangJianyu commented Oct 13, 2024

qnixsynapse commented Oct 13, 2024

NeoZhangJianyu commented Oct 15, 2024

NeoZhangJianyu commented Oct 22, 2024

Eugeniusz-Gienek commented Oct 22, 2024

NineMeowICT commented Nov 2, 2024

github-actions bot commented Dec 17, 2024

NineMeowICT commented Oct 2, 2024 •

edited

Loading

NeoZhangJianyu commented Oct 12, 2024 •

edited

Loading