Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs #12063

Draft
wants to merge 169 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
169 commits
Select commit Hold shift + click to select a range
b0c3013
ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine…
zhouwg Apr 24, 2024
d325088
ggml: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine…
zhouwg Apr 24, 2024
c75817b
rebase
zhouwg Jun 5, 2024
9c872cb
refine ggml-qnn-ut program and script to make reviewers happy
zhouwg Jun 5, 2024
926a866
review: replace external declaration with NDK header file
zhouwg Jun 5, 2024
dd29834
add supportive of quantize data type Q8_0
zhouwg Jun 6, 2024
f4c5303
review: remove unused QNN helper functions
zhouwg Jun 6, 2024
2fab33d
ggml-qnn: remove static global vars to support multi-instance simulta…
zhouwg Jun 7, 2024
94ee775
review: remove static global vars to support multi-instance simultane…
zhouwg Jun 7, 2024
5d691c6
review: put qnn's internal log inside preprocessor diretive
zhouwg Jun 8, 2024
fdf0272
review: code format using clang-format + manually modification accord…
zhouwg Jun 8, 2024
3e8b61f
review: fix a memory leak introduced by review modification which exp…
zhouwg Jun 9, 2024
d38d4a6
npu: probe htp info and capacity of rpc ion memory
zhouwg Jun 9, 2024
5f8cfe4
ggml-qnn: refine source code of ggml-qnn.cpp to make reviewer more happy
zhouwg Jun 10, 2024
5269e08
ggml-qnn: refine ggml inference using QNN NPU
zhouwg Jun 11, 2024
faaa86b
ggml-qnn: refine ggml inference using QNN NPU
zhouwg Jun 12, 2024
5598fbd
review: make a MVP(Minimum Viable PR) style PR in upstream
zhouwg Jun 13, 2024
5e18cdc
init the test array with const values
chraac Jun 15, 2024
6c68adc
add ggml_qnn_tensor_binder
chraac Jun 14, 2024
37bb926
use tensor wrapper in add
chraac Jun 15, 2024
36e41a1
use tensor wrapper in matmul
chraac Jun 16, 2024
a5679dd
use ggml_qnn_tensor_reader for output tensor
chraac Jun 16, 2024
5fe7b87
use ggml_qnn_tensor_writer for all parameters
chraac Jun 16, 2024
9456bba
rename
chraac Jun 17, 2024
65a14d9
fix todo
chraac Jun 18, 2024
aeef0c6
make the constant condition first
chraac Jun 19, 2024
dfe159f
remove TODO
chraac Jun 19, 2024
9932062
split logger function, tensors and backend from main qnn source
chraac Jun 19, 2024
3c491a3
remove reference of g_qnn_mgr in qnn_instance
chraac Jun 19, 2024
3fe07eb
fix compiling error
chraac Jun 19, 2024
37a1585
rename
chraac Jun 19, 2024
ff0359d
move qnn helper function into utility files
chraac Jun 19, 2024
e1056da
fix op handle checker
chraac Jun 24, 2024
c9e99bd
split qnn ops into file
chraac Jun 24, 2024
3808a4c
Merge branch 'master' into dev-refactoring
chraac Jul 1, 2024
8b677d1
move qnn backend into sub folder
chraac Jul 2, 2024
38f88d5
fix compiling error after merge latest master
chraac Jul 2, 2024
000240c
add clang format file and reformating
chraac Jul 4, 2024
ca0d999
add ggml_qnn_graph
chraac Jul 4, 2024
4b2ee61
move graph map to backend object
chraac Jul 5, 2024
a688ed3
add op param to add_nodes
chraac Jul 5, 2024
13dc3a0
use qnn graph inside add and mul ops
chraac Jul 5, 2024
58cec14
reformat
chraac Jul 5, 2024
0f2e687
move tensor related function to utils
chraac Jul 5, 2024
4b0f6b0
add helper function to get Qnn_TensorType_t from ggml_tensor
chraac Jul 5, 2024
263ffa9
small opt of the qnn graph config init
chraac Jul 5, 2024
874216b
remove unused members
chraac Jul 7, 2024
5f2e391
refactoring ggml_qnn_tensor
chraac Jul 7, 2024
af869fd
fix compiling error in debug build
chraac Jul 9, 2024
a7be069
add log
chraac Jul 9, 2024
9add256
use helper function instead
chraac Jul 9, 2024
dc7d83e
add log
chraac Jul 9, 2024
e97d3a6
fix tensor buffer allocation
chraac Jul 10, 2024
3feb574
merge register_rpc_mem into alloc_rpc_mem
chraac Jul 10, 2024
b49b501
fix sprintf type
chraac Jul 10, 2024
80051cf
remove unused variables
chraac Jul 10, 2024
b6f2927
add function to get graph from cache
chraac Jul 10, 2024
7ea28a6
add helper function for binary op
chraac Jul 10, 2024
8932135
add sqrt and mul ops
chraac Jul 10, 2024
be3aa96
use template function directly
chraac Jul 10, 2024
f0894d8
wip
chraac Jul 12, 2024
0eb595c
use table to simpilify the op mapping
chraac Jul 12, 2024
e3aa43a
suppress warning
chraac Jul 12, 2024
7cbc4fb
add mul
chraac Jul 12, 2024
100ccd5
add unary op template and more ops
chraac Jul 12, 2024
c1e2283
expose op at unit test
chraac Jul 13, 2024
148ceab
add log op
chraac Jul 14, 2024
30b4000
remove unused declarations
chraac Jul 14, 2024
c46b4de
[unit test] init all tensor by one function
chraac Jul 15, 2024
4410fd6
format with clang-format
chraac Jul 15, 2024
cd5a733
add cpu backend as cross reference
chraac Jul 15, 2024
f32327e
remove multiply declearation of log in unit test
chraac Jul 15, 2024
ff601ab
add todo
chraac Jul 15, 2024
0301b50
refactoring: prevent leak the QNN_INTERFACE_VER_TYPE and QNN_SYSTEM_I…
chraac Jul 16, 2024
b1ef302
refactoring: remove depend of dlsym at utils.hpp
chraac Jul 17, 2024
63dc587
refactoring: make the buffer alloc and free stay in same class
chraac Jul 17, 2024
bb13795
refactoring: remove unused functions and variables
chraac Jul 17, 2024
861bb9c
Merge tag 'b3405' into dev-refactoring
chraac Jul 17, 2024
eed9605
add build step of QNN backend at ggml
chraac Jul 17, 2024
454deef
register qnn backend
chraac Jul 17, 2024
2502b57
fix warnings
chraac Jul 17, 2024
b7d781e
remove qnn dedicated unit tests since we're now using the `test-backe…
chraac Jul 17, 2024
6457a68
disable qnn profiling in release build
chraac Jul 17, 2024
c76fc9a
fix warnings
chraac Jul 17, 2024
ce199b2
refactoring: downgrade some log to debug level
chraac Jul 17, 2024
d82b3a0
feat: add GGML_UNARY_OP_GELU
chraac Jul 18, 2024
15f5cc4
bug: fix allocation size overflow at log
chraac Jul 18, 2024
665f823
fix op checker
chraac Jul 18, 2024
ce3d09e
tried fix the add node error 6005
chraac Jul 19, 2024
f45fbec
Revert "tried fix the add node error 6005"
chraac Jul 19, 2024
0153a23
fix support ops
chraac Jul 19, 2024
a607995
Reapply "tried fix the add node error 6005"
chraac Jul 19, 2024
b1b5cc1
add function to convert qnn error into string
chraac Jul 19, 2024
1679dcf
fix: check all dimentions in `can offload`
chraac Jul 19, 2024
28a00e5
fix: try fix QNN_GRAPH_ERROR_INVALID_OP_CONFIG
chraac Jul 20, 2024
2729946
fix: try fix tensor type error
chraac Jul 20, 2024
51f95d6
fix: dimension could be wrong for tensor liked 1x1x8
chraac Jul 20, 2024
5f3b1ae
fix: try fix graph cache with append the tensors name
chraac Jul 20, 2024
b173c4e
feat: update tensor name when bind to graph
chraac Jul 20, 2024
3b47056
refactoring: change the tensor binding mode between qnn tensor and gg…
chraac Jul 22, 2024
706793f
fix: back to qnn tensor v1 to fix the create tensor error
chraac Jul 22, 2024
f843e5a
fix: 1.free up rpc memory at destruct
chraac Jul 22, 2024
ee305cc
refactoring: split qnn rpc buffer into dedicated class
chraac Jul 26, 2024
47735cb
fix: try fix error in 2nd run by appending dimension into graph key
chraac Jul 26, 2024
be9a8c7
fix: suppress warning
chraac Jul 26, 2024
18aa665
refactoring: opt graph key gen
chraac Jul 27, 2024
2c73791
refactoring: remove dup code
chraac Jul 27, 2024
ccfec70
refactoring: remove unused get_rpcmem_from_memhandle func
chraac Jul 27, 2024
867c91b
feat: add error string for QnnOpPackage_Error_t
chraac Jul 27, 2024
5da73f8
refactoring: move forward and supports_op into ops file
chraac Jul 27, 2024
e0c9b34
feat: check if dims equal for add
chraac Jul 27, 2024
8ab1f15
refactoring: remove internal functions, use op table directly
chraac Jul 27, 2024
e33b5c9
refactoring: print the name of unsupport op
chraac Jul 27, 2024
1f9d2a7
refactoring: improve tensor print
chraac Jul 28, 2024
5ecbeb5
Merge branch 'master' into dev-refactoring
chraac Jul 29, 2024
6da8294
refactoring: set the default qnn lib search path at CMakeLists.txt by…
chraac Jul 29, 2024
9a5f802
refactoring: add convient macro to disable copy and move of class
chraac Jul 29, 2024
74eb05a
feat: add ggml_qnn_op_config for handle different op
chraac Jul 29, 2024
6cc7432
Merge remote-tracking branch 'origin/master' into dev-refactoring
chraac Jul 31, 2024
47f6e02
fix: try fix the tensor rank of mul mat
chraac Jul 31, 2024
5ea980d
Merge branch 'master' into dev-refactoring
chraac Aug 5, 2024
dedadf2
Fixed a bug where debug code was included in the release, resulting i…
myan-o Aug 20, 2024
6bee798
Merge branch 'master' into dev-refactoring
chraac Aug 20, 2024
c9be2ba
Merge branch 'master' into dev-refactoring
chraac Aug 30, 2024
67e8af7
Merge branch 'master' into dev-refactoring
chraac Sep 7, 2024
481cb3a
fix compiling error
chraac Sep 7, 2024
b0b75d4
Merge branch 'master' into dev-refactoring
chraac Sep 10, 2024
8e7807e
Merge tag 'b3779' into dev-refactoring
chraac Sep 18, 2024
b7aea04
fix compiling error
chraac Sep 18, 2024
a1ceaae
fix compiling error at older ndk (r23c)
chraac Sep 28, 2024
2ef0904
Merge branch 'master' into dev-refactoring
chraac Sep 30, 2024
1da8a3e
fix compiling error after merge
chraac Sep 30, 2024
8e30038
Merge branch 'master' into dev-refactoring
chraac Oct 7, 2024
181cf52
adapt new register backend interface and fix missing ops
chraac Oct 10, 2024
17cc17e
Merge branch 'master' into dev-refactoring
chraac Oct 11, 2024
f260498
remove unused function
chraac Oct 11, 2024
4abaf7d
feat: fix mulmat (#2)
chraac Oct 28, 2024
c42433c
Merge branch 'master' into dev-refactoring
chraac Oct 28, 2024
5c1e6d4
disable gelu in NPU
chraac Oct 28, 2024
fe565cf
fix compiling error in release
chraac Oct 29, 2024
d963250
Merge branch 'master' into dev-refactoring
chraac Nov 4, 2024
0fec56f
fix compiling error
chraac Nov 4, 2024
8ad86dc
feat: add QNN_OP_TRANSPOSE (#6)
chraac Nov 4, 2024
e6dbdac
feat: fix llama-bench (#7)
chraac Nov 13, 2024
9f62fc9
Merge branch 'master' into dev-refactoring
chraac Nov 13, 2024
a2df09b
[WIP] feat: perf opt (#10)
chraac Nov 28, 2024
5103b16
bugfix: block large tensor calc in npu
chraac Nov 29, 2024
67b183c
Merge branch 'master' into dev-refactoring
chraac Nov 29, 2024
6d4feae
redo conflict changes
chraac Nov 29, 2024
09efaa3
define compile flag as module private
chraac Nov 29, 2024
c5e6549
fix: fix assertion
chraac Nov 29, 2024
cf91253
Merge branch 'master' into dev-refactoring
chraac Dec 3, 2024
0d02ee0
fix int overflow and remove view op to pass unit test
chraac Dec 3, 2024
e36ad89
bugfix: error pre-allocated tensor (k_cache_view-0) (#12)
chraac Dec 11, 2024
6d3267a
Merge branch 'master' into dev-refactoring
chraac Dec 14, 2024
79f124a
add missing op
chraac Dec 14, 2024
8f07b3e
Merge branch 'master' into dev-refactoring
chraac Dec 26, 2024
f2d8d01
[feat] Port ggml graph to QNN graph (#16)
chraac Jan 10, 2025
c410717
Merge branch 'master' into dev-refactoring
chraac Jan 10, 2025
5f93376
fix compiling error after merged
chraac Jan 10, 2025
10bd671
[feat]add more op support (#18)
chraac Jan 18, 2025
3ed9f5b
Merge branch 'master' into dev-refactoring
chraac Jan 18, 2025
34d9b38
Merge branch 'master' into dev-refactoring
chraac Feb 1, 2025
ba324b0
Merge branch 'master' into dev-refactoring
chraac Feb 12, 2025
12c75f1
Merge branch 'master' into dev-refactoring
chraac Feb 13, 2025
a822d00
feat: run on win (#24)
chraac Feb 24, 2025
84328ff
Merge branch 'master' into dev-refactoring
chraac Feb 24, 2025
ff033e1
opt mulmat base on official doc (#25)
chraac Feb 25, 2025
c867641
feat: fix some TODO item in upstream PR #26 (#27)
chraac Feb 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ llama_option_depr(WARNING LLAMA_RPC GGML_RPC)
llama_option_depr(WARNING LLAMA_SYCL GGML_SYCL)
llama_option_depr(WARNING LLAMA_SYCL_F16 GGML_SYCL_F16)
llama_option_depr(WARNING LLAMA_CANN GGML_CANN)
llama_option_depr(WARNING LLAMA_QNN GGML_QNN)

if (NOT MSVC)
if (LLAMA_SANITIZE_THREAD)
Expand Down
2 changes: 2 additions & 0 deletions ggml/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ set (GGML_SYCL_TARGET "INTEL" CACHE STRING
"ggml: sycl target device")
set (GGML_SYCL_DEVICE_ARCH "" CACHE STRING
"ggml: sycl device architecture")
option(GGML_QNN "ggml: use QNN" OFF)

option(GGML_OPENCL "ggml: use OpenCL" OFF)
option(GGML_OPENCL_PROFILING "ggml: use OpenCL profiling (increases overhead)" OFF)
Expand Down Expand Up @@ -253,6 +254,7 @@ set(GGML_PUBLIC_HEADERS
include/ggml-rpc.h
include/ggml-sycl.h
include/ggml-vulkan.h
include/ggml-qnn.h
include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
Expand Down
27 changes: 27 additions & 0 deletions ggml/include/ggml-qnn.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#pragma once

#include "ggml.h"

#include "ggml-backend.h"

#ifdef __cplusplus
extern "C" {
#endif

#define GGML_QNN_NAME "QNN"
#define GGML_QNN_MAX_DEVICES QNN_BACKEND_COUNT

enum QNNBackend {
QNN_BACKEND_CPU = 0,
QNN_BACKEND_GPU,
QNN_BACKEND_NPU,
QNN_BACKEND_COUNT,
};

GGML_API bool ggml_backend_is_qnn(ggml_backend_t backend);

GGML_API ggml_backend_reg_t ggml_backend_qnn_reg(void);

#ifdef __cplusplus
}
#endif
1 change: 1 addition & 0 deletions ggml/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ ggml_add_backend(RPC)
ggml_add_backend(SYCL)
ggml_add_backend(Vulkan)
ggml_add_backend(OpenCL)
ggml_add_backend(QNN)

foreach (target ggml-base ggml)
target_include_directories(${target} PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../include> $<INSTALL_INTERFACE:include>)
Expand Down
7 changes: 7 additions & 0 deletions ggml/src/ggml-backend-reg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@
#include "ggml-kompute.h"
#endif

#ifdef GGML_USE_QNN
#include "ggml-qnn.h"
#endif

// disable C++17 deprecation warning for std::codecvt_utf8
#if defined(__clang__)
# pragma clang diagnostic push
Expand Down Expand Up @@ -180,6 +184,9 @@ struct ggml_backend_registry {
#ifdef GGML_USE_KOMPUTE
register_backend(ggml_backend_kompute_reg());
#endif
#ifdef GGML_USE_QNN
register_backend(ggml_backend_qnn_reg());
#endif
#ifdef GGML_USE_CPU
register_backend(ggml_backend_cpu_reg());
#endif
Expand Down
45 changes: 45 additions & 0 deletions ggml/src/ggml-qnn/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
message(STATUS "Using QNN backend")

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
find_library(LOG_LIB log)
set(QNN_LINK_LIBRARIES ${LOG_LIB})
set(QNN_DEFAULT_LIB_SEARCH_PATH "/data/local/tmp/" CACHE STRING "customized library search path for QNN backend")
elseif(CMAKE_SYSTEM_NAME STREQUAL "Windows" OR CMAKE_SYSTEM_NAME STREQUAL "Linux")
set(QNN_DEFAULT_LIB_SEARCH_PATH "" CACHE STRING "customized library search path for QNN backend")
else()
message(FATAL_ERROR "QNN now only available on Android, Windows and Linux")
endif()

if(NOT DEFINED GGML_QNN_SDK_PATH)
# try read from environment variable
# TODO: create a function to search for the SDK path
if(DEFINED ENV{QNN_SDK_PATH})
set(GGML_QNN_SDK_PATH $ENV{QNN_SDK_PATH})
else()
message(FATAL_ERROR "GGML_QNN_SDK_PATH not defined")
endif()
endif()

message("QNN_SDK_PATH: ${GGML_QNN_SDK_PATH}")

set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3")

file(GLOB QNN_SOURCES "${CMAKE_CURRENT_LIST_DIR}/*.cpp")
ggml_add_backend_library(ggml-qnn
${QNN_SOURCES}
)

target_include_directories(ggml-qnn PRIVATE ${GGML_QNN_SDK_PATH}/include/QNN ${CMAKE_CURRENT_LIST_DIR})
target_link_libraries(ggml-qnn PRIVATE ${QNN_LINK_LIBRARIES})

if(NOT "${QNN_DEFAULT_LIB_SEARCH_PATH}" STREQUAL "")
string(REGEX REPLACE "/$" "" QNN_DEFAULT_LIB_SEARCH_PATH "${QNN_DEFAULT_LIB_SEARCH_PATH}")
endif()

message("GGML_QNN_DEFAULT_LIB_SEARCH_PATH: ${QNN_DEFAULT_LIB_SEARCH_PATH}")
target_compile_definitions(ggml-qnn PRIVATE GGML_QNN_DEFAULT_LIB_SEARCH_PATH="${QNN_DEFAULT_LIB_SEARCH_PATH}")

if(GGML_QNN_ENABLE_CPU_BACKEND)
message("GGML_QNN_ENABLE_CPU_BACKEND is enabled")
target_compile_definitions(ggml-qnn PRIVATE GGML_QNN_ENABLE_CPU_BACKEND)
endif()
Loading