[bug] the NPU backend achives around 1/3 performance of CPU

> > Hi [@TerryT9](https://github.com/TerryT9), I encountered the same issue and I'm not sure how to resolve it. Could you share how you solved it?
> 
> Hi [@Gianthard-cyh](https://github.com/Gianthard-cyh) , would you mind to say which model you were using?
> 
> I'm using Llama-3.2-1B-Instruct-f16.gguf. By the way, I run the model successfully after increasing the size limit of MUL_MAT op and changing the precision option of NPU (the execution of Convert op will fail without this) .
> 
> However, as stated in previous issues, the NPU backend achives around 1/3 performance of CPU. I think more profiling and optimizing work could be done. I'm happy to help with that.
> 
> My device is Oneplus Ace 3 with Snapdragon 8 Gen 2.
> 
> ```patch
> --- a/ggml/src/ggml-qnn/graph.cpp
> +++ b/ggml/src/ggml-qnn/graph.cpp
> @@ -192,8 +192,15 @@ qnn_graph::qnn_graph(const std::string &graph_name, QNNBackend device, std::shar
>          graph_vtcm_config.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
>          graph_vtcm_config.customConfig = &vtcm_config;
> 
> +        QnnHtpGraph_CustomConfig_t precision_config;
> +        precision_config.option = QNN_HTP_GRAPH_CONFIG_OPTION_PRECISION;
> +        precision_config.precision = QNN_PRECISION_FLOAT16;
> +        QnnGraph_Config_t graph_precision_config;
> +        graph_precision_config.option = QNN_GRAPH_CONFIG_OPTION_CUSTOM;
> +        graph_precision_config.customConfig = &precision_config;
> +
>          const QnnGraph_Config_t *graph_configs[] = {&graph_hvx_config, &graph_dlbc_config, &graph_vtcm_config,
> -                                                    &graph_opt_config, nullptr};
> +                                                    &graph_opt_config, &graph_precision_config, nullptr};
>          error = qnn_interface->qnn_graph_create(qnn_context, graph_name.c_str(), graph_configs, &graph_handle);
>      } else {
>          error = qnn_interface->qnn_graph_create(qnn_context, graph_name.c_str(), nullptr, &graph_handle);
> 
> ``` 

 _Originally posted by @Gianthard-cyh in [#20](https://github.com/chraac/llama.cpp/issues/20#issuecomment-2694315145)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] the NPU backend achives around 1/3 performance of CPU #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[bug] the NPU backend achives around 1/3 performance of CPU #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions