版本：2.8.1

AI

此文档详细介绍如何使用 TORQ-Toolkit 完成模型转换，并将转换后的模型部署到知合计算系列芯片上。

TORQ-Toolkit 适用于 A200/A210 硬件平台。

开发环境准备

安装 TORQ-Toolkit

TORQ-Toolkit 目前支持 x86 版本。TORQ-Toolkit 用户可直接使用 Docker 方式部署开发环境。

安装 Docker

说明： 已安装 Docker 工具的用户可跳过此步骤。

安装 Docker 工具。请根据官方手册进行安装。更多信息请参考 Docker 官方教程 Docker官方文档。

将用户添加到 Docker 用户组。

# 创建docker用户组
sudo groupadd docker

# 把当前用户加入docker用户组
sudo usermod -aG docker $USER

# 更新激活docker用户组
newgrp docker

# 验证不需要sudo执行docker命令
docker run hello-world  

返回结果如下。

Unable to find image 'hello-world:latest' locally 
latest: Pulling from library/hello-world 
719385e32844: Pull complete

Digest:  sha256:88ec0acaa3ec199d3b7eaf73588f4518c25f9d34f58ce9a0df68429c5af48e8d 
Status: Downloaded newer image for hello-world:latest
Hello from Docker!  

启动 TORQ-Toolkit 镜像

镜像准备。下载并加载 TORQ-Toolkit Docker 镜像。获取新版本请查看 Artifactory。示例图片

# 镜像准备示例命令如下
wget http://developer.zhcomputing.com/artifactory/generic-local/docker/torq_toolkit_v1.6.0.tar.xz
xz -dc torq_toolkit_v1.6.0.tar.xz | docker load

# 查看 docker 镜像信息
docker images

TORQ-Toolkit 镜像信息显示示例如下图所示。

REPOSITORY        TAG         IMAGE ID       CREATED         SIZE
torq_toolkit      v1.6.0      be61db9cefaa   1 hours ago     16.4GB

运行 Docker 镜像，运行后将进入镜像的 bash 环境。

# 运行 TORQ-Toolkit 镜像
docker run -it torq_toolkit:v1.6.0 /bin/bash

查看 TORQ-Toolkit 版本。

# 查看 TORQ-Toolkit 版本
pip3 show torq_toolkit

示例输出如下：

Name: torq-toolkit
Version: 1.6.0
Summary: A toolkit for model conversion, optimization and quantization
Home-page: https://github.com/username/torq-toolkit
Author: TORQ Team
Author-email: torq-support@example.com
License:
Location: /usr/local/lib/python3.12/dist-packages
Requires: numpy, pyyaml, torch, tqdm
Required-by:

设备端 NPU 环境准备

开发板镜像烧录请参考 基础

板端 ssh 用户信息：

用户名：zhihe
密码：zhihe
用户名： root
密码：无

NPU 驱动版本确认

查询 NPU 驱动版本。

注意：

如驱动版本较低，请烧录最新版本镜像。

# A210
dmesg | grep -i VIPLite

# 返回结果如下
VIPLite driver version 2.1.3.0

# A200
dmesg | grep -i vha_plat_probe

# 返回结果如下
vha_plat_probe: Version VHA DT driver version : REL_3.8-c16140200

TORQ Runtime 库安装和更新

TORQ Runtime 库名为 libtorqrt_<TARGET_PLATFORM>.so，为用于板端的 Runtime 库。

注意：

请确保 TORQ Runtime 库和 TORQ-Toolkit 版本一致且均为最新。

查询 libtorqrt_<TARGET_PLATFORM>.so 库版本。

# 查询 libtorqrt.so 库版本, 以 A210 为例：
strings /usr/lib/riscv64-linux-gnu/libtorqrt_a210.so | grep -i "torq runtime"

# 显示 libtorqrt 库版本为 torq runtime version: 1.6.0 (Jan 19 2026 09:39:39)

若 libtorqrt_<TARGET_PLATFORM>.so 库版本与 TORQ 版本不一致，需更新至同一版本。
```
# 以 A210 为例：
scp 3rdparty/torq/linux-a210/lib/libtorqrt_a210.so root@192.168.0.23:/usr/lib/riscv64-linux-gnu 
```
说明:

若 libtorqrt_a210.so 库文件已安装在其他目录中。如 /mnt/lib ，也可设置动态库搜索的环境变量指定。
```
export LD_LIBRARY_PATH=/mnt/lib:$LD_LIBRARY_PATH
```

TORQ LLM Runtime 库安装和更新

TORQ LLM Runtime 库名为 libtllm_a210.so，为用于板端的 Runtime 库。

注意：

请确保 TORQ LLM Runtime 库和 TORQ-Toolkit 版本一致且均为最新。

TORQ LLM Runtime 库仅适用于 A210 硬件平台。

查询 libtllm_a210.so 库版本。

# 查询 libtllm_a210.so 库版本
strings /usr/lib/riscv64-linux-gnu/libtllm_a210.so | grep -i "tllm runtime"

# 显示 libtllm_a210 库版本为 tllm runtime version: 1.6.0 (Jan 19 2026 09:50:44)

若 libtllm_a210.so 库版本与 TORQ 版本不一致，需更新至同一版本。
```
scp 3rdparty/torq/linux-a210/lib/libtllm_a210.so  root@192.168.0.23:/usr/lib/riscv64-linux-gnu 
```
说明：

若 libtllm_a210.so 库文件已安装在其他目录中。如 /mnt/lib，也可设置动态库搜索的环境变量指定。
```
export LD_LIBRARY_PATH=/mnt/lib:$LD_LIBRARY_PATH
```

示例

TORQ 提供了不同模型的参考示例，包括 MobileNet 图像分类、 YOLOv5 目标检测等。

torq-model-zoo 已开源至 gitee 。

# 下载 torq-model-zoo 示例代码工程
git clone https://gitee.com/zhcomputing/torq-model-zoo.git

# 下载相关依赖库
cd torq-model-zoo
./scripts/download_ucann.sh

MobileNet 模型部署示例

本章节以 MobileNet 模型部署为例，介绍如何快速上手模型转换和模型板端部署。

准备模型

运行下载脚本。下载结果为mobilenetv2-12.onnx文件。

# 进入 examples/mobilenet/model 目录
cd examples/mobilenet/model 

./download_model.sh

模型转换

执行模型转换。转换后的模型默认保存路径为../model/mobilenet_v2.torq 。

# 进入 examples/mobilenet/python 目录
cd ../python  

python3 convert.py --model ../model/mobilenetv2-12.onnx --target a210 --dtype u8

板端部署

编译模型相关文件。

# 返回到 torq_model_zoo 根目录
cd ../../../

./build-linux.sh -t a210 -d mobilenet

编译产物：./install/a210_linux/torq_mobilenet_demo

拷贝可执行文件到板端。

# 关于 <TARGET_PLATFORM> 的具体取值，请参考torq.config()
scp -r ./install/a210_linux/torq_mobilenet_demo root@<IP_ADDRESS>:

# 示例代码如下
scp -r ./install/a210_linux/torq_mobilenet_demo root@192.168.0.23:

登录板端，在板端执行以下命令。

./torq_mobilenet_demo model/mobilenet_v2.torq model/bell.jpg

输出结果如下：

[494] score=0.991719 class=n03017168 chime, bell, gong
[469] score=0.003797 class=n02939185 caldron, cauldron
[442] score=0.001032 class=n02825657 bell cote, bell cot
[577] score=0.000643 class=n03447721 gong, tam-tam
[406] score=0.000451 class=n02699494 altar

Yolov5 模型部署示例

本章节以 Yolov5 模型部署为例，介绍如何快速上手模型转换和模型板端部署。

准备模型

运行下载脚本。下载结果为yolov5s_relu.onnx文件。

# 进入 examples/yolov5/model 目录
cd examples/yolov5/model

./download_model.sh

模型转换

执行模型转换。转换后的模型默认保存路径为../model/yolov5.torq。

# 进入 examples/yolov5/python 目录
cd ../python

python3 convert.py --model ../model/yolov5s_relu.onnx --target a210 --dtype u8

板端部署

编译模型相关文件。编译产物：./install/a210_linux/torq_yolov5_demo。

# 返回到 torq_model_zoo 根目录
cd ../../../

./build-linux.sh -t a210 -d yolov5

拷贝可执行文件到板端。

# 关于 <TARGET_PLATFORM> 的具体取值，请参考torq.config()
scp -r ./install/<TARGET_PLATFORM>_linux/torq_yolov5_demo root@<IP_ADDRESS>:

#示例代码如下
scp -r ./install/a210_linux/torq_yolov5_demo root@192.168.0.23:

登录板端，在板端执行以下命令。

./torq_yolov5_demo model/yolov5.torq model/bus.jpg

本示例将打印测试图像检测结果的标签及其对应的分数，如下所示：

person @ (210 239 286 513) 0.871
person @ (483 236 559 528) 0.862
person @ (107 233 233 536) 0.859
bus @ (89 128 553 463) 0.610
person @ (79 356 119 518) 0.183

示例图片

说明： 不同平台、不同版本的工具和驱动程序可能会有略微不同的结果。

Qwen3-VL 模型部署示例

Qwen3-VL 为视觉语言模型，支持图像和文本的多模态输入处理。本章节以 Qwen3-VL 模型部署为例，介绍如何在 TORQ 平台上部署和运行 Qwen3-VL-2B-Instruct 模型。

模型准备

执行脚本下载 TORQ 模型，该模型可适配 TORQ 推理引擎。

说明：

用户也可自行编译 TORQ 模型文件。若需自行编译 TORQ 模型文件，请查看 torq-model-zoo/examples/qwen3-vl/README.md。

cd examples/qwen3-vl/model
./download_prebuild_model.sh

# 下载结果
drwxr-xr-x 4 319202364 319200513 4.0K Nov 14 07:34 qwen3-vl-2b-8bit

Python 应用部署与推理

编译底层接口库。编译后在当前目录生成产物： ./libtorq_vlm_api.so。
```
cd examples/qwen3-vl/application
make
```
拷贝以下文件至板端。
- 应用程序，对应的底层库及测试图片。本节中选用 person.jpg 作为测试图片。
```
#示例代码如下，关于<board_ip> 及 work_path 的具体值，请根据实际板端 IP/路径替换

cd examples/qwen3-vl/application
scp demo.py torq_vlm_api.py libtorq_vlm_api.so requirements.txt person.jpg root@<board_ip>:/work_path/
```
  注意：
  
  torq_vlm_api.py 通过 ctypes.CDLL("./libtorq_vlm_api.so", ...) 加载动态库。由于使用相对路径 "./"，运行时应确保当前目录下存在 libtorq_vlm_api.so。用户也可改用绝对路径以适应不同的目录结构。
- 预编译模型文件。分别为 examples/qwen3-vl/model/qwen3-vl-2b-8bit/vit 目录和 examples/qwen3-vl/model/qwen3-vl-2b-8bit/lm 目录。

登录板端，安装依赖。

 apt update
 apt-mark unhold openssl
 apt install -y libopenblas-dev liblapack-dev libcjson1 python3-pip libzmq5 libzmq3-dev libjpeg-dev libjpeg-turbo-progs libffi-dev nginx libsndfile1
 pip3 install -r requirements.txt -i http://developer.zhcomputing.com/pypi/private/+simple/ --trusted-host developer.zhcomputing.com --break-system-packages

进入板端，从 HuggingFace 中下载 Qwen3-VL-2B-Instruct 除权重文件外所有配置文件。

在板端执行以下命令。

python3 demo.py \
  --vit_path qwen3-vl-2b-8bit/vit \
  --llm_path qwen3-vl-2b-8bit/lm \
  --hf_config_path Qwen3-VL-2B-Instruct/ \
  --image_path person.jpg

# 预期结果

Preprocess time: 0.6448 seconds

This is a photograph of a woman in a wheelchair being pushed by a younger woman, likely a caregiver or nurse, in a  park setting. The scene is set on a paved path with a red running track visible on the left. The woman in the    wheelchair is wearing a brown cardigan and light-colored pants, and she is smiling. The caregiver is wearing a     light blue shirt and has her hair tied back. The background features lush green trees and a few palm trees, with a  clear blue sky above

Execution time: 7.7358 seconds

C 代码应用部署与推理

交叉编译。编译后在当前目录生成产物： qwen3_vl_exec.elf。
```
# 编译
cd examples/qwen3-vl/cpp

# A210 平台编译
make
```
拷贝以下文件至板端。
- 可执行文件和预处理后的二进制文件（examples/qwen3-vl/cpp/pv.bin）。
```
# 关于<board_ip> 及 work_path 的具体值，请根据实际板端 IP/路径替换
scp qwen3_vl_exec.elf pv.bin user@<board_ip>:/work_path/
```
- 预编译模型文件。分别为 examples/qwen3-vl/model/qwen3-vl-2b-8bit/vit 目录和 examples/qwen3-vl/model/qwen3-vl-2b-8bit/lm 目录。

登录板端，在板端执行以下命令。本 demo 的对话内容为描述测试图片。程序将输出首 token 延迟、采样的下一个 token 及逐 token 解码耗时等信息。

./qwen3_vl_exec.elf <vit_model_dir> <lm_model_dir> <img_input>

# 示例
./qwen3_vl_exec.elf ./qwen3-vl-2b-8bit/vit ./qwen3-vl-2b-8bit/lm pv.bin

参数	说明
`<vit_model_dir>`	ViT 模型目录，例如 `./qwen3-vl-2b-8bit/vit`
`<lm_model_dir>`	LLM 模型目录，例如 `./qwen3-vl-2b-8bit/lm`
`<img_input>`	输入图像的预处理后的二进制文件路径，例如 `./cpp/pv.bin`

#预期结果

[TTFT] Time to first token: 1097.6460 ms, 153.9659 token/s
Next id: 1986 = 33.562500
Position 169 Time: 65.115000 ms
Next id: 374 = 30.656250

注意：

不同平台、不同版本的工具和驱动程序可能会存在差异。

模型输出质量与速度会受硬件配置与量化设置影响。

Qwen3 模型部署示例

本章节介绍如何在 TORQ 平台上部署和运行 Qwen3 模型。

模型准备

执行脚本下载 TORQ 模型，该模型可适配 TORQ 推理引擎。

说明：

用户也可自行编译 TORQ 模型文件。若需自行编译 TORQ 模型文件，请查看 torq-model-zoo/examples/qwen3/README.md。

cd examples/qwen3/model
./download_prebuild_model.sh

# 下载结果
drwx------ 4 root root 4.0K Jan 20 08:34 qwen3-4b_seq256_4bit

板端应用部署与推理

交叉编译。编译后在当前目录生成产物： chat.elf。
```
# 编译
cd examples/qwen3/cpp

# A210 平台编译
make
```

拷贝以下文件至板端。

文件	说明
`examples/qwen3/cpp/chat.elf`	可执行文件
`examples/qwen3/cpp/vocab.json`	Qwen3 模型的词汇表文件，用于将 token ID 和实际文本进行转换
`examples/qwen3/model/qwen3-4b_seq256_4bit`	预编译模型文件

#示例命令
scp -r ../cpp/qwen3-4b_seq256_4bit user@<board_ip>:/path/qwen3/
scp -r ../model/chat.elf vocab.json user@<board_ip>:/path/qwen3/

登录板端，在板端执行以下命令。程序将输出首 token 延迟、采样的下一个 token 及逐 token 解码耗时等信息。

./chat.elf model_path vocab_path [--prompt <text>] [options]

# 示例
./chat.elf qwen3-4b_seq256_4bit vocab.json --prompt "who are you" -n 50

参数	说明
`model_path`	TORQ 模型目录
`vocab_path`	词汇表文件路径，例如 `vocab.json`
`prompt`	输入提示文本，例如 `"who are you"`
`n`	生成 token 数量，默认 20

# 预期结果
prefill input token size: 28

[TTFT] Time to first token: 722.3100 ms, 38.7645 token/s
Next id: 358 = 16.921875
Position 28 Time: 108.453000 ms
Next id: 1079 = 22.375000
Position 29 Time: 101.981000 ms
Next id: 264 = 20.218750
Position 30 Time: 101.701000 ms
Next id: 3460 = 26.359375
...

[TPOT] Time per output token: 101.9099 ms, 9.8126 token/s
<|system|>You are a helpful assistant<|system_end|><|user|>who are you?<|user_end|> I am a large-scale language     model developed by Alibaba Cloud\'s Tongyi Lab. My name is Qwen. I can assist you in various ways, such as   answering questions, creating text, and solving problems. If you have any questions or need help.

注意：

不同平台、不同版本的工具和驱动程序可能会存在差异。

模型输出质量与速度会受硬件配置与量化设置影响。

开发环境准备​

安装 TORQ-Toolkit​

安装 Docker​

启动 TORQ-Toolkit 镜像​

设备端 NPU 环境准备​

NPU 驱动版本确认​

TORQ Runtime 库安装和更新​

TORQ LLM Runtime 库安装和更新​

示例​

MobileNet 模型部署示例​

准备模型​

模型转换​

板端部署​

Yolov5 模型部署示例​

准备模型​

模型转换​

板端部署​

Qwen3-VL 模型部署示例​

模型准备​

Python 应用部署与推理​

C 代码应用部署与推理​

Qwen3 模型部署示例​

模型准备​

板端应用部署与推理​

开发环境准备

安装 TORQ-Toolkit

安装 Docker

启动 TORQ-Toolkit 镜像

设备端 NPU 环境准备

NPU 驱动版本确认

TORQ Runtime 库安装和更新

TORQ LLM Runtime 库安装和更新

示例

MobileNet 模型部署示例

准备模型

模型转换

板端部署

Yolov5 模型部署示例

准备模型

模型转换

板端部署

Qwen3-VL 模型部署示例

模型准备

Python 应用部署与推理

C 代码应用部署与推理

Qwen3 模型部署示例

模型准备

板端应用部署与推理