技术分享 | RK182X 如何在 RK3588 上进行应用测试

启扬ARM嵌入式开发 2026-06-04 134

描述

过去两年，我们在大模型上的焦虑往往源于“连接”——网络卡顿、数据泄露、按量付费的账单。但瑞芯微在2025年三季度发布的RK182X，或许能让我们松一口气。

这颗全球首款3D封装端侧大模型协处理器，首次在本地实现了7B参数模型的流畅运行（近100 Tokens/s），且能效比提升了6倍。目前，它已悄然导入数百个行业项目，商业化落地速度远超预期。

RK182X的出现，印证了一个判断：AI计算正在经历“去中心化”。

端侧AI并非云端的附庸。就像PC没有因为互联网消失一样，本地算力因其“实时响应（低延迟）、数据免疫（高隐私）、一次购买（低成本）”的三重优势，正在构建一个独立于云端的庞大生态。未来的智能世界，将是云边端协同共生的全新时代。

01
开发框架

下图展示了 RK1820/RK1828 平台的开发框架：

RK3588

PC 端开发环境，包含 RKNN3 Toolkit，RKNN3 Model Zoo 等。 RK3588

RKNN3 Toolkit：是一个软件开发工具包，可将 PyTorch、ONNX 等深度学习框架训练的模型转换为 RKNN 格式，支持用户进行模型转换、推理和性能评估；

RKNN3 Model Zoo：提供了丰富的模型转换示例，涵盖多种 AI 模型类型。

板端开发环境，包含 RKNN3 Runtime，AI 应用 Examples，工具集合，驱动等 RK3588

RKNN3 Runtime：模型转换完成后，可在开发板上使用RKNN3 API 加载和运行 RKNN 模型。除 RKNN3 API 外，还支持 OpenAI 兼容 API 调用 LLM 模型；

Examples：结合实际应用，提供模型应用参考示例；

工具集合：如用于调试的 RKNN-SMI 和 RKNN Console；。

驱动：提供 RK1820/RK1828 PCIe EP 驱动。

RK1820/RK1828 协处理器：提供固件，通过 PCIe/USB 高速接口与主控 SoC 互联

02
支持的平台

SDK版本	Host平台	Device平台（通信接口）
Alpha V0.0.1	RK3588 RK3576	RK1820/RK1828（PCIe）
Alpha V0.0.2	RK3588 RK3576	RK1820/RK1828（PCIe）
Beta V0.4.0	RK3588 RK3576	RK1820/RK1828（PCIe/USB）
Release V1.0.0	RK3588 RK3576	RK1820/RK1828（PCIe/USB）

本文档中默认使用 Release V1.0.0，如需其他版本，请联系我司。

03
快速开始

本文提供基于启扬 IAC-RK3588-KIT与 RK1828 的完整运行环境，并配套预转换的 RKNN 模型，用户无需编译 SDK 即可高效完成模型推理验证。

RK3588

3.1 准备工作

3.1.1 硬件准备

Host：IAC-RK3588-MB,IAC-RK3588-CM

Device：M.2 模组RK1828

1.IAC-RK3588-MB,IAC-RK3588-CM实物：

RK3588

2.M.2 模组RK1828：

RK3588

硬件连接

M.2 模组RK1828+IAC-RK3588-MB,IAC-RK3588-CM组合示意图：

接在底板J11 SSD的位置

RK3588

注意：RK1828需要外部供电12V，上电顺序为RK1828再RK3588。

RK3588

确认 RK1820/RK1828 连接状态：在 RK3588 EVB10 上检查设备连接状态。

root@linaro-alip:/# lspci000200.0 Processing accelerators: Rockchip Electronics Co., Ltd Device 182a (rev 01)

3.1.2 模型准备

可从网盘下载需要快速验证的模型。

网盘链接：https://console.box.lenovo.com/l/H1fig1提取码: rknn

模型位置位于：RKNN3_SDK/rknn3_models/v1.0.0

常用模型

类型	路径
LLM 模型	RKNN3_SDK/rknn3_models/v1.0.0/Qwen2.5-instruct-0.5B
CNN 模型	RKNN3_SDK/rknn3_models/v1.0.0/MobilenetV2
VLM 模型	RKNN3_SDK/rknn3_models/v1.0.0/InternVL3_2B

3.2 模型测试

3.2.1 CNN模型

CNN 模型完整验证（以 MobilenetV2 为例，模型位置）：

# PC 端执行# 将模型push到板端adb push MobilenetV2 /userdata/

模型运行：

# 如果当前非root用户，建议切换到root用户执行[ $(whoami) != "root" ] && sudo su -# 模型推理命令cd /userdata/MobilenetV2rknn3_model_test mobilenetv2-12.rknn mobilenetv2-12.weight '' '' 0x01 10

以下日志输出表明模型已成功完成推理：

root@linaro-alip:/userdata/MobilenetV2# rknn3_model_test mobilenetv2-12.rknn mobilenetv2-12.weight '' '' 0x10 10Input paths or golden output paths not provided, using random input and skipping golden comparisonFound 1 RK182X devicesDevice 0: transfer_type=PCIE, id=000200.0Info: Only one device found (id=000200.0), init_extend can be NULLrknn3_init success, cost 561.752 msDevice Memory Info: total=19 MB, free=18 MBNode 0: total=639 MB, free=639 MBNode 1: total=639 MB, free=639 MBNode 2: total=619 MB, free=619 MBNode 3: total=639 MB, free=639 MBNode 4: total=619 MB, free=619 MBNode 5: total=619 MB, free=619 MBNode 6: total=639 MB, free=639 MBNode 7: total=639 MB, free=639 MBmodel_len=63680, model=0x5749c80weight_len=3714048, weight_data=0x7fba5d4010rknn3_load_model_from_data success, cost 2.577 msCore number: 1model_config:0xf8c0rknn3_model_init success, cost 27.672 msinput tensors:name=input, n_dims=4, shape=[1, 224, 224, 3], stride=[150528, 672, 3, 1], aligned_size=150528, layout=NHWC, dtype=UINT8, qnt_type=PER_LAYER_ASYMMETRIC, scale=0.01866, zero_point=-14output tensors:name=output, n_dims=2, shape=[1, 1000], stride=[1000, 1], aligned_size=1024, layout=UNDEFINED, dtype=INT8, qnt_type=PER_LAYER_ASYMMETRIC, scale=0.14192, zero_point=-55Generating random input data for input 0 (name: input)Input 0 size check passed: 150528 elementsInput[0] first values (type: 3):[0] UINT8: 103[1] UINT8: 198[2] UINT8: 105[3] UINT8: 115[4] UINT8: 81[5] UINT8: 255[6] UINT8: 74[7] UINT8: 236[8] UINT8: 41[9] UINT8: 205Running model 10 times...Syncing input[0] to device...Syncing output[0] from device...loop: 1/10, model inference cost 4.022 msSyncing input[0] to device...Syncing output[0] from device...loop: 2/10, model inference cost 3.944 msSyncing input[0] to device...Syncing output[0] from device...loop: 3/10, model inference cost 3.954 msSyncing input[0] to device...Syncing output[0] from device...loop: 4/10, model inference cost 4.141 msSyncing input[0] to device...Syncing output[0] from device...loop: 5/10, model inference cost 4.481 msSyncing input[0] to device...Syncing output[0] from device...loop: 6/10, model inference cost 4.598 msSyncing input[0] to device...Syncing output[0] from device...loop: 7/10, model inference cost 4.440 msSyncing input[0] to device...Syncing output[0] from device...loop: 8/10, model inference cost 4.806 msSyncing input[0] to device...Syncing output[0] from device...loop: 9/10, model inference cost 5.273 msSyncing input[0] to device...Syncing output[0] from device...loop: 10/10, model inference cost 5.293 msAll 10 loops completed successfullyrknn3_destroy success, cost 1.809 ms

3.2.2 LLM模型

LLM类模型验证（以qwen2.5_0.5b_instruct为例，模型位置）：

# PC 端执行# 将模型push到板端adb push Qwen2.5-0.5B /userdata/

模型运行：

# 如果当前非root用户，建议切换到root用户执行[ $(whoami) != "root" ] && sudo su -# 模型推理命令cd /userdata/Qwen2.5-0.5Brknn3_session_test Qwen2.5-0.5B.rknn Qwen2.5-0.5B.weight Qwen2.5-0.5B.tokenizer.gguf Qwen2.5-0.5B.embed.bin 1024 256 0xff

LLM模型能够正确对话，表示模型推理成功，如下日志：

root@linaro-alip:/userdata/Qwen2.5-0.5B#rknn3_session_test Qwen2.5-0.5B.rknn Qwen2.5-0.5B.weight Qwen2.50.5B.tokenizer.gguf Qwen2.5-0.5B.embed.bin 1024 256 0xff*******************************NEW TEST**********************************llama_model_loader: loaded meta data with 23 key-value pairs and 0 tensors from Qwen2.5-0.5B.tokenizer.gguf (version GGUF V3 (latest))llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.llama_model_loader: - kv 0: general.architecture str = qwen2llama_model_loader: - kv 1: general.type str = modelllama_model_loader: - kv 2: general.name str = Grqllama_model_loader: - kv 3: qwen2.block_count u32 = 24llama_model_loader: - kv 4: qwen2.context_length u32 = 32768llama_model_loader: - kv 5: qwen2.embedding_length u32 = 896llama_model_loader: - kv 6: qwen2.feed_forward_length u32 = 4864llama_model_loader: - kv 7: qwen2.attention.head_count u32 = 14llama_model_loader: - kv 8: qwen2.attention.head_count_kv u32 = 2llama_model_loader: - kv 9: qwen2.rope.freq_base f32 = 1000000.000000llama_model_loader: - kv 10: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001llama_model_loader: - kv 11: general.file_type u32 = 1llama_model_loader: - kv 12: general.quantization_version u32 = 2llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2llama_model_loader: - kv 14: tokenizer.ggml.pre str = qwen2llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 151645llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 151643llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 151643llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = falsellama_model_loader: - kv 22: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...init_tokenizer: initializing tokenizer for type 2load: control token: 151660 '<|fim_middle|>' is not marked as EOGload: control token: 151659 '<|fim_prefix|>' is not marked as EOGload: control token: 151653 '<|vision_end|>' is not marked as EOGload: control token: 151648 '<|box_start|>' is not marked as EOGload: control token: 151646 '<|object_ref_start|>' is not marked as EOGload: control token: 151649 '<|box_end|>' is not marked as EOGload: control token: 151655 '<|image_pad|>' is not marked as EOGload: control token: 151651 '<|quad_end|>' is not marked as EOGload: control token: 151647 '<|object_ref_end|>' is not marked as EOGload: control token: 151652 '<|vision_start|>' is not marked as EOGload: control token: 151654 '<|vision_pad|>' is not marked as EOGload: control token: 151656 '<|video_pad|>' is not marked as EOGload: control token: 151644 '<|im_start|>' is not marked as EOGload: control token: 151661 '<|fim_suffix|>' is not marked as EOGload: control token: 151650 '<|quad_start|>' is not marked as EOGload: printing all EOG tokens - 151643 ('<|endoftext|>')load: - 151645 ('<|im_end|>')load: - 151662 ('<|fim_pad|>')load: - 151663 ('<|repo_name|>')load: - 151664 ('<|file_sep|>')load: special tokens cache size = 22load: token to piece cache size = 0.9310 MBprint_info: vocab type = BPEprint_info: n_vocab = 151936print_info: n_merges = 151387print_info: BOS token = 151643 '<|endoftext|>'print_info: EOS token = 151645 '<|im_end|>'print_info: EOT token = 151645 '<|im_end|>'print_info: PAD token = 151643 '<|endoftext|>'print_info: LF token = 198 'Ċ'print_info: FIM PRE token = 151659 '<|fim_prefix|>'print_info: FIM SUF token = 151661 '<|fim_suffix|>'print_info: FIM MID token = 151660 '<|fim_middle|>'print_info: FIM PAD token = 151662 '<|fim_pad|>'print_info: FIM REP token = 151663 '<|repo_name|>'print_info: FIM SEP token = 151664 '<|file_sep|>'print_info: EOG token = 151643 '<|endoftext|>'print_info: EOG token = 151645 '<|im_end|>'print_info: EOG token = 151662 '<|fim_pad|>'print_info: EOG token = 151663 '<|repo_name|>'print_info: EOG token = 151664 '<|file_sep|>'print_info: max token length = 256Warning: max_context_len (1024) is less than llm_config.max_ctx_len (2048).It's recommended to set to 2048.=============================================================Model Config=============================================================Max Context Length : 2048Max Position Embeddings : 8192Model Type :Task Type : RKNN3_LLM_TASK_GENERATE=============================================================--------------------Input[0]--------------------请解释一下相对论的基本概念。--------------------Output----------------------相对论是爱因斯坦在20世纪初提出的物理学理论，它改变了我们对时间、空间和物质的理解。以下是相对论的基本概念：1. **狭义相对论**（Special Relativity）：- **光速不变原理**：所有惯性参考系中的光速都是一样的，无论光源的运动速度如何。- **相对性原理**：在不同的惯性参考系中，物理定律是相同的。这意味着时间、空间和长度都是相对的。2. **广义相对论**（General Relativity）：- **引力场**：物质和能量通过引力场影响时空，使物体运动轨迹发生弯曲。- **时空曲率**：时空中的曲线反映了物质分布和引力场的强度。这种曲率与物质的质量成正比。3. **时间膨胀**（Time Dilation）：- 物质在加速时会缩短时间，即时间流逝得更快。4. **空间膨胀**（Space Dilation）：- 物质在加速时也会导致空间的弯曲，使距离物体更远的地方看起来变长。5. **引力红移**：由于物质和能量的影响，光谱线向--------------Max new token reached-------------Performance Statistics:-----------------------------------------------------------------------------------------Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second-----------------------------------------------------------------------------------------Prefill | 52.69 | 48 | 1.10 | 910.99Generate | 1385.01 | 255 | 5.43 | 184.11-------------------------------------------------------------------------------------------------------------Input[1]--------------------Please explain the basic concept of relativity.--------------------Output----------------------Relativity is a set of theories developed by Albert Einstein, which fundamentally changed our understanding of time, space, and the nature of physical objects. Here are the key concepts of relativity:1. **Special Relativity** (Special Relativity):- **Einstein's first law**: The laws of physics are the same for all observers in uniform motion relative to each other.- **Einstein's second law**: The speed of light in a vacuum is constant, regardless of the motion of the light source or observer.2. **General Relativity** (General Relativity):- **Gravitational Field**: The gravitational field is caused by mass and energy. This field affects the curvature of spacetime.- **Curvature of Space-Time**: The geometry of space-time changes when objects are accelerated, leading to phenomena like time dilation and length contraction.3. **Time Dilation** (Time Dilation):- When an object accelerates, its clocks run slower compared to a stationary observer. This effect is known as time dilation.4. **Space Dilation** (Space Dilation):- The same principle applies when objects are accelerated in space. Objects moving at high speeds will appear to move faster than they actually do, due to--------------Max new token reached-------------Performance Statistics:-----------------------------------------------------------------------------------------Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second-----------------------------------------------------------------------------------------Prefill | 38.39 | 17 | 2.26 | 442.81Generate | 1398.52 | 255 | 5.48 | 182.34-----------------------------------------------------------------------------------------*******************************END TEST**********************************

另可参考 docs/Examples 目录下的应用文档集成应用进行模型测试

打开APP阅读更多精彩内容