技术分享 | RK182X 如何在 RK3588 上进行应用测试

描述

 

过去两年,我们在大模型上的焦虑往往源于“连接”——网络卡顿、数据泄露、按量付费的账单。但瑞芯微在2025年三季度发布的RK182X,或许能让我们松一口气。

这颗全球首款3D封装端侧大模型协处理器,首次在本地实现了7B参数模型的流畅运行(近100 Tokens/s),且能效比提升了6倍。目前,它已悄然导入数百个行业项目,商业化落地速度远超预期。

RK182X的出现,印证了一个判断:AI计算正在经历“去中心化”。

端侧AI并非云端的附庸。就像PC没有因为互联网消失一样,本地算力因其“实时响应(低延迟)、数据免疫(高隐私)、一次购买(低成本)”的三重优势,正在构建一个独立于云端的庞大生态。未来的智能世界,将是云边端协同共生的全新时代。



01
开发框架
 

下图展示了 RK1820/RK1828 平台的开发框架:

RK3588


 


PC 端开发环境,包含 RKNN3 Toolkit,RKNN3 Model Zoo 等。RK3588


 

RKNN3 Toolkit:是一个软件开发工具包,可将 PyTorch、ONNX 等深度学习框架训练的模型转换为 RKNN 格式,支持用户进行模型转换、推理和性能评估;

RKNN3 Model Zoo:提供了丰富的模型转换示例,涵盖多种 AI 模型类型。


板端开发环境,包含 RKNN3 Runtime,AI 应用 Examples,工具集合,驱动等RK3588


 

RKNN3 Runtime:模型转换完成后,可在开发板上使用RKNN3 API 加载和运行 RKNN 模型。除 RKNN3 API 外,还支持 OpenAI 兼容 API 调用 LLM 模型;

Examples:结合实际应用,提供模型应用参考示例;

工具集合:如用于调试的 RKNN-SMI 和 RKNN Console;。

驱动:提供 RK1820/RK1828 PCIe EP 驱动。

RK1820/RK1828 协处理器:提供固件,通过 PCIe/USB 高速接口与主控 SoC 互联


 


02
支持的平台
 

SDK版本Host平台Device平台(通信接口)
Alpha V0.0.1

RK3588

RK3576

RK1820/RK1828(PCIe)
Alpha V0.0.2

RK3588

RK3576

RK1820/RK1828(PCIe)
Beta V0.4.0

RK3588

RK3576

RK1820/RK1828(PCIe/USB)
Release V1.0.0

RK3588

RK3576

RK1820/RK1828(PCIe/USB)

本文档中默认使用 Release V1.0.0,如需其他版本,请联系我司。


 


03
快速开始
 


 

本文提供基于启扬 IAC-RK3588-KIT与 RK1828 的完整运行环境,并配套预转换的 RKNN 模型,用户无需编译 SDK 即可高效完成模型推理验证。


 

RK3588


 

3.1 准备工作

3.1.1 硬件准备

Host:IAC-RK3588-MB,IAC-RK3588-CM

Device:M.2 模组RK1828

1.IAC-RK3588-MB,IAC-RK3588-CM实物:

RK3588

2.M.2 模组RK1828:

RK3588

硬件连接

M.2 模组RK1828+IAC-RK3588-MB,IAC-RK3588-CM组合示意图:

接在底板J11 SSD的位置

RK3588

注意:RK1828需要外部供电12V,上电顺序为RK1828再RK3588。

RK3588

确认 RK1820/RK1828 连接状态:在 RK3588 EVB10 上检查设备连接状态。

root@linaro-alip:/# lspci000200.0 Processing accelerators: Rockchip Electronics Co., Ltd Device 182a (rev 01)
 

3.1.2 模型准备

可从网盘下载需要快速验证的模型。

网盘链接:https://console.box.lenovo.com/l/H1fig1提取码: rknn

模型位置位于:RKNN3_SDK/rknn3_models/v1.0.0

常用模型

类型路径
LLM 模型RKNN3_SDK/rknn3_models/v1.0.0/Qwen2.5-instruct-0.5B
CNN 模型RKNN3_SDK/rknn3_models/v1.0.0/MobilenetV2
VLM 模型RKNN3_SDK/rknn3_models/v1.0.0/InternVL3_2B

3.2 模型测试

3.2.1 CNN模型

CNN 模型完整验证(以 MobilenetV2 为例,模型位置):

# PC 端执行# 将模型push到板端adb push MobilenetV2 /userdata/
 

模型运行:

# 如果当前非root用户,建议切换到root用户执行[ $(whoami) != "root" ] && sudo su -# 模型推理命令cd /userdata/MobilenetV2rknn3_model_test mobilenetv2-12.rknn mobilenetv2-12.weight '' '' 0x01 10
 

以下日志输出表明模型已成功完成推理:

root@linaro-alip:/userdata/MobilenetV2# rknn3_model_test mobilenetv2-12.rknn mobilenetv2-12.weight '' '' 0x10 10Input paths or golden output paths not provided, using random input and skipping golden comparisonFound 1 RK182X devicesDevice 0: transfer_type=PCIE, id=000200.0Info: Only one device found (id=000200.0), init_extend can be NULLrknn3_init success, cost 561.752 msDevice Memory Info: total=19 MB, free=18 MBNode 0: total=639 MB, free=639 MBNode 1: total=639 MB, free=639 MBNode 2: total=619 MB, free=619 MBNode 3: total=639 MB, free=639 MBNode 4: total=619 MB, free=619 MBNode 5: total=619 MB, free=619 MBNode 6: total=639 MB, free=639 MBNode 7: total=639 MB, free=639 MBmodel_len=63680, model=0x5749c80weight_len=3714048, weight_data=0x7fba5d4010rknn3_load_model_from_data success, cost 2.577 msCore number: 1model_config:0xf8c0rknn3_model_init success, cost 27.672 msinput tensors:name=input, n_dims=4, shape=[1, 224, 224, 3], stride=[150528, 672, 3, 1], aligned_size=150528, layout=NHWC, dtype=UINT8, qnt_type=PER_LAYER_ASYMMETRIC, scale=0.01866, zero_point=-14output tensors:name=output, n_dims=2, shape=[1, 1000], stride=[1000, 1], aligned_size=1024, layout=UNDEFINED, dtype=INT8, qnt_type=PER_LAYER_ASYMMETRIC, scale=0.14192, zero_point=-55Generating random input data for input 0 (name: input)Input 0 size check passed: 150528 elementsInput[0] first values (type: 3):[0] UINT8: 103[1] UINT8: 198[2] UINT8: 105[3] UINT8: 115[4] UINT8: 81[5] UINT8: 255[6] UINT8: 74[7] UINT8: 236[8] UINT8: 41[9] UINT8: 205Running model 10 times...Syncing input[0] to device...Syncing output[0] from device...loop: 1/10, model inference cost 4.022 msSyncing input[0] to device...Syncing output[0] from device...loop: 2/10, model inference cost 3.944 msSyncing input[0] to device...Syncing output[0] from device...loop: 3/10, model inference cost 3.954 msSyncing input[0] to device...Syncing output[0] from device...loop: 4/10, model inference cost 4.141 msSyncing input[0] to device...Syncing output[0] from device...loop: 5/10, model inference cost 4.481 msSyncing input[0] to device...Syncing output[0] from device...loop: 6/10, model inference cost 4.598 msSyncing input[0] to device...Syncing output[0] from device...loop: 7/10, model inference cost 4.440 msSyncing input[0] to device...Syncing output[0] from device...loop: 8/10, model inference cost 4.806 msSyncing input[0] to device...Syncing output[0] from device...loop: 9/10, model inference cost 5.273 msSyncing input[0] to device...Syncing output[0] from device...loop: 10/10, model inference cost 5.293 msAll 10 loops completed successfullyrknn3_destroy success, cost 1.809 ms

3.2.2 LLM模型

LLM类模型验证(以qwen2.5_0.5b_instruct为例,模型位置):

# PC 端执行# 将模型push到板端adb push Qwen2.5-0.5B /userdata/

模型运行:

# 如果当前非root用户,建议切换到root用户执行[ $(whoami) != "root" ] && sudo su -# 模型推理命令cd /userdata/Qwen2.5-0.5Brknn3_session_test Qwen2.5-0.5B.rknn Qwen2.5-0.5B.weight Qwen2.5-0.5B.tokenizer.gguf Qwen2.5-0.5B.embed.bin 1024 256 0xff
 

LLM模型能够正确对话,表示模型推理成功,如下日志:

root@linaro-alip:/userdata/Qwen2.5-0.5B#rknn3_session_test Qwen2.5-0.5B.rknn Qwen2.5-0.5B.weight Qwen2.50.5B.tokenizer.gguf Qwen2.5-0.5B.embed.bin 1024 256 0xff*******************************NEW TEST**********************************llama_model_loader: loaded meta data with 23 key-value pairs and 0 tensors from Qwen2.5-0.5B.tokenizer.gguf (version GGUF V3 (latest))llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.llama_model_loader: - kv 0: general.architecture str = qwen2llama_model_loader: - kv 1: general.type str = modelllama_model_loader: - kv 2: general.name str = Grqllama_model_loader: - kv 3: qwen2.block_count u32 = 24llama_model_loader: - kv 4: qwen2.context_length u32 = 32768llama_model_loader: - kv 5: qwen2.embedding_length u32 = 896llama_model_loader: - kv 6: qwen2.feed_forward_length u32 = 4864llama_model_loader: - kv 7: qwen2.attention.head_count u32 = 14llama_model_loader: - kv 8: qwen2.attention.head_count_kv u32 = 2llama_model_loader: - kv 9: qwen2.rope.freq_base f32 = 1000000.000000llama_model_loader: - kv 10: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001llama_model_loader: - kv 11: general.file_type u32 = 1llama_model_loader: - kv 12: general.quantization_version u32 = 2llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2llama_model_loader: - kv 14: tokenizer.ggml.pre str = qwen2llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 151645llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 151643llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 151643llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = falsellama_model_loader: - kv 22: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...init_tokenizer: initializing tokenizer for type 2load: control token: 151660 '<|fim_middle|>' is not marked as EOGload: control token: 151659 '<|fim_prefix|>' is not marked as EOGload: control token: 151653 '<|vision_end|>' is not marked as EOGload: control token: 151648 '<|box_start|>' is not marked as EOGload: control token: 151646 '<|object_ref_start|>' is not marked as EOGload: control token: 151649 '<|box_end|>' is not marked as EOGload: control token: 151655 '<|image_pad|>' is not marked as EOGload: control token: 151651 '<|quad_end|>' is not marked as EOGload: control token: 151647 '<|object_ref_end|>' is not marked as EOGload: control token: 151652 '<|vision_start|>' is not marked as EOGload: control token: 151654 '<|vision_pad|>' is not marked as EOGload: control token: 151656 '<|video_pad|>' is not marked as EOGload: control token: 151644 '<|im_start|>' is not marked as EOGload: control token: 151661 '<|fim_suffix|>' is not marked as EOGload: control token: 151650 '<|quad_start|>' is not marked as EOGload: printing all EOG tokens - 151643 ('<|endoftext|>')load: - 151645 ('<|im_end|>')load: - 151662 ('<|fim_pad|>')load: - 151663 ('<|repo_name|>')load: - 151664 ('<|file_sep|>')load: special tokens cache size = 22load: token to piece cache size = 0.9310 MBprint_info: vocab type = BPEprint_info: n_vocab = 151936print_info: n_merges = 151387print_info: BOS token = 151643 '<|endoftext|>'print_info: EOS token = 151645 '<|im_end|>'print_info: EOT token = 151645 '<|im_end|>'print_info: PAD token = 151643 '<|endoftext|>'print_info: LF token = 198 'Ċ'print_info: FIM PRE token = 151659 '<|fim_prefix|>'print_info: FIM SUF token = 151661 '<|fim_suffix|>'print_info: FIM MID token = 151660 '<|fim_middle|>'print_info: FIM PAD token = 151662 '<|fim_pad|>'print_info: FIM REP token = 151663 '<|repo_name|>'print_info: FIM SEP token = 151664 '<|file_sep|>'print_info: EOG token = 151643 '<|endoftext|>'print_info: EOG token = 151645 '<|im_end|>'print_info: EOG token = 151662 '<|fim_pad|>'print_info: EOG token = 151663 '<|repo_name|>'print_info: EOG token = 151664 '<|file_sep|>'print_info: max token length = 256Warning: max_context_len (1024) is less than llm_config.max_ctx_len (2048).It's recommended to set to 2048.=============================================================Model Config=============================================================Max Context Length : 2048Max Position Embeddings : 8192Model Type :Task Type : RKNN3_LLM_TASK_GENERATE=============================================================--------------------Input[0]--------------------请解释一下相对论的基本概念。--------------------Output----------------------相对论是爱因斯坦在20世纪初提出的物理学理论,它改变了我们对时间、空间和物质的理解。以下是相对论的基本概念:1. **狭义相对论**(Special Relativity):- **光速不变原理**:所有惯性参考系中的光速都是一样的,无论光源的运动速度如何。- **相对性原理**:在不同的惯性参考系中,物理定律是相同的。这意味着时间、空间和长度都是相对的。2. **广义相对论**(General Relativity):- **引力场**:物质和能量通过引力场影响时空,使物体运动轨迹发生弯曲。- **时空曲率**:时空中的曲线反映了物质分布和引力场的强度。这种曲率与物质的质量成正比。3. **时间膨胀**(Time Dilation):- 物质在加速时会缩短时间,即时间流逝得更快。4. **空间膨胀**(Space Dilation):- 物质在加速时也会导致空间的弯曲,使距离物体更远的地方看起来变长。5. **引力红移**:由于物质和能量的影响,光谱线向--------------Max new token reached-------------Performance Statistics:-----------------------------------------------------------------------------------------Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second-----------------------------------------------------------------------------------------Prefill | 52.69 | 48 | 1.10 | 910.99Generate | 1385.01 | 255 | 5.43 | 184.11-------------------------------------------------------------------------------------------------------------Input[1]--------------------Please explain the basic concept of relativity.--------------------Output----------------------Relativity is a set of theories developed by Albert Einstein, which fundamentally changed our understanding of time, space, and the nature of physical objects. Here are the key concepts of relativity:1. **Special Relativity** (Special Relativity):- **Einstein's first law**: The laws of physics are the same for all observers in uniform motion relative to each other.- **Einstein's second law**: The speed of light in a vacuum is constant, regardless of the motion of the light source or observer.2. **General Relativity** (General Relativity):- **Gravitational Field**: The gravitational field is caused by mass and energy. This field affects the curvature of spacetime.- **Curvature of Space-Time**: The geometry of space-time changes when objects are accelerated, leading to phenomena like time dilation and length contraction.3. **Time Dilation** (Time Dilation):- When an object accelerates, its clocks run slower compared to a stationary observer. This effect is known as time dilation.4. **Space Dilation** (Space Dilation):- The same principle applies when objects are accelerated in space. Objects moving at high speeds will appear to move faster than they actually do, due to--------------Max new token reached-------------Performance Statistics:-----------------------------------------------------------------------------------------Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second-----------------------------------------------------------------------------------------Prefill | 38.39 | 17 | 2.26 | 442.81Generate | 1398.52 | 255 | 5.48 | 182.34-----------------------------------------------------------------------------------------*******************************END TEST**********************************
 

另可参考 docs/Examples 目录下的应用文档集成应用进行模型测试

 

打开APP阅读更多精彩内容
声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉

全部0条评论

快来发表一下你的评论吧 !

×
20
完善资料,
赚取积分