本文内容来自先楫开发者 @Xusiwei1236,介绍了如何在HPM6750上运行边缘AI框架,感兴趣的小伙伴快点来看看
--------------- 以下为测评内容 ---------------
TFLM是什么?
你或许都听说过TensorFlow——由谷歌开发并开源的一个机器学习库,它支持模型训练和模型推理。
今天介绍的TFLM,全称是TensorFlow Lite for Microcontrollers,翻译过来就是“针对微控制器的TensorFlow Lite”。那TensorFlow Lite又是什么呢?
TensorFlow Lite(通常简称TFLite)其实是TensorFlow团队为了将模型部署到移动设备而开发的一套解决方案,通俗的说就是手机版的TensorFlow。下面是TensorFlow官网上关于TFLite的一段介绍:
“TensorFlow Lite 是一组工具,可帮助开发者在移动设备、嵌入式设备和 loT 设备上运行模型,以便实现设备端机器学习。”
而我们今天要介绍的TensorFlow Lite for Microcontrollers(TFLM)则是 TensorFlow Lite的微控制器版本。这里是官网上的一段介绍:
“ TensorFlow Lite for Microcontrollers (以下简称TFLM)是 TensorFlow Lite 的一个实验性移植版本,它适用于微控制器和其他一些仅有数千字节内存的设备。它可以直接在“裸机”上运行,不需要操作系统支持、任何标准 C/C++ 库和动态内存分配。核心运行时(core runtime)在 Cortex M3 上运行时仅需 16KB,加上足以用来运行语音关键字检测模型的操作,也只需 22KB 的空间。”
这三者一脉相承,都出自谷歌,区别是TensorFlow同时支持训练和推理,而后两者只支持推理。TFLite主要用于支持手机、平板等移动设备,TFLM则可以支持单片机。从发展历程上来说,后两者都是TensorFlow项目的“支线项目”。或者说这三者是一个树形的发展过程,具体来说,TFLite是从TensorFlow项目分裂出来的,TFLite-Micro是从TFLite分裂出来的,目前是三个并行发展的。在很长一段时间内,这三个项目的源码都在一个代码仓中维护,从源码目录的包含关系上来说,TensorFlow包含后两者,TFLite包含tflite-micro。
HPM SDK中的TFLM
HPM SDK中集成了TFLM中间件(类似库,但是没有单独编译为库),位于hpm_sdk\middleware子目录:
这个子目录的代码是由TFLM开源项目裁剪而来,删除了很多不需要的文件。
TFLM示例
HPM SDK中也提供了TFLM示例,位于hpm_sdk\samples\tflm子目录:
示例代码是从官方的persion_detection示例修改而来,添加了摄像头采集图像和LCD显示结果。
由于我手里没有配套的摄像头和显示屏,所以本篇没有以这个示例作为实验。
在HPM6750上运行TFLM基准测试
接下来以person detection benchmark为例,讲解如何在HPM6750上运行TFLM基准测试。
按照如下步骤,在HPM SDK环境中添加person detection benchmark源代码文件:
在HPM SDK的samples子目录创建tflm_person_detect_benchmark目录,并在其中创建src目录;
从上文描述的已经运行过person detection benchmark的tflite-micro目录中拷贝如下文件到src目录:
tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc
tensorflow\lite\micro\benchmarks\micro_benchmark.h
tensorflow\lite\micro\examples\person_detection\model_settings.h
tensorflow\lite\micro\examples\person_detection\model_settings.cc
在src目录创建testdata子目录,并将tflite-micro目录下如下目录中的文件拷贝全部到testdata中:
tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata
修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include预处理指令的文件路径(根据拷贝后的相对路径修改);
person_detection_benchmark.cc文件中,main函数的一开始添加一行board_init();、顶部添加一行#include "board.h”
在src平级创建CMakeLists.txt文件,内容如下:
cmake_minimum_required(VERSION 3.13)
set(CONFIG_TFLM 1)
find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})
project(tflm_person_detect_benchmark)
set(CMAKE_CXX_STANDARD 11)
sdk_app_src(src/model_settings.cc)
sdk_app_src(src/person_detection_benchmark.cc)
sdk_app_src(src/testdata/no_person_image_data.cc)
sdk_app_src(src/testdata/person_image_data.cc)
sdk_app_inc(src)
sdk_ld_options("-lm")
sdk_ld_options("--std=c++11")
sdk_compile_definitions(__HPMICRO__)
sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)
# sdk_compile_options("-mabi=ilp32f")
# sdk_compile_options("-march=rv32imafc")
sdk_compile_options("-O2")
# sdk_compile_options("-O3")
set(SEGGER_LEVEL_O3 1)
generate_ses_project()
在src平级创建app.yaml文件,内容如下:
dependency:
- tflm
接下来就是大家熟悉的——编译运行了。首先,使用generate_project生产项目:接着,将HPM6750开发板连接到PC,在Embedded Studio中打卡刚刚生产的项目:这个项目因为引入了TFLM的源码,文件较多,所以右边的源码导航窗里面的Indexing要执行很久才能结束。
然后,就可以使用F7编译、F5调试项目了:
编译完成后,先打卡串口终端连接到设备串口,波特率115200。启动调试后,直接继续运行,就可以在串口终端中看到基准测试的输出了:
==============================
hpm6750evkmini clock summary
==============================
cpu0: 816000000Hz
cpu1: 816000000Hz
axi0: 200000000Hz
axi1: 200000000Hz
axi2: 200000000Hz
ahb: 200000000Hz
mchtmr0: 24000000Hz
mchtmr1: 1000000Hz
xpi0: 133333333Hz
xpi1: 400000000Hz
dram: 166666666Hz
display: 74250000Hz
cam0: 59400000Hz
cam1: 59400000Hz
jpeg: 200000000Hz
pdma: 200000000Hz
==============================
----------------------------------------------------------------------
$$\ $$\ $$$$$$$\ $$\ $$\ $$\
$$ | $$ |$$ __$$\ $$$\ $$$ |\__|
$$ | $$ |$$ | $$ |$$$$\ $$$$ |$$\ $$$$$$$\ $$$$$$\ $$$$$$\
$$$$$$$$ |$$$$$$$ |$$\$$\$$ $$ |$$ |$$ _____|$$ __$$\ $$ __$$\
$$ __$$ |$$ ____/ $$ \$$$ $$ |$$ |$$ / $$ | \__|$$ / $$ |
$$ | $$ |$$ | $$ |\$ /$$ |$$ |$$ | $$ | $$ | $$ |
$$ | $$ |$$ | $$ | \_/ $$ |$$ |\$$$$$$$\ $$ | \$$$$$$ |
\__| \__|\__| \__| \__|\__| \_______|\__| \______/
----------------------------------------------------------------------
InitializeBenchmarkRunner took 114969 ticks (4 ms).
WithPersonDataIterations(1) took 10694521 ticks (445 ms)
DEPTHWISE_CONV_2D took 275798 ticks (11 ms).
DEPTHWISE_CONV_2D took 280579 ticks (11 ms).
CONV_2D took 516051 ticks (21 ms).
DEPTHWISE_CONV_2D took 139000 ticks (5 ms).
CONV_2D took 459646 ticks (19 ms).
DEPTHWISE_CONV_2D took 274903 ticks (11 ms).
CONV_2D took 868518 ticks (36 ms).
DEPTHWISE_CONV_2D took 68180 ticks (2 ms).
CONV_2D took 434392 ticks (18 ms).
DEPTHWISE_CONV_2D took 132918 ticks (5 ms).
CONV_2D took 843014 ticks (35 ms).
DEPTHWISE_CONV_2D took 33228 ticks (1 ms).
CONV_2D took 423288 ticks (17 ms).
DEPTHWISE_CONV_2D took 62040 ticks (2 ms).
CONV_2D took 833033 ticks (34 ms).
DEPTHWISE_CONV_2D took 62198 ticks (2 ms).
CONV_2D took 834644 ticks (34 ms).
DEPTHWISE_CONV_2D took 62176 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62206 ticks (2 ms).
CONV_2D took 832857 ticks (34 ms).
DEPTHWISE_CONV_2D took 62194 ticks (2 ms).
CONV_2D took 832882 ticks (34 ms).
DEPTHWISE_CONV_2D took 16050 ticks (0 ms).
CONV_2D took 438774 ticks (18 ms).
DEPTHWISE_CONV_2D took 27494 ticks (1 ms).
CONV_2D took 974362 ticks (40 ms).
AVERAGE_POOL_2D took 2323 ticks (0 ms).
CONV_2D took 1128 ticks (0 ms).
RESHAPE took 184 ticks (0 ms).
SOFTMAX took 2249 ticks (0 ms).
NoPersonDataIterations(1) took 10694160 ticks (445 ms)
DEPTHWISE_CONV_2D took 274922 ticks (11 ms).
DEPTHWISE_CONV_2D took 281095 ticks (11 ms).
CONV_2D took 515380 ticks (21 ms).
DEPTHWISE_CONV_2D took 139428 ticks (5 ms).
CONV_2D took 460039 ticks (19 ms).
DEPTHWISE_CONV_2D took 275255 ticks (11 ms).
CONV_2D took 868787 ticks (36 ms).
DEPTHWISE_CONV_2D took 68384 ticks (2 ms).
CONV_2D took 434537 ticks (18 ms).
DEPTHWISE_CONV_2D took 133071 ticks (5 ms).
CONV_2D took 843202 ticks (35 ms).
DEPTHWISE_CONV_2D took 33291 ticks (1 ms).
CONV_2D took 423388 ticks (17 ms).
DEPTHWISE_CONV_2D took 62190 ticks (2 ms).
CONV_2D took 832978 ticks (34 ms).
DEPTHWISE_CONV_2D took 62205 ticks (2 ms).
CONV_2D took 834636 ticks (34 ms).
DEPTHWISE_CONV_2D took 62213 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62239 ticks (2 ms).
CONV_2D took 832850 ticks (34 ms).
DEPTHWISE_CONV_2D took 62217 ticks (2 ms).
CONV_2D took 832856 ticks (34 ms).
DEPTHWISE_CONV_2D took 16040 ticks (0 ms).
CONV_2D took 438779 ticks (18 ms).
DEPTHWISE_CONV_2D took 27481 ticks (1 ms).
CONV_2D took 974354 ticks (40 ms).
AVERAGE_POOL_2D took 1812 ticks (0 ms).
CONV_2D took 1077 ticks (0 ms).
RESHAPE took 341 ticks (0 ms).
SOFTMAX took 901 ticks (0 ms).
WithPersonDataIterations(10) took 106960312 ticks (4456 ms)
NoPersonDataIterations(10) took 106964554 ticks (4456 ms)
可以看到,在HPM6750EVKMINI开发板上,连续运行10次人像检测模型,总体耗时4456毫秒,每次平均耗时445.6毫秒。
在树莓派3B+上运行TFLM基准测试
在树莓派上运行TFLM基准测试
树莓派3B+上可以和PC上类似,直接运行PC端的测试命令,得到基准测试结果:
可以看到,在树莓派3B+上的,对于有人脸的图片,连续运行10次人脸检测模型,总体耗时4186毫秒,每次平均耗时418.6毫秒;对于无人脸的图片,连续运行10次人脸检测模型,耗时4190毫秒,每次平均耗时419毫秒。
HPM6750和树莓派3B+、AMD R7 4800H上的基准测试结果对比
这里将HPM6750EVKMINI开发板、树莓派3B+和AMD R7 4800H上运行人脸检测模型的平均耗时结果汇总如下:
可以看到,在TFLM人脸检测模型计算场景下,HPM6750EVKMINI和树莓派3B+成绩相当。虽然HPM6750的816MHz CPU频率比树莓派3B+搭载的BCM2837 Cortex-A53 1.4GHz的主频低,但是在单核心计算能力上没有相差太多。
这里树莓派3B+上的TFLM基准测试程序是运行在64位Debian Linux发行版上的,而HPM6750上的测试程序是直接运行在裸机上的。由于操作系统内核中任务调度器的存在,会对CPU的计算能力带来一定损耗。所以,这里进行的并不是一个严格意义上的对比测试,测试结果仅供参考。
全部0条评论
快来发表一下你的评论吧 !