OpenVINO2021.4版本中场景文字检测与识别模型的推理使用

英特尔物联网 2021-08-26 2989

描述

场景文字检测与识别模型

OpenVINO2021.4支持场景文字检测是基于MobileNetV2的PixelLink模型，模型有两个分别是text-detection-0003与text-detection-0004。以text-detection-0003模型为例它有两个输出，分别是分割输出与bounding Boxes输出

下面是基于VGG16作为backbone实现的PixelLink的模型。

最终得到输出text/notext的mask区域，对mask区域简单处理之后就会得到每个场景文字区域的ROI。关于后处理，再后续的会有详细代码演示。OpenVINO2021.4不仅支持场景文字的检测还支持场景文字的识别，支持场景文字识别的模型是基于VGG16+双向LSTM，识别0~9与26个字符加空白，并且非大小写敏感！

模型输入与输出格式

PixelLink场景文字检测模型的输入与输出格式如下：

输入格式：1x3x768x1280 BGR彩色图像

输出格式：

name： “model/link_logits_/add”，［1x16x192x320］ – pixelLink的输出

name： “model/segm_logits/add”，［1x2x192x320］ – 像素分类text/no text

图-3文本识别模型的输入与输出格式如下：

输入格式：1x1x32x120

输出格式：30， 1， 37

输出解释是基于CTC贪心解析方式。

其中37字符集长度，字符集为：

0123456789abcdefghijklmnopqrstuvwxyz#

#表示空白。

同步与异步推理

在OpenVINO的IE推理模块相关SDK支持同步与异步推理模型，同步的意思是阻塞直到返回结果，异步就是调用推理之后直接返回，接受到处理完成通知之后再解析输出，相比同步方式，异步推理更加适合视频流多路推理的方式。异步推理的执行方式大致如下：

// start the async infer request （puts the request to the queue and immediately returns）

async_infer_request-》StartAsync（）;

// here you can continue execution on the host until results of the current request are really needed

//。。。

async_infer_request.Wait（IInferRequest：：RESULT_READY）;

auto output = async_infer_request.GetBlob（output_name）;

场景文字检测代码演示

OpenVINO2021.4中场景文字检测的，以text-detection-0003为例。加载模型文件与获取推理请求等与之前的保持一致，无需再说，这里主要是PixelLink模型的输出解析部分，它的解析部分代码如下：

cv：：Mat mask = cv：：Size（out_w， out_h）， CV_8U）;

int step = out_h*out_w;

for （int row = 0; row 《 out_h; row++） {

for （int col = 0; col 《 out_w; col++） {

float p1 = detection_out［row*out_w + col］;

float p2 = detection_out［step + row*out_w + col］; // text

if （p2》1.0） {

mask.at《uchar》（row， col） = 255;

}

cv：：resize（mask， mask， cv：：Size（im_w， im_h））;

std：：vector《std：：vector《cv：：Point》》 contours;

cv：：findContours（mask， contours， cv：：RETR_EXTERNAL， cv：：CHAIN_APPROX_SIMPLE）;

对输出的Mask数据，完成text与非text的分类，得到二值图象，然后对二值图象完成轮廓发现，根据轮廓发现的的结果输出最大/最小外接矩形，得到每个Text区域的检测结果，最终模型的运行结果如下：

场景文字识别代码演示

场景文字识别是基于场景文字检测模型输出得到的TEXT区域作为输入，基于灰度图象预测输出，使用text-recognition-0012模型。关于模型加载、输入与输出设置同样不再赘述，检测得到TEXT的ROI作为输入，推理与预测文字及显示的代码如下：

auto reco_output = reco_request.GetBlob（reco_output_name）;

const float* blob_out = static_cast《PrecisionTrait《Precision：：FP32》：：value_type*》（reco_output-》buffer（））;

const SizeVector reco_dims = reco_output-》getTensorDesc（）.getDims（）;

const int RW = reco_dims［0］;

const int RB = reco_dims［1］;

const int RL = reco_dims［2］;

std：：string ocr_txt = ctc_decode（blob_out， RW， RL）;

std：：cout 《《 ocr_txt 《《 std：：endl;

cv：：putText（src， ocr_txt， box.tl（）， cv：：FONT_HERSHEY_PLAIN， 1.0， cv：：Scalar（255， 0， 0）， 1）;

其中RWxRBxRL=30x1x37，CTC解析的函数ctc_decode实现代码如下：

std：：string ctc_decode（const float* blob_out， int seq_w， int seq_l） {

printf（“seq width： %d， seq length： %d ”， seq_w， seq_l）;

std：：string res = “”;

bool prev_pad = false;

const int num_classes = alphabet.length（）;

int seq_len = seq_w*seq_l;

for （int i = 0; i 《 seq_w; i++） {

int argmax = 0;

int max_prob = blob_out［i*seq_l］;

for （int j = 0; j 《num_classes; j++） {

if （blob_out［i*seq_l + j］》 max_prob） {

max_prob = blob_out［i*seq_l + j］;

argmax = j;

}

auto symbol = alphabet［argmax］;

if （symbol == ‘#’） {

prev_pad = true;

}

else {

if （res.empty（） || prev_pad || （！res.empty（） && symbol ！= res.back（））） {

prev_pad = false;

res += symbol;

}

return res;

}

解析过程就是对得到二维矩阵30x37，按行先做argmax，然后再去掉重复，最终得到预测生成的text文本返回。

总结

本文主要讲述了OpenVINO2021.4版本中场景文字检测与识别模型的推理使用，以及同步与异步推理的的基本概念。特别值得注意的是场景文字识别模型是基于灰度图象不是RGB彩色图象，如果搞错这点就会得到错误的文本预测结果。

编辑：jq

打开APP阅读更多精彩内容