特征增强和双线性特征向量融合的移动端工业货箱文本检测

胡海洋; 厉泽品; 李忠金

doi:10.11959/j.issn.1000-0801.2022139

您当前的位置：

首页 >

文章列表页 >

特征增强和双线性特征向量融合的移动端工业货箱文本检测

研究与开发 | 更新时间：2024-06-05

- 特征增强和双线性特征向量融合的移动端工业货箱文本检测
- Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers
- 电信科学 2022年38卷第7期页码：75-87
- 作者机构：
  
  1. 杭州电子科技大学计算机学院，浙江杭州 310018
  2. 浙江省脑机协同智能重点实验室，浙江杭州 310018
- 作者简介：
  
  [ "胡海洋（1977- ），男，杭州电子科技大学教授，主要研究方向为机器视觉、智能制造" ]
  [ "厉泽品（1997- ），男，杭州电子科技大学硕士生，主要研究方向为计算机视觉、文本检测识别" ]
  [ "李忠金（1988- ），男，杭州电子科技大学讲师，主要研究方向为计算机视觉、移动边缘计算" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(61572162);国家自然科学基金资助项目;The National Natural Science Foundation of China(61802095);浙江省重点研发计划项目;The Zhejiang Provincial Key Science and Technology Project(2018C01012);浙江省自然科学基金资助项目;The Zhejiang Provincial National Science Foundation of China(LQ17F020003)
- DOI：10.11959/j.issn.1000-0801.2022139
  中图分类号： TN929.5
- 网络出版日期：2022-07，
  
  纸质出版日期：2022-07-20
- 稿件说明：
移动端阅览
胡海洋, 厉泽品, 李忠金. 特征增强和双线性特征向量融合的移动端工业货箱文本检测[J]. 电信科学, 2022,38(7):75-87.

Haiyang HU, Zepin LI, Zhongjin LI. Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers[J]. Telecommunications science, 2022, 38(7): 75-87.
胡海洋, 厉泽品, 李忠金. 特征增强和双线性特征向量融合的移动端工业货箱文本检测[J]. 电信科学, 2022,38(7):75-87. DOI： 10.11959/j.issn.1000-0801.2022139.

Haiyang HU, Zepin LI, Zhongjin LI. Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers[J]. Telecommunications science, 2022, 38(7): 75-87. DOI： 10.11959/j.issn.1000-0801.2022139.

摘要

在实际工业环境下，光线昏暗、文本不规整、设备有限等因素，使得文本检测成为一项具有挑战性的任务。针对此问题，设计了一种基于双线性操作的特征向量融合模块，并联合特征增强与半卷积组成轻量级文本检测网络RGFFD（ResNet18+GhostModule+特征金字塔增强模块（feature pyramid enhancement module， FPEM）+ 特征融合模块（feature fusion module，FFM）+可微分二值化（differenttiable binarization，DB））。其中，Ghost模块内嵌特征增强模块，提升特征提取能力，双线性特征向量融合模块融合多尺度信息，添加自适应阈值分割算法提高DB模块分割能力。在实际工厂环境下，采用嵌入式设备UP2 board对货箱编号进行文本检测，RGFFD检测速度达到6.5 f/s。同时在公共数据集ICDAR2015、Total-text上检测速度分别达到39.6 f/s和49.6 f/s，在自定义数据集上准确率达到88.9%，检测速度为30.7 f/s。

Abstract

In the real factory environment

due to factors such as dim light

irregular text

and limited equipment

text detection becomes a challenging task.Aiming at this problem

a feature vector fusion module based on bilinear operation was designed and combined with feature enhancement and semi-convolution to form a lightweight text detection network RGFFD (ResNet18 + Ghost Module + FPEM(feature pyramid enhancement module)) + FFM(feature fusion module) + DB (differentiable binarization)).Among them

the Ghost module was embedded with a feature enhancement module to improve the feature extraction capability

the bilinear feature vector fusion module fused multi-scale information

and an adaptive threshold segmentation algorithm was added to improve the segmentation capability of the DB module.In the real industrial environment

the RGFFD detection speed reached 6.5 f/s

when using the embedded device UP2 board for text detection of container numbers.At the same time

the detection speed on the public datasets ICDAR2015 and Total-text reached 39.6 f/s and 49.6 f/s

respectively.The accuracy rate on the custom dataset reached 88.9%

and the detection speed was 30.7 f/s.

关键词

Keywords

references

HUANG W L , LIN Z , YANG J C , et al . Text localization in natural images using stroke feature transform and text covariance descriptors [C ] // Proceedings of 2013 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2013 : 1241 - 1248 .

NEUMANN L , MATAS J . Real-time lexicon-free scene text localization and recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2016 , 38 ( 9 ): 1872 - 1885 .

MATAS J , CHUM O , URBAN M , et al . Robust wide-baseline stereo from maximally stable extremal regions [J ] . Image and Vision Computing , 2004 , 22 ( 10 ): 761 - 767 .

MINETTO R , THOME N , CORD M , et al . T-HOG:an effective gradient-based descriptor for single line text regions [J ] . Pattern Recognition , 2013 , 46 ( 3 ): 1078 - 1090 .

KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .

LIU W , ANGUELOV D , ERHAN D , et al . SSD:single shot MultiBox detector [M ] // ComputerVision–ECCV2016 . Cham : Springer International Publishing , 2016 : 21 - 37 .

ZHONG Z Y , SUN L , HUO Q . An anchor-free region proposal network for Faster R-CNN-based text detection approaches [J ] . International Journal on Document Analysis and Recognition (IJDAR) , 2019 , 22 ( 3 ): 315 - 327 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 770 - 778 .

LIAO M , WAN Z , YAO C , et al . Real-time scene text detection with differentiable binarization [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Piscataway:IEEE Press , 2020 : 11474 - 11481 .

苏赋 , 吕沁 , 罗仁泽 . 基于深度学习的图像分类研究综述 [J ] . 电信科学 , 2019 , 35 ( 11 ): 58 - 74 .

SU F , LV Q , LUO R Z . Review of image classification based on deep learning [J ] . Telecommunications Science , 2019 , 35 ( 11 ): 58 - 74 .

HAN K , WANG Y H , TIAN Q , et al . GhostNet:more features from cheap operations [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2020 : 1577 - 1586 .

HOWARD A G , ZHU M , CHEN B , et al . Mobilenets:efficient convolutional neural networks for mobile vision applications [J ] . arXiv preprint arXiv:1704.04861 , 2017 .

ZHANG X Y , ZHOU X Y , LIN M X , et al . ShuffleNet:an extremely efficient convolutional neural network for mobile devices [C ] // Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 6848 - 6856 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 7132 - 7141 .

WANG W H , XIE E Z , SONG X G , et al . Efficient and accurate arbitrary-shaped text detection with pixel aggregation net work [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2019 : 8439 - 8448 .

MILLETARI F , NAVAB N , AHMADI S A . V-net:fully convolutional neural networks for volumetric medical image segmentation [C ] // Proceedings of 2016 Fourth International Conference on 3D Vision (3DV) . Piscataway:IEEE Press , 2016 : 565 - 571 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . arXiv preprint arXiv:1409.1556 , 2014 .

SZEGEDY C , LIU W , JIA Y Q , et al . Going deeper with convolutions [C ] // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2015 : 1 - 9 .

IOFFE S , SZEGEDY C . Normalization:accelerating deep network training by reducing internal covariate shift [J ] . arXiv preprint arXiv:1502.03167 .

SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2:inverted residuals and linear bottlenecks [C ] // Proceedings of 2018 IEEE/CVF Conference on ComputerV ision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 4510 - 4520 .

HOWARD A , SANDLER M , CHENB , et al . Searching for MobileNetV3 [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2019 : 1314 - 1324 .

LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // Proceedings of 2017 IEEE Conference on ComputerVision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 936 - 944 .

LIAO M , SHI B , BAI X , et al . Textboxes:a fast text detector with a single deep neural network [C ] // Thirty-first AAAI Conference on Artificial Intelligence . Piscataway:IEEE Press , 2017 .

LIAO M H , SHI B G , BAI X . TextBoxes++:a single-shot oriented scene text detector [J ] . IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society , 2018 , 27 ( 8 ): 3676 - 3690 .

TIAN Z , HUANG W , HE T , et al . Detecting text in natural image with connectionist text proposal network [C ] // Proceedings European Conference on Computer Vision . Heidelberg:Springer , 2016 : 56 - 72 .

SHI B G , BAI X , BELONGIE S . Detecting oriented text in natural images by linking segments [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 3482 - 3490 .

LIAO M H , ZHU Z , SHI B G , et al . Rotation-sensitive regression for oriented scene text detection [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 5909 - 5918 .

ZHOU X Y , YAO C , WEN H , et al . EAST:an efficient and accurate scene text detector [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 2642 - 2651 .

DENG D , LIU H , LI X , et al . Pixellink:detecting scene text via instance segmentation [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Piscataway:IEEE Press , 2018 .

WANG W H , XIE E Z , LI X , et al . Shape robust text detection with progressive scale expansion network [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 9328 - 9337 .

TIAN Z T , SHU M , LYU P Y , et al . Learning shape-aware embedding for scene text detection [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 4229 - 4238 .

CHOLLET F , . Xception:deep learning with depthwise separable convolutions [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 1800 - 1807 .

SHI X J , CHEN Z , WANG H , et al . Convolutional LSTM network:a machine learning approach for precipitation nowcasting [C ] // Advances in Neural Information Processing Systems .[S.l.:s. n. ] , 2015 : 802 - 810 .

LONG S , RUAN J , ZHANG W , et al . Textsnake:a flexible representation for detecting text of arbitrary shapes [C ] // Proceedings of the European Conference on Computer Vision (ECCV) . Piscataway:IEEE Press , 2018 : 20 - 36 .

YE J , CHEN Z , LIU J H , et al . TextFuseNet:scene text detection with richerfused features [C ] // Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence . California:International Joint Conferences on ArtificialIntelligence Organization , 2020 : 516 - 522 .

WANG X B , JIANG YY , LUO Z B , et al . Arbitrary shape scene text detection with adaptive text region representation [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 6442 - 6451 .

浏览量

504

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

多特征融合的合成视点立体图像质量评价