浏览全部资源
扫码关注微信
1. 杭州电子科技大学计算机学院,浙江 杭州 310018
2. 浙江省脑机协同智能重点实验室,浙江 杭州 310018
[ "胡海洋(1977- ),男,杭州电子科技大学教授,主要研究方向为机器视觉、智能制造" ]
[ "厉泽品(1997- ),男,杭州电子科技大学硕士生,主要研究方向为计算机视觉、文本检测识别" ]
[ "李忠金(1988- ),男,杭州电子科技大学讲师,主要研究方向为计算机视觉、移动边缘计算" ]
网络出版日期:2022-07,
纸质出版日期:2022-07-20
移动端阅览
胡海洋, 厉泽品, 李忠金. 特征增强和双线性特征向量融合的移动端工业货箱文本检测[J]. 电信科学, 2022,38(7):75-87.
Haiyang HU, Zepin LI, Zhongjin LI. Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers[J]. Telecommunications science, 2022, 38(7): 75-87.
胡海洋, 厉泽品, 李忠金. 特征增强和双线性特征向量融合的移动端工业货箱文本检测[J]. 电信科学, 2022,38(7):75-87. DOI: 10.11959/j.issn.1000-0801.2022139.
Haiyang HU, Zepin LI, Zhongjin LI. Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers[J]. Telecommunications science, 2022, 38(7): 75-87. DOI: 10.11959/j.issn.1000-0801.2022139.
在实际工业环境下,光线昏暗、文本不规整、设备有限等因素,使得文本检测成为一项具有挑战性的任务。针对此问题,设计了一种基于双线性操作的特征向量融合模块,并联合特征增强与半卷积组成轻量级文本检测网络RGFFD(ResNet18+GhostModule+特征金字塔增强模块(feature pyramid enhancement module, FPEM)+ 特征融合模块(feature fusion module,FFM)+可微分二值化(differenttiable binarization,DB))。其中,Ghost模块内嵌特征增强模块,提升特征提取能力,双线性特征向量融合模块融合多尺度信息,添加自适应阈值分割算法提高DB模块分割能力。在实际工厂环境下,采用嵌入式设备UP2 board对货箱编号进行文本检测,RGFFD检测速度达到6.5 f/s。同时在公共数据集ICDAR2015、Total-text上检测速度分别达到39.6 f/s和49.6 f/s,在自定义数据集上准确率达到88.9%,检测速度为30.7 f/s。
In the real factory environment
due to factors such as dim light
irregular text
and limited equipment
text detection becomes a challenging task.Aiming at this problem
a feature vector fusion module based on bilinear operation was designed and combined with feature enhancement and semi-convolution to form a lightweight text detection network RGFFD (ResNet18 + Ghost Module + FPEM(feature pyramid enhancement module)) + FFM(feature fusion module) + DB (differentiable binarization)).Among them
the Ghost module was embedded with a feature enhancement module to improve the feature extraction capability
the bilinear feature vector fusion module fused multi-scale information
and an adaptive threshold segmentation algorithm was added to improve the segmentation capability of the DB module.In the real industrial environment
the RGFFD detection speed reached 6.5 f/s
when using the embedded device UP2 board for text detection of container numbers.At the same time
the detection speed on the public datasets ICDAR2015 and Total-text reached 39.6 f/s and 49.6 f/s
respectively.The accuracy rate on the custom dataset reached 88.9%
and the detection speed was 30.7 f/s.
HUANG W L , LIN Z , YANG J C , et al . Text localization in natural images using stroke feature transform and text covariance descriptors [C ] // Proceedings of 2013 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2013 : 1241 - 1248 .
NEUMANN L , MATAS J . Real-time lexicon-free scene text localization and recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2016 , 38 ( 9 ): 1872 - 1885 .
MATAS J , CHUM O , URBAN M , et al . Robust wide-baseline stereo from maximally stable extremal regions [J ] . Image and Vision Computing , 2004 , 22 ( 10 ): 761 - 767 .
MINETTO R , THOME N , CORD M , et al . T-HOG:an effective gradient-based descriptor for single line text regions [J ] . Pattern Recognition , 2013 , 46 ( 3 ): 1078 - 1090 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J ] . Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .
LIU W , ANGUELOV D , ERHAN D , et al . SSD:single shot MultiBox detector [M ] // ComputerVision–ECCV2016 . Cham : Springer International Publishing , 2016 : 21 - 37 .
ZHONG Z Y , SUN L , HUO Q . An anchor-free region proposal network for Faster R-CNN-based text detection approaches [J ] . International Journal on Document Analysis and Recognition (IJDAR) , 2019 , 22 ( 3 ): 315 - 327 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 770 - 778 .
LIAO M , WAN Z , YAO C , et al . Real-time scene text detection with differentiable binarization [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Piscataway:IEEE Press , 2020 : 11474 - 11481 .
苏赋 , 吕沁 , 罗仁泽 . 基于深度学习的图像分类研究综述 [J ] . 电信科学 , 2019 , 35 ( 11 ): 58 - 74 .
SU F , LV Q , LUO R Z . Review of image classification based on deep learning [J ] . Telecommunications Science , 2019 , 35 ( 11 ): 58 - 74 .
HAN K , WANG Y H , TIAN Q , et al . GhostNet:more features from cheap operations [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2020 : 1577 - 1586 .
HOWARD A G , ZHU M , CHEN B , et al . Mobilenets:efficient convolutional neural networks for mobile vision applications [J ] . arXiv preprint arXiv:1704.04861 , 2017 .
ZHANG X Y , ZHOU X Y , LIN M X , et al . ShuffleNet:an extremely efficient convolutional neural network for mobile devices [C ] // Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 6848 - 6856 .
HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 7132 - 7141 .
WANG W H , XIE E Z , SONG X G , et al . Efficient and accurate arbitrary-shaped text detection with pixel aggregation net work [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2019 : 8439 - 8448 .
MILLETARI F , NAVAB N , AHMADI S A . V-net:fully convolutional neural networks for volumetric medical image segmentation [C ] // Proceedings of 2016 Fourth International Conference on 3D Vision (3DV) . Piscataway:IEEE Press , 2016 : 565 - 571 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . arXiv preprint arXiv:1409.1556 , 2014 .
SZEGEDY C , LIU W , JIA Y Q , et al . Going deeper with convolutions [C ] // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2015 : 1 - 9 .
IOFFE S , SZEGEDY C . Normalization:accelerating deep network training by reducing internal covariate shift [J ] . arXiv preprint arXiv:1502.03167 .
SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2:inverted residuals and linear bottlenecks [C ] // Proceedings of 2018 IEEE/CVF Conference on ComputerV ision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 4510 - 4520 .
HOWARD A , SANDLER M , CHENB , et al . Searching for MobileNetV3 [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2019 : 1314 - 1324 .
LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // Proceedings of 2017 IEEE Conference on ComputerVision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 936 - 944 .
LIAO M , SHI B , BAI X , et al . Textboxes:a fast text detector with a single deep neural network [C ] // Thirty-first AAAI Conference on Artificial Intelligence . Piscataway:IEEE Press , 2017 .
LIAO M H , SHI B G , BAI X . TextBoxes++:a single-shot oriented scene text detector [J ] . IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society , 2018 , 27 ( 8 ): 3676 - 3690 .
TIAN Z , HUANG W , HE T , et al . Detecting text in natural image with connectionist text proposal network [C ] // Proceedings European Conference on Computer Vision . Heidelberg:Springer , 2016 : 56 - 72 .
SHI B G , BAI X , BELONGIE S . Detecting oriented text in natural images by linking segments [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 3482 - 3490 .
LIAO M H , ZHU Z , SHI B G , et al . Rotation-sensitive regression for oriented scene text detection [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 5909 - 5918 .
ZHOU X Y , YAO C , WEN H , et al . EAST:an efficient and accurate scene text detector [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 2642 - 2651 .
DENG D , LIU H , LI X , et al . Pixellink:detecting scene text via instance segmentation [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . Piscataway:IEEE Press , 2018 .
WANG W H , XIE E Z , LI X , et al . Shape robust text detection with progressive scale expansion network [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 9328 - 9337 .
TIAN Z T , SHU M , LYU P Y , et al . Learning shape-aware embedding for scene text detection [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 4229 - 4238 .
CHOLLET F , . Xception:deep learning with depthwise separable convolutions [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 1800 - 1807 .
SHI X J , CHEN Z , WANG H , et al . Convolutional LSTM network:a machine learning approach for precipitation nowcasting [C ] // Advances in Neural Information Processing Systems .[S.l.:s. n. ] , 2015 : 802 - 810 .
LONG S , RUAN J , ZHANG W , et al . Textsnake:a flexible representation for detecting text of arbitrary shapes [C ] // Proceedings of the European Conference on Computer Vision (ECCV) . Piscataway:IEEE Press , 2018 : 20 - 36 .
YE J , CHEN Z , LIU J H , et al . TextFuseNet:scene text detection with richerfused features [C ] // Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence . California:International Joint Conferences on ArtificialIntelligence Organization , 2020 : 516 - 522 .
WANG X B , JIANG YY , LUO Z B , et al . Arbitrary shape scene text detection with adaptive text region representation [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 6442 - 6451 .
0
浏览量
504
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构