基于深度学习的图像目标检测算法综述

张婷婷; 章坚武; 郭春生; 陈华华; 周迪; 王延松; 徐爱华

doi:10.11959/j.issn.1000-0801.2020199

您当前的位置：

首页 >

文章列表页 >

基于深度学习的图像目标检测算法综述

综述 | 更新时间：2024-06-05

- 基于深度学习的图像目标检测算法综述
- A survey of image object detection algorithm based on deep learning
- 电信科学 2020年36卷第7期页码：92-106
- 作者机构：
  
  1. 杭州电子科技大学，浙江杭州 310018
  2. 浙江宇视科技有限公司，浙江杭州 310051
  3. 之江实验室，浙江杭州 311121
- 作者简介：
  
  [ "张婷婷（1995- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为计算机视觉与人工智能等" ]
  [ "章坚武（1961- ），男，博士，杭州电子科技大学通信工程学院教授、博士生导师，中国电子学会高级会员，浙江省通信学会常务理事，主要研究方向为移动通信、多媒体信号处理与人工智能、通信网络与信息安全" ]
  [ "郭春生（1971- ），男，博士，杭州电子科技大学通信工程学院副教授、硕士生导师，主要研究方向为视频分析与模式识别" ]
  [ "陈华华（1975- ），男，博士，杭州电子科技大学通信工程学院副教授、硕士生导师，主要研究方向为视频分析与模式识别" ]
  [ "周迪（1975- ），男，浙江宇视科技有限公司教授级高级工程师、宇视研究院院长，主要研究方向为视频安全、人工智能等" ]
  [ "王延松（1970- ），男，之江实验室研究员，教授级高工，科技部“宽带通信与新型网络”领域总体组专家、指南编制组专家，工信部“网络通信技术”领域咨询专家、中国通信学会委员、中国通信标准化协会工业互联网ST8组副组长等职务。主要研究方向为工业互联网、SDN/NFV、网络安全等" ]
  [ "徐爱华（1989- ），女，浙江宇视科技有限公司工程师，主要研究方向为视频安全、人工智能等" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(U1866209);国家自然科学基金资助项目;The National Natural Science Foundation of China(61772162);国家重点研发计划基金资助项目;The National Key Research Development Program of China(2018YFC0831503);浙江省自然科学基金资助项目;The Natural Science Foundation of Zhejiang Province of China(LYl6F020016);浙江省重点研发计划基金资助项目;The Key Research Development Program of Zhejiang Province of China(2018C01059);浙江省重点研发计划基金资助项目;The Key Research Development Program of Zhejiang Province of China(2019C01062)
- DOI：10.11959/j.issn.1000-0801.2020199
  中图分类号： TP393
- 网络出版日期：2020-07，
  
  纸质出版日期：2020-07-20
- 稿件说明：
移动端阅览
张婷婷, 章坚武, 郭春生, 等. 基于深度学习的图像目标检测算法综述[J]. 电信科学, 2020,36(7):92-106.

Tingting ZHANG, Jianwu ZHANG, Chunsheng GUO, et al. A survey of image object detection algorithm based on deep learning[J]. Telecommunications science, 2020, 36(7): 92-106.
张婷婷, 章坚武, 郭春生, 等. 基于深度学习的图像目标检测算法综述[J]. 电信科学, 2020,36(7):92-106. DOI： 10.11959/j.issn.1000-0801.2020199.

Tingting ZHANG, Jianwu ZHANG, Chunsheng GUO, et al. A survey of image object detection algorithm based on deep learning[J]. Telecommunications science, 2020, 36(7): 92-106. DOI： 10.11959/j.issn.1000-0801.2020199.

摘要

图像目标检测是找出图像中感兴趣的目标，并确定他们的类别和位置，是当前计算机视觉领域的研究热点。近年来，由于深度学习在图像分类方面的准确度明显提高，基于深度学习的图像目标检测模型逐渐成为主流。首先介绍了图像目标检测模型中常用的卷积神经网络；然后，重点从候选区域、回归和anchor-free方法的角度对现有经典的图像目标检测模型进行综述；最后，根据在公共数据集上的检测结果分析模型的优势和缺点，总结了图像目标检测研究中存在的问题并对未来发展做出展望。

Abstract

Image object detection is to find out the objects of interest in the image and determine their classifications and locations.It is a research hotspot in the field of computer vision.In recent years

due to the significant improvement in the accuracy of image classification with deep learning

image object detection models based on deep learning have gradually became mainstream.Firstly

the convolutional neural networks commonly used in image object detection were briefly introduced.Then

the existing classical image object detection models were reviewed from the perspective of candidate regions

regression and anchor-free methods.Finally

according to the detection results on the public dataset

the advantages and disadvantages of the models were analyzed

the problems in the image object detection research were summarized and the future development was forecasted.

关键词

Keywords

references

刘芳 , 杨安喆 , 吴志威 . 基于自适应 Siamese 网络的无人机目标跟踪算法 [J ] . 航空学报 , 2020 , 41 ( 1 ): 248 - 260 .

LIU F , YANG A Z , WU Z W . Adaptive siamese network based UAV target tracking algorithm [J ] . Acta Aeronautica et Astronautica Sinica , 2020 , 41 ( 1 ): 248 - 260 .

陈莹莹 , 房胜 , 李哲 . 加权多特征外观表示的实时目标追踪 [J ] . 中国图象图形学报 , 2019 , 24 ( 2 ): 291 - 301 .

CHEN Y Y , FANG S , LI Z . Real-time visual tracking via weighted multi-feature fusion on an appearance model [J ] . Journal of Image and Graphics , 2019 , 24 ( 2 ): 291 - 301 .

何冰倩 , 魏维 , 张斌 . 基于深度学习的轻量型的人体动作识别模型 [J ] . 计算机应用研究 , 2020 , 37 ( 8 ): 1 - 6 .

HE B Q , WEI W , ZHANG B . Lightweight human action recognition model based on deep learning [J ] . Application Research of Computers , 2020 , 37 ( 8 ): 1 - 6 .

罗会兰 , 童康 . 时空压缩激励残差乘法网络的视频动作识别 [J ] . 通信学报 , 2019 , 40 ( 10 ): 189 - 198 .

LUO H L , TONG K . Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition [J ] . Journal on Communications , 2019 , 40 ( 10 ): 189 - 198 .

UIJLINGS J , VAN D S K , GEVERS T , et al . Selective search for object recognition [J ] . International Journal of Computer Vision , 2013 , 104 ( 2 ): 154 - 171 .

GIRSHICK R , DONAHUE J , DARRELL T , et al . Rich feature hierarchies for accurate object detection and semantic segmentation [C ] // Proceedings of 27th IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2014 : 580 - 587 .

苏赋 , 吕沁 , 罗仁泽 . 基于深度学习的图像分类研究综述 [J ] . 电信科学 , 2019 , 35 ( 11 ): 58 - 74 .

SU F , LV Q , LUO R Z . Review of image classification based on deep learning [J ] . Telecommunications Science , 2019 , 35 ( 11 ): 58 - 74 .

LECUN Y , BOTTOU L , BENGIO Y , et al . Gradient-based learning applied to document recognition [J ] . Proceedings of the IEEE , 1998 , 86 ( 11 ): 2278 - 2324 .

KRIZHEVSKY A , SUTSKEVER I , HINTON G . ImageNet classification with deep convolutional neural networks [J ] . Advances in Neural Information Processing Systems , 2012 , 25 ( 2 ): 1097 - 1105 .

ZEILER M D , FERGUS R . Visualizing and understanding convolutional networks [C ] // Proceedings of 13th European Conference on Computer Vision . Berlin:Springer-Verlag , 2014 : 818 - 833 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [C ] // Proceedings of 3rd International Conference on Learning Representations.[S.l.:s.n] . 2015 .

LUO W J , LI Y J , URTASUN R , et al . Understanding the effective receptive field in deep convolutional neural networks [J ] . arXiv:1701.04128 , 2017

HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C ] // Proceedings of 29th IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2015 : 770 - 778 .

SQUARTINI S , PAOLINELLI S , PIAZZA F . Comparing different recurrent neural architectures on a specific task from vanishing gradient effect perspective [C ] // Proceedings of 2006 IEEE International Conference on Networking,Sensing and Control . Piscataway:IEEE Press , 2006 : 380 - 385 .

PASCANU R , MIKOLOV T , BENGIO Y . Understanding the exploding gradient problem [J ] . arXiv:1211.5063 , 2012

NEWELL A , YANG K , DENG J . Stacked hourglass networks for human pose estimation [C ] // Proceedings of 21st ACM Conference on Computer and Communications Security . Berlin:Springer-Verlag , 2016 : 483 - 499 .

EVERINGHAM M , GOOL L V , WILLIAMS C K I , et al . The pascal visual object classes (VOC) challenge [J ] . International Journal of Computer Vision , 2010 : 3485 - 3492 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO:common objects in context [C ] // Proceedings of 13th European Conference on Computer Vision . Berlin:Springer-Verlag , 2014 : 740 - 755 .

DONAHUE J , JIA Y , VINYALS O , et al . DeCAF:a deep convolutional activation feature for generic visual recognition [C ] // Proceedings of 31st International Conference on Machine Learning . New York:ACM Press , 2014 : 988 - 996 .

BODLA N , SINGH B , CHELLAPPA R , et al . Soft-NMS—improving object detection with one line of code [C ] // Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2017 : 5562 - 5570 .

HE K , ZHANG X , REN S , et al . Spatial pyramid pooling in deep convolutional networks for visual recognition [J ] . IEEE Transactions on Pattern Analysis ＆ Machine Intelligence , 2014 , 37 ( 9 ): 1904 - 1916 .

GIRSHICK R , . Fast R-CNN [C ] // Proceedings of IEEE International Conference on Computer Vision . Washington:IEEE Computer Society Press , 2015 : 1440 - 1448 .

YING Z , LI B , LU H , et al . Sample-specific SVM learning for person re-identification [C ] // Proceedings of IEEE Conference on Computer Vision ＆ Pattern Recognition . Washington:IEEE Computer Society Press , 2016 : 1278 - 187 .

REN S , HE K , GIRSHICK R , et al . Faster R-CNN:towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis ＆ Machine Intelligence , 2015 , 39 ( 6 ): 1137 - 1149 .

HE K , GEORGIA G , PIOTR D , et al . Mask R-CNN [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 :1.

SHELHAMER E , LONG J , DARRELL T . Fully convolutional networks for semantic segmentation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 4 ): 640 - 651 .

阮激扬 . 基于 YOLO 的目标检测算法设计与实现 [D ] . 北京:北京邮电大学 , 2019 .

RUAN J Y . Design and implementation of object detection algorithm based on YOLO [D ] . Bejing:Beijing University of Posts and Telecommunications , 2019 .

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once:unified,real-time object detection [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Washington:IEEE Computer Society Press , 2016 : 429 - 442 .

REDMON J , FARAFADI A . YOLO9000:better,faster,stronger [C ] // Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recongnition . Piscataway:IEEE Press , 2017 : 6517 - 6525 .

IOFFE S , SZEGEDY C . Batch normalization:accelerating deep network training by reducing internal covariate shift [C ] // Proceedings of International Conference on Machine Learning.[S.l.:s.n] . 2015 : 448 - 456 .

BOUSMALIS K , TRIGEORGIS G , SILBERMAN N , et al . Domain separation networks [J ] . arXiv:1608.06019 , 2016

REDMON J , FARHADI A . YOLOv3:an incremental improvement [J ] . arXiv:1608.06019 , 2018

LIU W , ANGUELOV D , ERHAN D , et al . SSD:single shot multibox detector [C ] // Proceedings of Computer Vision-ECCV . Springer:International Publishing , 2016 : 21 - 37 .

FU C Y , LIU W , RANGA A , et al . DSSD:deconvolutional single shot detector [J ] . arXiv:1701.06659 , 2017

JISOO J , HYOJIN P , NOJUN K . Enhancement of SSD by concatenating feature maps for object detection [J ] . arXiv:1705.09587 , 2017

LI Z , ZHOU F Q . FSSD:feature fusion single shot multibox detector [J ] . arXiv:1512.02325 , 2017

YI J , WU P , METAXAS D N . ASSD:attentive single shot multibox detector [J ] . Computer Vision and Image Understanding,arXiv:1909.12456 , 2019

HU J , SHEN L , ALBANIE S , et al . Squeeze-and-excitation networks [C ] // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 7132 - 7141 .

LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis ＆Machine Intelligence , 2017 ( 99 ): 2999 - 3007 .

ERHAN D , SZEGEDY C , TOSHEV A , et al . Scalable object detection using deep neural networks [J ] . arXiv:1312.2249 , 2013

LIN T Y , PIOTR D , GIRSHICK R , et al . Feature pyramid networks for object detection [J ] . arXiv:1612.03144 , 2016

LAW H , DENG J . CornerNet:detecting objects as paired keypoints [J ] . International Journal of Computer Vision , 2018 : 734 - 750 .

NEWELL A , HUANG Z , DENG J , et al . Associative embedding:end-to-end learning for joint detection and grouping [C ] // Proceedings of Neural Information Processing Systems . Cambridge:MIT Press , 2017 : 2277 - 2287 .

唐心宇 , 宋爱国 . 人体姿态估计及在康复训练情景交互中的应用 [J ] . 仪器仪表学报 , 2018 , 39 ( 11 ): 198 - 206 .

TANG X Y , SONG A G . Human pose estimation and its implementation in scenario interaction system of rehabilitation training [J ] . Chinese Journal of Scientific Instrument , 2018 , 39 ( 11 ): 198 - 206 .

GATTUPALLI S , . Human motion analysis and vision-based articulated pose estimation [C ] // Proceedings of International Conference on Healthcare Informatics . Piscataway:IEEE Press , 2015 : 470 - 470 .

HUANG Z , LIU Y , FANG Y , et al . Video-based fall detection for seniors with human pose estimation [C ] // Proceedings of 4th IEEE International Conference on Universal Village 2018 . Piscataway:IEEE Press , 2018 : 1 - 4 .

ZHOU X Y , ZHOU J C , KRHENBUHL P . Bottom-up object detection by grouping extreme and center points [C ] // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 850 - 859 .

CAO Z , SIMON T , WEI S , et al . Realtime multi person 2d pose estimation using part affinity fields [J ] . arXiv:1611.08050 , 2017

CHEN Y L , WANG Z C , PENG Y X , et al . Cascaded pyramid network for multi-person pose estimation [C ] // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 7103 - 7112 .

XIAO B , WU H P , WEI Y C . Simple baselines for human pose estimation and tracking [J ] . arXiv:1804.06208 , 2018

DUAN K , BAI S , XIE L , et al . CenterNet:keypoint triplets for object detection [C ] // Proceedings of International Conference on Computer Vision . Piscataway:IEEE Press , 2019 : 6568 - 6577 .

ZHOU X Y , WANG D Q , KRHENBUHL P . Objects as points [J ] . arXiv:1904.07850 , 2019

ZHU C C , HE Y H , SAVVIDES M . Feature selective anchor-free module for single-shot object detection [C ] // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2019 : 840 - 849 .

TIAN Z , SHEN C H , CHEN H , et al . FCOS:fully convolutional one-stage object detection [C ] // Proceedings of IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2019 : 9626 - 9635 .

HE T , SHEN C H , TIAN Z , et al . Knowledge adaptation for efficient semantic segmentation [J ] . arXiv:1903.04688 , 2019

LIU Y F , CHEN K , LIU C , et al . Structured knowledge distillation for semantic segmentation [C ] // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2019 : 2599 - 2608 .

LONG J , SHELHAMER E , DARRELL T . Fully convolutional networks for semantic segmentation [C ] // Proceedings of IEEE Conference on Computer Vision ＆ Pattern Recognition . Piscataway:IEEE Press , 2015 .

TIAN Z , HE T , SHEN C H , et al . Decoders matter for semantic segmentation:data-dependent decoding enables flexible feature aggregation [J ] . arXiv:1903.02120 , 2019

KONG T , SUN F C , LIU H P , et al . FoveaBox:beyond anchor-based object detector [J ] . arXiv:1904.03797 , 2019

邢惠钧 , 昌硕 . 基于移动小车的行人监控系统 [J ] . 电信科学 , 2017 , 33 ( 2 ): 120 - 127 .

XING H J , CHANG S . Pedestrian surveillance system based on mobile vehicle [J ] . Telecommunications Science , 2017 , 33 ( 2 ): 120 - 127 .

杨恩泽 . 基于深度学习的交通车辆检测与识别算法研究 [D ] . 北京:北京交通大学 , 2019 .

YANG E Z . Vehicle detection and recognition in traffic scenes based on deep learning [D ] . Beijing:Beijing Jiaotong University , 2019 .

王忠玉 . 智能视频监控下的安全帽佩戴检测系统的设计与实现 [D ] . 北京:北京邮电大学 , 2018 .

WANG Z Y . Design and implementation of detection system of wearing helmets based on intelligent video surveillance [D ] . Beijing:Beijing University of Posts and Telecommunications , 2018 .

陈虹 , 郭露露 , 宫洵 , 等 . 智能时代的汽车控制 [J ] . 自动化学报 , 2019 , 45 ( x ): 1 - 21 .

CHEN H , GUO L L , GONG X , et al . Automotive control in intelligent era [J ] . Acta Automatica Sinica , 2019 , 45 ( x ): 1 - 21 .

RUSSELL B C , TORRALBA A , MURPHY K P , et al . LabelMe:a database and Web-based tool for image annotation [J ] . International Journal of Computer Vision , 2008 , 77 ( 1 ): 157 - 173 .

HIDAYATULLAH P , MENGKO T E R , MUNIR R , et al . A semiautomatic sperm cell data annotator for convolutional neural network [C ] // Proceedings of 5th International Conference on Science in Information Technology.[S.l.:s.n . ] , 2019 : 211 - 216 .

YU J , MA Z H , WU D , et al . The safety state control of hazardous chemicals based on multi-source heterogeneous data fusion [C ] // Proceedings of 7th International Conference on Computer Science and Network Technology . Piscataway:IEEE Press , 2019 : 156 - 159 .

LIU S , LIU Y , ZHU X , et al . Multi-source feature fusion and entropy feature lightweight neural network for constrained multi-state heterogeneous iris recognition [J ] . IEEE Access , 2020 :1.

CHEN K , LI J , LIN W , et al . Towards accurate one-stage object detection with AP-loss [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2019 : 5114 - 5122 .

LIN C H , WANG S , XU D , et al . Object instance mining for weakly supervised object detection [C ] // Proceedings of 34th AAAI Conference on Artificial Intelligence . Palo Alto:AAAI Press , 2020 .

WANG X , LIU S F , MA H M , et al . Weakly-supervised semantic segmentation by iterative affinity learning [J ] . International Journal of Computer Vision , 2020 : 1 - 14 .

GAO M , SHEN Y J , LI Q Q , et al . Residual knowledge distillation [J ] . arXiv:2002.09168 , 2020

YANG J , MARTINEZ B , BULAT A , et al . Knowledge distillation via adaptive instance normalization [J ] . arXiv:2003.04289 , 2020

浏览量

1409

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

计算机视觉在智慧安防中的应用

基于深度学习的图像分类研究综述

GMTBLC：基于深度学习的双模态网络流量分类

基于改进YOLOv5的天线下倾角识别方法研究

基于时序深度残差收缩网络的混叠信号调制识别方法