基于图像描述算法的离线盲人视觉辅助系统

陈悦; 郭宇; 谢圆琰; 米振强

doi:10.11959/j.issn.1000-0801.2022014

您当前的位置：

首页 >

文章列表页 >

基于图像描述算法的离线盲人视觉辅助系统

研究与开发 | 更新时间：2024-06-05

- 基于图像描述算法的离线盲人视觉辅助系统
- Offline visual aid system for the blind based on image captioning
- 电信科学 2022年38卷第1期页码：61-72
- 作者机构：
  
  1. 北京科技大学计算机与通信工程学院，北京 100083
  2. 北京科技大学顺德研究生院，广东佛山 528399
- 作者简介：
  
  [ "陈悦（1998- ），女，北京科技大学计算机与通信工程学院硕士生，主要研究方向为计算机视觉与人工智能" ]
  [ "郭宇（1992- ），男，博士，北京科技大学计算机与通信工程学院讲师，主要研究方向为无线传感器网络、云计算、多机器人系统" ]
  [ "谢圆琰（1996- ），女，北京科技大学计算机与通信工程学院博士生，主要研究方向为云机器人、服务科学与云计算" ]
  [ "米振强（1983- ），男，博士，北京科技大学计算机与通信工程学院副教授，主要研究方向为服务计算、多机器人系统、移动环境中的点云计算" ]
- 基金信息：
- DOI：10.11959/j.issn.1000-0801.2022014
  中图分类号： TP391
- 网络出版日期：2022-01，
  
  纸质出版日期：2022-01-20
- 稿件说明：
移动端阅览
陈悦, 郭宇, 谢圆琰, 等. 基于图像描述算法的离线盲人视觉辅助系统[J]. 电信科学, 2022,38(1):61-72.

Yue CHEN, Yu GUO, Yuanyan XIE, et al. Offline visual aid system for the blind based on image captioning[J]. Telecommunications science, 2022, 38(1): 61-72.
陈悦, 郭宇, 谢圆琰, 等. 基于图像描述算法的离线盲人视觉辅助系统[J]. 电信科学, 2022,38(1):61-72. DOI： 10.11959/j.issn.1000-0801.2022014.

Yue CHEN, Yu GUO, Yuanyan XIE, et al. Offline visual aid system for the blind based on image captioning[J]. Telecommunications science, 2022, 38(1): 61-72. DOI： 10.11959/j.issn.1000-0801.2022014.

摘要

摘要：针对现有盲人视觉辅助设备存在的不便，探讨了基于模型剪枝的图像描述模型在便携式移动设备上运行的方法。回顾了图像描述模型和剪枝模型技术，重点提出了一种针对图像描述模型的改进剪枝算法。结果表明，在保证准确性的前提下，剪枝后的图像描述模型可以大幅降低工作时的处理时间和消耗的电源容量，能够随时随地快速准确地对环境信息进行描述及语音播报。

Abstract

In view of the inconveniences of existing visual aid systems for the blind

the method of running the image captioning model on portable mobile devices based on model pruning was discussed.Model pruning techniques and image captioning models were reviewed.An improved model pruning algorithm for image captioning model was proposed.Experimental results show that

on the premise of ensuring accuracy

the image captioning model after pruning can greatly reduce processing time and power consumption capacity

and can quickly and accurately describe environmental information and voice broadcast anytime and anywhere.

关键词

Keywords

references

康帅 , 章坚武 , 朱尊杰 , 等 . 改进 YOLOv4 算法的复杂视觉场景行人检测方法 [J ] . 电信科学 , 2021 , 37 ( 8 ): 46 - 56 .

KANG S , ZHANG J W , ZHU Z J , et al . An improved YOLOv4 algorithm for pedestrian detection in complex visual scenes [J ] . Telecommunications Science , 2021 , 37 ( 8 ): 46 - 56 .

MAO J H , XU W , YANG Y , et al . Explain images with multimodal recurrent neural networks [EB ] . 2014 .

VINYALS O , TOSHEV A , BENGIO S , et al . Show and tell:a neural image caption generator [C ] // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2015 .

ANDERSON P , HE X D , BUEHLER C , et al . Bottom-up and top-down attention for image captioning and visual question answering [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 6077 - 6086 .

LUO Y P , JI J Y , SUN X S , et al . Dual-level collaborative transformer for image captioning [EB ] . 2021 .

YANG X , TANG K H , ZHANG H W , et al . Auto-encoding scene graphs for image captioning [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 10685 - 10694 .

CHEN S Z , JIN Q , WANG P , et al . Say as you wish:fine-grained control of image caption generation with abstract scene graphs [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2020 : 9962 - 9971 .

WANG Z Y , FENG B , NARASIMHAN K , et al . Towards unique and informative captioning of images [M ] // Computer Vision – ECCV 2020.Cham:Springer International Publishing ,[S.l.:s.n. ] , 2020 : 629 - 644 .

XU G H , NIU S C , TAN M K , et al . Towards accurate text-based image captioning with content diversity exploration [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2021 : 12637 - 12646 .

DENTON E , ZAREMBA W,BRUNA , et al . Exploiting linear structure within convolutional networks for efficient evaluation [C ] // Advances in neural information processing systems . Cambridge:MIT Press , 2014 : 1269 - 1277 .

ZHUANG Z W , TAN M K , ZHUANG B H , et al . Discrimination-aware channel pruning for deep neural networks [EB ] . 2018 .

RASTEGARI M , ORDONEZ V , REDMON J , et al . Xnor-net:imagenet classification using binary convolutional neural networks [C ] // European conference on computer vision . Berlin:Springer , 2016 : 525 - 542 .

WANG K , LIU Z J , LIN Y J , et al . HAQ:hardware-aware automated quantization with mixed precision [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2019 : 8612 - 8620 .

CHEN H T , WANG Y H , XU C , et al . Data-free learning of student networks [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway:IEEE Press , 2019 : 3514 - 3522 .

LUO L C , SANDLER M , LIN Z , et al . Large-scale generative data-free distillation [EB ] . 2020 .

YU X Y , LIU T L , WANG X C , et al . On compressing deep models by low rank and sparse decomposition [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2017 : 7370 - 7379 .

YANG Z , WANG Y , LIU C , et al . Legonet:efficient convolutional neural networks with lego filters [C ] // International Conference on Machine Learning . New York:ACM Press , 2019 : 7005 - 7014 .

CHEN H T , WANG Y H , XU C J , et al . AdderNet:do we really need multiplications in deep learning? [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2020 : 1468 - 1477 .

XU Y , XU C , CHEN X , et al . Kernel based progressive distillation for adder neural networks [EB ] . 2020 .

SONG D H , WANG Y H , CHEN H T , et al . AdderSR:towards energy efficient image super-resolution [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE Press , 2021 : 15648 - 15657 .

PARK Y , YUN I D . Fast adaptive RNN Encoder⁻Decoder for anomaly detection in SMD assembly machine [J ] . Sensors (Basel,Switzerland) , 2018 , 18 ( 10 ): 3573 .

XU K , BA J , KIROS R , et al . Show,attend and tell:neural image caption generation with visual attention [EB ] . 2015 .

XINGJIAN S H I , CHEN Z , WANG H , et al . Convolutional LSTM network:A machine learning approach for precipitation nowcasting [C ] // Advances in neural information processing systems . Cambridge:MIT Press , 2015 : 802 - 810 .

MOLCHANOV P , TYREE S , KARRAS T , et al . Pruning convolutional neural networks for resource efficient inference [EB ] . 2016 .

王从徐 . 基于泰勒级数展开及其应用探讨 [J ] . 红河学院学报 , 2021 , 19 ( 02 ): 154 - 156 .

WANG C X . Discussion on Taylor series expansion and its application [J ] . Journal of Honghe University , 2021 , 19 ( 02 ): 154 - 156 .

HODOSH M , YOUNG P , HOCKENMAIER J . Framing image description as a ranking task:data,models and evaluation metrics [J ] . Journal of Artificial Intelligence Research , 2013 , 47 : 853 - 899 .

蔡鑫 . 基于 Bert 模型的互联网不良信息检测 [J ] . 电信科学 , 2020 , 36 ( 11 ): 121 - 126 .

CAI X . Internet bad information detection based on Bert model [J ] . Telecommunications Science , 2020 , 36 ( 11 ): 121 - 126 .

LIN C Y , . Rouge:a package for automatic evaluation of summaries [C ] // Text summarization branches out . Barcelona:ACL , 2004 : 74 - 81 .

浏览量

475

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据