基于深度时空特征卷积—池化的视频人群计数方法

李强; 康子路

doi:10.11959/j.issn.1000-0801.2018161

您当前的位置：

首页 >

文章列表页 >

基于深度时空特征卷积—池化的视频人群计数方法

研究与开发 | 更新时间：2024-06-05

- 基于深度时空特征卷积—池化的视频人群计数方法
- Video crowd counting method based on conv-pooling deep spatial and temporal features
- 电信科学 2018年34卷第6期页码：72-79
- 作者机构：
- 作者简介：
  
  [ "李强（1984-），男，博士，中国电子科技集团公司信息科学研究院物联网技术研究所工程师，主要研究方向为视频/图像处理、模式识别、机器学习。" ]
  [ "康子路（1972-），男，中国电子科技集团公司信息科学研究院物联网技术研究所高级工程师，主要研究方向为物联网、数据架构。" ]
- 基金信息：
- DOI：10.11959/j.issn.1000-0801.2018161
  中图分类号： TP391
- 网络出版日期：2018-06，
  
  纸质出版日期：2018-06-20
- 稿件说明：
移动端阅览
李强, 康子路. 基于深度时空特征卷积—池化的视频人群计数方法[J]. 电信科学, 2018,34(6):72-79.

Qiang LI, Zilu KANG. Video crowd counting method based on conv-pooling deep spatial and temporal features[J]. Telecommunications science, 2018, 34(6): 72-79.
李强, 康子路. 基于深度时空特征卷积—池化的视频人群计数方法[J]. 电信科学, 2018,34(6):72-79. DOI： 10.11959/j.issn.1000-0801.2018161.

Qiang LI, Zilu KANG. Video crowd counting method based on conv-pooling deep spatial and temporal features[J]. Telecommunications science, 2018, 34(6): 72-79. DOI： 10.11959/j.issn.1000-0801.2018161.

摘要

由于摄像机角度、背景、人群密度分布和遮挡的限制，传统的基于底层视觉特征的视频人群计数方法往往难以实现理想的效果。利用视频的时空特征和卷积—池化方法形成高层的视觉特征，采用局部特征聚合描述符进行量化和码本计算，实现了对视频人群信息的精准描述；该方法充分利用了视频的运动和外观信息，基于卷积神经网络和池化方法提升了对视频本征属性和特征的描述能力。实验结果表明，所提方法比传统的视频人群计数方法具有更高的精度和更好的顽健性。

Abstract

Due to angle of camera

background

population density distribution and occlusion limitations

traditional video crowd counting methods based on underlying visual features are often difficult to achieve ideal results.Using the temporal and spatial features of video and conv-pooling method

high-level visual features were formed

local feature aggregation descriptors were used for quantization and codebook calculation to achieve accurate description of video crowd information.This method made full use of video motion and appearance information.Based on convolutional neural networks and pooling methods

the ability to describe video intrinsic attributes and features was improved.Experimental results show that the proposed method has higher precision and better robustness than traditional video crowd counting methods.

关键词

Keywords

references

LOY C C , CHEN K , GONG S G , et al . Crowd counting and profiling:methodology and evaluation [M ] . New York : SpringerPress , 2013 : 347 - 382 .

IDREES H , SALEEMI I , SEIBERT C , et al . Multi-source multi-scale counting in extremely dense crowd images [C ] // The 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13),June 23-28,2013,Portland,OR,USA . Piscataway:IEEE Press , 2013 : 2547 - 2554 .

ROSTEN E , PORTER R , DRUMMOND T . Faster and better:a machine learning approach to corner detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010 , 32 ( 1 ): 105 - 119 .

DALAL N , TRIGGS B . Histograms of oriented gradients for human detection [C ] // The 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05),June 20-25,2005,New York,NY,USA . Piscataway:IEEE Press , 2005 : 886 - 893 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . Computer Science , 2014 ( 9 ).

TRAN D , BOURDEV L , FERGUS R , et al . Learning spatiotemporal features with 3D convolutional networks [C ] // The 2015 IEEE International Conference on Computer Vision (ICCV’15),Dec 7-13,2015,Santiago,Chile . Piscataway:IEEE Press , 2015 : 4489 - 4497 .

J’EGOU H , DOUZE M , SCHMID C , et al . Aggregating local descriptors into a compact image representation [C ] // The 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10),June 13-18,2010,San Francisco,CA,USA . Piscataway:IEEE Press , 2010 : 3304 - 3311 .

VIOLA P , JONES M J . Robust real-time face detection [J ] . International Journal of Computer Vision , 2004 , 57 ( 2 ): 137 - 154 .

WU B , NEVATIA R . Detection of multiple,partially occluded humans in a single image by bayesian combination of edgelet part detectors [C ] // The 2005 IEEE International Conference on Computer Vision (ICCV’05),Oct 17-21,2005,Beijing,China . Piscataway:IEEE Press , 2005 : 90 - 97 .

SABZMEYDANI P , MORI G . Detecting pedestrians by learning shapelet features [C ] // The 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07),June 18-23,2007,Minneapolis,Minnesota,USA . Piscataway:IEEE Press , 2007 : 1 - 8 .

GALL J , YAO A , RAZAVI N , et al . Hough forests for object detection,tracking,and action recognition [J ] . IEEE Transactions on Pattern Analysis ＆ Machine Intelligence , 2011 , 33 ( 11 ): 2188 - 2202 .

HUANG D , SHAN C , ARDABILIAN M , et al . Local binary patterns and its application to facial image analysis:a survey [J ] . IEEE Transactions on Systems Man ＆ Cybernetics Part C Applications ＆ Reviews , 2011 , 41 ( 6 ): 765 - 781 .

SULOCHANA S , VIDHYA R . Texture based image retrieval using framelet transform–gray level co-occurrence matrix(GLCM) [J ] . International Journal of Advanced Research in Artificial Intelligence , 2013 , 2 ( 2 ).

HINTON G E , SALAKHUTDINOV R . Reducing the dimensionality of data with neural networks [J ] . Science , 2006 , 313 ( 5786 ): 504 - 507 .

ZHANG C , LI H , WANG X , et al . Cross-scene crowd counting via deep convolutional neural networks [C ] // The 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15),June 8-10,2015,Boston,Massachusetts,USA . Piscataway:IEEE Press , 2015 : 833 - 841 .

CHAN A , LIANG Z , Vasconcelos N . Privacy preserving crowd monitoring:counting people without people models or tracking [C ] // The 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08),June 24-26,2008,Anchorage,Alaska,USA . Piscataway:IEEE Press , 2008 : 1 - 7 .

时增林 , 叶阳东 , 吴云鹏 , 等 . 基于序的空间金字塔池化网络的人群计数方法 [J ] . 自动化学报 , 2016 , 42 ( 6 ): 866 - 874 .

SHI Z L , YE Y D , WU Y P , et al . Crowd counting using rank-based spatial pyramid pooling network [J ] . Acta Automatica Sinica , 2016 , 42 ( 6 ): 866 - 874 .

BOOMINATHAN L , KRUTHIVENTI S , BABU R . Crowdnet:a deep convolutional network for dense crowd counting [C ] // The 2016 ACM Conference on Multimedia Conference (MM’16),Oct 15-19,2016,Amsterdam,The Netherlands . New York:ACM Press , 2016 : 640 - 644 .

ZACH C , POCK T , BISCHOF H . A duality based approach for realtime TV-L1 optical flow [M ] . Berlin : SpringerPress , 2007 : 214 - 223 .

BAY H , ESS A , TUYTELAARS T , et al . Speeded-up robust features (SURF) [J ] . Computer Vision and Image Understanding , 2008 , 110 ( 3 ): 346 - 359

FISCHLER M A , BOLLES R C . Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography [J ] . Communications of the ACM , 1981 , 24 ( 6 ): 381 - 395 .

ZHANG Z X , WANG M , GENG X . Crowd counting in public video surveillance by label distribution learning [J ] . Neurocomputing , 2015 ( 166 ): 151 - 163 .

浏览量

648

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于深度学习的6G可见光通信多址接入解调方法

一种基于编码单元快速划分的VVC帧内编码方法

基于优化卷积神经网络的车辆特征识别算法研究

一种基于随机森林和改进卷积神经网络的网络流量分类方法

深度卷积神经网络的柔性剪枝策略