浏览全部资源
扫码关注微信
[ "徐宏伟(1990-),男,宁波大学信息科学与工程学院硕士生,主要研究方向为多媒体通信与信息安全等。" ]
[ "严迪群(1979-),男,博士,宁波大学信息科学与工程学院副教授、硕士生导师,主要研究方向为多媒体通信、信息安全、基于深度学习的数字语音取证等。" ]
[ "阳帆(1991-),男,宁波大学信息科学与工程学院硕士生,主要研究方向为多媒体通信与信息安全等。" ]
[ "王让定(1962-),男,博士,宁波大学高等技术研究院教授、博士生导师,主要研究方向为多媒体通信与取证、信息隐藏与隐写分析、智能抄表及传感网络技术等。" ]
[ "金超(1990-),男,宁波大学信息科学与工程学院博士生,主要研究方向为多媒体通信与信息安全等。" ]
[ "向立(1994-),男,宁波大学信息科学与工程学院硕士生,主要研究方向为多媒体通信与信息安全等。" ]
网络出版日期:2018-02,
纸质出版日期:2018-02-20
移动端阅览
徐宏伟, 严迪群, 阳帆, 等. 基于卷积神经网络的电子变调语音检测算法[J]. 电信科学, 2018,34(2):46-57.
Hongwei XU, Diqun YAN, Fan YANG, et al. Detection algorithm of electronic disguised voice based on convolutional neural network[J]. Telecommunications science, 2018, 34(2): 46-57.
徐宏伟, 严迪群, 阳帆, 等. 基于卷积神经网络的电子变调语音检测算法[J]. 电信科学, 2018,34(2):46-57. DOI: 10.11959/j.issn.1000-0801.2018041.
Hongwei XU, Diqun YAN, Fan YANG, et al. Detection algorithm of electronic disguised voice based on convolutional neural network[J]. Telecommunications science, 2018, 34(2): 46-57. DOI: 10.11959/j.issn.1000-0801.2018041.
提出了一种基于梅尔倒谱系数统计特征和卷积神经网络的电子变调语音检测算法。首先提取待测语音的梅尔倒谱系数及其差分系数,并将上述系数的统计特征进行有针对性的构造,作为卷积神经网络的输入。从卷积核尺寸、卷积核个数以及池化层尺寸等方面,对24种不同网络结构进行了测试评估,最终确定了可有效用于变调检测的卷积神经网络结构。实验结果表明,所提出的算法能够有效地检测出电子变调的痕迹,并可准确估计出电子变调语音经过的具体伪造操作,为电子变调语音的检测提供了一种新的方法。
An electronic disguised voice detection algorithm based on the statistical features of MFCC and the convolution neural network was proposed.Firstly
the statistical features of MFCC were extracted and reconstructed as the input of convolution neural network.Considering the convolution kernel size
the number of convolution kernels and the pooling size
24 different network structures were evaluated in this work.Finally
the convolution neural network structure which could be effectively used for electronic disguised voice detection was determined.The experimental results show that the proposed algorithm can effectively detect the trace of electronic disguising.Meanwhile
the specific forgery operation of the electronic disguised voice can also be estimated.
RODMAN R , . Speaker recognition of disguised voices:A program for research [C ] // Consortium on Speech Technology in Conjunction with the Conference on Speaker Recognition by Man and Machine:Directions for Forensic Applications,Oct 8-11,1998 , Ankara,Turkey .[S.l.:s.n ] 1998 : 9 - 22 .
WU H , WANG Y , HUANG J . Blind detection of electronic disguised voice [C ] // 2013 IEEE International Conference on Acoustics,Speech and Signal Processing,May 26-31,2013,Vancouver,Canada . Piscataway:IEEE Press , 2013 : 3013 - 3017 .
WU H , WANG Y , HUANG J . Identification of electronic disguised voices [J ] . IEEE Transactions on Information Forensics and Security , 2014 , 9 ( 3 ): 489 - 500 .
CAO W , WANG H . Identification of Electronic Disguised Voices in the Noisy Environment [C ] // International Workshop on Digital-forensics and Watermarking,Sep 17-19,2016 , Beijing China .[S.l.:s.n ] 2016 : 75 - 87 .
ZHENG F , ZHANG G , SONG Z . Comparison of different implementations of MFCC [J ] . Journal of Computer science and Technology , 2001 , 16 ( 6 ): 582 - 589 .
KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [C ] // Advances in neural information processing systems,Dec 3-8,2012,Lake Tahoe,USA . New York:ACM Press , 2012 : 1097 - 1105 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . arXiv preprint arXiv:1409.1556 , 2014 .
ROUCOS S , WILGUS A . High quality time-scale modification for speech [C ] // ICASSP’85:Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing,Apri 26-29,1985,Florida,USA . Piscataway:IEEE Press , 1985 : 493 - 496 .
ZHU X , BEAUREGARD G , WYSE L . Real-time signal estimation from modified short-time Fourier transform magnitude spectra [J ] . IEEE Transactions on Audio Speech & Language Processing , 2007 , 15 ( 5 ): 1645 - 1653 .
Time-scale/pitch modification [EB/OL ] .(2009-11-24)[201709-27 ] . http://cn.mathworks.com/matlabcentral/fileexchange/258 80-time-scale-pitch-modification http://cn.mathworks.com/matlabcentral/fileexchange/258 80-time-scale-pitch-modification .
ZHU X , BEAUREGARD G T . Real-time signal estimation from modified short-time Fourier transform magnitude spectra [J ] . IEEE Transactions on Audio Speech & Language Processing , 2007 , 15 ( 5 ): 1645 - 1653 .
TREHUB S E , COHEN A J , THORPE L A . Development of the perception of musical relations:semitone and diatonic structure [J ] . Journal of Experimental Psychology Human Perception & Performance , 1986 , 12 ( 3 ): 295 .
Audacity:Free Audio Editor and Recorder [EB/OL ] .(2016-01-20)[2017-03-27 ] . http://www.audacityteam.org/ http://www.audacityteam.org/ .
Cool Edit Pro is Now Adobe Audition [EB/OL ] .(2012-11-08)[2017-03-27 ] . http://www.adobe.com/products/audition.html http://www.adobe.com/products/audition.html .
LECUN Y , BOTTOU L , BENGIO Y . Gradient-based learning applied to document recognition [J ] . Proceedings of the IEEE , 1998 , 86 ( 11 ): 2278 - 2324 .
SRIVASTAVA N , HINTON G E , KRIZHEVSKY A . Dropout:a simple way to prevent neural networks from overfitting [J ] . Journal of Machine Learning Research , 2014 , 15 ( 1 ): 1929 - 1958 .
CHOLLET F . Keras [EB/OL ] .(2016-09-16)[2016-11-24 ] . https://github.com/fchollet/keras https://github.com/fchollet/keras .
0
浏览量
856
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构