基于卷积神经网络的电子变调语音检测算法

徐宏伟; 严迪群; 阳帆; 王让定; 金超; 向立

doi:10.11959/j.issn.1000-0801.2018041

您当前的位置：

首页 >

文章列表页 >

基于卷积神经网络的电子变调语音检测算法

研究与开发 | 更新时间：2024-06-05

- 基于卷积神经网络的电子变调语音检测算法
- Detection algorithm of electronic disguised voice based on convolutional neural network
- 电信科学 2018年34卷第2期页码：46-57
- 作者机构：
- 作者简介：
  
  [ "徐宏伟（1990-），男，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。" ]
  [ "严迪群（1979-），男，博士，宁波大学信息科学与工程学院副教授、硕士生导师，主要研究方向为多媒体通信、信息安全、基于深度学习的数字语音取证等。" ]
  [ "阳帆（1991-），男，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。" ]
  [ "王让定（1962-），男，博士，宁波大学高等技术研究院教授、博士生导师，主要研究方向为多媒体通信与取证、信息隐藏与隐写分析、智能抄表及传感网络技术等。" ]
  [ "金超（1990-），男，宁波大学信息科学与工程学院博士生，主要研究方向为多媒体通信与信息安全等。" ]
  [ "向立（1994-），男，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(61300055);国家自然科学基金资助项目;The National Natural Science Foundation of China(61672302);浙江省自然科学基金资助项目;Natural Science Foundation of Zhejiang Province of China(LZ15F020002);浙江省自然科学基金资助项目;Natural Science Foundation of Zhejiang Province of China(LY17F020010);宁波市自然科学基金资助项目;Ningbo Natural Science Foundation of China(2017A610123)
- DOI：10.11959/j.issn.1000-0801.2018041
  中图分类号： TP391
- 网络出版日期：2018-02，
  
  纸质出版日期：2018-02-20
- 稿件说明：
移动端阅览
徐宏伟, 严迪群, 阳帆, 等. 基于卷积神经网络的电子变调语音检测算法[J]. 电信科学, 2018,34(2):46-57.

Hongwei XU, Diqun YAN, Fan YANG, et al. Detection algorithm of electronic disguised voice based on convolutional neural network[J]. Telecommunications science, 2018, 34(2): 46-57.
徐宏伟, 严迪群, 阳帆, 等. 基于卷积神经网络的电子变调语音检测算法[J]. 电信科学, 2018,34(2):46-57. DOI： 10.11959/j.issn.1000-0801.2018041.

Hongwei XU, Diqun YAN, Fan YANG, et al. Detection algorithm of electronic disguised voice based on convolutional neural network[J]. Telecommunications science, 2018, 34(2): 46-57. DOI： 10.11959/j.issn.1000-0801.2018041.

摘要

提出了一种基于梅尔倒谱系数统计特征和卷积神经网络的电子变调语音检测算法。首先提取待测语音的梅尔倒谱系数及其差分系数，并将上述系数的统计特征进行有针对性的构造，作为卷积神经网络的输入。从卷积核尺寸、卷积核个数以及池化层尺寸等方面，对24种不同网络结构进行了测试评估，最终确定了可有效用于变调检测的卷积神经网络结构。实验结果表明，所提出的算法能够有效地检测出电子变调的痕迹，并可准确估计出电子变调语音经过的具体伪造操作，为电子变调语音的检测提供了一种新的方法。

Abstract

An electronic disguised voice detection algorithm based on the statistical features of MFCC and the convolution neural network was proposed.Firstly

the statistical features of MFCC were extracted and reconstructed as the input of convolution neural network.Considering the convolution kernel size

the number of convolution kernels and the pooling size

24 different network structures were evaluated in this work.Finally

the convolution neural network structure which could be effectively used for electronic disguised voice detection was determined.The experimental results show that the proposed algorithm can effectively detect the trace of electronic disguising.Meanwhile

the specific forgery operation of the electronic disguised voice can also be estimated.

关键词

Keywords

references

RODMAN R , . Speaker recognition of disguised voices:A program for research [C ] // Consortium on Speech Technology in Conjunction with the Conference on Speaker Recognition by Man and Machine:Directions for Forensic Applications,Oct 8-11,1998 , Ankara,Turkey .[S.l.:s.n ] 1998 : 9 - 22 .

WU H , WANG Y , HUANG J . Blind detection of electronic disguised voice [C ] // 2013 IEEE International Conference on Acoustics,Speech and Signal Processing,May 26-31,2013,Vancouver,Canada . Piscataway:IEEE Press , 2013 : 3013 - 3017 .

WU H , WANG Y , HUANG J . Identification of electronic disguised voices [J ] . IEEE Transactions on Information Forensics and Security , 2014 , 9 ( 3 ): 489 - 500 .

CAO W , WANG H . Identification of Electronic Disguised Voices in the Noisy Environment [C ] // International Workshop on Digital-forensics and Watermarking,Sep 17-19,2016 , Beijing China .[S.l.:s.n ] 2016 : 75 - 87 .

ZHENG F , ZHANG G , SONG Z . Comparison of different implementations of MFCC [J ] . Journal of Computer science and Technology , 2001 , 16 ( 6 ): 582 - 589 .

KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [C ] // Advances in neural information processing systems,Dec 3-8,2012,Lake Tahoe,USA . New York:ACM Press , 2012 : 1097 - 1105 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . arXiv preprint arXiv:1409.1556 , 2014 .

ROUCOS S , WILGUS A . High quality time-scale modification for speech [C ] // ICASSP’85:Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing,Apri 26-29,1985,Florida,USA . Piscataway:IEEE Press , 1985 : 493 - 496 .

ZHU X , BEAUREGARD G , WYSE L . Real-time signal estimation from modified short-time Fourier transform magnitude spectra [J ] . IEEE Transactions on Audio Speech ＆ Language Processing , 2007 , 15 ( 5 ): 1645 - 1653 .

Time-scale/pitch modification [EB/OL ] .(2009-11-24)[201709-27 ] . http://cn.mathworks.com/matlabcentral/fileexchange/258 80-time-scale-pitch-modification http://cn.mathworks.com/matlabcentral/fileexchange/258 80-time-scale-pitch-modification .

ZHU X , BEAUREGARD G T . Real-time signal estimation from modified short-time Fourier transform magnitude spectra [J ] . IEEE Transactions on Audio Speech ＆ Language Processing , 2007 , 15 ( 5 ): 1645 - 1653 .

TREHUB S E , COHEN A J , THORPE L A . Development of the perception of musical relations:semitone and diatonic structure [J ] . Journal of Experimental Psychology Human Perception ＆ Performance , 1986 , 12 ( 3 ): 295 .

Audacity:Free Audio Editor and Recorder [EB/OL ] .(2016-01-20)[2017-03-27 ] . http://www.audacityteam.org/ http://www.audacityteam.org/ .

Cool Edit Pro is Now Adobe Audition [EB/OL ] .(2012-11-08)[2017-03-27 ] . http://www.adobe.com/products/audition.html http://www.adobe.com/products/audition.html .

LECUN Y , BOTTOU L , BENGIO Y . Gradient-based learning applied to document recognition [J ] . Proceedings of the IEEE , 1998 , 86 ( 11 ): 2278 - 2324 .

SRIVASTAVA N , HINTON G E , KRIZHEVSKY A . Dropout:a simple way to prevent neural networks from overfitting [J ] . Journal of Machine Learning Research , 2014 , 15 ( 1 ): 1929 - 1958 .

CHOLLET F . Keras [EB/OL ] .（2016-09-16）[2016-11-24 ] . https://github.com/fchollet/keras https://github.com/fchollet/keras .

浏览量

856

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于深度学习的6G可见光通信多址接入解调方法

一种基于编码单元快速划分的VVC帧内编码方法

基于优化卷积神经网络的车辆特征识别算法研究

一种基于随机森林和改进卷积神经网络的网络流量分类方法

深度卷积神经网络的柔性剪枝策略