基于逆梅尔对数频谱系数的回放语音检测算法

林朗; 王让定; 严迪群; 李璨

doi:10.11959/j.issn.1000-0801.2018020

您当前的位置：

首页 >

文章列表页 >

基于逆梅尔对数频谱系数的回放语音检测算法

研究与开发 | 更新时间：2024-06-05

- 基于逆梅尔对数频谱系数的回放语音检测算法
- A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient
- 电信科学 2018年34卷第5期页码：90-98
- 作者机构：
- 作者简介：
  
  [ "林朗（1994-），男，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。" ]
  [ "王让定（1962-），男，博士，宁波大学信息科学与工程学院教授、博士生导师，主要研究方向为多媒体通信与取证、信息隐藏与隐写分析、智能抄表及传感网络技术等。" ]
  [ "严迪群（1979-），男，博士，宁波大学信息科学与工程学院副教授、硕士生导师，主要研究方向为多媒体通信、信息安全、基于深度学习的数字语音取证等。" ]
  [ "李璨（1992-），女，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(61672302);国家自然科学基金资助项目;The National Natural Science Foundation of China(61300055);浙江省自然科学基金资助项目;Natural Science Foundation of Zhejiang Province of China(LZ15F020002);浙江省自然科学基金资助项目;Natural Science Foundation of Zhejiang Province of China(LY17F020010);宁波大学科研基金资助项目;The Scientific Research Foundation of Ningbo University(XKXL1405);宁波大学科研基金资助项目;The Scientific Research Foundation of Ningbo University(XKXL1420);宁波大学学科基金资助项目;Ningbo University Fund(XKXL1509);宁波大学学科基金资助项目;Ningbo University Fund(XKXL1503);宁波大学王宽诚幸福基金资助项目;K.C.Wong Magna Fund in Ningbo University
- DOI：10.11959/j.issn.1000-0801.2018020
  中图分类号： TN912.3
- 网络出版日期：2018-05，
  
  纸质出版日期：2018-05-20
- 稿件说明：
移动端阅览
林朗, 王让定, 严迪群, 等. 基于逆梅尔对数频谱系数的回放语音检测算法[J]. 电信科学, 2018,34(5):90-98.

Lang LIN, Rangding WANG, Diqun YAN, et al. A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient[J]. Telecommunications science, 2018, 34(5): 90-98.
林朗, 王让定, 严迪群, 等. 基于逆梅尔对数频谱系数的回放语音检测算法[J]. 电信科学, 2018,34(5):90-98. DOI： 10.11959/j.issn.1000-0801.2018020.

Lang LIN, Rangding WANG, Diqun YAN, et al. A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient[J]. Telecommunications science, 2018, 34(5): 90-98. DOI： 10.11959/j.issn.1000-0801.2018020.

摘要

高保真录音设备和回放设备的普及化及便携化，给说话人识别系统的抗回放语音攻击带来了严峻挑战。通过语谱图分析原始语音和回放语音在高频区的差异，有针对性地将语音信号在求取 Mel（梅尔）倒谱系数过程中的Mel滤波器组逆置，并将DCT前的Mel对数频谱系数作为算法的特征。最后，利用支持向量机作为分类器对待测语音进行判别。实验结果表明，此算法能够有效地检测回放语音。另外，将此算法加载到GMM-UBM说话人识别系统后，显著地提升了系统的抗回放语音攻击能力。

Abstract

The popularity and portability of high-fidelity audio recording equipment and playback equipment poses a serious challenge for speaker recognition systems against playback attacks.Based on the differences between the original speech and the playback speech in high frequency region

the algorithm reversed the Mel-filter bank in Mel-frequency cepstral coefficient (MFCC) calculation

and the coefficients before the DCT were used as the features of the algorithm.SVM was utilized as the classifier.Experimental results show that this algorithm can effectively detect the playback speech.In addition

the algorithm is integrated into the GMM-UBM speaker recognition system

which significantly improves the systems’ capability of resisting the playback attack.

关键词

Keywords

references

ZHU D , MA B , LI H . Speaker verification with feature-space MAPLR parameters [J ] . IEEE Transactions on Audio Speech ＆Language Processing , 2011 , 19 ( 3 ): 505 - 515 .

易克初 , 胡征 . 一种应用矢量量化的语音合成新方法 [J ] . 电信科学 , 1987 ( 11 ): 1 - 6 .

YI K C , HU Z . A new speech synthesis method using vector quantization [J ] . Telecommunications Science , 1987 ( 11 ): 1 - 6 .

郭弘 . 录音证据的真实性检验与研究 [J ] . 电信科学 , 2010 , 26 ( Z2 ): 56 - 60 .

GUO H . Authenticity verification and research of recording evidence [J ] . Telecommunications Science , 2010 , 26 ( Z2 ): 56 - 60 .

李璨 , 王让定 , 严迪群 , 等 . 基于相位谱的翻录语音攻击检测算法 [J ] . 电信科学 , 2017 , 33 ( 8 ): 145 - 154 .

LI C , WANG R D , YAN D Q , et al . Detection algorithm of riprap voice attack based on phase spectrum [J ] . Telecommunications Science , 2017 , 33 ( 8 ): 145 - 154 .

SHANG W , STEVENSON M . A playback attack detector for speaker verification systems [C ] // IEEE International Symposium on Communications Control and Signal Processing (ISCCSP),March 12-14,2008,St Julians,Malta . Piscataway:IEEE Press , 2008 : 1144 - 1149 .

SHANG W , STEVENSON M . Score normalization in playback attack detection [C ] // IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),March 14-19,2010,Dallas,USA . Piscataway:IEEE Press , 2010 : 1678 - 1681 .

张利鹏 , 曹犟 , 徐明星 . 防止假冒者闯入说话人识别系统 [J ] . 清华大学学报(自然科学版) , 2008 , 48 ( S1 ): 699 - 703 .

ZHANG L P , CAO J , XU M X . Prevention of impostors entering speaker recognition systems [J ] . Journal of Tsinghua University (Science and Technology) , 2008 , 48 ( S1 ): 699 - 703 .

王志峰 , 贺前华 , 张雪源 , 等 . 基于模式噪声的录音回放攻击检测 [J ] . 华南理工大学学报 , 2011 , 39 ( 10 ): 7 - 12 .

WANG Z F , HE Q H , ZHANG X Y , et al . Channel pattern noise based playback detection algorithm speaker recognition [J ] . Journal of South China University of Technology (Natural Science Edition) , 2011 , 39 ( 10 ): 7 - 12 .

李富强 , 万红 , 黄俊杰 . 基于MATLAB的语谱图显示与分析 [J ] . 微计算机信息 , 2005 ( 20 ): 172 - 174 .

LI F Q , WAN H , HUANG J J . The display and analysis of sonogram based on MATLAB [J ] . Control ＆ Automation , 2005 ( 20 ): 172 - 174 .

BURILLO P , BUSTINCE H . Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets [J ] . Fuzzy Sets ＆ Systems , 1996 , 78 ( 3 ): 305 - 316 .

项要杰 , 杨俊安 , 李晋徽 , 等 . 一种适用于说话人识别的改进Mel滤波器 [J ] . 计算机工程 , 2013 ( 11 ): 214 - 217 .

XIANG Y J , YANG J A , LI J H , et al . An improved Mel-frequency filter for speaker recognition [J ] . Computer Engineering , 2013 ( 11 ): 214 - 217 .

陶佰睿 , 郭琴 , 苗凤娟 , 等 . 基于改进 Mel 滤波器组的声纹特征提取SoC设计 [J ] . 微电子学 , 2015 ( 6 ): 785 - 788 .

TAO B R , GUO Q , MIAO F J , et al . SoC design of voiceprint features extraction based on improved Mel filter banks [J ] . Microelectronics , 2015 ( 6 ): 785 - 788 .

胡永刚 , 吴翊 , 王洪志 , 等 . 高维数据降维的 DCT 变换 [J ] . 计算机工程与应用 , 2006 ( 32 ): 21 - 23 .

HU Y G , WU Y , WANG H Z , et al . Discrete cosine transform in data dimensionality reduction [J ] . Computer Engineering and Applications , 2006 ( 32 ): 21 - 23 .

MOHAMED A . Deep neural network acoustic models for ASR [J ] . Doctoral , 2014

CHANG C C , LIN C J . LIBSVM:a library for support vector machines [J ] . ACM Transactions on Intelligent Systems ＆Technology , 2012 , 2 ( 3 ): 1 - 27 .

王天庆 , 李爱军 . 连续汉语语音识别语料库的设计 [C ] // 第六届全国现代语音学学术会议论文集,2003年10月1日,天津,中国 . [出版地不详:出版者不详] , 2003 : 1 - 4 .

WANG T Q , LI A J . The design of the continuous Chinese speech recognition corpus [C ] // The Sixth National Conference on Modern Phonetics Learning,Oct 1,2003,Tianjin,China.[S.l.:s.n] . 2003 : 1 - 4 .

CHAKROBORTY S , ROY A , SAHA G . Improved closed setttext-independent speaker identification by combining MFCC with evidence from flipped filter banks [J ] . International Journal of Signal Processing , 2007 , 4 ( 2 ): 114 - 122 .

浏览量

1004

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据