浏览全部资源
扫码关注微信
1. 杭州电子科技大学通信工程学院,浙江 杭州 310018
2. 浙江省数据存储传输及应用技术研究重点实验室,浙江 杭州 310018
[ "徐嘉(1998- ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音伪装检测" ]
[ "简志华(1978- ),男,博士,杭州电子科技大学通信工程学院副教授、硕士生导师,浙江省数据存储传输及应用技术研究重点实验室教师,主要研究方向为语音转换、伪装语音检测、声纹识别等" ]
[ "金宏辉(1999- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音转换和伪装检测" ]
[ "吴超(1988- ),男,博士,杭州电子科技大学通信工程学院讲师、硕士生导师,主要研究方向为导航信号处理及欺骗干扰检测" ]
网络出版日期:2023-11,
纸质出版日期:2023-11-20
移动端阅览
徐嘉, 简志华, 金宏辉, 等. 采用恒Q调制包络的合成语音伪装检测方法[J]. 电信科学, 2023,39(11):107-115.
Jia XU, Zhihua JIAN, Honghui JIN, et al. A method of synthetic speech spoofing detection using constant Q modulation envelope[J]. Telecommunications science, 2023, 39(11): 107-115.
徐嘉, 简志华, 金宏辉, 等. 采用恒Q调制包络的合成语音伪装检测方法[J]. 电信科学, 2023,39(11):107-115. DOI: 10.11959/j.issn.1000-0801.2023187.
Jia XU, Zhihua JIAN, Honghui JIN, et al. A method of synthetic speech spoofing detection using constant Q modulation envelope[J]. Telecommunications science, 2023, 39(11): 107-115. DOI: 10.11959/j.issn.1000-0801.2023187.
针对传统的声学特征参数对合成语音伪装检测时存在的准确度低、未知类型合成语音检测效果较差、在噪声环境中表现欠佳的情况,提出了一种采用恒Q调制包络(constant Q modulation envelope,CQME)的合成伪装语音检测方法。该方法基于语音时域包络中包含的丰富信息,而合成语音与真实语音的包络在细节上存在较大差异,利用恒Q变换(constant Q transform,CQT)得到语音调制包络谱,并计算每个频率成分的均方根,获得CQME特征向量。再用该特征向量训练随机森林分类器,实现真伪语音的判别。实验结果表明,在ASVspoof 2019数据集上,CQME特征训练的随机森林具有较高的检测性能,对未知类型的合成语音也具有较好的检测效果。并且在多种噪声条件下,该方法仍表现出较高的检测性能,具有很好的噪声鲁棒性。
In response to the low accuracy of synthetic speech spoofing detection based on traditional acoustic feature parameters
poor detection performance for unknown types of synthetic speech
and performance degradation in noisy environments
a method for detecting spoofing synthetic speech was proposed using constant Q modulation envelope (CQME) .The motivation of the method was from the fact that the temporal envelope of speech contained abundant information and there was a big difference in detail between the envelope of synthetic speech and genuine speech.The modulation envelope spectrum of speech was obtained by employing constant Q transform (CQT)
and the root mean square of each frequency component was calculated to derive the CQME feature vector.And then the CQME feature vector was used to train the random forest classifier for discriminating genuine speech from spoofing synthetic speech.Experimental results demonstrate that the random forest trained with CQME features achieves high detection performance on the ASVspoof 2019 dataset and exhibites good detection efficacy for unknown types of synthetic speech.Furthermore
the proposed method shows high detection performance even under various noise conditions
having excellent noise robustness.
TAN C B , HIJAZI M H A , KHAMIS N , et al . A survey on presentation attack detection for automatic speaker verification systems:state-of-the-art,taxonomy,issues and future direction [J ] . Multimedia Tools and Applications , 2021 , 80 ( 21-23 ): 32725 - 32762 .
徐嘉 , 简志华 , 金宏辉 , 等 . 基于中心对称局部二值模式的合成伪装语音检测方法 [J ] . 电信科学 , 2023 , 39 ( 1 ): 72 - 78 .
XU J , JIAN Z H , JIN H H , et al . A method for synthetic spoofing speech detection based on center-symmetric local binary pattern [J ] . Telecommunications Science , 2023 , 39 ( 1 ): 72 - 78 .
MITTAL A , DUA M . Automatic speaker verification systems and spoof detection techniques:review and analysis [J ] . International Journal of Speech Technology , 2021 , 25 ( 1 ): 105 - 134 .
ALZANTOT M , WANG Z , SRIVASTAVA M B . Deep residual neural networks for audio spoofing detection [C ] // Proceedings of 20th Annual Conference of the International Speech Communication Association 2019 (INTERSPEECH 2019) . Graz,Austria:ISCA , 2019 : 1078 - 1082 .
NAGAKRISHNAN R , REVATHI A . Generic speech based person authentication system with genuine and spoofed utterances:different feature sets and models [J ] . Multimedia Tools and Applications , 2021 , 81 ( 1 ): 1179 - 1208 .
TODISCO M , HÉCTOR D , EVANS N . Constant Q cepstral coefficients:a spoofing countermeasure for automatic speaker verification [J ] . Computer Speech & Language , 2017 ( 45 ): 516 - 535 .
RAJAN P , PARTHASARATHI S , MURTHY H A . Robustness of phase based features for speaker recognition [C ] // Proceedings of 10th Annual Conference of the International Speech Communication Association 2009 (INTERSPEECH 2009) . Brighton:ISCA , 2009 : 2299 - 2302 .
SARATXAGA I , SANCHEZ J , WU Z , et al . Synthetic speech detection using phase information [J ] . Speech Communication , 2016 ( 81 ): 30 - 41 .
DRULLMAN R , FESTEN J M , PLOMP R . Effect of temporal envelope smearing on speech reception [J ] . The Journal of the Acoustical Society of America , 1994 , 95 ( 2 ): 1053 - 1064 .
LU X , UNOKI M , NAKAMURA S . Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments [J ] . Computer Speech &Language , 2011 , 25 ( 3 ): 571 - 584 .
DING N , PATEL A D , CHEN L , et al . Temporal modulations in speech and music [J ] . Neuroscience & Biobehavioral Reviews , 2017 ( 81 ): 181 - 187 .
NING Y , HE S , WU Z , et al . A review of deep learning based speech synthesis [J ] . Applied Sciences , 2019 , 9 ( 19 ): 4050 .
林朗 , 王让定 , 严迪群 , 等 . 基于逆梅尔对数频谱系数的回放语音检测算法 [J ] . 电信科学 , 2018 , 34 ( 5 ): 90 - 98 .
LIN L , WANG R D , YAN D Q , et al . A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient [J ] . Telecommunications Science , 2018 , 34 ( 5 ): 90 - 98 .
BROWN J C . Calculation of a constant Q spectral transform [J ] . Journal of the Acoustical Society of America , 1998 , 89 ( 1 ): 425 - 434 .
HAMSA S , SHAHIN I , IRAQI Y , et al . Emotion recognition from speech using wavelet packet transform cochlear filter bank and random forest classifier [J ] . IEEE Access , 2020 ( 8 ): 96994 - 97006 .
CHEN L , SU W , FENG Y , et al . Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction [J ] . Information Sciences , 2020 ( 509 ): 150 - 163 .
RAMOSAJ B , PAULY M . Consistent estimation of residual variance with random forest out-of-bag errors [J ] . Statistics &Probability Letters , 2019 ( 151 ): 49 - 57 .
WANG X , YAMAGISHI J , TODISCO M , et al . ASVspoof 2019:a large-scale public database of synthesized,converted and replayed speech [J ] . Computer Speech & Language , 2020 ( 64 ): 101114 .
KINNUNEN T , DELGADO H , EVANS N , et al . Tandem assessment of spoofing countermeasures and automatic speaker verification:fundamentals [J ] . IEEE/ACM Transactions on Au-dio,Speech,and Language Processing , 2020 ( 28 ): 2195 - 2210 .
WANG X , TAKAKI S , YAMAGISHI J . Neural source-filterbased waveform model for statistical parametric speech synthesis [C ] // 2019 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . Piscataway:IEEE Press , 2019 : 5916 - 5920 .
0
浏览量
50
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构