基于中心对称局部二值模式的合成伪装语音检测方法

徐嘉; 简志华; 金宏辉; 吴超; 游林; 吴迎笑

doi:10.11959/j.issn.1000-0801.2023005

您当前的位置：

首页 >

文章列表页 >

基于中心对称局部二值模式的合成伪装语音检测方法

研究与开发 | 更新时间：2024-06-05

- 基于中心对称局部二值模式的合成伪装语音检测方法
- Synthetic spoofing speech detection method based on center-symmetric local binary pattern
- 电信科学 2023年39卷第1期页码：72-78
- 作者机构：
  
  1. 杭州电子科技大学通信工程学院，浙江杭州 310018
  2. 杭州电子科技大学网络空间安全学院，浙江杭州 310018
  3. 杭州电子科技大学计算机学院，浙江杭州 310018
- 作者简介：
  
  [ "徐嘉（1998- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为伪装语音检测" ]
  [ "简志华（1978- ），男，杭州电子科技大学通信工程学院副教授、硕士生导师，主要研究方向为语音转换、伪装语音检测、声纹识别等" ]
  [ "金宏辉（1999- ），男，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音转换和伪装语音检测" ]
  [ "吴超（1988- ），男，杭州电子科技大学通信工程学院讲师，主要研究方向为导航信号处理及欺骗干扰检测" ]
  [ "游林（1966- ），男，杭州电子科技大学网络空间安全学院教授、博士生导师，主要研究方向为生物信息处理、信息安全、密码学等" ]
  [ "吴迎笑（1980- ），女，杭州电子科技大学计算机学院特聘教授，主要研究方向为毫米波感知用于声纹识别与认证、射频信息处理和工业互联网" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(61201301);国家自然科学基金资助项目;The National Natural Science Foundation of China(61772166);国家自然科学基金资助项目;The National Natural Science Foundation of China(61901154)
- DOI：10.11959/j.issn.1000-0801.2023005
  中图分类号： TP391.42
- 网络出版日期：2023-01，
  
  纸质出版日期：2023-01-20
- 稿件说明：
移动端阅览
徐嘉, 简志华, 金宏辉, 等. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023,39(1):72-78.

Jia XU, Zhihua JIAN, Honghui JIN, et al. Synthetic spoofing speech detection method based on center-symmetric local binary pattern[J]. Telecommunications science, 2023, 39(1): 72-78.
徐嘉, 简志华, 金宏辉, 等. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023,39(1):72-78. DOI： 10.11959/j.issn.1000-0801.2023005.

Jia XU, Zhihua JIAN, Honghui JIN, et al. Synthetic spoofing speech detection method based on center-symmetric local binary pattern[J]. Telecommunications science, 2023, 39(1): 72-78. DOI： 10.11959/j.issn.1000-0801.2023005.

摘要

针对基于局部二值模式的伪装语音检测方法的合成语音检测准确度较低的情况，提出了一种基于中心对称局部二值模式的伪装语音检测方法。该方法通过短时傅里叶变换得到语音信号的语谱图，再利用中心对称局部二值模式提取语谱图的纹理特征，并用该纹理特征训练随机森林分类器，从而实现真伪语音的判别。该方法综合考虑语谱图中像素点的数值大小和位置关系，包含了更加全面的纹理信息，并将特征维度降低至16维，有利于减少计算量。实验结果表明，在ASVspoof 2019数据集上，与传统的基于局部二值模式的伪装语音检测方法相比，所提方法将合成伪装语音的串联检测代价函数（t-DCF）降低了 16.98%，检测速度提高了89.73%。

Abstract

In view of the fact that the local binary pattern (LBP) based speech spoofing detection method has low detection accuracy when detecting synthetic speech

a spoofing speech detection method based on center-symmetric local binary pattern (CSLBP) was proposed.In this method

the spectrogram of the speech signal was obtained through short-time Fourier transform (STFT)

and then the texture feature was extracted from the spectrogram using the CSLBP.The random forest classifier was trained by the extracted texture feature to realize the discrimination of genuine and spoofing speech.The CSLBP-based method comprehensively considered the value and position relationship of pixels in the spectrogram so as to contain more texture information

and reduced the feature dimension to 16 beneficial to decrease the amount of computation.Experimental results on the ASVspoof 2019 dataset show that

compared with the LBP-based spoofing detection method

the proposed method reduced the tandem detection cost function (t-DCF) of synthetic spoofing speech by 16.98% and increased the detection speed by 89.73%.

关键词

Keywords

references

KANERVISTO A , HAUTAMÄKI V , KINNUNEN T , et al . Optimizing tandem speaker verification and anti-spoofing systems [J ] . IEEE/ACM Transactions on Audio,Speech,and Language Processing , 2022 , 30 : 477 - 488 .

LEI Z C , YAN H , LIU C H , et al . Two-path GMM-ResNet and GMM-SENet for ASV spoofing detection [C ] // Proceedings of ICASSP 2022 - 2022 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2022 : 6377 - 6381 .

ALZANTOT M , WANG Z Q , SRIVASTAVA M B . Deep residual neural networks for audio spoofing detection [C ] // Proceedings of Interspeech 2019 . Cary:ISCA , 2019 : 1078 - 1082 .

崔兆国 . 基于SVM的反蓄意模仿说话人识别研究 [D ] . 桂林:桂林电子科技大学 , 2013 .

CUI Z G . Research on speaker recognition of anti-deliberate imitation based on SVM [D ] . Guilin:Guilin University of Electronic Technology , 2013 .

PADMANABHAN R , PARTHASARATHI S H K , MURTHY H A . Robustness of phase based features for speaker recognition [C ] // Proceedings of Interspeech 2009 . Cary:ISCA , 2009 : 2299 - 2302 .

SARATXAGA I , SANCHEZ J , WU Z , et al . Synthetic speech detection using phase information [J ] . Speech Communication , 2016 ( 81 ): 30 - 41 .

HOANG V T , . Unsupervised LBP histogram selection for color texture classification via sparse representation [C ] // Proceedings of 2018 IEEE International Conference on Information Communication and Signal Processing . Piscataway:IEEE Press , 2018 : 79 - 84 .

SHU X , SONG Z , SHI J , et al . Multiple channels local binary pattern for color texture representation and classification [J ] . Signal Processing:Image Communication , 2021 ( 98 ): 116392 .

KARANWAL S . A comparative study of 14 state of art descriptors for face recognition [J ] . Multimedia Tools and Applications , 2021 , 80 ( 8 ): 12195 - 12234 .

SHI L , WANG X , SHEN Y . Research on 3D face recognition method based on LBP and SVM [J ] . Optik:International Journal for Light and Electron Optics , 2020 ( 220 ): 165157 .

ALEGRE F , VIPPERLA R , AMEHRAYE A , et al . A new speaker verification spoofing countermeasure based on local binary patterns [C ] // Proceedings of Interspeech 2013 . Cary:ISCA , 2013 : 940 - 944 .

徐剑 , 简志华 , 于佳祺 , 等 . 采用完整局部二进制模式的伪装语音检测 [J ] . 电信科学 , 2021 , 37 ( 5 ): 91 - 99 .

XU J , JIAN Z H , YU J Q , et al . Completed local binary pattern based speech anti-spoofing [J ] . Telecommunications Science , 2021 , 37 ( 5 ): 91 - 99 .

XIA Z H , YUAN C S , LYU R , et al . A novel weber local binary descriptor for fingerprint liveness detection [J ] . IEEE Transactions on Systems,Man,and Cybernetics:Systems , 2018 , 50 ( 4 ): 1526 - 1536 .

TOFFA O K , MIGNOTTE M . Environmental sound classification using local binary pattern and audio features collaboration [J ] . IEEE Transactions on Multimedia , 2021 ( 23 ): 3978 - 3985 .

SHAH A , EL-ALFY E , . Comparative analysis of feature extraction and fusion for blind authentication of digital images using chroma channels [J ] . Signal Processing:Image Communication , 2021 ( 95 ): 116271 .

王科俊 , 曹逸 , 邢向磊 . 基于MB-CSLBP的手指静脉加密算法研究 [J ] . 智能系统学报 , 2018 , 13 ( 4 ): 543 - 549 .

WANG K J , CAO Y , XING X L . Finger-vein encryption algorithm based on MB-CSLBP [J ] . CAAI Transactions on Intelligent Systems , 2018 , 13 ( 4 ): 543 - 549 .

WANG X , YAMAGISHI J , TODISCO M , et al . ASVspoof 2019:a large-scale public database of synthesized,converted and replayed speech [J ] . Computer Speech ＆ Language , 2020 ( 64 ): 101114 .

KINNUNEN T , DELGADO H , EVANS N , et al . Tandem assessment of spoofing countermeasures and automatic speaker verification:fundamentals [J ] . IEEE/ACM Transactions on Audio,Speech,and Language Processing , 2020 ( 28 ): 2195 - 2210 .

HEIKKILA M , PIETIKAINEN M . A texture-based method for modeling the background and detecting moving objects [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2006 , 28 ( 4 ): 657 - 662 .

LIU L J , LING Z H , JIANG Y , et al . WaveNet vocoder with limited training data for voice conversion [C ] // Proceedings of Annual Conference of the International Speech Communication Association (Interspeech) . Cary:ISCA , 2018 : 1983 - 1987 .

LI Y J , SWERSKY K , ZEMEL R . Generative moment matching networks [C ] // Proceedings of International Conference on Machine Learning (ICML) .[S.l.:s.n. ] , 2015 : 1718 - 1727 .

浏览量

235

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

采用恒Q调制包络的合成语音伪装检测方法

基于联合特征与随机森林的伪装语音检测

一种基于随机森林和改进卷积神经网络的网络流量分类方法

采用圆周局部三值模式纹理特征的合成语音检测方法

机器学习在物联网虚假用户识别中的运用