采用局部相位量化的合成语音检测方法

徐嘉; 简志华; 金宏辉; 杨曼

doi:10.11959/j.issn.1000-0801.2024024

您当前的位置：

首页 >

文章列表页 >

采用局部相位量化的合成语音检测方法

研究与开发 | 更新时间：2024-06-05

- 采用局部相位量化的合成语音检测方法
- A method for synthetic speech detection using local phase quantization
- 电信科学 2024年40卷第2期页码：63-71
- 作者机构：
  
  1. 杭州电子科技大学信工程学院，浙江杭州 310018
  2. 浙江省数据存储传输及应用技术研究重点实验室，浙江杭州 310018
- 作者简介：
  
  [ "徐嘉（1998- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音伪装检测" ]
  [ "简志华（1978- ），男，博士，杭州电子科技大学通信工程学院副教授、硕士生导师，浙江省数据存储传输及应用技术研究重点实验室副教授，主要研究方向为语音转换、伪装语音检测、声纹识别以及语音隐私保护等" ]
  [ "金宏辉（1999- ），男，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音转换和伪装检测" ]
  [ "杨曼（2000- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音伪装检测" ]
- 基金信息：
  
  国家自然科学基金资助项目;The National Natural Science Foundation of China(61201301);国家自然科学基金资助项目;The National Natural Science Foundation of China(61772166)
- DOI：10.11959/j.issn.1000-0801.2024024
  中图分类号： TP391.42
- 网络出版日期：2024-02，
  
  纸质出版日期：2024-02-20
- 稿件说明：
移动端阅览
徐嘉, 简志华, 金宏辉, 等. 采用局部相位量化的合成语音检测方法[J]. 电信科学, 2024,40(2):63-71.

Jia XU, Zhihua JIAN, Honghui JIN, et al. A method for synthetic speech detection using local phase quantization[J]. Telecommunications science, 2024, 40(2): 63-71.
徐嘉, 简志华, 金宏辉, 等. 采用局部相位量化的合成语音检测方法[J]. 电信科学, 2024,40(2):63-71. DOI： 10.11959/j.issn.1000-0801.2024024.

Jia XU, Zhihua JIAN, Honghui JIN, et al. A method for synthetic speech detection using local phase quantization[J]. Telecommunications science, 2024, 40(2): 63-71. DOI： 10.11959/j.issn.1000-0801.2024024.

摘要

由于语音合成的便利性，合成伪装语音对说话人认证系统的安全构成了很大的威胁。为了进一步提升说话人认证系统的伪装语音检测能力，提出了一种利用语谱图频域信息的合成语音检测方法，它通过局部相位量化算法对语谱图频域信息进行描述。首先，将语谱图分为若干子块，然后对每个子块进行局部相位量化，经直方图统计分析后获得局部相位量化特征向量并将该特征向量作为随机森林分类器的输入特征，实现合成语音检测。实验结果表明，该方法进一步降低了合成语音检测系统的串联检测代价数值，并且具有更强的泛化能力。

Abstract

Due to the convenience of speech synthesis

synthesized disguised speech poses a great threat to the security of speaker verification systems.In order to further enhance the ability of detecting the camouflage to the speaker verification system

a method of synthetic speech detection was put forward using the information in spectral domain of the synthetic speech spectrogram.The method employed the local phase quantization (LPQ) algorithm to describe frequency domain information in the speech spectrogram.Firstly

the spectrogram was divided into several sub-blocks

and then the LPQ was performed on each sub-block.After the histogram statistical analysis

the LPQ feature vector was obtained and used as the input feature of the random forest classifier to realize the synthetic speech detection.The experimental results demonstrate that the proposed method further reduces tandem detection cost function (t-DCF) and has better generalization ability.

关键词

Keywords

references

REN Y Q , PENG H P , LI L X , et al . Generalized voice spoofing detection via integral knowledge amalgamation [J ] . IEEE/ACM Transactions on Audio,Speech,and Language Processing , 2023 ( 31 ): 2461 - 2475 .

CHENG P , ROEDIG U . Personal voice assistant security and privacy—a survey [J ] . Proceedings of the IEEE , 2022 , 110 ( 4 ): 476 - 507 .

徐剑 , 简志华 , 于佳祺 , 等 . 采用完整局部二进制模式的伪装语音检测 [J ] . 电信科学 , 2021 , 37 ( 5 ): 91 - 99 .

XU J , JIAN Z H , YU J Q , et al . Completed local binary pattern based speech anti-spoofing [J ] . Telecommunications Science , 2021 , 37 ( 5 ): 91 - 99 .

徐嘉 , 简志华 , 金宏辉 , 等 . 基于中心对称局部二值模式的合成伪装语音检测方法 [J ] . 电信科学 , 2023 , 39 ( 1 ): 72 - 78 .

XU J , JIAN Z H , JIN H H , et al . Synthetic spoofing speech detection method based on center-symmetric local binary pattern [J ] . Telecommunications Science , 2023 , 39 ( 1 ): 72 - 78 .

陈佳 , 章坚武 , 张浙亮 . 基于上下文信息与注意力特征的欺骗语音检测 [J ] . 电信科学 , 2023 , 39 ( 2 ): 92 - 102 .

CHEN J , ZHANG J W , ZHANG Z L . Spoof speech detection based on context information and attention feature [J ] . Telecommunications Science , 2023 , 39 ( 2 ): 92 - 102 .

MITTAL A , DUA M . Automatic speaker verification systems and spoof detection techniques:review and analysis [J ] . International Journal of Speech Technology , 2021 , 25 ( 1 ): 105 - 134 .

ALZANTOT M , WANG Z , SRIVASTAVA M B . Deep residual neural networks for audio spoofing detection [C ] // Proceedings of 20th Annual Conference of the International Speech Communication Association 2019(INTERSPEECH 2019) . Graz,Austria:ISCA , 2019 : 1078 - 1082 .

NAGAKRISHNAN R , REVATHI A . Generic speech based person authentication system with genuine and spoofed utterances:different feature sets and models [J ] . Multimedia Tools and Applications , 2021 , 81 ( 1 ): 1179 - 1208 .

TODISCO M , HÉCTOR DELGADO , EVANS N . Constant Q cepstral coefficients:a spoofing countermeasure for automatic speaker verification [J ] . Computer Speech ＆ Language , 2017 ( 45 ): 516 - 535 .

LOWEIMI E , BARKER J , HAIN T . Statistical normalisation of phase-based feature representation for robust speech recognition [C ] // Proceedings of the 2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . Piscataway:IEEE Press , 2017 : 5310 - 5314 .

YANG J C , WANG H J , DAS R K , et al . Modified magnitude-phase spectrum information for spoofing detection [J ] . IEEE/ACM Transactions on Audio,Speech,and Language Processing , 2021 ( 29 ): 1065 - 1078 .

KIM J , BAN S M . Phase-aware spoof speech detection based on Res2net with phase network [C ] // Proceedings of the ICASSP 2023 - 2023 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . Piscataway:IEEE Press , 2023 : 1 - 5 .

OJANSIVU V , HEIKKILA J . Blur insensitive texture classification using local phase quantization [J ] . Lecture Notes in Computer Science , 2008 ( 5099 ): 236 - 243 .

MACIN G , TASCI B , TASCI I , et al . An accurate multiple sclerosis detection model based on exemplar multiple parameters local phase quantization:ExMPLPQ [J ] . Applied Sciences , 2022 , 12 ( 10 ): 4920 - 4929 .

RASWA F H , KINARTA I Y , PULUNGAN R , et al . Fingerprint liveness detection using denoised-bayes shrink wavelet and aggregated local spatial and frequency features [C ] // Proceedings of the 2022 International Conference on Machine Learning and Cybernetics (ICMLC) . Piscataway:IEEE Press , 2022 : 103 - 108 .

CHAA M , AKHTAR Z , LATI A . Contactless person recognition using 2D and 3D finger knuckle patterns [J ] . Multimedia Tools and Applications , 2022 , 81 ( 6 ): 8671 - 8689 .

刘琳岚 , 高声荣 , 舒坚 . 基于随机森林的链路质量预测 [J ] . 通信学报 , 2019 , 40 ( 4 ): 202 - 211 .

LIU L L , GAO S R , SHU J . Link quality prediction based on random forest [J ] . Journal on Communications , 2019 , 40 ( 4 ): 202 - 211 .

WANG X , YAMAGISHI J , TODISCO M , et al . ASVspoof 2019:a large-scale public database of synthesized,converted and replayed speech [J ] . Computer Speech ＆ Language , 2020 ( 64 ): 101114 .

KINNUNEN T , DELGADO H , EVANS N , et al . Tandem as sessment of spoofing countermeasures and automatic speaker verification:fundamentals [J ] . IEEE/ACM Transactions on Audio,Speech,and Language Processing , 2020 ( 28 ): 2195 - 2210 .

LU F X , HUANG J . An improved local binary pattern operator for texture classification [C ] // Proceedings of the 2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) . Piscataway:IEEE Press , 2016 : 1308 - 1311 .

XIONG Z X , LIU M L , GUO Q . Finger vein recognition method based on center-symmetric local binary pattern [C ] // Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA) . Piscataway:IEEE Press , 2019 : 262 - 266 .

朱长水 , 丁勇 , 袁宝华 , 等 . 融合LBP和LPQ的人脸识别 [J ] . 南京师大学报(自然科学版) , 2015 , 38 ( 1 ): 104 - 107 , 112 .

ZHU C S , DING Y , YUAN B H , et al . Face recognition based on local binary pattern and local phase quantization [J ] . Journal of Nanjing Normal University (Natural Science Edition) , 2015 , 38 ( 1 ): 104 - 107 , 112 .

GRIFFIN D , LIM J . Signal estimation from modified short-time Fourier transform [J ] . IEEE Transactions on Acoustics,Speech,and Signal Processing , 1984 , 32 ( 2 ): 236 - 243 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

采用圆周局部三值模式纹理特征的合成语音检测方法