一种采用激活函数的具有噪声鲁棒性的合成伪造语音检测方法

杨曼; 简志华; 梁承涵

doi:10.11959/j.issn.1000-0801.2026038

您当前的位置：

首页 >

文章列表页 >

一种采用激活函数的具有噪声鲁棒性的合成伪造语音检测方法

更新时间：2026-01-21

- 一种采用激活函数的具有噪声鲁棒性的合成伪造语音检测方法
- A noise-robust spoofing synthetic speech detection method using activation function
- 电信科学 2026年页码：1-11
- 作者机构：
  
  杭州电子科技大学通信工程学院，浙江杭州 310018
- 作者简介：
  
  [ "杨曼（2000- ），女，杭州电子科技大学通信工程学院硕士研究生，主要研究方向为伪造语音检测。" ]
  [ "简志华（1978- ），男，通讯作者，杭州电子科技大学通信工程学院副教授，博士，硕士生导师，主要研究方向有伪造语音检测、语音中的隐私保护、语音转换与生成等。" ]
  [ "梁承涵（2001- ），男，杭州电子科技大学通信工程学院硕士研究生，主要研究方向为伪造语音检测与声纹鉴伪。" ]
- 基金信息：
  
  国家自然科学基金资助项目(61201301;61772166)
- DOI：10.11959/j.issn.1000-0801.2026038
  中图分类号： TN912
- 修回：2025-09-08，
  
  录用：2025-09-30，
  
  网络出版：2026-01-06，
- 稿件说明：
移动端阅览
杨曼,简志华,梁承涵.一种采用激活函数的具有噪声鲁棒性的合成伪造语音检测方法[J].电信科学,

YANG Man,JIAN Zhihua,LIANG Chenghan.A noise-robust spoofing synthetic speech detection method using activation function[J].Telecommunications Science,
杨曼,简志华,梁承涵.一种采用激活函数的具有噪声鲁棒性的合成伪造语音检测方法[J].电信科学, DOI：10.11959/j.issn.1000−0801.2026038.

YANG Man,JIAN Zhihua,LIANG Chenghan.A noise-robust spoofing synthetic speech detection method using activation function[J].Telecommunications Science, DOI：10.11959/j.issn.1000−0801.2026038.

摘要

在现实应用场景中，攻击者在伪造语音中加入加性噪声或者混响等干扰，会导致经纯净语音训练得到的检测系统性能急剧下降，为此，通过设计一种激活函数替代残差网络中跳跃连接，实现了具有噪声鲁棒性的合成语音检测系统。通过分析不同激活函数对残差块跳跃连接的影响后，将输入特征划分为非显著特征、显著特征和无法判断特征，提出了一个新的激活函数，并通过方差增长的方法来寻找激活函数的最优参数。实验结果表明，与现有方法相比，不仅显著降低了系统的等错误率，而且对噪声干扰具有很好的鲁棒性。

Abstract

In real-world application scenarios

attackers often add additive noise or reverberation and other interferences to the forged voice

which will cause the performance of the detection system trained with clean voice to drop sharply. Therefore

an activation function was designed to replace the skip connection in the residual network

thereby proposing a synthetic speech detection system with noise robustness. After analyzing the influence of different activation functions on the skip connection of the residual block

the input features were divided into non-significant features

significant features and undetermined features

and a novel activation function was proposed. The optimal parameters of the activation function were determined through the method of variance growth. Experimental results show that compared with existing methods

the method proposed in this paper not only significantly reduces the equal error rate of the system

but also has good robustness to noise interference.

关键词

Keywords

references

付毅冲 . 零样本个性化语音合成的研究 [D ] . 北京 : 北京邮电大学 , 2025 .

FU Y C . Research on zero-shot personalized speech synthesis [D ] . Beijing : Beijing University of Posts and Telecommunications , 2025 .

乔喆 . 人工智能生成内容技术在内容安全治理领域的风险和对策 [J ] . 电信科学 , 2023 , 39 ( 10 ): 136 - 146 .

QIAO Z . Risks and countermeasures of artificial intelligence generated content technology in content security governance [J ] . Telecommunications Science , 2023 , 39 ( 10 ): 136 - 146 .

HUANG W , GU Y M , WANG Z M , et al . Generalizable audio deepfake detection via latent space refinement and augmentation [C ] // Proceedings of the ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2025 : 1 - 5 .

ISOLDE W , DAGMAR B , VINCENT H , 等 . 欧洲法庭科学联盟说话人鉴定方法的实践指南 [J ] . 中国语音学报 , 2024 ( 1 ): 93 - 101 .

ISOLDE W , DAGMAR B , VINCENT H , et al . Practical guide to the European network of forensic speaker typing methods [J ] . Chinese Journal of Phonetics , 2024 ( 1 ): 93 - 101 .

许裕雄 , 李斌 , 谭舜泉 , 等 . 语音深度伪造及其检测技术研究进展 [J ] . 中国图象图形学报 , 2024 , 29 ( 8 ): 2236 - 2268 .

XU Y X , LI B , TAN S Q , et al . Research progress on speech deepfake and its detection techniques [J ] . Journal of Image and Graphics , 2024 , 29 ( 8 ): 2236 - 2268 .

MO Y C , WANG S L . Multi-task learning improves synthetic speech detection [C ] // Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2022 : 6392 - 6396 .

李鹏程 , 张旭龙 , 王健宗 , 等 . 面向非平行语料的语音转换技术综述 [J ] . 大数据 , 2024 , 10 ( 3 ): 65 - 81 .

LI P C , ZHANG X L , WANG J Z , et al . A survey of voice conversion based on non-parallel data [J ] . Big Data Research , 2024 , 10 ( 3 ): 65 - 81 .

MUTICA I , MIHALACHE S , BURILEANU D . Synthetic speech detection using deep neural networks [C ] // Proceedings of the 2024 47th International Conference on Telecommunications and Signal Processing (TSP) . Piscataway : IEEE Press , 2024 : 53 - 57 .

LI C T , YANG F R , YANG J . The role of long-term dependency in synthetic speech detection [J ] . IEEE Signal Processing Letters , 2022 , 29 : 1142 - 1146 .

LIU C , XU X L , XIAO F . ASSD: an AI-synthesized speech detection scheme using whisper feature and types classification [J ] . IEEE Transactions on Audio, Speech and Language Processing , 2025 , 33 : 542 - 556 .

BHUKYA R K , RAJ A . Automatic speaker verification spoof detection and countermeasures using Gaussian mixture model [C ] // Proceedings of the 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) . Piscataway : IEEE Press , 2022 : 1 - 6 .

RAHMENI R , BEN AICHA A , BEN AYED Y . Speech spoofing detection using SVM and ELM technique with acoustic features [C ] // Proceedings of the 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) . Piscataway : IEEE Press , 2020 : 1 - 4 .

YU H , TAN Z H , MA Z Y , et al . Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2018 , 29 ( 10 ): 4633 - 4644 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2016 : 770 - 778 .

ALZANTOT M , WANG Z Q , SRIVASTAVA M B . Deep residual neural networks for audio spoofing detection [C ] // Proceedings of the Interspeech 2019 . Farmington Hills : Cengage Learning , 2019 : 1078 - 1082 .

GAO S H , CHENG M M , ZHAO K , et al . Res2Net: a new multi-scale backbone architecture [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 2 ): 652 - 662 .

PARASU P , EPPS J , SRISKANDARAJA K , et al . Investigating light-ResNet architecture for spoofing detection under mismatched conditions [C ] // Proceedings of the Interspeech 2020 . Farmington Hills : Cengage Learning , 2020 : 1111 - 1115 .

杨曼 , 简志华 , 梁承涵 . 采用自监督对比学习的合成伪造语音检测方法 [J ] . 电信科学 , 2024 , 40 ( 11 ): 40 - 49 .

YANG M , JIAN Z H , LIANG C H . A method of synthetic spoofing speech detection using self-supervised contrastive learning [J ] . Telecommunications Science , 2024 , 40 ( 11 ): 40 - 49 .

ZHANG Y , JIANG F , DUAN Z Y . One-class learning towards synthetic voice spoofing detection [J ] . IEEE Signal Processing Letters , 2021 , 28 : 937 - 941 .

WU Z Z , KINNUNEN T , EVANS N , et al . ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge [C ] // Proceedings of the Interspeech 2015 . ISCA , 2015 : 2037 - 2041 .

KINNUNEN T , SAHIDULLAH M , DELGADO H , et al . The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection [C ] // Proceedings of the Interspeech 2017 . Farmington Hills : Cengage Learning , 2017 : 2 - 6 .

BHUKYA R K , RAJ A , RAJA D N . Audio deepfakes: feature extraction and model evaluation for detection [C ] // Proceedings of the 2024 5th International Conference for Emerging Technology (INCET) . Piscataway : IEEE Press , 2024 : 1 - 6 .

VARGA A , STEENEKEN H J M . Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems [J ] . Speech Communication , 1993 , 12 ( 3 ): 247 - 251 .

WANG L B , YOSHIDA Y , KAWAKAMI Y , et al . Relative phase information for detecting human speech and spoofed speech [C ] // Proceedings of the Interspeech 2015 . Farmington Hills : Cengage Learning , 2015 : 2092 - 2096 .

MARTÍN-DOÑAS J M , ÁLVAREZ A . The vicomtech audio deepfake detection system based on Wav2vec2 for the 2022 ADD challenge [C ] // Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2022 : 9241 - 9245 .

TAK H , KAMBLE M , PATINO J , et al . Rawboost: a raw data boosting and augmentation method applied to automatic speaker verification anti-spoofing [C ] // Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2022 : 6382 - 6386 .

温燕 . 基于多分支卷积神经网络的合成与转换语音检测研究 [D ] . 南昌 : 江西师范大学 , 2023 .

WEN Y . Research on synthetic and converted speech detection based on multi-branch convolutional neural network [D ] . Nanchang : Jiangxi Normal University , 2023 .

DAS R K . Known-unknown data augmentation strategies for detection of logical access, physical access and speech deepfake attacks: ASVspoof 2021 [C ] // Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge . 2021 : 29 - 36 .

LIU X C , WANG X , SAHIDULLAH M , et al . ASVspoof 2021: towards spoofed and deepfake speech detection in the wild [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 2507 - 2522 .

TAK H , PATINO J , Todisco M , et al . Evans and A. Larcher, "End-to-End anti-spoofing with RawNet2 [C ] // Proceedings of the 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2021 : 6369 - 6373 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

采用自监督对比学习的合成伪造语音检测方法

采用局部相位量化的合成语音检测方法

采用圆周局部三值模式纹理特征的合成语音检测方法

改进的混沌Hopfield神经网络盲检测算法