浏览全部资源
扫码关注微信
杭州电子科技大学通信工程学院,浙江 杭州 310018
[ "应娜(1978- ),女,博士,杭州电子科技大学通信工程学院副教授、硕士生导师,主要研究方向为智能信号处理与应用。" ]
[ "吴顺朋(1994- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音信号处理。" ]
[ "杨萌(1980- ),男,杭州电子科技大学通信工程学院副教授、硕士生导师,主要研究方向为SAR图像目标检测与识别及大模型应用。" ]
[ "邹雨鉴(1998- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音信号处理。" ]
收稿日期:2023-11-30,
修回日期:2024-04-10,
纸质出版日期:2024-05-20
移动端阅览
应娜,吴顺朋,杨萌等.基于小波散射变换和MFCC的双特征语音情感识别融合算法[J].电信科学,2024,40(05):62-72.
YING Na,WU Shunpeng,YANG Meng,et al.Dual-feature speech emotion recognition fusion algorithm based on wavelet scattering transform and MFCC[J].Telecommunications Science,2024,40(05):62-72.
应娜,吴顺朋,杨萌等.基于小波散射变换和MFCC的双特征语音情感识别融合算法[J].电信科学,2024,40(05):62-72. DOI: 10.11959/j.issn.1000-0801.2024088.
YING Na,WU Shunpeng,YANG Meng,et al.Dual-feature speech emotion recognition fusion algorithm based on wavelet scattering transform and MFCC[J].Telecommunications Science,2024,40(05):62-72. DOI: 10.11959/j.issn.1000-0801.2024088.
为了充分挖掘语音信号频谱包含的情感信息以提高语音情感识别的准确性,提出了一种基于小波散射变换和梅尔频率倒谱系数(Mel-frequency cepstral coefficient,MFCC)的排列熵加权和偏差调整规则的语音情感识别融合算法(PEW-BAR)。算法首先获取语音信号的小波散射特征和梅尔频率倒谱系数的相关特征;然后按尺度维度扩展小波散射特征,利用支持向量机得到情感识别的后验概率并获得排列熵,并使用排列熵对后验概率进行加权;最后采用一种偏差调整规则进一步融合MFCC的相关特征的识别结果。实验结果表明,在EMODB、RAVDESS和eNTERFACE05数据集上,与传统的基于小波散射系数的语音情感识别方法相比,该算法将ACC分别提高了2.82%、2.85%和5.92%,将UAR分别提升了3.40%、2.87%和5.80%,IEMOCAP上提高了6.89%。
A fusion algorithm named permutation entropy weighted and bias adjustment rule fusion (PEW-BAR) was proposed to enhance the accuracy of speech emotion recognition by exploiting the emotional information in the spectral characteristics of speech signals. The algorithm was based on the integration of wavelet scattering transform and Mel-frequency cepstral coefficients (MFCC). Firstly
wavelet scattering features and MFCC-related features from speech signals were extracted. Then
the wavelet scattering features were expanded in the scale dimension and applied support vector machines to obtain posterior probabilities for emotion recognition. And permutation entropy was calculated and a weighted fusion based on this entropy was subsequently applied. Finally
a bias adjustment rule was utilized to refine the integration results obtained from the MFCC-related features. Experimental results on various datasets
including EMODB
RAVDESS
and eNTERFACE05
demonstrate notable improvements. The proposed algorithm outperforms traditional wavelet scattering coefficient-based methods
achieving accuracy improvements of 2.82%
2.85%
and 5.92%
respectively. Additionally
it shows enhancements of 3.40%
2.87%
and 5.80% in terms of unweighted average recall (UAR)
and a 6.89% improvement on the IEMOCAP dataset.
FAHAD M S , RANJAN A , YADAV J , et al . A survey of speech emotion recognition in natural environment [J ] . Digital Signal Processing , 2021 ( 110 ): 102951 .
SUN C , LI H , MA L . Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network [J ] . Frontiers in Psychology , 2023 ( 13 ): 1075624 .
王海坤 , 潘嘉 , 刘聪 . 语音识别技术的研究进展与展望 [J ] . 电信科学 , 2018 , 34 ( 2 ): 1 - 11 .
WANG H K , PAN J , LIU C . Research development and forecast of automatic speech recognition technologies [J ] . Telecommunications Science , 2018 , 34 ( 2 ): 1 - 11 .
杨震 , 王天朗 , 郭海燕 , 等 . 跨域注意力特征融合的说话人确认方法 [J ] . 通信学报 , 2023 , 44 ( 8 ): 89 - 98 .
YANG Z , WANG T L , GUO H Y , et al . Speaker verification method based on cross-domain attentive feature fusion [J ] . Journal on Communications , 2023 , 44 ( 8 ): 89 - 98 .
FALAHZADEH M R , FARSA E Z , HARIMI A , et al . 3D convolutional neural network for speech emotion recognition with its realization on intel CPU and NVIDIA GPU [J ] . IEEE Access , 2022 ( 10 ): 112460 - 112471 .
FALAHZADEH M R , FAROKHI F , HARIMI A , et al . A 3D tensor representation of speech and 3D convolutional neural network for emotion recognition [J ] . Circuits, Systems, and Signal Processing , 2023 , 42 ( 7 ): 4271 - 4291 .
FALAHZADEH M R , FAROKHI F , HARIMI A , et al . Deep convolutional neural network and gray wolf optimization algorithm for speech emotion recognition [J ] . Circuits, Systems, and Signal Processing , 2023 , 42 ( 1 ): 449 - 492 .
王金华 , 应娜 , 朱辰都 , 等 . 基于语谱图提取深度空间注意特征的语音情感识别算法 [J ] . 电信科学 , 2019 , 35 ( 7 ): 100 - 108 .
WANG J H , YING N , ZHU C D , et al . Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature [J ] . Telecommunications Science , 2019 , 35 ( 7 ): 100 - 108 .
GERCZUK M , AMIRIPARIAN S , OTTL S , et al . EmoNet: A transfer learning framework for multi-corpus speech emotion recognition [J ] . IEEE Transactions on Affective Computing , 2023 , 14 ( 2 ): 1472 - 1487 .
AFTAB A , MORSALI A , GHAEMMAGHAMI S , et al . LIGHT-SERNET: A lightweight fully convolutional neural network for speech emotion recognition [C ] // Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2022 : 6912 - 6916 .
徐嘉 , 简志华 , 金宏辉 , 等 . 采用恒Q调制包络的合成语音伪装检测方法 [J ] . 电信科学 , 2023 , 39 ( 11 ): 107 - 115 .
XU J , JIAN Z H , JIN H H , et al . A method of synthetic speech spoofing detection using constant Q modulation envelope [J ] . Telecommunications Science , 2023 , 39 ( 11 ): 107 - 115 .
SINGH P , WALDEKAR S , SAHIDULLAH M , et al . Analysis of constant-Q filterbank based representations for speech emotion recognition [J ] . Digital Signal Processing , 2022 ( 130 ): 103712 .
ZHANG S Q , TAO X , CHUANG Y L , et al . Learning deep multimodal affective features for spontaneous speech emotion recognition [J ] . Speech Communication , 2021 ( 127 ): 73 - 81 .
BRUNI V , CARDINALI M L , VITULANO D . An MDL-based wavelet scattering features selection for signal classification [J ] . Axioms , 2022 , 11 ( 8 ): 376 .
MEI N , WANG H , ZHANG Y , et al . Classification of heart sounds based on quality assessment and wavelet scattering transform [J ] . Computers in Biology and Medicine , 2021 ( 137 ): 104814 .
SINGH P , SAHA G , SAHIDULLAH M . Deep scattering network for speech emotion recognition [C ] // Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO) . Piscataway : IEEE Press , 2021 : 131 - 135 .
孙聪珊 , 马琳 , 李海峰 . 基于CM-OMEMD和小波散射网络的语音情感识别 [J ] . 信号处理 , 2023 , 39 ( 4 ): 688 - 697 .
SUN C S , MA L , LI H F . Speech emotion recognition based on CM-OMEMD and wavelet scattering network [J ] . Journal of Signal Processing , 2023 , 39 ( 4 ): 688 - 697 .
CHIN C S , ZHANG J H . Wavelet scattering transform for multiclass support vector machines in audio devices classification system [C ] // Proceedings of the 2021 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) . Piscataway : IEEE Press , 2021 : 735 - 740 .
樊鑫 , 赵晓光 , 唐胜利 , 等 . WSD-SVM在工作面底板破坏深度微震事件自动识别中的应用 [J ] . 西安科技大学学报 , 2023 , 43 ( 1 ): 160 - 166 .
FAN X , ZHAO X G , TANG S L , et al . Application of WSD-SVM in micro-seismic events automatic recognition of the damage depth of working face floor [J ] . Journal of Xi'an University of Science and Technology , 2023 , 43 ( 1 ): 160 - 166 .
LIU Z S , YAO G H , ZHANG Q , et al . Wavelet scattering transform for ECG beat classification [J ] . Computational and Mathematical Methods in Medicine , 2020 : 3215681 .
KEK X Y , CHIN C S , LI Y . An intelligent low-complexity computing interleaving wavelet scattering based mobile shuffling network for acoustic scene classification [J ] . IEEE Access , 2022 ( 10 ): 82185 - 82201 .
PHAM T D . Classification of motor-imagery tasks using a large EEG dataset by fusing classifiers learning on wavelet-scattering features [J ] . IEEE Transactions on Neural Systems and Rehabilitation Engineering , 2023 ( 31 ): 1097 - 1107 .
HAJIHASHEMI V , GHARAHBAGH A A , CRUZ P M , et al . Binaural acoustic scene classification using wavelet scattering, parallel ensemble classifiers and nonlinear fusion [J ] . Sensors , 2022 , 22 ( 4 ): 1535 .
AL-TIMEMY A H , SERRESTOU Y , KHUSHABA R N , et al . Hand gesture recognition with acoustic myography and wavelet scattering transform [J ] . IEEE Access , 2022 ( 10 ): 107526 - 107535 .
SINGH P , SAHIDULLAH M , SAHA G . Modulation spectral features for speech emotion recognition using deep neural networks [J ] . Speech Communication , 2023 ( 146 ): 53 - 69 .
AVILA A R , AKHTAR Z , SANTOS J F , et al . Feature pooling of modulation spectrum features for improved speech emotion recognition in the wild [J ] . IEEE Transactions on Affective Computing , 2021 , 12 ( 1 ): 177 - 188 .
LIU Y , SUN H , GUAN W , et al . Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework [J ] . Speech Communication , 2022 ( 139 ): 1 - 9 .
0
浏览量
5
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构