浏览全部资源
扫码关注微信
杭州电子科技大学通信工程学院,浙江 杭州 310018
[ "应娜(1978- ),女,博士,杭州电子科技大学通信工程学院副教授、硕士生导师,主要研究方向为智能信号处理与应用。" ]
[ "邹雨鉴(1998- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为多模态情感识别。" ]
[ "杨雪滢(1997- ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音情感识别。" ]
[ "孙文胜(1966- ),男,现就职于杭州电子科技大学通信工程学院,主要研究方向为网络通信。" ]
[ "叶学义(1973- ),男,博士,杭州电子科技大学通信工程学院副教授、硕士生导师,主要研究方向为图像处理、模式识别、信息隐藏。" ]
[ "蒋银河(1999- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为多模态情感识别。" ]
收稿日期:2024-09-27,
修回日期:2024-12-05,
纸质出版日期:2025-06-20
移动端阅览
应娜,邹雨鉴,杨雪滢等.一种基于DN-ResNet11的语音情感识别算法[J].电信科学,2025,41(06):139-153.
YING Na,ZOU Yujian,YANG Xueying,et al.A speech emotion recognition algorithm based on DN-ResNet11[J].Telecommunications Science,2025,41(06):139-153.
应娜,邹雨鉴,杨雪滢等.一种基于DN-ResNet11的语音情感识别算法[J].电信科学,2025,41(06):139-153. DOI: 10.11959/j.issn.1000-0801.2025042.
YING Na,ZOU Yujian,YANG Xueying,et al.A speech emotion recognition algorithm based on DN-ResNet11[J].Telecommunications Science,2025,41(06):139-153. DOI: 10.11959/j.issn.1000-0801.2025042.
为解决网络训练复杂度高的问题并改进语音情感特征提取,提出了基于双嵌套残差网络(DN-ResNet11)与通道注意残差网络(CRNet)的双支路特征提取模型。首先,设计了低复杂度的DN-ResNet11以高效提取语谱图的融合情感特征,提升情感识别率;然后,结合多尺度引导滤波和局部二值模式(local binary pattern,LBP)算法对语谱图进行细节增强;最后,融合两组特征进行情感分类,形成双支路加权融合模型(weighted fusion model based on dual nested residual and channel residual network,WFDN_CRNet),进一步提升情感表征能力。在CASIA、EMO-DB、IEMOCAP等语音情感数据集上情感识别率分别达到94.58%、85.59%、65.72%,所提方法在情感识别率优于ResNet18等基准方法的同时,显著降低了计算成本,验证了模型的有效性。
To address the high complexity of network training and improve speech emotion feature extraction
a dual-branch feature extraction model based on DN-ResNet11 and a channel attention residual network (CRNet) was proposed. Firstly
the low-complexity DN-ResNet11 was designed to efficiently extract fused emotional features from spectrograms
enhancing emotion recognition accuracy. Secondly
multiscale guided filtering and the local binary pattern (LBP) algorithm were incorporated to enhance spectrogram details. Finally
the two sets of features were fused for emotion classification
forming a dual-branch weighted fusion model (weighted fusion model based on dual nested residual and channel residual network
WFDN_CRNet)
further enhancing emotional representation ability. Experiments on the CASIA
EMO-DB
and IEMOCAP speech emotion datasets show emotion recognition rates of 94.58%
85.59%
and 65.72%
respectively. The proposed method not only achieves superior emotion recognition rates compared to baseline models such as ResNet18
but also significantly reduces computational cost
demonstrating the model’s effectiveness.
NWE T L , FOO S W , DE S L C . Speech emotion recognition using hidden Markov models [J ] . Speech Communication , 2003 , 41 ( 4 ): 603 - 623 .
WU S , FALK T H , CHAN W Y . Automatic speech emotion recognition using modulation spectral features [J ] . Speech Communication , 2011 , 53 ( 5 ): 768 - 785 .
SCHULLER B , RIGOLL G , LANG M . Hidden Markov model-based speech emotion recognition [C ] // Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing . Piscataway : IEEE Press , 2003 , 2: II-1.
SUN L , ZOU B , FU S , et al . Speech emotion recognition based on DNN-decision tree SVM model [J ] . Speech Communication , 2019 , 115 : 29 - 37 .
YENIGALLA P , KUMAR A , TRIPATHI S , et al . Speech emotion recognition using spectrogram & phoneme embedding [C ] // Proceedings of the 2018 IEEE International Conference on Interspeech . Piscataway : IEEE Press , 2018 : 3688 - 3692 .
LI Z , LI J , MA S , et al . Speech emotion recognition based on residual neural network with different classifiers [C ] // Proceedings of the 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS) . Piscataway : IEEE Press , 2019 : 186 - 190 .
WANG J , XUE M , CULHANE R , et al . Speech emotion recognition with dual-sequence LSTM architecture [C ] // Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2020 : 6474 - 6478 .
ZHANG W , JIA Y . A study on speech emotion recognition model based on Mel-spectrogram and CapsNet [C ] // Proceedings of the 2021 3rd International Academic Exchange Conference on Science and Technology Innovation (IAECST) . Piscataway : IEEE Press , 2021 : 231 - 235 .
金俊林 , 于玲 , 周骁群 . 基于图卷积神经网络的语音情感识别 [J ] . 信息技术与信息化 , 2022 ( 8 ): 202 - 205 .
JIN J L , YU L , ZHOU X Q . Speech emotion recognition based on graph convolutional nerual network [J ] . Information Technology and Informatization , 2022 ( 8 ): 202 - 205 .
MRUNAL P G , ABHISHEK V . Automatic recognition of emotions in speech with large self-supervised learning transformer models [C ] // Proceedings of the IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings) . Piscataway : IEEE Press , 2023 .
CHEN L W , RUDNICKY A . Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition [C ] // Proceedings of the 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2023 : 1 - 5 .
ZI H Z , YAN F W , YU W . Multi-level fusion of wav2vec 2.0 and BERT for multimodal emotion recognition [C ] // Proceedings of the 2022 IEEE International Conference on Interspeech . Piscataway : IEEE Press , 2022 : 725 - 4729 .
HAN S , POOL J , TRAN J , et al . Learning both weights and connections for efficient neural networks [J ] . In Advances in Neural Information Processing Systems (NeurIPS) , 2015 : 1135 - 1143 .
COURBARIAUX M , BENGIO Y , DAVID J P . BinaryConnect: training deep neural networks with binary weights during propagations [J ] . In Advances in Neural Information Processing Systems (NeurIPS) , 2015 : 3123 - 3131 .
CHOLLET F . Xception: deep learning with depthwise separable convolutions [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2017 : 1251 - 1258 .
HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2016 : 770 - 778 .
GREFF K , SRIVASTAVA R K , SCHMIDHUBER J . Highway and residual networks learn unrolled iterative estimation [J ] . arXiv preprint arXiv: 1612.07771 , 2016 .
ORHAN A E , PITKOW X . Skip connections eliminate singularities [J ] . arXiv preprint arXiv: 1701.09175 , 2017 .
XU J , LI Z , DU B , et al . Reluplex made more practical: leaky ReLU [C ] // Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC) . Piscataway : IEEE Press , 2020 : 1 - 7 .
HE K , SUN J , TANG X . Guided image filtering [J ] . IEEE transactions on pattern analysis and machine intelligence , 2012 , 35 ( 6 ): 1397 - 1409 .
WU H , ZHENG S , ZHANG J , et al . Fast end-to-end trainable guided filter [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2018 : 1838 - 1847 .
PRAKASA E . Texture feature extraction by using local binary pattern [J ] . INKOM Journal , 2016 , 9 ( 2 ): 45 - 48 .
WANG Q , WU B , ZHU P , et al . ECA-Net: efficient channel attention for deep convolutional neural networks [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2020 : 11534 - 11542 .
JIA X Y , XIN C W , YU J W , et al . Temporal modeling matters: a novel temporal emotional modeling approach for speech emotion recognition [C ] // Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2023 .
VLAD S C , COSMIN S C , ADRIANA S . TBDM-Net: bidirectional dense networks with gender information for speech emotion recognition [C ] // Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing . Piscataway : IEEE Press , 2024 .
ZHENG L , XIN K , FU J R . Dual-TBNet: improving the robustness of speech features via dualtransformer-BiLSTM for speech emotion recognition [C ] // Proceedings of the IEEE/ACM Transactions on Audio, Speech, and Language Processing . Piscataway : IEEE Press , 2023 : 2193 – 2203 .
ZHU R F , SUN C X , WEI X P , et al . Speech emotion recognition using channel attention mechanism [C ] // Proceedings of the IEEE International Conference on Computer Engineering and Application (ICCEA) . Piscataway : IEEE Press , 2023 : 680 - 684 .
YONG W , CHENG L , YUAN Z , et al . Time-frequency transformer: a novel time frequency joint learning method for speech emotion recognition [C ] // Proceedings of the International Conference on Neural Information Processing (ICONIP) . Changsha : Central South University Press , 2023 .
LIAO Z , SHEN S . Speech emotion recognition based on swin-transformer [J ] . Journal of Physics: Conference Series , 2023 : 2508 ( 1 ), 012056 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构