浏览全部资源
扫码关注微信
[ "王金华(1992- ),女,杭州电子科技大学硕士生,主要研究方向为深度学习与语音处理。" ]
[ "应娜(1978- ),女,博士,杭州电子科技大学副教授、硕士生导师,主要研究方向为信号处理与人工智能。" ]
[ "朱辰都(1995- ),男,杭州电子科技大学硕士生,主要研究方向为语音处理。" ]
[ "刘兆森(1995- ),男,杭州电子科技大学硕士生,主要研究方向为深度学习与图像处理。" ]
[ "蔡哲栋(1994- ),男,杭州电子科技大学硕士生,主要研究方向为深度学习与图像处理。" ]
网络出版日期:2019-07,
纸质出版日期:2019-07-20
移动端阅览
王金华, 应娜, 朱辰都, 等. 基于语谱图提取深度空间注意特征的语音情感识别算法[J]. 电信科学, 2019,35(7):100-108.
Jinhua WANG, Na YING, Chendu ZHU, et al. Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature[J]. Telecommunications science, 2019, 35(7): 100-108.
王金华, 应娜, 朱辰都, 等. 基于语谱图提取深度空间注意特征的语音情感识别算法[J]. 电信科学, 2019,35(7):100-108. DOI: 10.11959/j.issn.1000-0801.2019052.
Jinhua WANG, Na YING, Chendu ZHU, et al. Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature[J]. Telecommunications science, 2019, 35(7): 100-108. DOI: 10.11959/j.issn.1000-0801.2019052.
从语音情感特征的提取和分类建模出发,以混合卷积神经网络模型为基础,改进特征提取中的 Itti模型,包括:增加通过局部二值模式提取的纹理特征;结合听觉敏感度权重提取情感强相关特征。然后提出通过特征约束条件提取标定权重特征的约束挤压和激励网络结构;最后形成以 VGGnet 和长短时记忆网络混合网络为基础的微调模型,进一步提升了情感表征能力。通过在自然情感数据库和柏林德语数据库上进行验证,该模型在情感识别率上有明显的上升,相较于基准模型提升了 8. 43%,同时对比了本模型在自然数据库(FAU-AEC)和柏林数据库(EMO-DB)上的识别效果,实验结果证明模型具有良好的泛化性。
Starts from the extraction and classification modeling of speech emotion features
based on the hybrid convolutional neural network model
the Itti model in feature extraction was improved
including increasing the extraction by local binary mode. The strong correlation features were extracted combining with the sensitivity of the auditory sensitivity. Then
the constrained extrusion and excitation network structure of the calibration weights were extracted by feature constraints. Finally
a fine-tuning model based on VGGnet and long-short-time memory network hybrid network was formed
further enhancing the ability to express emotions. By validating on the natural sentiment database and the German-German database
the model had a significant increase in the rate of sentiment recognition
which is 8. 43% higher than the benchmark model. At the same time
the recognition effect of the model on the natural database (FAU-AEC) and the Berlin database (EMO-DB) were compared. The experimental results show that the model has a good generalization.
韩文静 , 李海峰 , 阮华斌 , 等 . 语音情感识别研究进展综述 [J ] . 软件学报 , 2014 , 25 ( 1 ): 37 - 50 .
HAN W J , LI H F , RUAN H B , et al . A review of research progress in speech emotion recognition [J ] . Journal of Software , 2014 , 25 ( 1 ): 37 - 50 .
王海坤 , 潘嘉 , 刘聪 . 语音识别技术的研究进展与展望 [J ] . 电信科学 , 2018 , 34 ( 2 ): 1 - 11 .
WANG H K , PAN J , LIU C . Research progress and prospect of speech recognition technology [J ] . Telecommunications Science , 2018 , 34 ( 2 ): 1 - 11 .
YAMADA T , HASHIMOTO H , TOSA N . Pattern recognition of emotion with neural network [C ] // The 1995 IEEE IECON 21st International Conference on Industrial Electronics,Control,and Instrumentation,Nov 6-10,1995,Orlando,FL,USA . Piscataway:IEEE Press , 1995 : 183 - 187 .
TENG Z , JI W . Speech emotion recognition with i-vector feature and rnn model [C ] // 2015 IEEE China Summit and International Conference on Signal and Information Processing (China SIP),July 12-15,2015,Chengdu,China . Piscataway:IEEE Press , 2015 : 524 - 528 .
BASU A , CHAKRABORTY J , AFTABUDDIN M . Emotion recognition from speech using convolutional neural network with recurrent neural network architecture [C ] // 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA),Dec 13-16,2016,Jeju,South Korea . Piscataway:IEEE Press , 2017 : 333 - 336 .
SHI B , BAI X , YAO C . An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 11 ).
ZAZO R , LOZANO-DIEZ A , GONZALEZ D J , et al . Language identification in short utterances using long short-term memory (LSTM) [J ] . Recurrent Neural Networks , 2016 ( 1 ).
GELLY G , GAUVAIN J L , LE V , et al . A divide-and-conquer approach for language identification based on recurrent neural networks [Z ] . 2016 .
LOZANO-DIEZ A , ZAZO C R , GONZLEZ D J , et al . An end-to-end approach to language identification in short utterances using convolutional neural networks [J ] . 2015 .
ZHANG X R , SONG P , ZHA C , et al . Auditory attention model based on Chirplet for cross-corpus speech emotion recognition [J ] . Journal of Southeast University , 2016 , 32 ( 4 ): 402 - 407 .
纪滨 , 杨盼盼 , 申元霞 . 基于改进ITTI模型及粒子群优化算法的白细胞区域提取 [J ] . 安徽工业大学学报 , 2016 , 33 ( 3 ): 284 - 288 .
JI B , YANG P P , SHEN Y X . Leukocyte region extraction based on improved ITTI model and particle swarm optimization algorithm [J ] . Journal of Anhui University of Technology , 2016 , 33 ( 3 ): 284 - 288 .
刘兵 , 霍键亮 . 基于灰度概率统计的视觉注意改进算法 [J ] . 电子设计工程 , 2013 , 21 ( 5 ): 54 - 57 .
LIU B , HUO J L . Improved visual attention algorithm based on gray probability statistics [J ] . Electronic Design Engineering , 2013 , 21 ( 5 ): 54 - 57 .
KALINLI O , CHEN R . Speech syllable/vowel /phone boundary detection using auditory attention cues: US20120253812 [P ] .2016-02-02.
STEVENS C , HARN B , CHARD D J , et al . Examining the role of attention and instruction in at-risk kind ergarteners electrophysiological measures of selective auditory attention before and after an early literacy intervention [J ] . Journal of Learning Disabilities , 2013 , 46 ( 1 ): 73 - 86 .
张欣然 , 巨晓正 , 宋鹏 , 等 . 用于垮库语音情感识别的 DBN特征融合方法 [J ] . 信号处理 , 2017 , 33 ( 5 ): 649 - 650 .
ZHANG X R , JU X Z , SONG P , et al . DBN feature fusion method for voice emotion recognition in library [J ] . Signal Processing , 2017 , 33 ( 5 ): 649 - 650 .
HU J , SHEN L , SUN G . Squeeze-and-excitation networks [J ] . arXiv: 1709.01507 , 2017 .
EYBEN F , WOLLMER M , SCHULLER B . openSMILE—the Munich versatile and fast open-source audio feature extractor [C ] // The 18th ACM International Conference on Multimedia,October 25-29,2010,Firenze,Italy . New York:ACM Press , 2010 : 1459 - 1462 .
BARTZ C , HEROLD T , HAOJIN Y , et al . Language identification using deep convolutional recurrent neural networks [J ] . arXiv: 1708.04811v1 , 2017 .
0
浏览量
650
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构