浏览全部资源
扫码关注微信
[ "吴震东(1976-),男,杭州电子科技大学网络空间安全学院讲师,主要研究方向为生物特征识别、生物密钥、网络安全、自然语言处理、人工智能等。" ]
[ "潘树诚(1991-),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为基于深度学习的声纹、人脸识别研究等。" ]
[ "章坚武(1961-),男,杭州电子科技大学通信工程学院教授、博士生导师,主要研究方向为移动通信系统、多媒体通信技术、网络安全等。" ]
网络出版日期:2017-03,
纸质出版日期:2017-03-20
移动端阅览
吴震东, 潘树诚, 章坚武. 基于CNN的连续语音说话人声纹识别[J]. 电信科学, 2017,33(3):59-66.
Zhendong WU, Shucheng PAN, Jianwu ZHANG. Continuous speech speaker recognition based on CNN[J]. Telecommunications science, 2017, 33(3): 59-66.
吴震东, 潘树诚, 章坚武. 基于CNN的连续语音说话人声纹识别[J]. 电信科学, 2017,33(3):59-66. DOI: 10.11959/j.issn.1000-0801.2017046.
Zhendong WU, Shucheng PAN, Jianwu ZHANG. Continuous speech speaker recognition based on CNN[J]. Telecommunications science, 2017, 33(3): 59-66. DOI: 10.11959/j.issn.1000-0801.2017046.
近年来,随着社会生活水平的不断提高,人们对机器智能人声识别的要求越来越高。高斯混合—隐马尔可夫模型(Gaussian of mixture-hidden Markov model
GMM-HMM)是说话人识别研究领域中最重要的模型。由于该模型对大语音数据的建模能力不是很好,对噪声的顽健性也比较差,模型的发展遇到了瓶颈。为了解决该问题,研究者开始关注深度学习技术。引入了CNN深度学习模型研究连续语音说话人识别问题,并提出了CNN连续说话人识别(continuous speaker recognition of convolutional neural network
CSR-CNN)算法。模型提取固定长度、符合语序的语音片段,形成时间线上的有序语谱图,通过CNN提取特征序列,经过奖惩函数对特征序列组合进行连续测量。实验结果表明,CSR-CNN算法在连续—片段说话人识别领域取得了比GMM-HMM更好的识别效果。
In the last few years
with the constant improvement of the social life level
the requirement for speech recognition is getting higher and higher. GMM-HMM (Gaussian mixture-hidden Markov model) have been the main method for speaker recognition. Because of the bad modeling capability of big data and the bad performance of robustness
the development of this model meets the bottleneck.In order to solve this question
researchers began to focus on deep learning technologies. CNN deep learning model for continuous speech speaker recognition was introduced and CSR-CNN model was put forward. The model extracts fixed-length and right-order phonetic fraction to form an ordered sound spectrograph. Then input the voiceprint extract from CNN model to a reward-penalty function to continuous measurement. Experimental results show that CSR-CNN model has very good recognition effectin continuous speech speaker recognition field.
SU D , WU X , XU L . GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection [C ] // 2010 IEEE International Conference on Acoustics Speech and Signal Processing , March 14 - 19 , 2010 , Dallas, TX, USA . New Jersey : IEEE Press , 2010 : 4890 - 4893 .
JOACHIMS T . Making large-scale SVM learning practical [J ] . Technical Reports , 1998 , 8 ( 3 ): 499 - 526 .
REYNOLDS D A , QUATIERI T F , DUNN R B . Speaker verification using adapted gaussian mixture models [J ] . Digital Signal Processing , 2000 , 10 ( 1 - 3 ): 19 - 41 .
HEBERT M . Text-dependent speaker recognition [M ] . Heidelberg:Springer , 2008 : 743 - 762 .
VOGT R J , LUSTI C J , SRIDHARAN S . Factor analysis modeling for speaker verification with short utterances [J ] . Journal of Substance Abuse Treatment , 2008 , 10 ( 1 ): 11 - 16 .
VOGT R , BAKER B , SRIDHARAN S . Factor analysis subspace estimation for speaker verification with short utterances [C ] // INTERSPEECH 2008, Conference of the International Speech Communication Association , Sept 6 - 10 , 2008 , Brisbane,Australia . [S.l.: s.n. ] , 2008 : 853 - 856 .
KANAGASUNDARAM A , VOGT R , DEAN D , et al . i-Vector based speaker recognition on short utterances [C ] // INTERSPEECH 2011(DBLP) , August 27 - 31 . 2011 , Florence, Italy . [S.l.: s.n. ] 2011 .
LARCHER A , BOUSQUET P , KONG A L , et al . i-Vectors in the context of phonetically-constrained short utterances for speaker verification [C ] // ICASSP , March 25 - 30 , 2012 , Kyoto, Japan . New Jersey : IEEE Press , 2012 : 4773 - 4773 .
HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks [J ] . Science , 2006 , 313 ( 5786 ): 504 - 507 .
ZOU M , CONZEN S D . A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data [J ] . Bioinformatics , 2005 , 21 ( 1 ): 71 - 79 .
RUMELHART D E , MCCLELLAND J L . Parallel distributed processing [M ] // Cambridge : The MIT Press , 1986 : 45 - 76 .
ZORRIA SSATINE F , TANNOCK J D T . A review of neural networks for statistical process control [J ] . Journal of Intelligent Manufacturing , 1998 , 9 ( 3 ): 209 - 224 .
CHEN S H , HWANG S H , WANG Y R . An RNN-based prosodic information synthesizer for Mandarin text-to-speech [J ] . IEEE Transactions on Speech & Audio Processing , 1998 , 6 ( 3 ): 226 - 239 .
TAN T , QIAN Y , YU D , et al . Speaker-aware training of LSTM-RNNS for acoustic modeling [C ] // 2016 IEEE International Conference on Acoustics, Speech and Signal Processing , March 20 - 25 , 2011 , Shanghai, China . New Jersey : IEEE Press , 2016 : 5280 - 5284 .
GALES M J F . Maximum likelihood linear transformations for HMM-based speech recognition [J ] . Computer Speech &Language , 1998 , 12 ( 2 ): 75 - 98 .
RAMASWAMY G N , GOPALAKRISHAN P S . Compression of acoustic features for speech recognition in network environments [C ] // 1999 IEEE International Conference on Acoustics, Speech and Signal Processing , May 15 , 1998 , Seattle, WA, USA . New Jersey : IEEE Press , 1998 : 977 - 980 .
PAN J , LIU C , WANG Z , et al . Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMS in acoustic modeling [C ] // 2012 International Symposium on Chinese Spoken Language Processing , Dec 5 - 8 , 2012 , Kowloon Tong, China . New Jersey : IEEE Press , 2012 : 301 - 305 .
HUANG Z , TANG J , XUE S , et al . Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code [C ] // IEEE International Conference on Acoustics, Speech and Signal Processing , March 20 - 25 , 2016 , Shanghai, China . New Jersey : IEEE Press , 2016 : 5305 - 5309 .
SAATCI E , TAVASANOGLU V . Multiscale handwritten character recognition using CNN image filters [C ] // 2002 International Joint Conference on Neural Networks , May 12 - 17 , 2002 , Honolulu, HI, USA . New Jersey : IEEE Press , 2002 : 2044 - 2048 .
LIU K , ZHANG M , PAN Z . Facial expression recognition with CNN ensemble [C ] // International Conference on Cyberworlds , Sept 28 - 30 , 2016 , Chongqing, China . New Jersey : IEEE Press , 2016 : 163 - 166 .
JURISIC F , FILKOVIC I , KALAFATIC Z . Multiple-dataset traffic sign classification with OneCNN [C ] // Iapr Asian Conference on Pattern Recognition , Nov 3 - 6 , 2015 , Kuala Lumpur,Malaysia . New Jersey : IEEE Press , 2015 : 614 - 618 .
ZHANG L , LIN L , LIANG X , et al . Is faster R-CNN doing well for pedestrian detection? [M ] . Heidelberg : Springer-Verlag : 443 - 457 .
ZHENG Y , LI Z , ZHANG C . A hybrid architecture based on CNN for image semantic annotation [M ] //SHI Z Z, VADERA S,LI G. Intelligent Information Processing Ⅷ, Heidelberg:Springer , 2016 : 81 - 90 .
PARMAKSIZOGLU S , ALCI M . A novel cloning template designing method by using an artificial bee colony algorithm for edge detection of CNN based imaging sensors [J ] . Sensors , 2011 , 11 ( 5 ): 5337 - 5359 .
0
浏览量
2524
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构