浏览全部资源
扫码关注微信
1. 南昌大学信息工程学院,江西 南昌 330029
2. 中国电信股份有限公司江西分公司,江西 南昌 330029
[ "彭杰(1992-),男,南昌大学信息工程学院硕士生,主要研究方向为自然语言处理、文本分析、计算机网络等。" ]
[ "石永革(1953-),男,南昌大学信息工程学院教授,主要研究方向为计算机网络、信息安全。" ]
[ "高胜保(1966-),男,中国电信股份有限公司江西分公司网络运营支撑事业部副主任,主要研究方向为通信网络运营、网络信息安全、云计算及大数据分析等。" ]
网络出版日期:2016-09,
纸质出版日期:2016-09-15
移动端阅览
彭杰, 石永革, 高胜保. 基于对话内容的交互型文本会话主题挖掘[J]. 电信科学, 2016,32(9):139-145.
Jie PENG, Yongge SHI, Shengbao GAO. Session topic mining for interactive text based on conversational content[J]. Telecommunications science, 2016, 32(9): 139-145.
彭杰, 石永革, 高胜保. 基于对话内容的交互型文本会话主题挖掘[J]. 电信科学, 2016,32(9):139-145. DOI: 10.11959/j.issn.1000-0801.2016238.
Jie PENG, Yongge SHI, Shengbao GAO. Session topic mining for interactive text based on conversational content[J]. Telecommunications science, 2016, 32(9): 139-145. DOI: 10.11959/j.issn.1000-0801.2016238.
传统的主题挖掘模型一般仅从交互型文本中挖掘出文档主题,为了能够从中挖掘出会话主题并提高挖掘模型的普适性,提出了一种基于对话内容的交互型文本会话主题生成模型。首先通过分析交互型文本的特征,基于主题树的概念,定义了一个5层结构的对话生成树。以此为基础,再基于LDA构建会话主题生成模型(ST-LDA)。最后采用吉布斯抽样法对ST-LDA进行推导,得到会话主题及其分布概率。使用实际数据进行验证,结果表明,ST-LDA模型可以从交互型文本中有效地挖掘出会话主题。此外,成果可以降低分类算法的复杂度,回溯主题—参与者关联关系,具有较好的普适性。
Traditional theme mining model generally digs out the document theme from the interactive text only.In order to explore the session topic and improve the universality of mining model
a kind of interactive text session topic generation model based on the content of the dialogue was put forward.Firstly
by analyzing the characteristics of interactive text and based on the concept of topic tree
a dialog spanning tree was defined with a five-layer structure.Based on this and LDA
the model of session topic generation(ST-LDA)was built.At last
Gibbs sampling method was adopted to deduce the ST-LDA and obtaining session topic and its distribution probability.The results show that the ST-LDA model can dig out a session topic effectively from the interactive text.Besides
the results can reduce the complexity of the classification algorithm and can be back to the theme—participants association.It also has a good universality.
林小俊 , 张猛 , 暴筱 , 等 . 基于概念网络的短文本分类方法 [J ] . 计算机工程 , 2010 , 21 ( 11 ): 4 - 6 .
LIN X J , ZHANG M , BAO X . Short-text classification method based on concept network [J ] . Computer Engineering , 2010 , 21 ( 11 ): 4 - 6 .
张志飞 , 苗夺谦 , 高灿 . 基于概念网络的短文本分类方法 [J ] . 计算机应用 , 2013 , 33 ( 6 ): 1587 - 1590 .
ZHANG Z F , MIAO D Q , GAO C . Short text classification using latent Dirichlet allocation [J ] . Journal of Computer Applications , 2013 , 33 ( 6 ): 1587 - 1590 .
孙建旺 , 吕学强 , 张雷瀚 . 基于语义与最大匹配度的短文本分类研究 [J ] . 计算机工程与设计 , 2013 , 34 ( 10 ): 3613 - 3618 .
SUN J W , LU X Q , ZHANG L H . Short text classification based on semantics and maximum matching degree [J ] . Computer Engineering and Designing , 2013 , 34 ( 10 ): 3613 - 3618 .
邬晓钧 , 郑方 , 徐明星 . 基于主题森林结构的对话管理模型 [J ] . 自动化学报 , 2003 , 29 ( 3 ): 275 - 283 .
WU X J , ZHENG F , XU M X . Topic forest based dialog management model [J ] . Acta Automatica Sinica , 2003 , 29 ( 3 ): 275 - 283 .
ABDUR R , BASHER M A , BENJAMIN C M F . Analyzing topics and authors in chat logs for crime investigation [J ] . Knowledge &Information Systems , 2014 , 39 ( 2 ): 351 - 381 .
蒋莹莹 . 基于对话的主题提取研究 [D ] . 武汉:华中科技大学 , 2009 .
JIANG Y Y . Research on topic extraction based dialogue [D ] . Wuhan:Huazhong University of Science&Technology , 2009 .
房冠南 . 面向对话语料的标签推荐 [D ] . 北京:北京邮电大学 , 2012 .
FANG G N . Tag recommendation for dialogue corpus [D ] . Beijing:Beijing University of Posts and Telecommunications , 2012 .
张晨逸 , 孙建伶 , 丁轶群 . 基于MB-LDA模型的微博主题挖掘 [J ] . 计算机研究与发展 , 2011 , 48 ( 10 ): 1795 - 1802 .
ZHANG C Y , SUN J L , DING Y Q . Topic mining for micro-blog based on MB-LDA model [J ] . Journal of Computer Research and Development , 2011 , 48 ( 10 ): 1795 - 1802 .
BLEI D , NG A , JORDAN M . Latent dirichlet allocation [J ] . Journal of Machine Learning Research , 2003 ( 3 ): 4 - 5 .
吕超镇 , 姬东鸿 , 吴飞飞 . 基于LDA特征扩展的短文本分类 [J ] . 计算机工程与应用 , 2015 , 51 ( 4 ): 123 - 127 .
LV C Z , JI D H , WU F F . Short text classification based on expanding feature of LDA [J ] . Computer Engineering and Applications , 2015 , 51 ( 4 ): 123 - 127 .
QUAN X , LIU G , LU Z , et al . Short text similarity based on probabilistic topics [J ] . Knowledge and Information Systems , 2010 , 25 ( 3 ): 473 - 491 .
GRIFFITHS T L , STEYVERS M . Finding scientific topics [J ] . Procof the National Academy of Sciences of the United States of America , 2004 ( 101 ): 5228 - 5235 .
GRIFFITHS T L , STEYVERS M . Finding scientific topics [J ] . PNAS , 2004 , 101 ( 1 ): 5228 - 5235 .
樊兴华 , 孙茂松 . 一种高性能的两类中文文本分类方法 [J ] . 计算机学报 , 2006 ( 1 ): 124 - 131 .
FAN X H , SUN M S . A High performance two-class chinese text categorization method [J ] . Chinese Journal of Computers , 2006 ( 1 ): 124 - 131 .
0
浏览量
2011
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构