浏览全部资源
扫码关注微信
[ "孙洋(1983- ),女,中国移动通信有限公司研究院工程师,主要研究方向为自然语言处理、机器学习和大数据安全" ]
[ "粟栗(1981- ),男,博士,中国移动通信有限公司研究院教授级高级工程师,主要研究方向为大数据安全和密码学" ]
[ "张星(1980- ),男,中国移动通信有限公司研究院工程师、技术经理,主要研究方向为数据安全、数据存储和虚拟化技术" ]
[ "王峰生(1979- ),男,现就职于中国移动通信有限公司研究院,主要研究方向为移动通信网络安全" ]
[ "杜海涛(1979—),男,博士,中国移动通信有限公司研究院高级工程师,主要研究方向为大数据安全和移动通信网络安全" ]
网络出版日期:2020-03,
纸质出版日期:2020-03-20
移动端阅览
孙洋, 粟栗, 张星, 等. 基于子语义空间的挖掘短文本策略方法[J]. 电信科学, 2020,36(3):83-94.
Yang SUN, Li SU, Xing ZHANG, et al. Method of short text strategy mining based on sub-semantic space[J]. Telecommunications science, 2020, 36(3): 83-94.
孙洋, 粟栗, 张星, 等. 基于子语义空间的挖掘短文本策略方法[J]. 电信科学, 2020,36(3):83-94. DOI: 10.11959/j.issn.1000-0801.2020061.
Yang SUN, Li SU, Xing ZHANG, et al. Method of short text strategy mining based on sub-semantic space[J]. Telecommunications science, 2020, 36(3): 83-94. DOI: 10.11959/j.issn.1000-0801.2020061.
为解决精准识别短文本数据的问题,提出一种基于子语义空间的短文本策略挖掘方法。该方法首先采用语义空间技术,解决短文本在分析过程中存在的“词汇鸿沟”与“数据稀疏”问题;然后基于聚类算法将语义空间划分为多个子语义空间,在各子语义空间并行挖掘关联规则,提高了策略生成的效率与质量;最后利用二叉树进行策略归并,生成最简策略集。实验证明,与传统的分类模型相比,该方案生成的策略集在误报率为6.5%的情况下,准确率可达88%。在违规短信的发现处理中,使用该技术挖掘的策略集,覆盖能力强、准确率高,具有很强的实用性。
To solve the problem of identifying short text data accurately
a method of short text strategy mining based on sub-semantic space was proposed.Firstly
semantic space technology was used to solve the problem of “vocabularygap” and “data sparseness” in short text analysis.Then
based on clustering algorithm
the semantic space was divided into several sub-semantic spaces
and association rules were mined in the sub-semantic space
which improved the efficiency and quality of strategy generation.Finally
binary tree was used to merge strategies and generate the simplest strategy set.Experiments show that compared with the traditional classification model
the accuracy rate of the strategy set generated by the proposed scheme can achieve 85% when the false positive rate is 6.5%.In the processing of illegal short messages
using this technology to mine potential policy sets has strong coverage ability
high accuracy and strong practicability.
YIH W , GOODMAN J , CARVALHO V R . Finding advertising keywords on Web pages [C ] // Proceedings of the 15th International Conference on World Wide Web . New York:ACM Press , 2006 : 213 - 222 .
KUHN R , DE MORI R . A cache-based natural language model for speech recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 1990 , 12 ( 6 ): 570 - 583 .
MIHALCEA R , TARAU P . TextRank:bringing order into texts [Z ] . 2004 .
王庆 , 陈泽亚 , 郭静 , 等 . 基于词共现矩阵的项目关键词词库和关键词语义网络 [J ] . 计算机应用 , 2015 , 35 ( 6 ): 1649 - 1653 .
WANG Q , CHEN Z Y , GUO J , et al . Project keyword lexicon and keyword semantic network based on word co-occurrence matrix [J ] . Journal of Computer Applications , 2015 , 35 ( 6 ): 1649 - 1653 .
董振东 , 董强 , 郝长伶 . 知网的理论发现 [J ] . 中文信息学报 , 2007 , 21 ( 4 ): 3 - 9 .
DONG Z D , DONG Q , HAO C L . Theoretical findings of HowNet [J ] . Journal of Chinese Information Processing , 2007 , 21 ( 4 ): 3 - 9 .
BENGIO Y , DUCHARME R , VINCENT P , et al . A neural probabilistic language model [J ] . Journal of Machine Learning Research , 2003 ( 3 ): 1137 - 1155 .
MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [J ] . Computer Science , 2013 .
BENGIO Y , DUCHARME R , WINCENT P . Neural probabilistic language model neural probabilistic language model [Z ] . 2003 .
HANM J W , KAMBER M , PEI J . 数据挖掘概念与技术 [M ] . 范明,孟小峰 ,译 .北京 : 机械工业出版社 , 2012 : 157 - 179 .
HANM J W , KAMBER M , PEI J . Data mining concepts and techniques [M ] . Translated by FAN M,MENG X F , Beijing : Machinery Industry PressPress , 2012 : 157 - 179 .
朱龙珠 , 徐宏 , 刘莉莉 . 基于深度学习的 95598 重大服务事件识别研究 [J ] . 电力信息与通信技术 , 2018 , 16 ( 11 ): 19 - 23 .
ZHU L Z , XU H , LIU L L . Research on recognition of 95598 significant service events based on deep learning [J ] . Electric Power Information and Communication Technology , 2018 , 16 ( 11 ): 19 - 23 .
陈涛 , 鲁萌 , 陈彦名 . 运营商大数据技术应用研究 [J ] . 电信科学 , 2017 , 33 ( 1 ): 130 - 134 .
CHEN T , LU M , CHEN Y M . Research on operators’ big data technologies and applications [J ] . Telecommunications Science , 2017 , 33 ( 1 ): 130 - 134 .
0
浏览量
270
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构