浏览全部资源
扫码关注微信
[ "张丽娜(1980-),女,浙江安防职业技术学院讲师,主要研究方向为数据挖掘、图形图像、智能算法、云计算。" ]
[ "匡泰(1964-),男,浙江安防职业技术学院信息工程系主任、副教授,主要研究方向为大数据、人工智能。" ]
[ "姜迪清(1965-),男,现就职于浙江安防职业技术学院,主要研究方向为舆情管理、人事管理等。" ]
网络出版日期:2017-01,
纸质出版日期:2017-01-15
移动端阅览
张丽娜, 匡泰, 姜迪清. 大数据中基于时态特征和混合式搜索的博客筛选挖掘[J]. 电信科学, 2017,33(1):77-84.
Lina ZHANG, Tai KUANG, Diqing JIANG. Blog screening and mining based on temporal features and hybrid search in big data[J]. Telecommunications science, 2017, 33(1): 77-84.
张丽娜, 匡泰, 姜迪清. 大数据中基于时态特征和混合式搜索的博客筛选挖掘[J]. 电信科学, 2017,33(1):77-84. DOI: 10.11959/j.issn.1000-0801.2017001.
Lina ZHANG, Tai KUANG, Diqing JIANG. Blog screening and mining based on temporal features and hybrid search in big data[J]. Telecommunications science, 2017, 33(1): 77-84. DOI: 10.11959/j.issn.1000-0801.2017001.
针对现存很多博客筛选挖掘方法的相关性程度比较松散以及信息检索方法的缺陷,提出一种基于时态特征和混合式搜索的方法。考虑到用户评论是组合证据的重要来源以及时间因素的影响,提出的方法将博客文章的平均评论数量、消息来源的 BM25的相关性分数、最久博客文章的 BM25分数和最新相关博文和最旧博文的时间范围作为时态特征集。另外,考虑到线性搜索的局部性优势以及差分进化搜索的全局优势,将两种信息搜索方式组合。实验使用 BlogS06数据集,由博客主页、XML 源文件和其博客入口页面组成,用于TREC 2007和TREC 2008的博客筛选挖掘实验。实验结果表明,提出的方法在运行时间和有效性方面获得了满意的效果。
Concerning that the correlation degree of the existing methods of blog screen and mining is loose and the information retrieval of the methods is deficient
a method based on temporal feature and hybrid search method was proposed.Considering the user reviews are important sources of evidence combination
the average number of reviews for blogs
the sources of BM25 relevance scores
the longest blog BM25 scores and time range between the latest related blog paper and the oldest related blog paper are being as the temporal feature sets.In addition
considering local search advantage of linear search(LS) and global search advantage of differential evolution(DE)
the two kinds of information search methods were combined.BlogS06 data set was used in the experiment which was consists of blog home pages
XML source files and its blog portal pages
it was used for TREC 2007 and TREC 2008 blog mining experiments.Experimental results show that the proposed method can obtain satisfactory results in terms of running time and effectiveness.
RUCCELL M A . 社交网站的数据挖掘与分析 [M ] . 苏统华 , 魏通 , 赵逸雪 , 等 译. 北京 : 机械工业出版社 , 2015 .
RUCCELL M A . Mining the social web [M ] . SU T H , WEI T , ZHAO Y X , et al . Beijing : China Machine Press , 2015 .
关静怡 . 高质量博客检索中核心技术的研究 [D ] . 北京 : 北京邮电大学 , 2011 .
GUAN J Y . Research on core technology of high quality blog retrieval [D ] . Beijing : Beijing University of Posts and Telecommunications , 2011 .
MACDONALD C , OUNIS I , SOBOROFF I . Overview of the TREC 2007 blog track [C]//16th Text Retrieval Conference , November 6 - 9 , 2007 , Gaithersburg,Maryland,USA . New Jersey : IEEE Press , 2007 : 1908 - 1910 .
翟姗姗 , 许鑫 , 夏立新 . 学术博客中的用户交流与知识传播研究述评 [J ] . 现代图书情报技术 , 2015 , 31 ( Z1 ): 3 - 12 .
ZHAI S S , XU X , XIA L X . Review of the research on user communication and knowledge dissemination in academic blogs [J ] . New Technology of Library and Information Service , 2015 , 31 ( Z1 ): 3 - 12 .
ELSAS J L , ARGUELLO J , CALLAN J , et al . Retrieval and feedback models for blog feed search [C]// International ACM SIGIR Conference on Research and Development in Information Retrieval , July 20 - 24 , 2008 , Singapore . New York : ACM Press , 2008 : 347 - 354 .
林旺 , 翁彧 . 一种面向博客群的主题倾向性分析模型 [J ] . 中央民族大学学报(自然科学版) , 2014 , 23 ( 3 ): 33 - 37 .
ZHAI S S , XU X , XIA L X . Review of the research on user communication and knowledge dissemination in academic blogs [J ] . New Technology of Library and Information Service , 2015 , 31 ( Z1 ): 3 - 12 .
郑美玉 . 基于本体的中文博客二级自动分类研究 [J ] . 情报科学 , 2016 , 34 ( 2 ): 87 - 90 .
ZHENG M Y . Research on two level automatic classification of Chinese blogs based on ontology [J ] . Information Science , 2016 , 34 ( 2 ): 87 - 90 .
于航 . 基于图模型的博客排序系统的研究与实现 [D ] . 北京 : 北京大学 , 2011 .
YU H . Research and implementation of blog ranking system based on graph model [D ] . Beijing : Beijing University , 2011 .
PARAPAR J , VIDAL M , SANTOS J . Finding the best parameter setting:particle swarm optimization [C]//The 2nd Spanish Conference on Information Retrieval (CERI 2012) , June 18 - 19 , 2012 , Valencia,Spain . New Jersey : IEEE Press , 2012 : 49 - 60 .
BOLLEGALA D , NOMAN N , IBA H . RankDE:learning a ranking function for information retrieval using differential evolution [C]// Conference on Genetic and Evolutionary Computation , July 12 - 16 , 2011 , Dublin,Ireland . New York : ACM Press , 2011 : 1771 - 1778 .
LIN C , LIN C , LIN Z Y , et al . Hybrid pseudo-relevance feedback for microblog retrieval [J ] . Journal of Information Science , 2013 , 39 ( 6 ): 773 - 788 .
范晨熙 , 黄理灿 , 李雪利 . 基于 Lucene 的 BM25模型的评分机制的研究 [J ] . 工业控制计算机 , 2013 , 26 ( 3 ): 78 - 79 .
FAN C X , HUANG L C , LI X L . Research on scoring mechanism of BM25 model based on Lucene [J ] . Industrial Control Computer , 2013 , 26 ( 3 ): 78 - 79 .
LAVRENKO V , CROFT W B . Relevance based language models [C]//International ACM SIGIR Conference on Research and Development in Information Retrieval , September 9 - 13 , 2001 , New Orleans,USA . New York : ACM Press , 2001 : 120 - 127 .
付仅 . 论博客证据 [D ] . 重庆 : 重庆邮电大学 , 2013 .
FU J . Study on blog evidence [D ] . Chongqing : Chongqing University of Posts and Telecommunications , 2013 .
ZHANG S B , ZHANG B , ZHANG Y , et al . A search log sparseness oriented query expansion method [C]// International Conference on Systems and Informatics , November 15 - 17 , 2014 , Shanghai,China . New Jersey : IEEE Press , 2014 : 1050 - 1055 .
许斌 , 亓晋 , 印溪 , 等 . 基于多策略离散差分进化的移动互联网个性化服务组合 [J ] . 电信科学 , 2016 , 32 ( 2 ): 1045 - 1051 .
XU B , QI J , YIN X , et al . Personalized service composition based on multi-strategy discrete differential evolution in mobile internet [J ] . Telecommunications Science , 2016 , 32 ( 2 ): 1045 - 1051 .
MACDONALD C , OUNIS I . The TREC blogs06 collection:creating and analysing a blog test collection [EB/OL ] . 2016 - 02 - 29 2016 - 05 - 27 . https://www.researchgate.net/publication/40704787_The_TREC_Blogs06_Collection_Creating_and_Anal-ysing_a_Blog_Test_Collection. https://www.researchgate.net/publication/40704787_The_TREC_Blogs06_Collection_Creating_and_Anal-ysing_a_Blog_Test_Collection.
0
浏览量
807
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构