浏览全部资源
扫码关注微信
1. 浙江工商大学 杭州 310018
2. 浙江水利水电学院 杭州 310018
[ "刘君强,男,博士,浙江工商大学教授,主要研究方向为大数据分析与云计算、网络信息安全与隐私保护、信息管理与软件工程。" ]
[ "周青峰,男,浙江工商大学硕士生,主要研究方向为大数据分析与云计算、数据挖掘、Web Services。" ]
[ "王文慧,女,浙江水利水电学院讲师,主要研究方向为数据挖掘、机器学习、复杂系统建模与控制。" ]
[ "时磊,男,博士,浙江工商大学讲师,主要研究方向为无线传感器网络、Ad Hoc网络。" ]
网络出版日期:2015-04,
纸质出版日期:2015-04-15
移动端阅览
刘君强, 周青峰, 王文慧, 等. 大数据中效用挖掘的快速单阶段算法[J]. 电信科学, 2015,31(4):77-85.
Junqiang Liu, Qingfeng Zhou, Wenhui Wang, et al. Fast Single Pbase Algoritbm for Utility Mining in Big Data[J]. Telecommunications science, 2015, 31(4): 77-85.
刘君强, 周青峰, 王文慧, 等. 大数据中效用挖掘的快速单阶段算法[J]. 电信科学, 2015,31(4):77-85. DOI: 10.11959/j.issn.1000-0801.2015100.
Junqiang Liu, Qingfeng Zhou, Wenhui Wang, et al. Fast Single Pbase Algoritbm for Utility Mining in Big Data[J]. Telecommunications science, 2015, 31(4): 77-85. DOI: 10.11959/j.issn.1000-0801.2015100.
现有数据挖掘算法的缺点是在挖掘大数据时会出现大量候选模式,从而造成可伸缩性瓶颈,个别算法虽然不生成候选模式,但是计算代价高昂,缺乏有效剪裁,运行效率存在瓶颈。为此,提出一个全新的单阶段不生成候选模式的数据挖掘算法,其创新性有3点:一是基于前缀生长的模式枚举和基于效用上限值评估的剪裁策略;二是基于稀疏矩阵和虚拟投影的效用信息表达;三是节省存储空间的深度优先搜索方法。大量实验表明,新算法的时间效率比现有算法高5倍以上,并且内存使用量比现有算法少20%~60%,可伸缩性高。
Most of the latest works on utility mining generates a huge number of candidates in dealing with big data
which suffers from the scalability issue.Some work does not generate candidates
but suffers from the efficiency issue due to lack of strong pruning and high computation overhead.A novel algorithm that finds high utility patterns in a single phase without generating candidates was proposed.The novelties lie in a prefix growth strategy with strong pruning
and a sparse matrix based representation of transactions with pseudo projection.The proposed algorithm works in a depth first manner and does not materialize high utility patterns in memory
which further improves the scalability.Extensive experiments on synthetic and rea1-world data show that the proposed algorithm outperforms the latest works in terms of running time
memory overhead
and scalability.
Ahmed C F , Tanbeer S K , Jeong B S , et al . Efficient tree structures for high utility pattern mining in incremental databases . IEEE Transactions on Knowledge and Data Engineering , 2009 , 21 ( 12 ): 1708 ~ 1721
Erwin A , Gopalan R P , Achuthan N R . Efficient mining of high utility itemsets from large datasets . Proceedings of PAKDD , Osaka,Japan , 2008
Li Y C , Yeh J S , Chang C C . Isolated items discarding strategy for discovering high utility itemsets . Data & Knowledge Engineering , 2008 , 64 ( 1 ): 98 ~ 217
Liu Y , Liao W , Choudhary A . A fast high utility itemsets mining algorithm . Proceedings of the Utility-Based Data Mining Workshop in Conjunction With the 11th ACM SIGKDD , Chicago,Illinois,USA , 2005
Tseng V S , Shie B E , Wu C W , et al . Efficient algorithms for mining high utility itemsets from transactional databases . IEEE Transactions on Knowledge and Data Engineering , 2013 , 25 ( 8 ): 1772 ~ 1786
Yen S J , Lee Y S . Mining high utility quantitative association rules . Proceeding of the 9th International Conference on Data Warehousing and Knowledge Discovery , Regensburg,Germany , 2007
Yao H , Hamilton H J , Geng L . A unified framework for utility-based measures for mining itemsets . Proceedings of ACM SIGKDD the 2nd Workshop on Utility-Based Data Mining , Philadelphia,PA,USA , 2006
Agrawal R , Srikant R , Geng L . Fast algorithms for mining association rules . Proceedings of the 20th International Conference on Very Large Databases , Santiago,Chile , 1994
Han J , Pei J , Yin Y . Mining frequent patterns without candidate generation . Proceedings of ACM SIGMOD conference , Santiago,Chile , 1994
Liu J , Pan Y . An efficient algorithm for mining closed itemsets . Journal of Zhejiang University Science , 2004 , 5 ( 1 ): 8 ~ 15
Liu J , Pei Y , Wang K , et al . Mining frequent item sets by opportunistic projection.Proceedings of SIGKDD . Proceedings of SIGKDD , Edmonton,Canada , 2002
Shie B E , Cheng J H , Chuang K T , et al . A one-phase method for mining high utility mobile sequential patterns in mobile commerce environments . Proceedings of IEA/AIE12 , Dalian,China , 2012
Wu C W , Lin Y F , Yu P S , et al . Mining high utility episodes in complex event sequences . Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Chicago Illinois,USA , 2013
Wu C W , Shie B E , Tseng V S , et al . Mining top-K high utility itemsets . Proceedings of SIG KDD , Beijing,China , 2012
Liu M , Qu J . et al . Mining high utility itemsets without candidate generation . Proceedings of CIKM , Proceedings of CIKM , 2012
0
浏览量
410
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构