浏览全部资源
扫码关注微信
[ "王忠伟,男,宁波大学硕士生,主要研究方向为数据挖掘。" ]
[ "陈叶芳,女,宁波大学讲师,主要研究方向为数据处理和挖掘。" ]
[ "肖四友,男,博士,宁波大学副教授,主要研究方向为数据处理和挖掘。" ]
[ "钱江波,男,博士,宁波大学教授,主要研究方向为数据处理和挖掘、逻辑电路设计。" ]
网络出版日期:2015-07,
纸质出版日期:2015-07-20
移动端阅览
王忠伟, 陈叶芳, 肖四友, 等. 一种高维大数据全k近邻查询算法[J]. 电信科学, 2015,31(7):52-62.
Zhongwei Wang, Yefang Chen, Siyou Xiao, et al. An AkNN Algorithm for High-Dimensional Big Data[J]. Telecommunications science, 2015, 31(7): 52-62.
王忠伟, 陈叶芳, 肖四友, 等. 一种高维大数据全k近邻查询算法[J]. 电信科学, 2015,31(7):52-62. DOI: 10.11959/j.issn.1000-0801.2015171.
Zhongwei Wang, Yefang Chen, Siyou Xiao, et al. An AkNN Algorithm for High-Dimensional Big Data[J]. Telecommunications science, 2015, 31(7): 52-62. DOI: 10.11959/j.issn.1000-0801.2015171.
全k近邻(all k-nearest neighbor,AkNN)查询,是k近邻查询的一个变型,旨在在一个查询过程中为给定数据集的每个对象确定k个最近邻。提出了一种在Hadoop分布式平台下处理高维大数据的AkNN查询算法。首先使用行条化思想结合p-stable LSH算法将高维数据对象降维,然后结合空间填充曲线Z-order的优良特性,把降维后的数据嵌入一维空间中,接着进行范围查询。整个过程使用MapReduce框架分布式并行处理。实验结果表明,所提出的算法可以高效处理高维大数据的AkNN查询。
A new variant of k nearest neighbor queries,which called as all k-nearest neighbor queries(AkNN),is a process to search the k nearest neighbors of each object in a data set.An AkNN query algorithm for high-dimensional big data on the Hadoop system was proposed.Using the banding technique and the p-stable LSH algorithm,dimensionality reduction was performed,then the data was embeded in a Z-order curve.The preprocessed data were continued to be treated on a MapReduce framework in a distributed parallel manner.Experimental results show that the proposed algorithm can efficiently handle AkNN queries for large-scale high-dimensional data.
Böhm C , Krebs F . k-nearest neighbour join:Turbo charging the KDD process . Knowledge and Information Systems , 2004 , 6 ( 6 ): 728 ~ 749
Xia C , Lu H , Ooi B C , et al . Gorder:an efficient method for kNN join processing . Proceedings of the 30th International Conference on Very Large Data Bases , Toronto,Canada , 2004 : 756 ~ 767
Yu C , Cui B , Wang S , et al . Efficient index-based kNN join processing for high-dimensional data . Information and Software Technology , 2007 , 49 ( 4 ): 332 ~ 344
Chen Y , Patel J M Efficient evaluation of all-nearest-neighbor queries . Proceedings of the 23rd International Conference on Data Engineering , Istanbul,Turkey , 2007 : 1056 ~ 1065
Emrich T , Graf F , Kriegel H P , et al . Optimizing all-nearestneighbor queries with trigonometric pruning . Lecture Notes in Computer Science , 2010 ( 6187 ): 501 ~ 518
Zhang J , Mamoulis N , Papadias D , et al . All-nearest-neighbors queries in spatial databases . Proceedings of the 16th International Conference on Scientific and Statistical Database Management , Santorini Island,Greece , 2004 : 297 ~ 306
Kouiroukidis N , Evangelidis G . The effects of dimensionality curse in high dimensional kNN search . Proceedings of 15th Panhellenic Conference on Informatics (PCI) , Gastonia,USA , 2011 : 41 ~ 45
Weber R , Schek H J , Blott S . A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces . Proceedings of the 24rd International Conference on Very Large Data Bases , New York,USA , 1998 : 194 ~ 205
Arya S , Mount D M , Netanyahu N S , et al . An optimal algorithm for approximate nearest neighbor searching fixed dimensions.Journal of the ACM(JACM) . Journal of Artificial Intelligence Research , 1998 , 45 ( 6 ): 891 ~ 923
Indyk P , Motwani R . Approximate nearest neighbors:towards removing the curse of dimensionality . Proceedings of the 30th Annual ACM Symposium on Theory of Computing , Dallas,Texas,USA , 1998 : 604 ~ 613
Hadoop . http://hadoop.apache.org/ http://hadoop.apache.org/ , 2015
Afrati F N , Ullman J D . Optimizing joins in a MapReduce environment . Proceedings of the 13th International Conference on Extending Database Technology,Palais de Beaulieu , Lausanne,Switzerlan , 2010 : 99 ~ 110
Jiang D , Tung A , Chen G . Map-join-reduce:toward scalable and efficient data analysis on large clusters . IEEE Transactions on Knowledge and Data Engineering , 2011 , 23 ( 9 ): 1299 ~ 1311
Vernica R , Carey M J , Li C . Efficient parallel set-similarity joins using MapReduce . Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data , Indianapolis,Indiana,USA , 2010 : 495 ~ 506
Zhang C , Li F , Jestes J . Efficient parallel kNN joins for large data in MapReduce . Proceedings of the 15th International Conference on Extending Database Technology , Berlin,Germany , 2012 : 38 ~ 49
Rajaraman A , Ullman J D . Mining of Massive Datasets . Cambridge : Cambridge University Press , 2011
Datar M , Immorlica N , Indyk P , et al . Locality-sensitive hashing scheme based on p-stable distributions . Proceedings of the 20th Annual Symposium on Computational Geometry , New York,USA , 2004 : 253 ~ 262
Tao Y , Yi K , Sheng C , et al . Quality and efficiency in high dimensional nearest neighbor search . Proceedings of the 35th SIGMOD International Conference on Management of Data , Providence,Rhode Island,USA , 2009 : 563 ~ 576
Labelme . http://labelme.csail.mit.edu http://labelme.csail.mit.edu , 2015
Fergus R , Torralba A , Freeman W T . Tiny Images Dataset . http://horatio.cs.nyu.edu/mit/tiny/data/index.html http://horatio.cs.nyu.edu/mit/tiny/data/index.html , 2015
Pan J , Manocha D . Bi-level locality sensitive hashing for k-nearest neighbor computation . Proceedings of the 28th International Conference on Data Engineering(ICDE) , Washington DC,USA , 2012 : 378 ~ 389
Spark . http://spark.apache.org/ http://spark.apache.org/ , 2015
0
浏览量
769
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构