浏览全部资源
扫码关注微信
[ "任建新,男,宁波大学信息科学与工程学院硕士生,主要研究方向为大规模数据处理技术与信息检索。" ]
[ "陈华辉,男,博士,宁波大学信息科学与工程学院教授,主要研究方向为数据库、数据流、数据挖掘、云计算。" ]
网络出版日期:2015-07,
纸质出版日期:2015-07-20
移动端阅览
任建新, 陈华辉. 一种自适应子空间相似性搜索方法[J]. 电信科学, 2015,31(7):63-74.
Jianxin Ren, Huahui Chen. An Adaptive Subspace Similarity Search Approach[J]. Telecommunications science, 2015, 31(7): 63-74.
任建新, 陈华辉. 一种自适应子空间相似性搜索方法[J]. 电信科学, 2015,31(7):63-74. DOI: 10.11959/j.issn.1000-0801.2015190.
Jianxin Ren, Huahui Chen. An Adaptive Subspace Similarity Search Approach[J]. Telecommunications science, 2015, 31(7): 63-74. DOI: 10.11959/j.issn.1000-0801.2015190.
近年来,在多媒体信息检索、相似性连接和时间序列匹配等数据库领域的相似搜索研究备受关注。绝大部分工作都是在欧式空间条件下,使用度量距离函数计算最近邻(如 kNN、kNNJ)来解决搜索目标集合问题。但已有研究表明,此条件下的搜索结果准确性很容易受到高差异维度的影响,且对应的解决方案尚缺乏灵活性和顽健性。首先提出了单机环境下动态子空间(部分维度)下相似搜索问题及解决方案。随着数据规模的扩大,单机算法不能很好地扩展,随之又提出了Hadoop框架下的分布式算法。实验证实,在不影响准确率的情况下,分布式算法的性能要优于集中式算法。
In recent years,such database fields as multimedia information retrieval,similarity join and time series matching,where similarity search has attracted much attention.Existing researches mostly compute nearest neighbor to solve problems about search target set,such as kNN and kNNJ,by metric distance functions in the Euclidean space.But some studies showed that high dissimilarity dimensions had got great effect on the accuracy of answer and flexibility and robustness still were lacked in corresponding solutions.Thus centralized dynamic subspace or partial dimensions similarity search problem and algorithms were proposed at first.Furthermore,with the emerge of very large dataset,centralized algorithms can,t extend very well.Finally,the distributed ones under hadoop framework were proposed.Experiments prove that distributed algorithms outperform centralized ones in the performance without accuracy loss.
Lian X , Chen L . Subspace similarity search under Lp-Norm . IEEE Transactions on Knowledge and Data Engineering , 2012 , 24 ( 2 ): 365 ~ 382
Hinneburg A , Aggarwal C , Keim D A , et al . What is the nearest neighbor in high dimensional spaces . Proceedings of the 26th VLDB Conference , Cairo,Egypt , 2000 : 506 ~ 515
张慧 , 郑吉平 , 韩秋廷 . BTreeU-Topk:基于二叉树的不确定数据上的Top-k查询算法 . 计算机研究与发展 , 2012 ( 1 ): 2095 ~ 2105
Zhang H , Zheng J P , Han Q T . BTreeU-Topk:binary-tree based Top-k query algorithms on uncertain data . Journal of Computer Research and Development , 2012 ( 1 ): 2095 ~ 2105
Shi Y , Graham B . Similarity search problem research on multi-dimensional data sets . Proceedings of Tenth International Conference on Information Technology:New Generations(ITNG) , Washington DC,USA , 2013 : 573 ~ 577
张彪 , 李川 , 徐洪宇 等 . 基于特征子图的异构信息网络节点相似性度量 . 电信科学 , 2014 , 30 ( 11 ): 66 ~ 72
Zhang B , Li C , Xu H Y , et al . Heterogeneous information networks node similarity measurement based on feature sub-graph . Telecommunications Science , 2014 , 30 ( 11 ): 66 ~ 72
Watanabe S , Sawada H , Minami Y , et al . Fast similarity search on a large speech data set with neighborhood graph indexing . Proceedings of 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) , Dallas,USA , 2010 : 5358 ~ 5361
Marios , Yannis . R-trees:a dynamic index structure for spatial searching . Boston,MA,USA , 1984 : 993 ~ 1002
Kriegel H P , Kroger P , Schubert M , et al . Efficient query processing in arbitrary subspaces using vector approximations . Proceedings of the 18th International Conference on Scientific and Statistical Database Management , Washington DC,USA , 2006 : 184 ~ 190
Zhang D X , Agrawal D , Chen G , et al . HashFile:an efficient index structure for multimedia data . Proceedings of the IEEE 27th International Conference on Data Engineering(ICDE) , Washington DC,USA , 2011 : 1103 ~ 1114
Datar M , Immorlica N , Indyk P , et al . Locality-sensitive hashing scheme based on p-stable distributions . Proceedings of the 20th Annual Symposium on Computational Geometry , New York,USA , 2004 : 253 ~ 262
Lv Q , Josephson W , Wang Z , et al . Multi-probe LSH:efficient indexing for high-dimensional similarity search . Proceedings of the 33rd International Conference on Very Large Data Bases , Vienna,Austria , 2007 : 950 ~ 961
林悦 . 基于散列算法的高维数据的最近邻检索(硕士学位论文) . 浙江大学 , 2013
Lin Y . Hashing based algorithms for nearest neighbor search in high dimensions(master dissertation) . Zhejiang University , 2013
Zhang Z J , Ooi B C , Parthasarathy S , et al . Similarity search on bregman divergence:towards non-metric indexing . Proceedings of the VLDB Endowment , Springer,Geimany , 2009 : 13 ~ 24
Thomas B . Subspace similarity search using the ideas of ranking and top-k retrieval . Proceedings of the IEEE 26th International Conference on Data Engineering Workshops (ICDEW) , California,USA , 2010 : 4 ~ 9
Thomas B . Subspace similarity search using the ideas of ranking and top-k retrieval . Proceedings of Scientific and Statistical Database Management Springer Berlin Heidelberg , Germany , 2010 : 555 ~ 564
He X F . Incremental semi-supervised subspace learning for image retrieval . Proceedings of the 12th Annual ACM International Conference on Multimedia , New York,USA , 2004 : 2 ~ 8
Stupar A , Michel S , Schenkel R . Rankreduce-processing k-nearest neighbor queries on top of MapReduce . Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval , Geneva,Switzerland , 2010 : 13 ~ 18
Zhang C , Li F F , Jestes J . Efficient parallel kNN joins for large data in MapReduce . Proceedings of the 15th International Conference on Extending Database Technology , New York,NY,USA , 2012 : 38 ~ 49
Kong W H , Wu J L , Guo M Y . Manhattan hashing for large-scale image retrieval . Proceedings of the 35th International ACMSIGIR Conference on Research and Development in Information Retrieval , Portland,Oregon,USA , 2012 : 45 ~ 54
0
浏览量
457
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构