浏览全部资源
扫码关注微信
[ "钱途(1992-),男,宁波大学信息科学与工程学院硕士生,主要研究方向为大数据、数据挖掘。" ]
[ "钱江波(1974-),男,博士,宁波大学信息科学与工程学院教授,主要研究方向为数据处理与挖掘、逻辑电路设计、多维索引与查询优化。" ]
[ "董一鸿(1969-),男,博士,宁波大学信息科学与工程学院教授,主要研究方向为大数据、数据挖掘和人工智能。" ]
[ "陈华辉(1964-),男,博士,宁波大学信息科学与工程学院教授,主要研究方向为数据处理与挖掘、云计算。" ]
网络出版日期:2017-09,
纸质出版日期:2017-09-20
移动端阅览
钱途, 钱江波, 董一鸿, 等. SLSB-forest:高维数据的近似k近邻查询[J]. 电信科学, 2017,33(9):58-68.
Tu QIAN, Jiangbo QIAN, Yihong DONG, et al. SLSB-forest:approximate k nearest neighbors searching on high dimensional data[J]. Telecommunications science, 2017, 33(9): 58-68.
钱途, 钱江波, 董一鸿, 等. SLSB-forest:高维数据的近似k近邻查询[J]. 电信科学, 2017,33(9):58-68. DOI: 10.11959/j.issn.1000-0801.2017193.
Tu QIAN, Jiangbo QIAN, Yihong DONG, et al. SLSB-forest:approximate k nearest neighbors searching on high dimensional data[J]. Telecommunications science, 2017, 33(9): 58-68. DOI: 10.11959/j.issn.1000-0801.2017193.
近似 k 近邻查询的研究一直受到广泛关注,局部敏感散列(LSH)是解决此问题的主流方法之一。LSH 及目前大部分改进版本都会面临以下问题:数据散列以后在桶里分布不均匀;无法准确计算对应参数 k的查询范围建立索引。基于此,将支持动态数据索引的LSH和B-tree结合,构建新的SLSB-forest索引结构,使散列桶里的数据维持在一个合理的区间。针对SLSB-forest提出了两种查询算法:快速查找和准确率优先查找,并通过理论和实验证明查找过程中查询范围的动态变化。
The study of approximate k nearest neighbors query has attracted broad attention.Local sensitive hash is one of the mainstream ways to solve this problem.Local sensitive hash and its varients have noted the following problems:the uneven distribution of hashed data in the buckets
it cannot calculate the appropriate query range (for the parameter k) to build index.To tackle the above problem
a new data struct which called SLSB-forest was presented.The SLSB-forest combined the LSH and the B-tree to maintain the amount of bucket’s data in reasonable range.Two query algorithms were proposed:fast and accurate priority search.Theory and experimental results prove that query range can dynamic change during searching approximate k nearest neighbors.
GUTTMAN A . R-trees:a dynamic index structure for spatial searching [J ] . ACM SIGMOD Record , 1984 , 14 ( 2 ): 47 - 57 .
KATAYAMA N , SATOH S . The SR-tree:an index structure for high-dimensional nearest neighbor queries [C ] // ACM SIGMOD International Conference on Management of Data,May 11-15,1997,Tucson,Arizona,USA . New York:ACM Press , 1997 : 369 - 380 .
过洁 , 徐晓旸 , 潘金贵 . 虚拟场景的一种快速优化 Kd-Tree构造方法 [J ] . 电子学报 , 2011 , 39 ( 8 ): 1811 - 1817 .
GUO J , XU X Y , PAN J G . Build Kd-Tree for virtual scenes in a fast and optimal way [J ] . Chinese Journal of Electronics , 2011 , 39 ( 8 ): 1811 - 1817 .
WEBER R , SCHEK H J , BLOTT S . A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces [C ] // International Conference on Very Large Data Bases,August 24-27,1998,San Francisco,CA,USA . New York:ACM Press , 1998 : 194 - 205 .
INDYK P , MOTWANI R . Approximate nearest neighbors:towards removing the curse of dimensionality [J ] .Theory of Computing,2000,604-613(11):604-613. Theory of Computing , 2000 , 604-613 ( 11 ): 604 - 613 .
DATAR M , IMMORLICA N , INDYK P , et al . Locality-sensitive hashing scheme based on p-stable distributions [C ] // Twentieth Symposium on Computational Geometry,June 8-11,2004,Brooklyn,New York,USA . New York:ACM Press , 2004 : 253 - 262 .
LV Q , JOSEPHSON W , WANG Z , et al . Multi-probe LSH:efficient indexing for high-dimensional similarity search [C ] // International Conference on Very Large Data Bases,September 23-27,2007,Vienna,Austria . New York:ACM Press , 2007 : 950 - 961 .
PANIGRAHY R , . Entropy based nearest neighbor search in high dimensions [C ] // The Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms,November 4,2005,Miami,Florida,USA.[S.l.:s.n] . 2005 : 1186 - 1195 .
LIU Y F , CUI J T , HUANG Z , et al . SK-LSH:an efficient index structure for approximate nearest neighbor search [J ] . Proceedings of the VLDB , 2014 , 7 ( 9 ): 745 - 756 .
TAO Y F , YI K , SHENG C , et al . Quality and efficiency in high dimensional nearest neighbor search [C ] // ACM Sigmod International Conference on Management of Data,June 29-July 2,2009,Providence,Rhode Island,USA . New York:ACM Press , 2009 : 563 - 576 .
QIAN J , ZHU Q , CHEN H . Multi-granularity locality-sensitive bloom filter [J ] . IEEE Transactions on Computers , 2015 , 64 ( 12 ): 3500 - 3514 .
PAULEV L , GOU H , AMSALEG L . Locality sensitive hashing:a comparison of hash function types and querying mechanisms [J ] . Pattern Recognition Letters , 2010 , 31 ( 11 ): 1348 - 1358 .
QIAN J , ZHU Q , WANG Y . Bloom filter based associative deletion [J ] . IEEE Transactions on Parallel and Distributed Systems , 2014 , 24 ( 8 ): 1986 - 1998 .
胡海苗 , 姜帆 . 基于可扩展LSH的高维动态数据索引 [J ] . 软件学报 , 2015 , 26 ( 2 ): 228 - 238 .
HU H M , JIANG F . Scalable locality sensitive hashing scheme for dynamic high-dimension data indexing [J ] . Journal of Software , 2015 , 26 ( 2 ): 228 - 238 .
吴军 . 大数据和机器智能对未来社会的影响 [J ] . 电信科学 , 2015 , 31 ( 2 ): 7 - 16 .
WU J . Big data,machine intelligence and their impacts to the future world [J ] . Telecommunications Science , 2015 , 31 ( 2 ): 7 - 16 .
0
浏览量
888
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构