浏览全部资源
扫码关注微信
[ "寿震宇(1993- ),男,宁波大学信息科学与工程学院硕士生,主要研究方向为机器学习、人工智能、大数据检索" ]
[ "钱江波(1974- ),男,博士,宁波大学信息科学与工程学院教授,主要研究方向为数据处理与挖掘、逻辑电路设计、多维索引与查询优化" ]
[ "董一鸿(1969- ),男,博士,宁波大学信息科学与工程学院教授,主要研究方向为大数据、数据挖掘和人工智能" ]
[ "陈华辉(1964- ),男,博士,宁波大学信息科学与工程学院教授,主要研究方向为数据处理与挖掘、云计算" ]
网络出版日期:2020-03,
纸质出版日期:2020-03-20
移动端阅览
寿震宇, 钱江波, 董一鸿, 等. 演化森林哈希:一种无监督的在线哈希学习算法[J]. 电信科学, 2020,36(3):71-82.
Zhenyu SHOU, Jiangbo QIAN, Yihong DONG, et al. EFH:an online unsupervised hash learning algorithm[J]. Telecommunications science, 2020, 36(3): 71-82.
寿震宇, 钱江波, 董一鸿, 等. 演化森林哈希:一种无监督的在线哈希学习算法[J]. 电信科学, 2020,36(3):71-82. DOI: 10.11959/j.issn.1000-0801.2020055.
Zhenyu SHOU, Jiangbo QIAN, Yihong DONG, et al. EFH:an online unsupervised hash learning algorithm[J]. Telecommunications science, 2020, 36(3): 71-82. DOI: 10.11959/j.issn.1000-0801.2020055.
目前的无监督哈希学习算法在训练阶段需要加载全部的数据,会占据较大的内存空间,并且无法适用于流式数据。探索性地提出了一种无监督在线哈希学习算法——演化森林哈希。针对大规模数据检索场景,通过改进后的演化树学习数据的空间拓扑结构,并提出了路径编码策略将数据点遍历演化树时的路径映射为保相似性二进制编码。为了进一步提高编码查询性能,在演化树哈希的基础上进一步提出在线演化森林哈希,最后在两个被广泛使用的数据集上用实验证明了本文方法的可行性。
Many unsupervised learning to hash algorithm needs to load all data to memory in the training phase
which will occupy a large memory space and cannot be applied to streaming data.An unsupervised online learning to hash algorithm called evolutionary forest hash (EFH) was proposed.In a large-scale data retrieval scenario
the improved evolution tree can be used to learn the spatial topology of the data.A path coding strategy was proposed to map leaf nodes to similarity-preserved binary code.To further improve the querying performance
ensemble learning was combined
and an online evolving forest hashing method was proposed based on the evolving trees.Finally
the feasibility of this method was proved by experiments on two widely used data sets.
ATKINSON P , ORLOWSKA M . Similarity search in high dimensions via hashing [C ] // Proceedings of International Conference on Very Large Data Bases . San Francisco:Morgan Kaufmann , 1999 : 518 - 529 .
ANDONI A , INDYK P . Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions [J ] . Communications of the ACM , 2008 , 51 ( 1 ): 117 - 122 .
ATHITSOS V , POTAMIAS M , PAPAPETROU P , et al . Nearest neighbor retrieval using distance-based hashing [C ] // Proceedings of International Conference on Data Engineering . Piscataway:IEEE Press , 2008 : 327 - 336 .
ZONG Z . Exploiting Web images for semantic video indexing via robust sample-specific loss [J ] . IEEE Transactions on Multimedia , 2014 , 16 ( 6 ): 1677 - 1689 .
WANG J , ZHANG T , SONG J , et al . A survey on learning to hash [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 4 ): 769 - 790 .
DATAR M , IMMORLICA N , INDYK P , et al . Locality-sensitive hashing scheme based on P-stable distributions [C ] // Proceedings of ACM Symposium on Computational Geometry . New York:ACM Press , 2004 : 253 - 262 .
SHRIVASTAVA A , LI P . Asymmetric LSH (ALSH) for sublinear time maximum inner product search [J ] . Advances in Neural Information Processing Systems , 2014 ( 3 ): 2321 - 2329 .
ANDONI A , INDYK P , LAARHOVEN T , et al . Practical and optimal LSH for angular distance [C ] // Proceedings of Neural Information Processing Systems . Piscataway:IEEE Press , 2015 : 1225 - 1233 .
QIAN J , ZHU Q , CHEN H . Multi-granularity locality-sensitive bloom filter [J ] . IEEE Transactions on Computers , 2015 , 64 ( 12 ): 3500 - 3514 .
HUANG Q , FENG J , ZHANG Y , et al . Query-aware locality-sensitive hashing for approximate nearest neighbor search [J ] . Proceedings of the VLDB Endowment , 2015 , 9 ( 1 ): 1 - 12 .
GONG Y , LAZEBNIK S , et al . Iterative quantization:a procrustean approach to learning binary codes [C ] // Proceedings of Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2011 : 817 - 824 .
SALAKHUTDINOV R , HINTON G . Semantic hashing [J ] . International Journal of Approximate Reasoning , 2009 , 50 ( 7 ): 969 - 978 .
KULIS B , DARRELL T . Learning to hash with binary reconstructive embeddings [C ] // Proceedings of Annual Conference on Neural Information Processing Systems . Redhook:Curran Associates , 2009 : 1042 - 1050 .
KONG W , LI W , GUO M , et al . Manhattan hashing for large-scale image retrieval [C ] // Proceedings of International Conference on Research and Development in Information Retrieval . New York:ACM Press , 2012 : 45 - 54 .
SONG J , YANG Y , YANG Y , et al . Inter-media hashing for large-scale retrieval from heterogeneous data sources [C ] // Proceedings of International Conference on Management of Data . New York:ACM Press , 2013 : 785 - 796 .
NOROUZI M , FLEET D . Minimal loss hashing for compact binary codes [C ] // Proceedings of International Conference on Machine Learning . Madison:Omni Press , 2008 : 353 - 360 .
LI J , TIAN D , ALREGIB G . Vector quantization in multiresolution mesh compression [J ] . IEEE Signal Processing Letters , 2006 , 13 ( 10 ): 616 - 619 .
KOHONEN T . The self-organizing map [J ] . Neurocomputing , 1998 , 21 ( 1 ): 1 - 6 .
VESANTO J , ALHONIEMI E . Clustering of the self-organizing map [J ] . IEEE Transactions on Neural Networks , 2000 , 11 ( 3 ): 586 - 600 .
PAKKANEN J , IIVARINENJ , OJA E . The evolving tree—analysis and applications [J ] . IEEE Transactions on Neural Networks , 2006 , 17 ( 3 ): 591 - 603 .
BLACKMORE J , MIIKKULAINEN R . Visualizing high-dimensional structure with the incremental grid growing neural network [C ] // International Conference on Machine Learning . San Francisco:Morgan Kaufmann , 1995 : 55 - 63 .
FRITZKE B . Growing cell structures:self-organizing network for unsupervised and supervised learning [J ] . Neural Networks , 1994 , 7 ( 9 ): 1441 - 1460 .
KOIKKALAINEN P , OJA E . Self-organizing hierarchical feature maps [C ] // International Joint Conference on Neural Network . Piscataway:IEEE Press , 1990 : 279 - 284 .
LIU Y , CUI J , HUANG Z , et al . SK-LSH:an afficient index structure for approximate nearest neighbor search [J ] . Proceedings of the VLDB Endowment , 2014 , 7 ( 9 ): 745 - 756 .
ZHOU Z H . Machine learning [M ] . Beijing : Tsinghua University PressPress , 2016 .
OZA N C , RUSSELL S . Online bagging and boosting [C ] // International Conference on Systems . Piscataway:IEEE Press , 2005 : 2340 - 2345 .
HE K , WEN F , SUN J . K-means hashing:an affinity-preserving quantization method for learning binary compact codes [C ] // Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2013 : 2938 - 2945 .
GONG Y , LAZEBNIK S , GORDO A , et al . Iterative quantization:a procrustean approach to learning binary codes for large-scale image retrieval [J ] . IEEE Transactions on Pattern Analysis Machine Intelligence , 2013 ( 35 ): 2916 - 2929 .
BREIMAN L . Bagging predictors [J ] . Machine Learning , 1996 , 24 ( 2 ): 123 - 140 .
SHARMIN N , LU X . Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters [C ] // Proceedings of IEEE International Conference on Big Data,Big Data . Piscataway:IEEE Press , 2015 : 243 - 252 .
郭佳睿 , 魏进武 , 张云勇 . 大数据助力运营商提升规模化运营核心力策略 [J ] . 电信科学 , 2018 , 34 ( 1 ): 120 - 125 .
GUO J R , WEI J W , ZHANG Y Y . Strategies for enhancing core capability of large-scale operation for national telecom operators assisted by big data [J ] . Telecommunications Science , 2018 , 34 ( 1 ): 120 - 125 .
陈涛 , 鲁萌 , 陈彦名 . 运营商大数据技术应用研究 [J ] . 电信科学 , 2017 , 33 ( 1 ): 130 - 134 .
CHEN T , LU M , CHEN Y M . Research on operators’ big data technologies and applications [J ] . Telecommunications Science , 2017 , 33 ( 1 ): 130 - 134 .
郭佳睿 , 魏进武 , 张云勇 . 大数据助力运营商提升规模化运营核心力策略 [J ] . 电信科学 , 2018 , 34 ( 1 ): 120 - 125 .
GUO J R , WEI J W , ZHANG Y Y . Strategies for enhancing core capability of large-scale operation for national telecom operators assisted by big data [J ] . Telecommunications Science , 2018 , 34 ( 1 ): 120 - 125 .
0
浏览量
338
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构