一种关联感知的大数据导入方法

公怀予; 徐劲松; 王攀

doi:10.11959/j.issn.1000-0801.2016044

您当前的位置：

首页 >

文章列表页 >

一种关联感知的大数据导入方法

运营技术广角 | 更新时间：2024-06-05

- 一种关联感知的大数据导入方法
- An associated perception import method for big data
- 电信科学 2016年32卷第3期页码：130-134
- 作者机构：
  
  1. 中国电信股份有限公司济源分公司，河南济源 454650
  2. 南京邮电大学，江苏南京 210003
- 作者简介：
  
  [ "公怀予（1973-），男，中国电信股份有限公司济源分公司工程师、总经理，主要研究方向为大数据分析和流量经营。" ]
  [ "徐劲松（1974-），男，博士，南京邮电大学通达学院副教授、教研室主任，主要研究方向为信息安全、云计算及大数据应用。" ]
  [ "王攀（1979-），男，南京邮电大学副研究员，主要研究方向为大数据分析和流量经营。" ]
- 基金信息：
  
  江苏省自然科学基金资助项目;The Natural Science Foundation of Jiangsu Province(BK2009426);江苏省高校自然科学基金资助项目;The Natural Science Fund of Higher Education of Jiangsu Province(14KJD520005);2013江苏省六大人才高峰计划基金资助项目;2013 Six Talent Peaks Project in Jiangsu Province;2013国家发展和改革委员会信息安全专项基金资助项目;2013 Information Security Special Funds of the National Development and Reform Commission;国家电网公司2014年科技项目：力信息通信网络流量预测和管理智能化关键技术研究及其应用;State Grid 2014 Science and Technology Project：Research and Application of Network Traffic Prediction and Smart Pipe Key Technologies for Electric Power Information Communication Network;2015江苏省产学研前瞻性联合研究项目;2015 Prospective Joint Research Project of Jiangsu Province(BY2015011-02)
- DOI：10.11959/j.issn.1000-0801.2016044
  中图分类号： TP393
- 网络出版日期：2016-03，
  
  纸质出版日期：2016-03-20
- 稿件说明：
移动端阅览
公怀予, 徐劲松, 王攀. 一种关联感知的大数据导入方法[J]. 电信科学, 2016,32(3):130-134.

Huaiyu GONG, Jinsong XU, Pan WANG. An associated perception import method for big data[J]. Telecommunication science, 2016, 32(3): 130-134.
公怀予, 徐劲松, 王攀. 一种关联感知的大数据导入方法[J]. 电信科学, 2016,32(3):130-134. DOI： 10.11959/j.issn.1000-0801.2016044.

Huaiyu GONG, Jinsong XU, Pan WANG. An associated perception import method for big data[J]. Telecommunication science, 2016, 32(3): 130-134. DOI： 10.11959/j.issn.1000-0801.2016044.

摘要

针对现有数据库向大数据迁移的背景，Apache推出了Sqoop作为关系数据库向大数据迁移的主要工具。Sqoop简单地将数据表切分并随机存储到不同的节点上。针对Hadoop的这种存储方式带来的关系查询的低效率问题，设计了一种关联度感知的数据导入预处理方法。将关联度较高的表尽量存储在相邻的虚拟机节点，以降低关联数据查询带来的网络传输时延，提高系统的性能。对比实验表明，将关联性较强的数据表存放在相同或相邻节点上，可以成倍提高数据查询的性能。

Abstract

Against the background of the existing database to the large data migration，Apache introduced the Sqoop as the main tool for the relational database to the big data migration.Sqoop simply cut the data table and randomly store it on diffe rent nodes.Being aimed at the problem of low efficiency of the query of the relationship between the Hadoop，a method of data importing and preprocessing was designed.To reduce the network transmission delay and improve the performance of the system，the high correlation degree was kept in the adjacent nodes.The contrast experiment shows that the performance of the data query can be improved greatly by the same or adjacent nodes.

关键词

Keywords

references

中国大数据发展调查研究结果 [EB/OL ] .（ 2015 - 07 - 29 ）[ 2015 - 09 - 01 ] . http://zhishi.moojnn.com//article/262 http://zhishi.moojnn.com//article/262 .

Research report of China big data development [EB/OL ] .（ 2015 - 07 - 29 ）[ 2015 - 09 - 01 ] . http://zhishi.moojnn.com//article/262 http://zhishi.moojnn.com//article/262 .

Apache Sqoop [EB/OL ] .[ 2015 - 09 - 01 ] . http://sqoop.apache.org/ http://sqoop.apache.org/ .

BALMIN A ， KALDEWEY T ， TATA S . Clydesdale：structured data processing on Hadoop [C ] // 2012 ACM SIGMOD International Conference on Management of Data ， May 20 - 24 ， 2012 ， Scottsdale，AZ，USA . New York ： ACM Press ， 2012 ： 705 - 708 .

BALMIN A ， KALDEWEY T ， TATA S . Clydesdale：structured data processing on MapReduce [C ] // 2012 International Conference on Extending Database Technology ， March 27 - 30 ， 2012 ， Berlin，German . New York ： ACM Press ， 2012 ： 15 - 25 .

THUSOO A ， SARMA J S ， JAIN N ， et al . Hive - a warehousing solution over a MapReduce framework [J ] . PVLDB ， 2009 ， 2 （ 2 ）： 1626 - 1629 .

LEE R B ， LUO T ， HUAI Y ， et al . YSmart：yet another SQL-to-MapReduce translator [C ] // 2011 International Conference on Distributed Computing Systems ， June 20 - 24 ， 2011 ， Minneapolis，Minnesota，USA . New Jersey ： IEEE Press ， 2011 ： 25 - 36 .

LYNDEN S ， TANIMURA Y ， KOJIMAL ， et al . Dynamic data redistribution for MapReduce joins [C ] // 2011 IEEE International Conference on Coud Computing Technology and Science ， November 29 - December 1 ， 2011 ， Athens，Greece . New Jersey ： IEEE Press ， 2011 ： 717 - 723 .

ALPER O ， MIREK R ， . Processing theta-joins using MapReduce [C ] .// 2011 ACM SIGMOD Internati onal Conference on Management of Data ， June 12 - 16 ， 2011 ， Athens，Greece . New Jersey ： IEEE Press ， 2011 ： 949 - 960 .

JIANG D W ， TUNG A K H ， CHEN G . Map-join-reduce：toward scalable and efficient data analysis on large clusters [J ] . IEEE Transactions on knowledge and Data Engineering ， 2011 ， 23 （ 9 ）： 1299 - 1311 .

浏览量

912

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于PaaS技术的大数据云化平台实践

基于Hadoop的电信大数据采集方案研究与实现

大数据时代运营商分析支撑域转型的实践与思考

基于信息增益的Hadoop瓶颈检测算法

数据仓库与大数据融合的探讨