浏览全部资源
扫码关注微信
1. 中国电信股份有限公司济源分公司,河南 济源 454650
2. 南京邮电大学,江苏 南京 210003
[ "公怀予(1973-),男,中国电信股份有限公司济源分公司工程师、总经理,主要研究方向为大数据分析和流量经营。" ]
[ "徐劲松(1974-),男,博士,南京邮电大学通达学院副教授、教研室主任,主要研究方向为信息安全、云计算及大数据应用。" ]
[ "王攀(1979-),男,南京邮电大学副研究员,主要研究方向为大数据分析和流量经营。" ]
网络出版日期:2016-03,
纸质出版日期:2016-03-20
移动端阅览
公怀予, 徐劲松, 王攀. 一种关联感知的大数据导入方法[J]. 电信科学, 2016,32(3):130-134.
Huaiyu GONG, Jinsong XU, Pan WANG. An associated perception import method for big data[J]. Telecommunication science, 2016, 32(3): 130-134.
公怀予, 徐劲松, 王攀. 一种关联感知的大数据导入方法[J]. 电信科学, 2016,32(3):130-134. DOI: 10.11959/j.issn.1000-0801.2016044.
Huaiyu GONG, Jinsong XU, Pan WANG. An associated perception import method for big data[J]. Telecommunication science, 2016, 32(3): 130-134. DOI: 10.11959/j.issn.1000-0801.2016044.
针对现有数据库向大数据迁移的背景,Apache推出了Sqoop作为关系数据库向大数据迁移的主要工具。Sqoop简单地将数据表切分并随机存储到不同的节点上。针对Hadoop的这种存储方式带来的关系查询的低效率问题,设计了一种关联度感知的数据导入预处理方法。将关联度较高的表尽量存储在相邻的虚拟机节点,以降低关联数据查询带来的网络传输时延,提高系统的性能。对比实验表明,将关联性较强的数据表存放在相同或相邻节点上,可以成倍提高数据查询的性能。
Against the background of the existing database to the large data migration,Apache introduced the Sqoop as the main tool for the relational database to the big data migration.Sqoop simply cut the data table and randomly store it on diffe rent nodes.Being aimed at the problem of low efficiency of the query of the relationship between the Hadoop,a method of data importing and preprocessing was designed.To reduce the network transmission delay and improve the performance of the system,the high correlation degree was kept in the adjacent nodes.The contrast experiment shows that the performance of the data query can be improved greatly by the same or adjacent nodes.
中国大数据发展调查研究结果 [EB/OL ] .( 2015 - 07 - 29 )[ 2015 - 09 - 01 ] . http://zhishi.moojnn.com//article/262 http://zhishi.moojnn.com//article/262 .
Research report of China big data development [EB/OL ] .( 2015 - 07 - 29 )[ 2015 - 09 - 01 ] . http://zhishi.moojnn.com//article/262 http://zhishi.moojnn.com//article/262 .
Apache Sqoop [EB/OL ] .[ 2015 - 09 - 01 ] . http://sqoop.apache.org/ http://sqoop.apache.org/ .
BALMIN A , KALDEWEY T , TATA S . Clydesdale:structured data processing on Hadoop [C ] // 2012 ACM SIGMOD International Conference on Management of Data , May 20 - 24 , 2012 , Scottsdale,AZ,USA . New York : ACM Press , 2012 : 705 - 708 .
BALMIN A , KALDEWEY T , TATA S . Clydesdale:structured data processing on MapReduce [C ] // 2012 International Conference on Extending Database Technology , March 27 - 30 , 2012 , Berlin,German . New York : ACM Press , 2012 : 15 - 25 .
THUSOO A , SARMA J S , JAIN N , et al . Hive - a warehousing solution over a MapReduce framework [J ] . PVLDB , 2009 , 2 ( 2 ): 1626 - 1629 .
LEE R B , LUO T , HUAI Y , et al . YSmart:yet another SQL-to-MapReduce translator [C ] // 2011 International Conference on Distributed Computing Systems , June 20 - 24 , 2011 , Minneapolis,Minnesota,USA . New Jersey : IEEE Press , 2011 : 25 - 36 .
LYNDEN S , TANIMURA Y , KOJIMAL , et al . Dynamic data redistribution for MapReduce joins [C ] // 2011 IEEE International Conference on Coud Computing Technology and Science , November 29 - December 1 , 2011 , Athens,Greece . New Jersey : IEEE Press , 2011 : 717 - 723 .
ALPER O , MIREK R , . Processing theta-joins using MapReduce [C ] .// 2011 ACM SIGMOD Internati onal Conference on Management of Data , June 12 - 16 , 2011 , Athens,Greece . New Jersey : IEEE Press , 2011 : 949 - 960 .
JIANG D W , TUNG A K H , CHEN G . Map-join-reduce:toward scalable and efficient data analysis on large clusters [J ] . IEEE Transactions on knowledge and Data Engineering , 2011 , 23 ( 9 ): 1299 - 1311 .
0
浏览量
912
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构