浏览全部资源
扫码关注微信
1. 华北电力大学控制与计算机工程学院 保定071003
2. 国网冀北电力有限公司技能培训中心 保定071051
[ "周国亮,男,博士,华北电力大学在站博士后,主要研究方向为云计算和大数据处理技术。" ]
[ "朱永利,男,博士,华北电力大学教授、博士生导师,主要研究方向为人工智能及应用、网络化监控与电力系统自动化。" ]
[ "王桂兰,女,华北电力大学博士研究生,主要研究方向为风机故障预测。" ]
网络出版日期:2013-10,
纸质出版日期:2013-10-20
移动端阅览
周国亮, 朱永利, 王桂兰. CC-MRSJ:Hadoop平台下缓存敏感的星型联接算法[J]. 电信科学, 2013,29(10):31-37.
Guoliang Zhou, Yongli Zhu, Guilan Wang. CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform[J]. Telecommunications science, 2013, 29(10): 31-37.
周国亮, 朱永利, 王桂兰. CC-MRSJ:Hadoop平台下缓存敏感的星型联接算法[J]. 电信科学, 2013,29(10):31-37. DOI: 10.3969/j.issn.1000-0801.2013.10.007.
Guoliang Zhou, Yongli Zhu, Guilan Wang. CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform[J]. Telecommunications science, 2013, 29(10): 31-37. DOI: 10.3969/j.issn.1000-0801.2013.10.007.
提出了一种缓存敏感的MapReduce 星型联接算法,事实表每列单独存储,维表根据维层次划分为多个列簇。事实表外键列与对应维表采用相关性存储,减少联接过程中的数据移动。算法分为两个阶段,首先每个外键列和对应维表进行联接;然后对中间结果进行联接,随机访问测度列,进而得到最终结果。算法只读取需要的数据,缓存利用率高,从而具有良好的缓存敏感特性;算法充分利用时延实体化,避免不必要的数据访问和移动。通过在SSB数据集上与Hive系统的对比测试表明,CC-MRSJ算法具有较高的执行效率。
A cache-conscious MapReduce star join algorithm was presented
each column of fact table was separately stored
and dimension table was divided into several column families according to dimension hierarchy.Fact table foreign key column and corresponding dimension table was co-location storage
thus reducing data movement in the join process.CC-MRSJ consists of two phases:firstly each foreign key column and the corresponding dimension table were joined; then the intermediate results were joined and random accessed measure columns
and so got the final result.CC-MRSJ read only the data needed
and cache utilization is high
so it has good cache conscious feature; it also takes advantage of late materialization
avoiding unnecessary data access and movement.CC-MRSJ has higher performance comparing to hive system based on SSB datasets.
Dean J , Ghemawat S . MapReduce:simplified data processing on large clusters . Communications of the ACM , 2008 ( 1 )
Chang F , Dean J , Ghemawat S , et al . Bigtable:a distributed storage system for structured data . ACM Transactions on Computer Systems , 2008 ( 2 )
Thusoo A , Sarma J S , Jain N , et al . Hive-a warehousing solution over a MapReduce framework . Proceedings of the VLDB Endowment , 2009 , 2 ( 2 ): 1626 ~ 1629
Gates A , Natkovich O , Chopra S , et al . Srivastava,building a high level dataflow system on top of MapReduce:the pig experience . Proceedings of the VLDB Endowment , 2009 , 2 ( 2 ): 1414 ~ 1425
Stonebraker M , Abadi D J , Batkin A , et al . C-store:a column-oriented dbms . Proceedings of the 31st International Conference on Very Large Data Bases , Trondheim , Norway , 2005 : 553 ~ 564
Abadi D J , Madden S , Hachem N . Column-stores vs row-stores:how different are they really . Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data , Vancouver , 2008 : 967 ~ 980
Ailamaki A , DeWitt D J , Hill M D , et al . Weaving relations for cache performance . Proceedings of the 27th International Conference on Very Large Data Bases , Roma , 2001 : 169 ~ 180
Lee R , Yin H , Zheng S , et al . RCFile:a fast and space-efficient data place-ment structure in MapReduce-based warehouse systems . ICDE 2011 , Hannover , HGermany : 2001 : 1199 ~ 1208
Floratou A , Patel J M , Shekita E J , et al . Column-oriented storage techniques for MapReduce . Proceedings of the VLDB Endowment , 2011 ( 7 )
Lin Y T , Agrawal D , Chen C , et al . Llama:leveraging columnar storage for scalable join processing in the MapReduce framework . Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data , Athens , Greece , 2011
Blanas S , Patel J M , Ercegovac V , et al . A comparison of join algorithms for log processing in mapreduce . Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data Indiana , USA , 2010 : 975 ~ 986
Han H , Jung H S , Eom H S , et al . Yeom:scatter-gather-merge:an efficient star-join query processing algorithm for data-parallel frameworks . Cluster Computing , 2011 , 14 ( 2 ): 183 ~ 197
Rao J , Ross K A . Cache conscious indexing for decision-support in main memory . Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data Indiana , USA , 2010 : 975 ~ 986
Brewer E A , . Towards robust distributed systems . Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing , Portland , Oregon , 2000
Zhang S B , Han J Z , Liu Z Y . Accelerating MapReduce with distributed memory cache . ICPADS 2009 , Shenzhen,China , 2009 : 472 ~ 478
Shinnar A , Cunningham D , Saraswat V , et al . M3R:increased performance for in-memory Hadoop jobs . Proceedings of the VLDB Endowment , 2012 ( 5 )
O'Neil P , O'Neil E , Chen X . The star schema benchmark , http://www.cs.umb.edu/~poneil/star http://www.cs.umb.edu/~poneil/star . SchemaB.PDF,Minneapdis , 2007
Apache Hadoop . http://hadoop.apache.org/ http://hadoop.apache.org/ , 2012
Lee R , Luo T , Huai Y , et al . YSmart:Yet another SQL-to-MapReduce translator . Proceedings of the 31st International Conference on Minneapolis , MN,USA , 2011 : 25 ~ 36
Huai Y , Lee R , Zhang S , et al . A matrix model for analyzing,optimizing and deploying software for big data analytics in distributed systems . Proceedings of the 2nd ACM Symposium on Cloud Computing , Cascais , 2011
0
浏览量
330
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构