TY - GEN
T1 - Exploring MapReduce efficiency with highly-distributed data
AU - Cardosa, Michael
AU - Wang, Chenyu
AU - Nangia, Anshuman
AU - Chandra, Abhishek
AU - Weissman, Jon
PY - 2011
Y1 - 2011
N2 - MapReduce is a highly-popular paradigm for high-performance computing over large data sets in large-scale platforms. However, when the source data is widely distributed and the computing platform is also distributed, e.g. data is collected in separate data center locations, the most efficient architecture for running Hadoop jobs over the entire data set becomes non-trivial. In this paper, we show the traditional single-cluster MapReduce setup may not be suitable for situations when data and compute resources are widely distributed. Further, we provide recommendations for alternative (and even hierarchical) distributed MapReduce setup configurations, depending on the workload and data set.
AB - MapReduce is a highly-popular paradigm for high-performance computing over large data sets in large-scale platforms. However, when the source data is widely distributed and the computing platform is also distributed, e.g. data is collected in separate data center locations, the most efficient architecture for running Hadoop jobs over the entire data set becomes non-trivial. In this paper, we show the traditional single-cluster MapReduce setup may not be suitable for situations when data and compute resources are widely distributed. Further, we provide recommendations for alternative (and even hierarchical) distributed MapReduce setup configurations, depending on the workload and data set.
KW - MapReduce
KW - distributed systems
UR - http://www.scopus.com/inward/record.url?scp=79961048998&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79961048998&partnerID=8YFLogxK
U2 - 10.1145/1996092.1996100
DO - 10.1145/1996092.1996100
M3 - Conference contribution
AN - SCOPUS:79961048998
SN - 9781450307000
T3 - MapReduce'11 - Proceedings of the 2nd International Workshop on MapReduce and Its Applications
SP - 27
EP - 33
BT - MapReduce'11 - Proceedings of the 2nd International Workshop on MapReduce and Its Applications
T2 - 2nd International Workshop on MapReduce and Its Applications, MapReduce'11, Co-located with 20th International ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC 2011
Y2 - 8 June 2011 through 8 June 2011
ER -