TY - JOUR
T1 - Nebula
T2 - Distributed Edge Cloud for Data Intensive Computing
AU - Jonathan, Albert
AU - Ryden, Mathew
AU - Oh, Kwangsung
AU - Chandra, Abhishek
AU - Weissman, Jon
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.
AB - Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.
KW - Distributed Systems
KW - cloud computing
KW - data intensive computing
KW - edge cloud
UR - http://www.scopus.com/inward/record.url?scp=85021835257&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021835257&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2017.2717883
DO - 10.1109/TPDS.2017.2717883
M3 - Article
AN - SCOPUS:85021835257
SN - 1045-9219
VL - 28
SP - 3229
EP - 3242
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 11
M1 - 7954728
ER -