Nebula: Distributed Edge Cloud for Data Intensive Computing

Albert Jonathan; Mathew Ryden; Kwangsung Oh; Abhishek Chandra; Jon Weissman

doi:10.1109/TPDS.2017.2717883

Nebula: Distributed Edge Cloud for Data Intensive Computing

Albert Jonathan, Mathew Ryden, Kwangsung Oh, Abhishek Chandra, Jon Weissman

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

55 Scopus citations

Abstract

Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.

Original language	English (US)
Article number	7954728
Pages (from-to)	3229-3242
Number of pages	14
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	28
Issue number	11
DOIs	https://doi.org/10.1109/TPDS.2017.2717883
State	Published - Nov 1 2017

Bibliographical note

Publisher Copyright:
© 1990-2012 IEEE.

Keywords

Distributed Systems
cloud computing
data intensive computing
edge cloud

Access

10.1109/TPDS.2017.2717883

OpenUrl availability

Full text

Cite this

@article{266576b9d26a4d6685a11247a3335dbf,

title = "Nebula: Distributed Edge Cloud for Data Intensive Computing",

abstract = "Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.",

keywords = "Distributed Systems, cloud computing, data intensive computing, edge cloud",

author = "Albert Jonathan and Mathew Ryden and Kwangsung Oh and Abhishek Chandra and Jon Weissman",

note = "Publisher Copyright: {\textcopyright} 1990-2012 IEEE.",

year = "2017",

month = nov,

day = "1",

doi = "10.1109/TPDS.2017.2717883",

language = "English (US)",

volume = "28",

pages = "3229--3242",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "11",

}

TY - JOUR

T1 - Nebula

T2 - Distributed Edge Cloud for Data Intensive Computing

AU - Jonathan, Albert

AU - Ryden, Mathew

AU - Oh, Kwangsung

AU - Chandra, Abhishek

AU - Weissman, Jon

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.

AB - Centralized cloud infrastructures have become the popular platforms for data-intensive computing today. However, they suffer from inefficient data mobility due to the centralization of cloud resources, and hence, are highly unsuited for geo-distributed data-intensive applications where the data may be spread at multiple geographical locations. In this paper, we present Nebula: a dispersed edge cloud infrastructure that explores the use of voluntary resources for both computation and data storage. We describe the lightweight Nebula architecture that enables distributed data-intensive computing through a number of optimization techniques including location-aware data and computation placement, replication, and recovery. We evaluate Nebula performance on an emulated volunteer platform that spans over 50 PlanetLab nodes distributed across Europe, and show how a common data-intensive computing framework, MapReduce, can be easily deployed and run on Nebula. We show Nebula MapReduce is robust to a wide array of failures and substantially outperforms other wide-area versions based on emulated existing systems.

KW - Distributed Systems

KW - cloud computing

KW - data intensive computing

KW - edge cloud

UR - http://www.scopus.com/inward/record.url?scp=85021835257&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021835257&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2017.2717883

DO - 10.1109/TPDS.2017.2717883

M3 - Article

AN - SCOPUS:85021835257

SN - 1045-9219

VL - 28

SP - 3229

EP - 3242

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 11

M1 - 7954728

ER -

Nebula: Distributed Edge Cloud for Data Intensive Computing

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this