A Network Cost-aware Geo-distributed Data Analytics System

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

Many geo-distributed data analytics (GDA) systems have focused on the network performance-bottleneck: interdata center network bandwidth to improve performance. Unfortunately, these systems may encounter a cost-bottleneck ($) because they have not considered data transfer cost ($), one of the most expensive and heterogeneous resources in a multi-cloud environment. In this paper, we present Kimchi, a network cost-aware GDA system to meet the cost-performance tradeoff by exploiting data transfer cost heterogeneity to avoid the cost-bottleneck. Kimchi determines cost-aware task placement decisions for scheduling tasks given inputs including data transfer cost, network bandwidth, input data size and locations, and desired cost-performance tradeoff preference. In addition, Kim-chi is also mindful of data transfer cost in the presence of dynamics. A Kimchi prototype has been implemented on Spark and experiments show that it reduces cost by 14% ∼ 24% without impacting performance and reduces query execution time by 45% ∼ 70% without impacting cost compared to other baseline approaches centralized, vanilla Spark, and bandwidth-aware (e.g. Iridium). More importantly, Kimchi allows applications to explore a much richer cost-performance tradeoff space in a multi-cloud environment.

Original languageEnglish (US)
Title of host publicationProceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020
EditorsLaurent Lefevre, Carlos A. Varela, George Pallis, Adel N. Toosi, Omer Rana, Rajkumar Buyya
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages649-658
Number of pages10
ISBN (Electronic)9781728160955
DOIs
StatePublished - May 2020
Event20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 - Melbourne, Australia
Duration: May 11 2020May 14 2020

Publication series

NameProceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020

Conference

Conference20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020
Country/TerritoryAustralia
CityMelbourne
Period5/11/205/14/20

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Data Analytics
  • Multi-cloud
  • Network Cost

Fingerprint

Dive into the research topics of 'A Network Cost-aware Geo-distributed Data Analytics System'. Together they form a unique fingerprint.

Cite this