Abstract
Many geo-distributed data analytics (GDA) systems have focused on the network performance-bottleneck: interdata center network bandwidth to improve performance. Unfortunately, these systems may encounter a cost-bottleneck ($) because they have not considered data transfer cost ($), one of the most expensive and heterogeneous resources in a multi-cloud environment. In this paper, we present Kimchi, a network cost-aware GDA system to meet the cost-performance tradeoff by exploiting data transfer cost heterogeneity to avoid the cost-bottleneck. Kimchi determines cost-aware task placement decisions for scheduling tasks given inputs including data transfer cost, network bandwidth, input data size and locations, and desired cost-performance tradeoff preference. In addition, Kim-chi is also mindful of data transfer cost in the presence of dynamics. A Kimchi prototype has been implemented on Spark and experiments show that it reduces cost by 14% ∼ 24% without impacting performance and reduces query execution time by 45% ∼ 70% without impacting cost compared to other baseline approaches centralized, vanilla Spark, and bandwidth-aware (e.g. Iridium). More importantly, Kimchi allows applications to explore a much richer cost-performance tradeoff space in a multi-cloud environment.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 |
Editors | Laurent Lefevre, Carlos A. Varela, George Pallis, Adel N. Toosi, Omer Rana, Rajkumar Buyya |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 649-658 |
Number of pages | 10 |
ISBN (Electronic) | 9781728160955 |
DOIs | |
State | Published - May 2020 |
Event | 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 - Melbourne, Australia Duration: May 11 2020 → May 14 2020 |
Publication series
Name | Proceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 |
---|
Conference
Conference | 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 5/11/20 → 5/14/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- Data Analytics
- Multi-cloud
- Network Cost