Dynamically negotiating capacity between on-demand and batch clusters

Feng Liu; Kate Keahey; Pierre Riteau; Jon B Weissman

doi:10.1109/SC.2018.00041

Dynamically negotiating capacity between on-demand and batch clusters

Feng Liu, Kate Keahey, Pierre Riteau, Jon B Weissman

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

15 Scopus citations

Abstract

In the era of rapid experimental expansion data analysis needs are rapidly outpacing the capabilities of small institutional clusters and looking to integrate HPC resources into their workflow. We propose one way of reconciling on-demand needs of experimental analytics with the batch managed HPC resources within a system that dynamically moves nodes between an on-demand cluster configured with cloud technology (OpenStack) and a traditional HPC cluster managed by a batch scheduler (Torque). We evaluate this system experimentally both in the context of real-life traces representing two years of a specific institutional need, and via experiments in the context of synthetic traces that capture generalized characteristics of potential batch and on-demand workloads. Our results for the real-life scenario show that our approach could reduce the current investment in on-demand infrastructure by 82% while at the same time improving the mean batch wait time almost by an order of magnitude (8x).

Original language	English (US)
Title of host publication	Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	493-503
Number of pages	11
ISBN (Electronic)	9781538683842
DOIs	https://doi.org/10.1109/SC.2018.00041
State	Published - Jul 2 2018
Event	2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States Duration: Nov 11 2018 → Nov 16 2018

Publication series

Name	Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference	2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
Country/Territory	United States
City	Dallas
Period	11/11/18 → 11/16/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Keywords

Computers and information processing
Distributed computing
Grid computing
Metacomputing

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1109/SC.2018.00041

OpenUrl availability

Full text

Cite this

Liu, F., Keahey, K., Riteau, P., & Weissman, J. B. (2018). Dynamically negotiating capacity between on-demand and batch clusters. In Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 (pp. 493-503). Article 8665750 (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SC.2018.00041

Dynamically negotiating capacity between on-demand and batch clusters. / Liu, Feng; Keahey, Kate; Riteau, Pierre et al.
Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 493-503 8665750 (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, F, Keahey, K, Riteau, P & Weissman, JB 2018, Dynamically negotiating capacity between on-demand and batch clusters. in Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018., 8665750, Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Institute of Electrical and Electronics Engineers Inc., pp. 493-503, 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, United States, 11/11/18. https://doi.org/10.1109/SC.2018.00041

Liu F, Keahey K, Riteau P, Weissman JB. Dynamically negotiating capacity between on-demand and batch clusters. In Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 493-503. 8665750. (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018). doi: 10.1109/SC.2018.00041

Liu, Feng ; Keahey, Kate ; Riteau, Pierre et al. / Dynamically negotiating capacity between on-demand and batch clusters. Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 493-503 (Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018).

@inproceedings{649480a04e494bbc80e34b9b9aeae040,

title = "Dynamically negotiating capacity between on-demand and batch clusters",

abstract = "In the era of rapid experimental expansion data analysis needs are rapidly outpacing the capabilities of small institutional clusters and looking to integrate HPC resources into their workflow. We propose one way of reconciling on-demand needs of experimental analytics with the batch managed HPC resources within a system that dynamically moves nodes between an on-demand cluster configured with cloud technology (OpenStack) and a traditional HPC cluster managed by a batch scheduler (Torque). We evaluate this system experimentally both in the context of real-life traces representing two years of a specific institutional need, and via experiments in the context of synthetic traces that capture generalized characteristics of potential batch and on-demand workloads. Our results for the real-life scenario show that our approach could reduce the current investment in on-demand infrastructure by 82% while at the same time improving the mean batch wait time almost by an order of magnitude (8x).",

keywords = "Computers and information processing, Distributed computing, Grid computing, Metacomputing",

author = "Feng Liu and Kate Keahey and Pierre Riteau and Weissman, {Jon B}",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 ; Conference date: 11-11-2018 Through 16-11-2018",

year = "2018",

month = jul,

day = "2",

doi = "10.1109/SC.2018.00041",

language = "English (US)",

series = "Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "493--503",

booktitle = "Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018",

}

TY - GEN

T1 - Dynamically negotiating capacity between on-demand and batch clusters

AU - Liu, Feng

AU - Keahey, Kate

AU - Riteau, Pierre

AU - Weissman, Jon B

PY - 2018/7/2

Y1 - 2018/7/2

N2 - In the era of rapid experimental expansion data analysis needs are rapidly outpacing the capabilities of small institutional clusters and looking to integrate HPC resources into their workflow. We propose one way of reconciling on-demand needs of experimental analytics with the batch managed HPC resources within a system that dynamically moves nodes between an on-demand cluster configured with cloud technology (OpenStack) and a traditional HPC cluster managed by a batch scheduler (Torque). We evaluate this system experimentally both in the context of real-life traces representing two years of a specific institutional need, and via experiments in the context of synthetic traces that capture generalized characteristics of potential batch and on-demand workloads. Our results for the real-life scenario show that our approach could reduce the current investment in on-demand infrastructure by 82% while at the same time improving the mean batch wait time almost by an order of magnitude (8x).

AB - In the era of rapid experimental expansion data analysis needs are rapidly outpacing the capabilities of small institutional clusters and looking to integrate HPC resources into their workflow. We propose one way of reconciling on-demand needs of experimental analytics with the batch managed HPC resources within a system that dynamically moves nodes between an on-demand cluster configured with cloud technology (OpenStack) and a traditional HPC cluster managed by a batch scheduler (Torque). We evaluate this system experimentally both in the context of real-life traces representing two years of a specific institutional need, and via experiments in the context of synthetic traces that capture generalized characteristics of potential batch and on-demand workloads. Our results for the real-life scenario show that our approach could reduce the current investment in on-demand infrastructure by 82% while at the same time improving the mean batch wait time almost by an order of magnitude (8x).

KW - Computers and information processing

KW - Distributed computing

KW - Grid computing

KW - Metacomputing

UR - http://www.scopus.com/inward/record.url?scp=85064125051&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064125051&partnerID=8YFLogxK

U2 - 10.1109/SC.2018.00041

DO - 10.1109/SC.2018.00041

M3 - Conference contribution

AN - SCOPUS:85064125051

T3 - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

SP - 493

EP - 503

BT - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Y2 - 11 November 2018 through 16 November 2018

ER -

Dynamically negotiating capacity between on-demand and batch clusters

Abstract

Publication series

Conference

Bibliographical note

Keywords

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this