Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications

Feng Liu; Jon B. Weissman

doi:10.1145/2807591.2807610

Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications

Feng Liu, Jon B. Weissman

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

14 Scopus citations

Abstract

In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.

Original language	English (US)
Title of host publication	Proceedings of SC 2015
Subtitle of host publication	The International Conference for High Performance Computing, Networking, Storage and Analysis
Publisher	IEEE Computer Society
ISBN (Electronic)	9781450337236
DOIs	https://doi.org/10.1145/2807591.2807610
State	Published - Nov 15 2015
Event	International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 - Austin, United States Duration: Nov 15 2015 → Nov 20 2015

Publication series

Name	International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume	15-20-November-2015
ISSN (Print)	2167-4329
ISSN (Electronic)	2167-4337

Other

Other	International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
Country/Territory	United States
City	Austin
Period	11/15/15 → 11/20/15

Keywords

HPC
elasticity
parallel job scheduling

Access

10.1145/2807591.2807610

OpenUrl availability

Full text

Cite this

Liu, F., & Weissman, J. B. (2015). Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications. In Proceedings of SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis Article a33 (International Conference for High Performance Computing, Networking, Storage and Analysis, SC; Vol. 15-20-November-2015). IEEE Computer Society. https://doi.org/10.1145/2807591.2807610

Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications. / Liu, Feng; Weissman, Jon B.
Proceedings of SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2015. a33 (International Conference for High Performance Computing, Networking, Storage and Analysis, SC; Vol. 15-20-November-2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, F & Weissman, JB 2015, Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications. in Proceedings of SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis., a33, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, vol. 15-20-November-2015, IEEE Computer Society, International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, United States, 11/15/15. https://doi.org/10.1145/2807591.2807610

Liu F, Weissman JB. Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications. In Proceedings of SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society. 2015. a33. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC). doi: 10.1145/2807591.2807610

Liu, Feng ; Weissman, Jon B. / Elastic job bundling : An adaptive resource request strategy for large-scale parallel applications. Proceedings of SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 2015. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC).

@inproceedings{ba574d4aac1b473db115c3190ca79cb7,

title = "Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications",

abstract = "In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.",

keywords = "HPC, elasticity, parallel job scheduling",

author = "Feng Liu and Weissman, {Jon B.}",

year = "2015",

month = nov,

day = "15",

doi = "10.1145/2807591.2807610",

language = "English (US)",

series = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC",

publisher = "IEEE Computer Society",

booktitle = "Proceedings of SC 2015",

note = "International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 ; Conference date: 15-11-2015 Through 20-11-2015",

}

TY - GEN

T1 - Elastic job bundling

T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015

AU - Liu, Feng

AU - Weissman, Jon B.

PY - 2015/11/15

Y1 - 2015/11/15

N2 - In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.

AB - In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.

KW - HPC

KW - elasticity

KW - parallel job scheduling

UR - http://www.scopus.com/inward/record.url?scp=84966658965&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84966658965&partnerID=8YFLogxK

U2 - 10.1145/2807591.2807610

DO - 10.1145/2807591.2807610

M3 - Conference contribution

AN - SCOPUS:84966658965

T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC

BT - Proceedings of SC 2015

PB - IEEE Computer Society

Y2 - 15 November 2015 through 20 November 2015

ER -

Elastic job bundling: An adaptive resource request strategy for large-scale parallel applications

Abstract

Publication series

Other

Keywords

Access

OpenUrl availability

Other files and links

Cite this