TY - GEN
T1 - Elastic job bundling
T2 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
AU - Liu, Feng
AU - Weissman, Jon B.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.
AB - In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.
KW - HPC
KW - elasticity
KW - parallel job scheduling
UR - http://www.scopus.com/inward/record.url?scp=84966658965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966658965&partnerID=8YFLogxK
U2 - 10.1145/2807591.2807610
DO - 10.1145/2807591.2807610
M3 - Conference contribution
AN - SCOPUS:84966658965
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2015
PB - IEEE Computer Society
Y2 - 15 November 2015 through 20 November 2015
ER -