XPS: Full: CCA: Enhancing Scalability and Energy Efficiency in Extreme-Scale Parallel Systems through Application-Aware Communication Reduction

Project: Research project

Project Details

Description

The high-performance computing systems available today are far from meeting the performance and energy efficiency targets necessary to satisfy the needs of future computing systems. With processor clock frequencies leveling off over the past several years, the only way to achieve the expected levels of performance within the same physical size and the same or lower energy requirements is to have more processors computing in parallel. However, increasing concurrency among the processors increases inter-processor communication, which then becomes the critical bottleneck. This work proposes an application-centric approach to minimize communication-induced overheads in order to enable extreme scalability. The implicit fault tolerance of future parallel applications can be exploited to mitigate the degraded locality and increased communication costs of highly concurrent systems while ensuring an acceptable level of application output quality.

This work specifically focuses on (i) how data to be transferred can be compressed, and possibly discarded, and (ii) how communication and synchronization can be avoided or relaxed in certain application contexts, all while maintaining an acceptable level of output quality for the specific application. The intellectual merit of this work stems from its exploration of cross-layer communication and synchronization techniques that exploit application characteristics to significantly reduce the cost of communication and synchronization. This approach will enable a wider array of applications to exploit parallelism and thereby increase their scalability. This work will enable extreme scalability of domain-specific applications executing on large-scale computing systems by quantifying the costs and benefits associated with application-aware communication and synchronization, designing application-aware scalable communication and synchronization primitives, developing software to support application-aware scalable communication, and designing new micro-architectural and system supports for application-aware scalable communication.

The broader impact of this research lies in significantly reducing the costs of communication and synchronization for a wide spectrum of applications and consequently enabling new levels of parallel scalability and energy efficiency across a range of application programs. This project will produce open source tools for analyzing and enhancing the scalability of parallel applications, and will provide new opportunities for both graduate and undergraduate students to participate in cutting-edge computer systems research.

StatusFinished
Effective start/end date9/1/148/31/20

Funding

  • National Science Foundation: $666,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.