CSR: Small: Collaborative Research: Dispersed Real-time Data Analytics

Project: Research project

Project Details

Description

Recent years have seen an explosion of data produced by a wide variety of mobile applications, sensors, and Internet of Things (IoT) devices spread across multiple geographic locations. This data must be aggregated and analyzed to gain real-time insights across several domains. There has also been a dramatic increase in the heterogeneity of the computing platform: compute and storage resources are available at end-user devices and data centers at the edge of the Internet, as well as at centralized clouds. This project proposes algorithms and architectures for leveraging such a dispersed computing platform to perform real-time processing of large volumes of data. Gaining real-time insights from large volumes of data is at the crux of the digital economy, as information from users and sensors must be processed quickly to gain immediate actionable insights. This project aims to make such insights feasible leading to greater productivity and user satisfaction. Data science and analytics is a key area of workforce demand in the US. The proposed educational and outreach activities include new course material and Research Experiences for Undergraduates (REU) programs targeted to provide greater exposure in this area. This project also includes strong outreach initiatives to attract both women and under-represented minorities to data analytics research.

This project proposes a two-level architecture for a dispersed real-time analytics system across a large number of autonomous resource providers (ARPs): a global subsystem that allocates resources end-to-end across multiple resource providers, and a local subsystem that schedules resources within each ARP. The global resource allocation process introduces the notion of an aggregation tree that is used for discovering, allocating, and coordinating resources across multiple ARPs. The local resource allocation process orchestrates the movement of computation and data within an ARP. Across both levels, techniques for dealing with dynamic variability in resources and resilience to failure are investigated. All software developed as a result of this project will be clearly documented and open-sourced. For any datasets collected from public sources, sufficient meta-data will be published to enable others to reuse the data for their own purposes. The technical reports and papers produced by the project will be published and shared with the general public. The information generated by the project will be maintained, preserved, and made available for the duration required by NSF. A comprehensive website will provide access to this information. The website URL is: http://geo-anal.cs.umn.edu.

StatusFinished
Effective start/end date9/1/178/31/22

Funding

  • National Science Foundation: $258,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.