EAGER-DynamicData: Judicious Censoring, Random Sketching, and Efficient Validate for Learning Patterns from Dynamically-Changing and Large-Scale Data Sets

Project: Research project

Project Details

Description

Abstract. With pervasive sensors continuously collecting and recording massive amounts of information, there is no doubt this is an era of data deluge. Learning from these dynamic and large volumes of data is expected to bring significant science and engineering advances along with consequent improvements in quality of life. The present early-concept grant for exploratory research aims to develop potentially transformative pattern recognition techniques that will be specifically tested on dynamically deforming (due to e.g., patient motion) cardiac magnetic resonance images, as well as on information extraction from large-scale healthcare datasets. Big challenges that this project addresses, include the sheer volume of online and growing datasets, which makes it impossible to run analytics especially in batch form; and also the facts that large-scale datasets are inevitably noisy, dynamic, incomplete, prone to outliers and (un)intentional misses, as well as vulnerable to cyber-attacks. The project's large-scale analytics will also permeate interdisciplinary benefits to environmental data mining, neuroscience, and the future power grid. At a broader scale, the developed technologies will provide valuable tools for foundational science and engineering research, and promote societal embracing of the emergent big data technologies, along with training the next-generation of data science professionals.

This early-concept grant for exploratory research aspires to tackle big data challenges by putting forth large-scale learning tools and their performance analyses that leverage two untested, but potentially transformative, ideas for extracting computationally affordable yet informative subsets of massive and dynamic datasets, namely i) adaptive censoring, and ii) random data sketching-and-validation. Data in this project can be stationary or nonstationary; they become available in batch or sequential (a.k.a. online) modes; they can be collected in vectors, matrices or general multi-way arrays (called tensors); noise, possibly outliers and (un)intentional misses are present; and data processing can be linear or nonlinear in adaptive or non-adaptive modes. The proposed high risk-high payoff research lies at the intersection of essential big data tools including compressive sampling, matrix and tensor completion, anomaly and outlier identification, online and parallel optimization techniques. In accordance with the major inference tasks, three intertwined research thrusts will be pursued: T1) Adaptive censoring for large-scale regressions; T2) Subspace tracking and imputation for dynamic large-scale tensors; and T3) Sketch-and-validate for large-scale clustering and classification. The resultant innovative tools will be tested in healthcare data, and multi-dimensional magnetic resonance imaging, having as ultimate goal high-resolution biomedical movies to be acquired, processed, and displayed in real time.

StatusFinished
Effective start/end date9/15/158/31/18

Funding

  • National Science Foundation: $300,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.