Collaborative Research: New Regression Models and Methods for Studying Multiple Categorical Responses

Project: Research project

Project Details

Description

In many areas of scientific study including bioengineering, epidemiology, genomics, and neuroscience, an important task is to model the relationship between multiple categorical outcomes and a large number of predictors. In cancer research, for example, it is crucial to model whether a patient has cancer of subtype A, B, or C and high or low mortality risk given the expression of thousands of genes. However, existing statistical methods either cannot be applied, fail to capture the complex relationships between the response variables, or lead to models that are difficult to interpret and thus, yield little scientific insight. The PIs address this deficiency by developing multiple new statistical methods. For each new method, the PIs will provide theoretical justifications and fast computational algorithms. Along with graduate and undergraduate students, the PIs will also create publicly available software that will enable applications across both academia and industry.

This project aims to address a fundamental problem in multivariate categorical data analysis: how to parsimoniously model the joint probability mass function of many categorical random variables given a common set of high-dimensional predictors. The PIs will tackle this problem by using emerging technologies on tensor decompositions, dimension reduction, and both convex and non-convex optimization. The project focuses on three research directions: (1) a latent variable approach for the low-rank decomposition of a conditional probability tensor; (2) a new overlapping convex penalty for intrinsic dimension reduction in a multivariate generalized linear regression framework; and (3) a direct non-convex optimization-based approach for low-rank tensor regression utilizing explicit rank constraints on the Tucker tensor decomposition. Unlike the approach of regressing each (univariate) categorical response on the predictors separately, the new models and methods will allow practitioners to characterize the complex and often interesting dependencies between the responses.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusActive
Effective start/end date9/1/218/31/24

Funding

  • National Science Foundation: $96,410.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.