RI: Small: Learning 3D Equivariant Visual Representation for Animals

Project: Research project

Project Details

Description

Recent advances in computer vision make it possible to track humans in the wild with remarkable accuracy. Generalizing these new approaches towards diverse animal species, however, is still premature, despite the significant scientific and societal impact on multiple disciplines such as biology, neuroscience, and medicine. These approaches are built upon a supervised learning paradigm that requires sizable annotated data, but attaining comparable annotated visual data for animal species is fundamentally infeasible, as it requires expert knowledge and there is limited availability of species-specific images, leading to a large bias in the tracking models. In this research project, the investigator will develop new computer vision theories and algorithms that can effectively explore a large or potentially infinite number of unlabeled images of animals. The developed fundamentals are generic, and therefore readily applicable, with some modifications, to similar computer vision tasks, such as 2D/3D human pose estimation, deformable object registration, and dense correspondence estimation. The project integrates research with education and outreaches to K-12 students of under-represented groups through a series of programs.While the primary focus of this research program is on learning a visual representation of animals, this project addresses a core computer vision problem of landmark localization/keypoint detection/pose estimation given a limited amount of labeled data. The project will make use of 3D equivariance, an intrinsic property in visual data of articulated deformable objects, to uncover shared and repeatable visual relationships across views, time, and species. The investigator will integrate the proposed equivariance through 3D reconstruction of animals into representation learning, which will facilitate the transfer of visual information from one image to another. More specifically, the project will work on: (1) a new multiview geometry to learn the visual transformation across views, which allows cross-view self-supervision; (2) a re-formulation of non-rigid structure from motion, parametrized by 3D pose to enable learning from monocular videos; and (3) disentanglement of appearance and 3D pose to learn the visual transformation across species.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date10/1/229/30/25

Funding

  • National Science Foundation: $501,659.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.