Data exploration by representative region selection: Axioms and convergence

Alexander S. Estes, Michael O. Ball, David J. Lovell

Research output: Contribution to journalArticlepeer-review

Abstract

We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.

Original languageEnglish (US)
Pages (from-to)970-1007
Number of pages38
JournalMathematics of Operations Research
Volume46
Issue number3
DOIs
StatePublished - Aug 2021

Bibliographical note

Publisher Copyright:
© 2021 INFORMS.

Keywords

  • Data analysis
  • Data summarization
  • Density estimation
  • Representative region selection
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Data exploration by representative region selection: Axioms and convergence'. Together they form a unique fingerprint.

Cite this