Data exploration by representative region selection: Axioms and convergence

Alexander S. Estes; Michael O. Ball; David J. Lovell

doi:10.1287/MOOR.2020.1115

Data exploration by representative region selection: Axioms and convergence

Alexander S. Estes, Michael O. Ball, David J. Lovell

Industrial and Systems Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.

Original language	English (US)
Pages (from-to)	970-1007
Number of pages	38
Journal	Mathematics of Operations Research
Volume	46
Issue number	3
DOIs	https://doi.org/10.1287/MOOR.2020.1115
State	Published - Aug 2021

Bibliographical note

Publisher Copyright:
© 2021 INFORMS.

Keywords

Data analysis
Data summarization
Density estimation
Representative region selection
Unsupervised learning

Access

10.1287/MOOR.2020.1115

OpenUrl availability

Full text

Cite this

@article{4861bede1aa542b5abd00b6c94880a59,

title = "Data exploration by representative region selection: Axioms and convergence",

abstract = "We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.",

keywords = "Data analysis, Data summarization, Density estimation, Representative region selection, Unsupervised learning",

author = "Estes, {Alexander S.} and Ball, {Michael O.} and Lovell, {David J.}",

note = "Publisher Copyright: {\textcopyright} 2021 INFORMS.",

year = "2021",

month = aug,

doi = "10.1287/MOOR.2020.1115",

language = "English (US)",

volume = "46",

pages = "970--1007",

journal = "Mathematics of Operations Research",

issn = "0364-765X",

publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",

number = "3",

}

TY - JOUR

T1 - Data exploration by representative region selection

T2 - Axioms and convergence

AU - Estes, Alexander S.

AU - Ball, Michael O.

AU - Lovell, David J.

PY - 2021/8

Y1 - 2021/8

N2 - We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.

AB - We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.

KW - Data analysis

KW - Data summarization

KW - Density estimation

KW - Representative region selection

KW - Unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85113889197&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85113889197&partnerID=8YFLogxK

U2 - 10.1287/MOOR.2020.1115

DO - 10.1287/MOOR.2020.1115

M3 - Article

AN - SCOPUS:85113889197

SN - 0364-765X

VL - 46

SP - 970

EP - 1007

JO - Mathematics of Operations Research

JF - Mathematics of Operations Research

IS - 3

ER -

Data exploration by representative region selection: Axioms and convergence

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this