Bayesian consensus clustering

Eric F. Lock; David B. Dunson

doi:10.1093/bioinformatics/btt425

Bayesian consensus clustering

Eric F. Lock, David B. Dunson

Biostatistics

Research output: Contribution to journal › Article › peer-review

176 Scopus citations

Abstract

Motivation: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. Results: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. Availability: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/ software.html. Contact: Eric.Lock@duke.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Original language	English (US)
Pages (from-to)	2610-2616
Number of pages	7
Journal	Bioinformatics
Volume	29
Issue number	20
DOIs	https://doi.org/10.1093/bioinformatics/btt425
State	Published - Oct 15 2013

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1093/bioinformatics/btt425

OpenUrl availability

Full text

Cite this

@article{ab41b4696926459ca5db2a52d746b4c1,

title = "Bayesian consensus clustering",

abstract = "Motivation: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. Results: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. Availability: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/ software.html. Contact: Eric.Lock@duke.edu Supplementary information: Supplementary data are available at Bioinformatics online.",

author = "Lock, {Eric F.} and Dunson, {David B.}",

year = "2013",

month = oct,

day = "15",

doi = "10.1093/bioinformatics/btt425",

language = "English (US)",

volume = "29",

pages = "2610--2616",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "20",

}

TY - JOUR

T1 - Bayesian consensus clustering

AU - Lock, Eric F.

AU - Dunson, David B.

PY - 2013/10/15

Y1 - 2013/10/15

N2 - Motivation: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. Results: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. Availability: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/ software.html. Contact: Eric.Lock@duke.edu Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: In biomedical research a growing number of platforms and technologies are used to measure diverse but related information, and the task of clustering a set of objects based on multiple sources of data arises in several applications. Most current approaches to multisource clustering either independently determine a separate clustering for each data source or determine a single 'joint' clustering for all data sources. There is a need for more flexible approaches that simultaneously model the dependence and the heterogeneity of the data sources. Results: We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source independently. We present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas. Availability: R code with instructions and examples is available at http://people.duke.edu/%7Eel113/ software.html. Contact: Eric.Lock@duke.edu Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=84885617335&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885617335&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btt425

DO - 10.1093/bioinformatics/btt425

M3 - Article

C2 - 23990412

AN - SCOPUS:84885617335

SN - 1367-4803

VL - 29

SP - 2610

EP - 2616

JO - Bioinformatics

JF - Bioinformatics

IS - 20

ER -

Bayesian consensus clustering

Abstract

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this