Model-based overlapping clustering

Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, Raymond J. Mooney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

164 Scopus citations

Abstract

While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. Several approaches to finding overlapping clusters have come from work on analysis of biological datasets. In this paper, we interpret an overlapping clustering model proposed by Segal et al. [23] as a generalization of Gaussian mixture models, and we extend it to an overlapping clustering model based on mixtures of any regular exponential family distribution and the corresponding Bregman divergence. We provide the necessary algorithm modifications for this extension, and present results on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.

Original languageEnglish (US)
Title of host publicationKDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsR.L. Grossman, R. Bayardo, K. Bennett, J. Vaidya
Pages532-537
Number of pages6
DOIs
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CityChicago, IL
Period8/21/058/24/05

Keywords

  • Bregman divergences
  • Exponential model
  • Graphical model
  • High-dimensional clustering
  • Overlapping clustering

Fingerprint

Dive into the research topics of 'Model-based overlapping clustering'. Together they form a unique fingerprint.

Cite this