Discovering identities in web contexts with unsupervised clustering

Ted Pedersen, Anagha Kulkarni

Research output: Contribution to conferencePaperpeer-review

8 Scopus citations

Abstract

We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to tailor these methods to handle the very noisy data that we typically find on the Web. We experiment with several variations in feature selection, two methods that automatically determine the number of clusters in the data, two different representations of the contexts to be discriminated, and with dimensionality reduction. Our evaluation is carried out usingWeb contexts for five different ambiguous names that were manually disambiguated to use as a gold standard.

Original languageEnglish (US)
Pages23-30
Number of pages8
StatePublished - 2007
EventIJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007 - Hyderabad, India
Duration: Jan 8 2007Jan 8 2007

Other

OtherIJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007
Country/TerritoryIndia
CityHyderabad
Period1/8/071/8/07

Fingerprint

Dive into the research topics of 'Discovering identities in web contexts with unsupervised clustering'. Together they form a unique fingerprint.

Cite this