Discovering identities in web contexts with unsupervised clustering

Ted Pedersen; Anagha Kulkarni

Discovering identities in web contexts with unsupervised clustering

Ted Pedersen, Anagha Kulkarni

Computer Science (Duluth)

Research output: Contribution to conference › Paper › peer-review

8 Scopus citations

Abstract

We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to tailor these methods to handle the very noisy data that we typically find on the Web. We experiment with several variations in feature selection, two methods that automatically determine the number of clusters in the data, two different representations of the contexts to be discriminated, and with dimensionality reduction. Our evaluation is carried out usingWeb contexts for five different ambiguous names that were manually disambiguated to use as a gold standard.

Original language	English (US)
Pages	23-30
Number of pages	8
State	Published - 2007
Event	IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007 - Hyderabad, India Duration: Jan 8 2007 → Jan 8 2007

Other

Other	IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007
Country/Territory	India
City	Hyderabad
Period	1/8/07 → 1/8/07

OpenUrl availability

Full text

Cite this

@conference{660f8857971e4883bede73e03d27b4a3,

title = "Discovering identities in web contexts with unsupervised clustering",

abstract = "We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to tailor these methods to handle the very noisy data that we typically find on the Web. We experiment with several variations in feature selection, two methods that automatically determine the number of clusters in the data, two different representations of the contexts to be discriminated, and with dimensionality reduction. Our evaluation is carried out usingWeb contexts for five different ambiguous names that were manually disambiguated to use as a gold standard.",

author = "Ted Pedersen and Anagha Kulkarni",

year = "2007",

language = "English (US)",

pages = "23--30",

note = "IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007 ; Conference date: 08-01-2007 Through 08-01-2007",

}

TY - CONF

T1 - Discovering identities in web contexts with unsupervised clustering

AU - Pedersen, Ted

AU - Kulkarni, Anagha

PY - 2007

Y1 - 2007

N2 - We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to tailor these methods to handle the very noisy data that we typically find on the Web. We experiment with several variations in feature selection, two methods that automatically determine the number of clusters in the data, two different representations of the contexts to be discriminated, and with dimensionality reduction. Our evaluation is carried out usingWeb contexts for five different ambiguous names that were manually disambiguated to use as a gold standard.

AB - We describe the application of unsupervised clustering methodologies to the problem of discriminating among ambiguous names found in short passages of text that appear on Web pages. We show how to tailor these methods to handle the very noisy data that we typically find on the Web. We experiment with several variations in feature selection, two methods that automatically determine the number of clusters in the data, two different representations of the contexts to be discriminated, and with dimensionality reduction. Our evaluation is carried out usingWeb contexts for five different ambiguous names that were manually disambiguated to use as a gold standard.

UR - http://www.scopus.com/inward/record.url?scp=67149124261&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67149124261&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:67149124261

SP - 23

EP - 30

T2 - IJCAI 2007 Workshop on Analytics for Noisy Unstructured Text Data, AND 2007

Y2 - 8 January 2007 through 8 January 2007

ER -

Discovering identities in web contexts with unsupervised clustering

Abstract

Other

OpenUrl availability

Other files and links

Fingerprint

Cite this