Frequent sub-structure-based approaches for classifying chemical compounds

Mukund Deshpande, Michihiro Kuramochi, George Karypis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

86 Scopus citations

Abstract

In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.

Original languageEnglish (US)
Title of host publicationProceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
Pages35-42
Number of pages8
StatePublished - 2003
Event3rd IEEE International Conference on Data Mining, ICDM '03 - Melbourne, FL, United States
Duration: Nov 19 2003Nov 22 2003

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other3rd IEEE International Conference on Data Mining, ICDM '03
Country/TerritoryUnited States
CityMelbourne, FL
Period11/19/0311/22/03

Fingerprint

Dive into the research topics of 'Frequent sub-structure-based approaches for classifying chemical compounds'. Together they form a unique fingerprint.

Cite this