Frequent sub-structure-based approaches for classifying chemical compounds

Mukund Deshpande; Michihiro Kuramochi; George Karypis

Frequent sub-structure-based approaches for classifying chemical compounds

Mukund Deshpande, Michihiro Kuramochi, George Karypis

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

86 Scopus citations

Abstract

In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.

Original language	English (US)
Title of host publication	Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
Pages	35-42
Number of pages	8
State	Published - 2003
Event	3rd IEEE International Conference on Data Mining, ICDM '03 - Melbourne, FL, United States Duration: Nov 19 2003 → Nov 22 2003

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)	1550-4786

Other

Other	3rd IEEE International Conference on Data Mining, ICDM '03
Country/Territory	United States
City	Melbourne, FL
Period	11/19/03 → 11/22/03

OpenUrl availability

Full text

Cite this

Frequent sub-structure-based approaches for classifying chemical compounds. / Deshpande, Mukund; Kuramochi, Michihiro; Karypis, George.
Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003. 2003. p. 35-42 (Proceedings - IEEE International Conference on Data Mining, ICDM).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Deshpande, M, Kuramochi, M & Karypis, G 2003, Frequent sub-structure-based approaches for classifying chemical compounds. in Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003. Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 35-42, 3rd IEEE International Conference on Data Mining, ICDM '03, Melbourne, FL, United States, 11/19/03.

@inproceedings{b2bf8c08fea54569b4d87e0dfb7064d7,

title = "Frequent sub-structure-based approaches for classifying chemical compounds",

abstract = "In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.",

author = "Mukund Deshpande and Michihiro Kuramochi and George Karypis",

year = "2003",

language = "English (US)",

isbn = "0769519784",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

pages = "35--42",

booktitle = "Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003",

note = "3rd IEEE International Conference on Data Mining, ICDM '03 ; Conference date: 19-11-2003 Through 22-11-2003",

}

TY - GEN

T1 - Frequent sub-structure-based approaches for classifying chemical compounds

AU - Deshpande, Mukund

AU - Kuramochi, Michihiro

AU - Karypis, George

PY - 2003

Y1 - 2003

N2 - In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.

AB - In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.

UR - http://www.scopus.com/inward/record.url?scp=34547984408&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547984408&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:34547984408

SN - 0769519784

SN - 9780769519784

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 35

EP - 42

BT - Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003

T2 - 3rd IEEE International Conference on Data Mining, ICDM '03

Y2 - 19 November 2003 through 22 November 2003

ER -

Frequent sub-structure-based approaches for classifying chemical compounds

Abstract

Publication series

Other

OpenUrl availability

Other files and links

Fingerprint

Cite this