Estimating missed actual positives using independent classifiers

Sandeep Mane, Jaideep Srivastava, San Yih Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Data mining is increasingly being applied in environments having very high rate of data generation like network intrusion detection [7], where routers generate about 300,000 - 500,000 connections every minute. In such rare class data domains, the cost of missing a rare-class instance is much higher than that of other classes. However, the high cost for manual labeling of instances, the high rate at which data is collected as well as real-time response constraints do not always allow one to determine the actual classes for the collected unlabeled datasets. In our previous work [9], this problem of missed false negatives was explained in context of two different domains - "network intrusion detection" and "business opportunity classification". In such cases, an estimate for the number of such missed high-cost, rare instances will aid in the evaluation of the performance of the modeling technique (e.g. classification) used. A capture-recapture method was used for estimating false negatives, using two or more learning methods (i.e. classifiers). This paper focuses on the dependence between the class labels assigned by such learners. We define the conditional independence for classifiers given a class label and show its relation to the conditional independence of the features sets (used by the classifiers) given a class label. The later is a computationally expensive problem and hence, a heuristic algorithm is proposed for obtaining conditionally independent (or less dependent) feature sets for the classifiers, Initial results of this algorithm on synthetic datasets are promising and further research is being pursued.

Original languageEnglish (US)
Title of host publicationKDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsR.L. Grossman, R. Bayardo, K. Bennett, J. Vaidya
Pages648-653
Number of pages6
DOIs
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/TerritoryUnited States
CityChicago, IL
Period8/21/058/24/05

Keywords

  • Capture-recapture method
  • Conditional independence of classifiers given class label
  • Conditional independence of features given class label
  • Conditional mutual information
  • False negative

Fingerprint

Dive into the research topics of 'Estimating missed actual positives using independent classifiers'. Together they form a unique fingerprint.

Cite this