Performance characterization of data mining benchmarks

Vineeth Mekkat, Ragavendra Natarajan, Wei Chung Hsu, Antonia Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Explosive growth in the availability of various kinds of data in both commercial and scientific domains have resulted in an unprecedented need to develop novel data-driven, knowledge discovery techniques. Data mining is one such data-centric application. It consists of methods to discover interesting, nontrivial, and useful patterns hidden within massive amounts of data. Researchers from both academia and industry have recognized that the challenges of data mining applications will help shape the future of multi-core processor and parallelizing compiler designs. However, relatively little has been done to understand the performance characteristics of these applications on modern multi-core processors. The exponential growth of on-chip resources make it critical to exploit parallelism at all granularities for improving the performance of data mining applications. In this paper, we examine the instruction-level, memory-level and thread-level parallelism available in data mining applications. We observe that (i) data mining applications have a slightly different instruction mix from SPEC integer applications, and this difference can potentially lead to different ILP extraction; ii) although many data mining applications suffer from data cache miss penalty, similar to SPEC integer applications, different techniques must be developed to enable effective prefetching due to the existance of complex and irregular data structures, such as hash tables; (iii) although data mining applications have large amount of thread-level parallelism, efficient extraction of such parallelism depends on on-chip cache performance; and (iv) the performance characteristics of data mining applications can vary at runtime, and thus techniques that dynamically tune the applications to adapt to such variations are desired.

Original languageEnglish (US)
Title of host publicationINTERACT-14 - Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
DOIs
StatePublished - 2010
Event2010 Workshop on Interaction between Compilers and Computer Architecture, INTERACT-14 - Pittsburgh, PA, United States
Duration: Mar 13 2010Mar 13 2010

Publication series

NameProceedings - Annual Workshop on Interaction between Compilers and Computer Architectures, INTERACT
ISSN (Print)1550-6207

Conference

Conference2010 Workshop on Interaction between Compilers and Computer Architecture, INTERACT-14
Country/TerritoryUnited States
CityPittsburgh, PA
Period3/13/103/13/10

Fingerprint

Dive into the research topics of 'Performance characterization of data mining benchmarks'. Together they form a unique fingerprint.

Cite this