Model-agnostic Methods for Text Classification with Inherent Noise

Kshitij Tayal, Rahul Ghosh, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Text classification is a fundamental problem, and recently, deep neural networks (DNN) have shown promising results in many natural language tasks. However, their human-level performance relies on high-quality annotations, which are time-consuming and expensive to collect. As we move towards large inexpensive datasets, the inherent label noise degrades the generalization of DNN. While most machine learning literature focuses on building complex networks to handle noise, in this work, we evaluate model-agnostic methods to handle inherent noise in large scale text classification that can be easily incorporated into existing machine learning workflows with minimal interruption. Specifically, we conduct a point-by-point comparative study between several noise-robust methods on three datasets encompassing three popular classification models. To our knowledge, this is the first time such a comprehensive study in text classification encircling popular models and model-agnostic loss methods has been conducted. In this study, we describe our learning and demonstrate the application of our approach, which outperformed baselines by up to 10 % in classification accuracy while requiring no network modifications. Code for this paper is hosted at www.kshitijtayal.com/code/model-agnostic-methods.

Original languageEnglish (US)
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Industry Track
EditorsAnn Clifton, Courtney Napoles
PublisherAssociation for Computational Linguistics (ACL)
Pages202-213
Number of pages12
ISBN (Electronic)9781952148293
DOIs
StatePublished - 2020
Event28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain
Duration: Dec 12 2020 → …

Publication series

NameCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Industry Track

Conference

Conference28th International Conference on Computational Linguistics, COLING 2020
Country/TerritorySpain
CityVirtual, Online
Period12/12/20 → …

Bibliographical note

Funding Information:
This research was supported by National Science Foundation under the grant 1838159 and 1739191. Access to computing facilities was provided by the University of Minnesota Supercomputing Institute.

Publisher Copyright:
© COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Industry Track.

Fingerprint

Dive into the research topics of 'Model-agnostic Methods for Text Classification with Inherent Noise'. Together they form a unique fingerprint.

Cite this