Abstract
Supervised learning scenarios, where labels and features are possibly mismatched, have been an emerging concern in machine learning applications. For example, researchers often need to align heterogeneous data from multiple resources to the same entities without a unique identifier in the socioeconomic study. Such a mismatch problem can significantly affect the learning performance if it is not appropriately addressed. Due to the combinatorial nature of the mismatch problem, existing methods are often designed for small datasets and simple linear models but are not scalable to large-scale datasets and complex models. In this paper, we first present a new formulation of the mismatch problem that supports continuous optimization problems and allows for gradient-based methods. Moreover, we develop a computation and memory efficient method to process complex data and models. Empirical studies on synthetic and real-world data show significantly better performance of the proposed algorithms than state-of-the-art methods.
Original language | English (US) |
---|---|
Title of host publication | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 4228-4232 |
Number of pages | 5 |
ISBN (Electronic) | 9781665405409 |
DOIs | |
State | Published - 2022 |
Event | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: May 23 2022 → May 27 2022 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2022-May |
ISSN (Print) | 1520-6149 |
Conference
Conference | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 |
---|---|
Country/Territory | Singapore |
City | Virtual, Online |
Period | 5/23/22 → 5/27/22 |
Bibliographical note
Funding Information:This paper is based upon work supported by the Cisco Research and the National Science Foundation under grant number DMS-2134148.
Publisher Copyright:
© 2022 IEEE
Keywords
- Mismatched Data
- Permutation Matrix
- Stochastic Gradient Descent (SGD)
- Supervised Learning