Collaborative Research: SaTC: CORE: Small: Foundations for the Next Generation of Private Learning Systems

Project: Research project

Project Details

Description

Recent advances in large-scale machine learning (ML) promise a range of benefits to society, but also introduce new risks. One major risk is a loss of privacy for the individuals whose data powers the machine learning algorithms. There are now convincing demonstrations that algorithms for machine learning can reveal sensitive information about individuals in their training data by memorizing specific strings of sensitive text such as bank account numbers or through membership-inference attacks. In the recent years, a framework called differential privacy---a mathematically principled, quantitative notion of what it means for an algorithm to ensure privacy for the individuals who contribute training data---has led to significant progress towards privacy in machine learning. This progress offers a proof-of-concept that we can hope to enjoy some of the benefits of using machine learning on sensitive data, while measuring and limiting breaches of confidentiality. This project will investigate and begin to make some of the fundamental advances that are necessary to make differentially private ML a viable technology. The focus will be on laying the groundwork for differentially private ML for entire systems, rather than for standalone tasks, which have been the focus of prior work. This project team comprising researchers with a broad range of expertise in ML, algorithms, systems, and cybersecurity, has planned a set of education tasks: public-facing set of course materials on differentially private machine learning and statistics and and an undergraduate-level textbook on differential privacy.

This project includes three technical thrusts that will lay the groundwork for future efforts to build private ML systems. The first thrust will be to improve the foundational algorithms that enable differentially private ML on high-dimensional data. The second thrust will be to build a bridge between algorithms for standalone ML tasks and algorithms for systems-level workloads of ML tasks, by developing differentially private algorithms for training many personalized models, which is a paradigmatic workload in ML. The final thrust will consist of empirical work on auditing differentially private ML methods to understand how the real-world privacy costs compare to those predicted by the theory of differential privacy when these algorithms are used as part of realistic workloads, such as models that are continually updated with new data. This privacy auditing will also facilitate detecting unwanted memorization of training data in machine learning, and also provide more quantitative approaches to auditing differentially private algorithms based on membership-inference and data poisoning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date10/1/219/30/23

Funding

  • National Science Foundation: $115,995.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.