Featured Product
This Week in Quality Digest Live
Risk Management Features
Gleb Tsipursky
Remodeling the work setup will help teams shift successfully across post-pandemic workplaces
Emily Newton
Consumers want to know that EVs are safe and reliable
Amrou Awaysheh
The world’s pledges so far aren’t enough to stop climate change, U.N. data show
Leo Simonovich
Siemens Energy’s AI-based monitoring and detection platform provides intelligent illumination for IoT cyber defense
Knowledge at Wharton
The global supply chain is affected by the pandemic but also China’s increased consumption of its own products

More Features

Risk Management News
Partnership embeds quality assurance at every stage of the product life cycle, enables agile product introduction
Both quality professionals and their business leaders agree that openness and communication is essential to moving forward
Greenlight Guru and MedCrypt partner to secure medical devices
Address equipment issues before a catastrophic failure occurs
Higher quality contributes to higher efficiency and less downtime
Design, develop, implement, continually improve risk management in systems and software engineering
ISO/IEC/IEEE 16085 has just been updated
Galileo’s Telescope describes how to measure success at the top of the organization, translate down to every level of supervision
NSF-funded project is developing a model to help manufacturers pivot and produce personal protective equipment

More News

Risk Management

Threat Models for Differential Privacy

A look at central, local, and hybrid models

Published: Monday, April 12, 2021 - 12:03

It’s not so simple to deploy a practical system that satisfies differential privacy. Our example in the last post was a simple Python program that adds Laplace noise to a function computed over the sensitive data. For this to work in practice, we’d need to collect all the sensitive data on one server to run our program.

What if that server gets hacked? Differential privacy provides no protection in this case—it only protects the output of our program.

When deploying differentially private systems, it’s important to consider the threat model—that is, what kind of adversaries we want the system to protect against. If the threat model includes adversaries who might compromise the server holding the sensitive data, then we need to modify the system to protect against this kind of attack.

The design of systems for differential privacy thus needs to consider both privacy and security. As described in our last post, “privacy” refers to controlling what can be inferred from a data release. We can think of security as an orthogonal issue: security mechanisms control who is allowed to access a piece of data, but make no guarantees about what can be learned from that data.

Central differential privacy

The most commonly used threat model in differential privacy research is called the “central model of differential privacy” (or simply, “central differential privacy”).

The key component of the central model is a trusted data curator. Each individual submits their sensitive data to the data curator, who stores all the data in a central location (i.e., on a single server). The data curator is trusted: We assume they will not look at the sensitive data directly, will not share it with anyone, and cannot be compromised by any other adversary. In other words, with this model we assume that the server holding the sensitive data cannot be hacked.

In the central model, we typically add noise to query results, as we did in our Laplace mechanism example. The advantage of this model is that it allows algorithms to add the smallest possible amount of noise, and therefore produce results with the maximum accuracy allowed under differential privacy. The figure below demonstrates this process. We place the privacy barrier in between the trusted data curator and the analyst; to the right of the privacy barrier, only differentially private results can be viewed, so the analyst does not need to be trusted.

Figure 1: Central model of differential privacy

The disadvantage of the central model is that it requires a trusted data curator—and many data curators are not considered trustworthy. In fact, lack of trust in the data collector is often a primary motivation for the use of differential privacy.

Local differential privacy

The “local model of differential privacy” addresses the security issue in the central model by eliminating the trusted data curator. Each individual adds noise to their own data before sending it to the data curator. This means that the data curator never sees the sensitive data, so does not need to be trusted. The figure below demonstrates the local model; here, the privacy barrier stands between each data owner and the (untrusted) data curator.


Figure 2: Local model of differential privacy

The local model of differential privacy avoids the security issues of the central model—if the data curator’s server is hacked, the hackers only see noisy data that already satisfy differential privacy. This is a major benefit of the local model, and it’s why the local model was adopted in Google’s RAPPOR system1 and Apple’s data collection system.2

The drawback? The local model produces less accurate answers than the central model. In the local model, each individual adds enough noise to satisfy differential privacy; thus, the total noise for all participants is much larger than the single noise sample used in the central model.

As a result, the local model is only useful for queries with a very strong “signal.” Apple’s system, for example, uses the local model to estimate the popularity of emojis, but the results are only useful for the most popular emojis (i.e., where the “signal” is strongest). The local model is typically not used for more complex queries, like those used in the U.S. Census3 or applications like machine learning.

Hybrid models

The central and local models have their advantages and drawbacks, so achieving the best of both is a natural goal and an active area of research.

One alternative approach is the “shuffling model,” implemented in a system called Prochlo.4 The shuffling model includes an untrusted data curator and individual data contributors, and adds a set of partially trusted shufflers. In this model, each individual adds a small amount of noise to their own data, and then submits it to the shuffler, which adds additional noise before forwarding batches of data to the data curator. The idea is that shufflers are unlikely to collude with the data curator or each other, so the small amount of noise added by individuals is sufficient to guarantee privacy. Each shuffler operates on a batch of inputs—in the same way as the central model—so a small amount of additional noise guarantees privacy for the whole batch. The shuffling model is a compromise between the local and central models: It allows adding less noise than the local model but requires more noise than the central model.

Another possibility is to combine differential privacy with techniques from cryptography, such as secure multiparty computation (MPC) or fully homomorphic encryption (FHE). FHE allows computing on encrypted data without decrypting it first, and MPC allows a group of parties to securely compute functions over distributed inputs without revealing the inputs. Computing a differentially private function using secure computation is a promising way to achieve the accuracy of the central model with the security benefits of the local model. In this approach, the use of secure computation eliminates the need for a trusted data curator. Recent work5 demonstrates the promise of combining MPC and differential privacy, and achieves most of the benefits of both the central and local models. In most cases, secure computation is several orders of magnitude slower than native execution, which is often impractical for large datasets or complex queries. Secure computation is an active area of research, however, and its performance is improving quickly. (Stay tuned for a more complete blog post on this topic later in the series.)

Coming up next
In our next post, we’ll look at our first open-source tools for enforcing differential privacy. We’ll explore tools that can be deployed to answer counting queries on real databases, including tools accessible to nonexperts and ones that can scale to very large databases, like the ones used by the U.S. Census Bureau to count people with differential privacy.

References
1. Erlingsson, Úlfar, Vasyl Pihur, and Aleksandra Korolova. “Rappor: Randomized aggregatable privacy-preserving ordinal response.” In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1,054–1,067. 2014.
2. Apple Inc. “Apple Differential Privacy Technical Overview.” Accessed July 31, 2020.
3. Garfinkel, Simson L., John M. Abowd, and Sarah Powazek. “Issues encountered deploying differential privacy.” In Proceedings of the 2018 Workshop on Privacy in the Electronic Society, pp. 133–137. 2018.
4. Bittau, Andrea, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. “Prochlo: Strong Privacy for Analytics in the Crowd.” In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 441–459. 2017.
5. Roy Chowdhury, Amrita, Chenghong Wang, Xi He, Ashwin Machanavajjhala, and Somesh Jha. “Cryptε: Crypto-Assisted Differential Privacy on Untrusted Servers.” In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 603–619. 2020.

First published Sept. 15, 2020 on NIST’s Cybersecurity Insights blog.

Discuss

About The Authors

Joseph Near’s picture

Joseph Near

Joseph Near is an assistant professor of computer science at the University of Vermont who supports NIST as a moderator for the Privacy Engineering Collaboration Space.

David Darais’s picture

David Darais

David Darais is a principal scientist at Galois and supports NIST as a moderator for the Privacy Engineering Collaboration Space.