Cost for QD employees to rent an apartment in Chico, CA. $1,200/month. Please turn off your ad blocker in Quality Digest
Our landlords thank you.
Published: Monday, April 12, 2021 - 12:03 It’s not so simple to deploy a practical system that satisfies differential privacy. Our example in the last post was a simple Python program that adds Laplace noise to a function computed over the sensitive data. For this to work in practice, we’d need to collect all the sensitive data on one server to run our program. What if that server gets hacked? Differential privacy provides no protection in this case—it only protects the output of our program. When deploying differentially private systems, it’s important to consider the threat model—that is, what kind of adversaries we want the system to protect against. If the threat model includes adversaries who might compromise the server holding the sensitive data, then we need to modify the system to protect against this kind of attack. The design of systems for differential privacy thus needs to consider both privacy and security. As described in our last post, “privacy” refers to controlling what can be inferred from a data release. We can think of security as an orthogonal issue: security mechanisms control who is allowed to access a piece of data, but make no guarantees about what can be learned from that data. The most commonly used threat model in differential privacy research is called the “central model of differential privacy” (or simply, “central differential privacy”). The key component of the central model is a trusted data curator. Each individual submits their sensitive data to the data curator, who stores all the data in a central location (i.e., on a single server). The data curator is trusted: We assume they will not look at the sensitive data directly, will not share it with anyone, and cannot be compromised by any other adversary. In other words, with this model we assume that the server holding the sensitive data cannot be hacked. In the central model, we typically add noise to query results, as we did in our Laplace mechanism example. The advantage of this model is that it allows algorithms to add the smallest possible amount of noise, and therefore produce results with the maximum accuracy allowed under differential privacy. The figure below demonstrates this process. We place the privacy barrier in between the trusted data curator and the analyst; to the right of the privacy barrier, only differentially private results can be viewed, so the analyst does not need to be trusted. Figure 1: Central model of differential privacy The disadvantage of the central model is that it requires a trusted data curator—and many data curators are not considered trustworthy. In fact, lack of trust in the data collector is often a primary motivation for the use of differential privacy. The “local model of differential privacy” addresses the security issue in the central model by eliminating the trusted data curator. Each individual adds noise to their own data before sending it to the data curator. This means that the data curator never sees the sensitive data, so does not need to be trusted. The figure below demonstrates the local model; here, the privacy barrier stands between each data owner and the (untrusted) data curator. The local model of differential privacy avoids the security issues of the central model—if the data curator’s server is hacked, the hackers only see noisy data that already satisfy differential privacy. This is a major benefit of the local model, and it’s why the local model was adopted in Google’s RAPPOR system1 and Apple’s data collection system.2 The drawback? The local model produces less accurate answers than the central model. In the local model, each individual adds enough noise to satisfy differential privacy; thus, the total noise for all participants is much larger than the single noise sample used in the central model. As a result, the local model is only useful for queries with a very strong “signal.” Apple’s system, for example, uses the local model to estimate the popularity of emojis, but the results are only useful for the most popular emojis (i.e., where the “signal” is strongest). The local model is typically not used for more complex queries, like those used in the U.S. Census3 or applications like machine learning. The central and local models have their advantages and drawbacks, so achieving the best of both is a natural goal and an active area of research. One alternative approach is the “shuffling model,” implemented in a system called Prochlo.4 The shuffling model includes an untrusted data curator and individual data contributors, and adds a set of partially trusted shufflers. In this model, each individual adds a small amount of noise to their own data, and then submits it to the shuffler, which adds additional noise before forwarding batches of data to the data curator. The idea is that shufflers are unlikely to collude with the data curator or each other, so the small amount of noise added by individuals is sufficient to guarantee privacy. Each shuffler operates on a batch of inputs—in the same way as the central model—so a small amount of additional noise guarantees privacy for the whole batch. The shuffling model is a compromise between the local and central models: It allows adding less noise than the local model but requires more noise than the central model. Another possibility is to combine differential privacy with techniques from cryptography, such as secure multiparty computation (MPC) or fully homomorphic encryption (FHE). FHE allows computing on encrypted data without decrypting it first, and MPC allows a group of parties to securely compute functions over distributed inputs without revealing the inputs. Computing a differentially private function using secure computation is a promising way to achieve the accuracy of the central model with the security benefits of the local model. In this approach, the use of secure computation eliminates the need for a trusted data curator. Recent work5 demonstrates the promise of combining MPC and differential privacy, and achieves most of the benefits of both the central and local models. In most cases, secure computation is several orders of magnitude slower than native execution, which is often impractical for large datasets or complex queries. Secure computation is an active area of research, however, and its performance is improving quickly. (Stay tuned for a more complete blog post on this topic later in the series.) Coming up next References First published Sept. 15, 2020 on NIST’s Cybersecurity Insights blog. Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types. However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads. So please consider turning off your ad blocker for our site. Thanks, Joseph Near is an assistant professor of computer science at the University of Vermont who supports NIST as a moderator for the Privacy Engineering Collaboration Space. David Darais is a principal scientist at Galois and supports NIST as a moderator for the Privacy Engineering Collaboration Space.Threat Models for Differential Privacy
A look at central, local, and hybrid models
Central differential privacy
Local differential privacy
Figure 2: Local model of differential privacyHybrid models
In our next post, we’ll look at our first open-source tools for enforcing differential privacy. We’ll explore tools that can be deployed to answer counting queries on real databases, including tools accessible to nonexperts and ones that can scale to very large databases, like the ones used by the U.S. Census Bureau to count people with differential privacy.
1. Erlingsson, Úlfar, Vasyl Pihur, and Aleksandra Korolova. “Rappor: Randomized aggregatable privacy-preserving ordinal response.” In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1,054–1,067. 2014.
2. Apple Inc. “Apple Differential Privacy Technical Overview.” Accessed July 31, 2020.
3. Garfinkel, Simson L., John M. Abowd, and Sarah Powazek. “Issues encountered deploying differential privacy.” In Proceedings of the 2018 Workshop on Privacy in the Electronic Society, pp. 133–137. 2018.
4. Bittau, Andrea, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. “Prochlo: Strong Privacy for Analytics in the Crowd.” In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 441–459. 2017.
5. Roy Chowdhury, Amrita, Chenghong Wang, Xi He, Ashwin Machanavajjhala, and Somesh Jha. “Cryptε: Crypto-Assisted Differential Privacy on Untrusted Servers.” In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 603–619. 2020.
Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.
Quality Digest Discuss
About The Authors
Joseph Near
David Darais
© 2022 Quality Digest. Copyright on content held by Quality Digest or by individual authors. Contact Quality Digest for reprint information.
“Quality Digest" is a trademark owned by Quality Circle Institute, Inc.