The Composition Theorem for Differential Privacy

Peter Kairouz, Sewoong Oh, Pramod Viswanath

Introduction

Differential privacy is a formal framework to quantify to what extent individual privacy in a statistical database is preserved while releasing useful aggregate information about the database. It provides strong privacy guarantees by requiring the indistinguishability of whether an individual is in the database or not based on the released information, regardless of the side information on the other aspects of the database the adversary may possess. Denoting the database when the individual is present as $D$ and as $D^{\prime}$ when the individual is not, a differentially private mechanism provides indistinguishability guarantees with respect to the pair $(D,D^{\prime})$ . More generally, we consider pairs of databases that indistinguishability is guaranteed for as “neighbors”. The formal definition of $(\epsilon,\delta)$ -differential privacy is the following.

A randomized mechanism $M$ over a set of databases is $(\varepsilon,\delta)$ -differentially private if for all pairs of neighboring databases $D$ and $D^{\prime}$ , and for all sets $S$ in the output space of the mechanism ${\cal X}$ ,

We start with the view of differential privacy as providing certain guarantees for the two error types (false alarm and missed detection) in a binary hypothesis testing problem (involving two neighboring databases), as in previous work [WZ10]. We brings two benefits of this operational interpretation of the privacy definition to bear on the problem at hand.

The first is conceptual: the operational setting directs the logic of the steps of the proof, makes the arguments straightforward and readily allows generalizations such as heterogeneous compositions.

The second is technical: the operational interpretation of hypothesis testing brings both the natural data processing inequality, and the strong converse to the data processing inequality. These inequalities, while simple by themselves, lead to surprisingly strong technical results. As an aside, we mention that there is a strong tradition of such derivations in the information theory literature: the Fisher information inequality [Bla65, Zam98], the entropy power inequality [Sta59, Bla65, VG06], an extremal inequality involving mutual informations [LV07], matrix determinant inequalities [CT88], the Brunn-Minkowski inequality and its functional analytic variants [DCT91] – Chapter 17 of [CT12] enumerates a detailed list – were all derived using operational interpretations of mutual information and corresponding data processing inequalities.

One special case of our results, the strengthening of the state-of-the-art result in [DRV10], could also have been arrived at directly by using stronger technical methods than used in [DRV10]. Specifically, we use a direct expression for the privacy region (instead of an upper bound) to arrive at our strengthened result.

The optimal composition theorem (Theorem 3.3) provides a fundamental limit on how much privacy degrades under composition. Such a characterization is a basic result in differential privacy and has been used widely in the literature [DRV10, HLM10, BBDS12, GRU12, MN12, HR13]. In each of these instances, the optimal composition theorem derived here (or the simpler characterization of Theorem 3.4) could be “cut-and-pasted”, allowing for corresponding strengthening of their conclusions. We demonstrate this strengthening for two instances: variance of noise adding mechanisms in Section 4.1 and [BBDS12] in Appendix C.1. We further show that a variety of existing noise adding mechanisms ensure the same level of privacy with similar variances. This implies that there is nothing special about the popular choice of adding a Gaussian noise when composing multiple queries, and the same utility as measured through the noise variance can be obtained using other known mechanisms. As an application to the operational definition of differential privacy, we prove, in Section 5, that a simple non-interactive randomize response mechanism is optimal in secure multi-party computation. We start our discussions by operationally introducing differential privacy as certain guarantees on the error probabilities in a binary hypothesis testing problem.

Differential Privacy as Hypothesis Testing

Given a random output $Y$ of a database access mechanism $M$ , consider the following hypothesis testing experiment. We choose a null hypothesis as database $D_{0}$ and alternative hypothesis as $D_{1}$ :

For any $\varepsilon\geq 0$ and $\delta\in$ , a database mechanism $M$ is $(\varepsilon,\delta)$ -differentially private if and only if the following conditions are satisfied for all pairs of neighboring databases $D_{0}$ and $D_{1}$ , and all rejection region $S\subseteq{\cal X}$ :

This operational perspective of differential privacy relates the privacy parameters $\varepsilon$ and $\delta$ to a set of conditions on probability of false alarm and missed detection. This shows that it is impossible to get both small $P_{\rm MD}$ and $P_{\rm FA}$ from data obtained via a differentially private mechanism, and that the converse is also true. This operational interpretation of differential privacy suggests a graphical representation of differential privacy as illustrated in Figure 1. We define the privacy region for $(\varepsilon,\delta)$ -differential privacy as

Similarly, we define the privacy region of a database access mechanism $M$ with respect to two neighboring databases $D$ and $D^{\prime}$ as

For all neighboring databases $D$ and $D^{\prime}$ , and a database access mechanism $M$ , the pair of a false alarm and a missed detection probabilities achieved by any decision rule $\gamma$ is included in the privacy region:

Let $D\sim D^{\prime}$ denote that the two databases are neighbors. The union over all neighboring databases define the privacy region of the mechanism.

The following corollary, which follows immediately from Theorem 2.1, gives a necessary and sufficient condition on the privacy region for $(\varepsilon,\delta)$ -differential privacy.

A mechanism $M$ is $(\varepsilon,\delta)$ -differentially private if and only if ${\cal R}(M)\subseteq{\cal R}(\varepsilon,\delta)$ .

To illustrate the strengths of the graphical representation of differential privacy, we provide simpler proofs for some well-known results in differential privacy in Appendix A.

Consider two database access mechanisms $M(\cdot)$ and $M^{\prime}(\cdot)$ . Let $X$ and $Y$ denote the random outputs of mechanisms $M$ and $M^{\prime}$ respectively. We say $M$ dominates $M^{\prime}$ if $M^{\prime}(D)$ is conditionally independent of the database $D$ conditioned on the outcome of $M(D)$ . In other words, the database $D$ , $X=M(D)$ and $Y=M^{\prime}(D)$ form the following Markov chain: $D$ – $X$ – $Y$ .

If a mechanism $M$ dominates a mechanism $M^{\prime}$ , then for all pairs of neighboring databases $D_{1}$ and $D_{2}$ ,

We provide a proof in Section 9.1. Wasserman and Zhu [WZ10, Lemma 2.6] have proved that, for a special case when $M$ is $(\varepsilon,0$ )-differentially private, $M^{\prime}$ is also $(\varepsilon,0$ )-differentially private, which is a corollary of the above theorem. Perhaps surprisingly, the converse is also true.

Fix a pair of neighboring databases $D_{1}$ and $D_{2}$ and let $X$ and $Y$ denote the random outputs of mechanisms $M$ and $M^{\prime}$ , respectively. If $M$ and $M^{\prime}$ satisfy

then there exists a coupling of the random outputs $X$ and $Y$ such that they form a Markov chain $D$ – $X$ – $Y$ where $D\in\{D_{1},D_{2}\}$ .

When the privacy region of $M^{\prime}$ is included in $M$ , then there exists a stochastic transformation $T$ that operates on $X$ and produce a random output that has the same marginal distribution as $Y$ conditioned on the database $D$ . We can consider this mechanism $T$ as a privatization mechanism that takes a (privatized) output $X$ and provides even further privatization. The above theorem was proved in [Bla53, Corollary of Theorem 10] in the context of comparing two experiments, where a statistical experiment denotes a mechanism in the context of differential privacy.

Composition of Differentially Private Mechanisms

In this section, we address how differential privacy guarantees compose: when accessing databases multiple times via differentially private mechanisms, each of which having its own privacy guarantees, how much privacy is still guaranteed on the union of those outputs? To formally define composition, we consider the following scenario known as the ‘composition experiment’, proposed in [DRV10].

A composition experiment takes as input a parameter $b\in\{0,1\}$ , and an adversary ${\cal A}$ . From the hypothesis testing perspective proposed in the previous section, $b$ can be interpreted as the hypothesis: null hypothesis for $b=0$ and alternative hypothesis for $b=1$ . At each time $i$ , a database $D^{i,b}$ is accessed depending on $b$ . For example, one includes a particular individual and another does not. An adversary ${\cal A}$ is trying to break privacy (and figure out whether the particular individual is in the database or not) by testing the hypotheses on the output of $k$ sequential access to those databases via differentially private mechanisms. In full generality, we allow the adversary to have full control over which pair of databases to access, which query to ask, and which mechanism to be used at each repeated access. Further, the adversary is free to make these choices adaptively based on the previous outcomes. The only restrictions are the differentially private mechanisms belong to a family ${\cal M}$ (e.g., the family of all $(\varepsilon,\delta)$ -differentially private mechanisms), the internal randomness of the mechanisms are independent at each repeated access, and that the hypothesis $b$ is not known to the adversary.

The outcome of this $k$ -fold composition experiment is the view of the adversary ${\cal A}$ : $V^{b}\equiv(R,Y_{1}^{b},\ldots,Y_{k}^{b})$ , which is the sequence of random outcomes $Y_{1}^{b},\ldots,Y_{k}^{b}$ , and the outcome $R$ of any internal randomness of ${\cal A}$ .

In terms of testing whether a particular individual is in the database $(b=0)$ or not $(b=1$ ), we want to characterize how much privacy degrades after a $k$ -fold composition experiment. It is known that the privacy degrades under composition by at most the ‘sum’ of the differential privacy parameters of each access.

For any $\varepsilon>0$ and $\delta\in$ , the class of $(\varepsilon,\delta)$ -differentially private mechanisms satisfy $(k\varepsilon,k\delta)$ -differential privacy under $k$ -fold adaptive composition.

We give a complete answer to this fundamental question in the following theorems. We prove a tighter bound on the privacy under composition. Further, we also prove the achievability of the privacy guarantee: we provide a set of mechanisms such that the privacy region under $k$ -fold composition is exactly the region defined by the conditions in (5). Hence, this bound on the privacy region is tight and cannot be improved upon.

For any $\varepsilon\geq 0$ and $\delta\in$ , the class of $(\varepsilon,\delta)$ -differentially private mechanisms satisfies

under $k$ -fold adaptive composition, for all $i=\{0,1,\ldots,\lfloor k/2\rfloor\}$ , where

Hence, the privacy region of $k$ -fold composition is an intersection of $k$ regions, each of which is $((k-2i)\varepsilon,1-(1-\delta)^{k}(1-\delta_{i}))$ -differentially private: ${\cal R}(\{(k-2i)\varepsilon,1-(1-\delta)^{k}(1-\delta_{i})\}_{i\in[k/2]})\,\equiv\bigcap_{i=0}^{\lfloor\frac{k}{2}\rfloor}{\cal R}((k-2i)\varepsilon,1-(1-\delta)^{k}(1-\delta_{i}))$ . We give a proof in Section 6 where we give an explicit mechanism that achieves this region under composition. Hence, this bound on the privacy region is tight, and gives the exact description of how much privacy can degrade under $k$ -fold adaptive composition. This settles the question left open in [DMNS06, DKM+06a, DL09, DRV10] by providing, for the first time, the fundamental limit of composition, and proving a matching mechanism with the worst-case privacy degradation.

To prove the optimality of our main result in Theorem 3.3, namely that it is impossible to have a privacy worse than (5), we rely on the operational interpretation of the privacy as hypothesis testing. To this end, we use the new analysis tools (Theorem 2.4 and Theorem 2.5) provided in the previous section. Figure 2 illustrates how much the privacy region of Theorem 3.3 degrades as we increase the number of composition $k$ . Figure 3 provides a comparison of the three privacy guarantees in Theorems 3.1, 3.2 and 3.3 for $30$ -fold composition of $(0.1,0.001)$ -differentially private mechanisms. Smaller region gives a tighter bound, since it guarantees the higher privacy.

2 Simplified privacy region under composition

In the high privacy regime, where $\varepsilon\leq 0.9$ , this bound can be further simplified as

3 Composition Theorem for Heterogeneous Mechanisms

Applications of the Optimal Composition Theorem

When analyzing a complex mechanism with multiple sub-mechanisms each with ( $\varepsilon_{0},\delta_{0}$ )-differential privacy guarantee, we can apply the composition theorem (Theorem 3.3 and Theorem 3.4). To ensure overall $(\varepsilon,\delta)$ -differential privacy for the whole complex mechanism, one chooses $\varepsilon_{0}=\varepsilon/(2\sqrt{k\log(e+\varepsilon/\delta)})$ and $\delta_{0}=\delta/2k$ , when there are $k$ sub-mechanisms. The existing composition theorem guarantees the desired overall privacy. Then, the utility of the complex mechanism is calculated for the choice of $\varepsilon_{0}$ and $\delta_{0}$ .

Following this recipe, we first provide a sufficient condition on the variance of noise adding mechanisms. This analysis shows that one requires smaller variance than what is previously believed, in the regime where $\varepsilon=\Theta(\delta)$ . Further, we show that a variety of known mechanisms achieve the desired privacy under composition with the same level of variance. Applying this analysis to known mechanisms for cut queries of a graph, we show that again in the regime where $\varepsilon=\Theta(\delta)$ , one can achieve the desired privacy under composition with improved utility.

For count queries with sensitivity one, the geometric noise adding mechanism is known to be universally optimal in a general cost minimization framework (Bayesian setting in [GRS12] and worst-case setting in [GV12]). Here we provide a new interpretation of the geometric noise adding mechanism as an optimal mechanism under composition for counting queries. In the course of proving Theorem 3.3, we show that a family of mechanisms are optimal under composition, in the sense that they achieve the largest privacy region among $k$ -fold compositions of any $(\varepsilon_{i},\delta_{i})$ differentially private mechanisms. Larger region under composition implies that one can achieve smaller error rates, while ensuring the same level of privacy at each step of the composition. In this section, we show that the geometric mechanism is one of such mechanisms, thus providing the new interpretation to the optimality of the geometric mechanisms.

where $\sim$ indicates that the pair of databases are neighbors. A common approach to privatize such a query output is to add noise to it, and the variance of the noise grows with sensitivity of the query and the desired level of privacy. A popular choice of the noise is Gaussian. It is previously known that it is sufficient to add Gaussian noise with variance $O(k\Delta^{2}\log(1/\delta)/\varepsilon^{2})$ to each query output in order to ensure $(\varepsilon,\delta)$ -differential privacy under $k$ -fold composition. We improve the analysis of Gaussians under composition, and show that for a certain regime where $\varepsilon=\Theta(\delta)$ , the sufficient condition can be improved by a log factor.

When composing real-valued queries, the Gaussian mechanism is a popular choice [DN03, DN04, BDMN05, BBDS12, HR13]. However, we show that there is nothing special about Gaussian mechanisms for composition. We prove that the Laplacian mechanism or the staircase mechanism introduced in [GV12] can achieve the same level of privacy under composition with the same variance.

For any $\varepsilon\in(0,0.9]$ and $\delta\in(0,1]$ , if the database access mechanism satisfies $(\sqrt{\varepsilon^{2}/4k\log(e+(\varepsilon/\delta))},\delta/2k)$ -differential privacy on each query output, then it satisfies $(\varepsilon,\delta)$ -differential privacy under $k$ -fold composition.

One of the most popular noise adding mechanisms is the Laplacian mechanism, which adds Laplacian noise to real-valued query outputs. When the sensitivity is $\Delta$ , one can achieve $(\varepsilon_{0},0)$ -differential privacy with the choice of the distribution ${\rm Lap}(\varepsilon_{0}/\Delta)=(\varepsilon_{0}/2\Delta)e^{-\varepsilon_{0}|x|/\Delta}$ . The resulting variance of the noise is $2\Delta^{2}/\varepsilon_{0}^{2}$ . The above corollary implies a certain sufficient condition on the variance of the Laplacian mechanism to ensure privacy under composition.

For real-valued queries with sensitivity $\Delta>0$ , the mechanism that adds Laplacian noise with variance $\big{(}8k\Delta^{2}\log\big{(}e+(\varepsilon/\delta)\big{)}/\varepsilon^{2}\big{)}$ satisfies $(\varepsilon,\delta)$ -differential privacy under $k$ -fold adaptive composition for any $\varepsilon\in(0,0.9]$ and $\delta\in(0,1]$ .

In terms of variance-privacy trade-off for real-valued queries, the optimal noise-adding mechanism known as the staircase mechanism was introduced in [GV12]. The probability density function of this noise is piecewise constant, and the probability density on the pieces decay geometrically. It is shown in [GV13] that that with variance of $O(\min\{1/\varepsilon^{2},1/\delta^{2}\})$ , the staircase mechanism achieved $(\varepsilon,\delta)$ -differential privacy. Corollary 4.1 implies that with variance $O\big{(}k\Delta^{2}\log(e+\varepsilon/\delta)/\varepsilon^{2}\big{)}$ , the staircase mechanism satisfies $(\varepsilon,\delta)$ -differential privacy under $k$ -fold composition.

Another popular mechanism known as the Gaussian mechanism privatizes each query output by adding a Gaussian noise with variance $\sigma^{2}$ . It is not difficult to show that when the sensitivity of the query is $\Delta$ , with a choice of $\sigma^{2}\geq 2\Delta^{2}\log(2/\delta_{0})/\varepsilon_{0}^{2}$ , the Gaussian mechanism satisfies $(\varepsilon_{0},\delta_{0})$ -differential privacy (e.g. [DKM+06a]). The above corollary implies that the Gaussian mechanism with variance $O(k\Delta^{2}\log(1/\delta)\log(e+(\varepsilon/\delta))/\varepsilon^{2})$ ensures $(\varepsilon,\delta)$ -differential privacy under $k$ -fold composition. However, we can get a tighter sufficient condition by directly analyzing how Gaussian mechanisms compose, and the proof is provided in Appendix B.

For real-valued queries with sensitivity $\Delta>0$ , the mechanism that adds Gaussian noise with variance $\big{(}8k\Delta^{2}\log\big{(}e+(\varepsilon/\delta)\big{)}/\varepsilon^{2}\big{)}$ satisfies $(\varepsilon,\delta)$ -differential privacy under $k$ -fold adaptive composition for any $\varepsilon>0$ and $\delta\in(0,1]$ .

It is previously known that it is sufficient to add i.i.d. Gaussian noise with variance $O(k\Delta^{2}\log(1/\delta)/\varepsilon^{2})$ to ensure $(\varepsilon,\delta)$ -differential privacy under $k$ -fold composition (e.g. [HT10, Theorem 2.7]). The above theorem shows that when $\delta=\Theta(\varepsilon)$ , one can achieve the same privacy with smaller variance by a factor of $\log(1/\delta)$ .

2 Geometric noise adding mechanism under composition

The geometric noise adding mechanism is a discrete variant of the popular Laplacian mechanism. For integer-valued queries with sensitivity one, the mechanism adds a noise distributed according to a double-sided geometric distribution whose probability density function is $p(k)=\big{(}(e^{\varepsilon}-1)/(e^{\varepsilon}+1)\big{)}e^{-\varepsilon|k|}$ . This mechanism is known to be universally optimal in a general cost minimization framework (Bayesian setting in [GRS12] and worst-case setting in [GV12]). In this section, we show that the geometric noise adding mechanism achieves the fundamental limit on the privacy region under composition.

Consider the composition experiment for counting queries. For a pair of neighboring databases $D_{0}$ and $D_{1}$ , some of the query outputs differ by one, since sensitivity is one, and for other queries the output might be the same. Let $k$ denote the number of queries whose output differs with respect to $D_{0}$ and $D_{1}$ . Then, we show in Section C that the privacy region achieved by geometric mechanism, that adds geometric noise for each integer-valued query output, is exactly described by the optimal composition theorem of (5). Further, since this is the largest privacy region under composition for the pair of database $D_{0}$ and $D_{1}$ that differ in $k$ queries, no other mechanism can achieve a larger privacy region. Since the geometric mechanism does not depend on the particular choice of pairs of databases $D_{0}$ and $D_{1}$ , nor does it depend on the specific query being asked, the mechanism achieves the exact composed privacy region universally for every pair of neighboring databases simultaneously.

Among the mechanisms guaranteeing the same level of privacy, one with larger privacy region under composition is considered better, in terms of allowing for smaller false alarm and missed detection rate in hypothesis testing whether the database contains a particular entry or not. In this sense, larger privacy degradation under composition has more utility. The geometric mechanism has the largest possible privacy degradation under composition, stated formally below; the proof is deferred to Appendix C.

Under the $k$ -fold composition experiment of counting queries, the geometric mechanism achieves the largest privacy region among all $(\varepsilon,0)$ -differentially private mechanisms, universally for every pair of neighboring databases simultaneously.

Applications of the Operational Interpretation to Private Multi-Party Computation

In this section, we showcase the power of the operational interpretation of differential privacy in the differentially private multi-party computation (MPC) setting [BNO08, DKM+06b, MMP+10, GMPS13]. We study the following problem of secure multi-party differential privacy: each party possesses a single bit of information; the information bits are statistically independent. Each party is interested in computing a function, which could differ from party to party, and there could be a central observer (observing the entire transcript of the interactive communication protocol) interested in computing a separate function. The interactive communication is achieved via a broadcast channel that all parties and central observer can hear. It is useful to distinguish between two types of communication protocols: interactive and non-interactive. We say a communication protocol is non-interactive if a message broadcasted by one party does not depend on the messages broadcasted by any other parties. In contrast, interactive protocols allows the messages at any stage of the communication to depend on all the previous messages.

Our main result is the exact optimality of a simple non-interactive protocol in terms of maximizing accuracy for given privacy levels: each party randomizes (sufficiently) and publishes its own bit. Each party and the central observer then separately compute their respective decision functions to maximize the appropriate notion of their accuracy measure. The optimality is general: it holds for all types of functions, heterogeneous privacy conditions on the parties, all types of cost metrics, and both average and worst-case (over the inputs) measures of accuracy. Finally, the optimality result is simultaneous, in terms of maximizing accuracy at each of the parties and the central observer. Each party only needs to know its own desired level of privacy, its own function to be computed, and its measure of accuracy. Optimal data release and optimal decision making are naturally separated.

The proof of this result critically relies on the operational interpretation of differential privacy. In this multi-party and local privacy setting, we show that the randomized response still dominates any other $(\varepsilon,\delta)$ -differentially private mechanisms. Given this, any other mechanism, interactive or not, can be simulated at the receiver. This powerful technique bypasses the previous results on the same setting, where weaker results were proved with heavier proof techniques. In [GMPS13], optimal mechanisms are proposed for only two-party computation and only for AND and XOR functions. In [KOV15], only $(\varepsilon,0)$ -differential privacy is addressed, and the proof techniques developed in [KOV15] cannot be generalized to the more general $(\varepsilon,\delta)$ -differential privacy setting.

Consider the setting where there are $k$ parties, each with its own private binary data $x_{i}\in\{0,1\}$ generated independently. The independence assumption here is necessary because without it each party can learn something about others, which violates differential privacy, even without revealing any information. Differential privacy implicitly imposes independence in a multi-party setting. The goal of each party $i\in[k]$ is to compute an arbitrary function $f_{i}:\{0,1\}^{k}\to{\cal Y}$ of interest by interactively broadcasting messages. There might be a central observer who listens to all the messages being broadcasted, and wants to compute another arbitrary function $f_{0}:\{0,1\}\to{\cal Y}$ . The $k$ parties are honest in the sense that once they agree on what protocol to follow, every party follows the rules. At the same time, they can be curious, and each party needs to ensure that other parties cannot learn its bit with sufficient confidence. This is done by imposing local differential privacy constraints. This setting is similar to the one studied in [DJW13, KOV14b] in the sense that there are multiple privacy barriers, each one separating an individual party from the rest of the world. However, the main difference is that we consider multi-party computation, where there are multiple functions to be computed, and each node might possess a different function to be computed.

In the end, each party makes a decision on what the value of function $f_{i}$ is, based on its own bit $x_{i}$ and the transcript $\tau$ that was broadcasted. A decision rule is a mapping from a transcript $\tau\in{\cal T}$ and private bit $x_{i}\in\{0,1\}$ to a decision $y\in{\cal Y}$ represented by a function ${\hat{f}}_{i}(\tau,x_{i})$ . We allow randomized decision rules, in which case ${\hat{f}}_{i}(\tau,x_{i})$ can be a random variable. For the central observer, a decision rule is a function of just the transcript, denoted by a function ${\hat{f}}_{0}(\tau)$ .

for each $i\in[k]$ . The computational cost of $(2^{k}\,|{\cal T}|)$ for computing the optimal decision rule is unavoidable in general, since that is the inherent complexity of the problem: describing the distribution of the transcript requires the same cost. We will show that the optimal protocol requires a set of transcripts of size $|{\cal T}|=2^{k}$ , and the computational complexity of the decision rule for a general function is $2^{2k}$ . However, for a fixed protocol, this decision rule needs to be computed only once before any message is transmitted. Further, it is also possible to find a closed form solution for the decision rule when $f$ has a simple structure. One example is the XOR function where the optimal decision rule is as simple as evaluating the XOR of all the received bits, which requires $O(k)$ operations. When there are multiple maximizers $y$ , we can choose either one of them arbitrarily, and it follows that there is no gain in randomizing the decision rule for average accuracy. Similarly, the worst-case accuracy is defined as

For worst-case accuracy, given a protocol $P$ , the optimal decision rule of the $i$ -th party with a bit $x_{i}$ can be computed by solving the following convex program:

Privacy is measured by approximate differential privacy [Dwo06, DMNS06]. Since we allow heterogeneous privacy constraints, we use $(\varepsilon_{i},\delta_{i})$ to denote the desired privacy level of the $i$ -th party. We say that a protocol $P$ is $(\varepsilon_{i},\delta_{i})$ -differentially private for the $i$ -th party if for $i\in[k]$ , and all $x_{i},x_{i}^{\prime}\in\{0,1\}$ , $x_{-i}\in\{0,1\}^{k-1}$ , and $S\subseteq{\cal T}$ ,

This condition ensures that no adversary can infer the private data $x_{i}$ with high enough confidence, no matter what auxiliary information or computational power she might.

2 Main Result

We show, perhaps surprisingly, that the simple randomized response presented in (24) is the unique optimal protocol in a very general sense. For any desired privacy level $(\varepsilon_{i},\delta_{i})$ , and arbitrary function $f_{i}$ , for any accuracy measure $w_{i}$ , and any notion of accuracy (either average or worst case), we show that the randomized response is universally optimal.

This is a strong optimality result. Every party and the central observer can simultaneously achieve the optimal accuracy, using a universal randomized response. Each party only needs to know its own desired level of privacy, its own function to be computed, and its measure of accuracy. Optimal data release and optimal decision making are naturally separated. It is not immediate at all that such a simple non-interactive randomized response mechanism would achieve the maximum accuracy. The proof critically harnesses the data processing inequalities and is provided in Appendix D.

Proof of Theorem 3.3

We first propose a simple mechanism and prove that the proposed mechanism dominates over all $(\varepsilon,\delta)$ -differentially private mechanisms. Analyzing the privacy region achieved by the $k$ -fold composition of the proposed mechanism, we get a bound on the privacy region under the adaptive composition. This gives an exact characterization of privacy under composition, since we show both converse and achievability. We prove that no other family of mechanisms can achieve ‘more degraded’ privacy (converse), and that there is a mechanism that we propose which achieves the privacy region (achievability).

In particular, the output of this mechanism does not depend on the database $D^{i,b}$ or the query $q_{i}$ , and only depends on the hypothesis $b$ . The privacy region of a single access to this mechanism is ${\cal R}(\varepsilon,\delta)$ in Figure 1. Hence, by Theorem 2.5, all $(\varepsilon,\delta)$ -differentially private mechanisms are dominated by this mechanism.

For a database access mechanism $M$ over a output space ${\cal X}$ and a pair of neighboring databases $D_{0}$ and $D_{1}$ , let $P_{0}$ and $P_{1}$ denote the probability density function for random variables $M(D_{0})$ and $M(D_{1})$ respectively. Assume there exists a permutation $\pi$ over ${\cal X}$ such that $P_{0}(x)=P_{1}(\pi(x))$ . Then, the privacy region is

The symmetry assumption is to simplify notations, and the analysis can be easily generalized to deal with non-symmetric distributions.

for $i\in\{0,\ldots,\lfloor k/2\rfloor\}$ . From Remark 6.1, it follows that the privacy region is ${\cal R}(\{\varepsilon_{i},\delta_{i}\})=\bigcap_{i=0}^{\lfloor k/2\rfloor}{\cal R}\big{(}\varepsilon_{i},\delta_{i}\big{)}$ , where $\varepsilon_{i}=(k-2i)\varepsilon$ and $\delta_{i}$ ’s are defined as in (6). Figure 2 shows this privacy region for $k=1,\ldots,5$ and for $\varepsilon=0.4$ and for two values of $\delta=0$ and $\delta=0.1$ .

2 Converse

We will now prove that this region is the largest region achievable under $k$ -fold adaptive composition of any ( $\varepsilon,\delta$ )-differentially private mechanisms.

where $R$ is any internal randomness of the adversary ${\cal A}$ . Since, $(X,Y)$ – $Z$ – $W$ implies $X$ – $(Y,Z)$ – $W$ , we have

Further, since $X^{i,b}$ is independent of the past conditioned on $b$ , we have

where we used (36) in the first equality and (37) in the second. By induction, we get a decomposition

This finishes the proof of the desired claim.

Alternatively, one can prove (38), using a probabilistic graphical model. Precisely, the following Bayesian network describes the dependencies among various random quantities of the experiment described above. Since the set of nodes $(X^{1,b},X^{2,b},X^{3,b},X^{4,b})$ d-separates node $b$ from the rest of the bayesian network, it follows immediately from the Markov property of this Bayesian network that (38) is true (cf. [Lau96]).

Proof of Theorem 3.4

To find the tangent line, we need to maximize the shift, which is equivalent to moving the line downward until it is tangent to the boundary of the privacy region (cf. Figure 3).

Next, we give an upper bound on the moment generating function of $Z$ .

Proof of Theorem 3.5

Using the similar notations as Section 7, it follows that under $k$ -fold composition,

Proofs

Consider hypothesis testing between $D_{1}$ and $D_{2}$ . If there is a point $(P_{\rm MD},P_{\rm FA})$ achieved by $M^{\prime}$ but not by $M$ , then we claim that this is a contradiction to the assumption that $D$ – $X$ – $Y$ form a Markov chain. Consider a decision maker who have only access to the output of $M$ . Under the Markov chain assumption, he can simulate the output of $M^{\prime}$ by generating a random variable $Y$ conditioned on $M(D)$ and achieve every point in the privacy region of $M^{\prime}$ (cf. Theorem 2.2). Hence, the privacy region of $M^{\prime}$ must be included in the privacy region of $M$ .

2 Proof of Theorem 2.1

3 Proof of Remark 2.2

We have a decision rule $\gamma$ represented by a partition $\{S_{i}\}_{i\in\{1,\ldots,N\}}$ and corresponding accept probabilities $\{p_{i}\}_{i\in\{1,\ldots,N\}}$ , such that if the output is in a set $S_{i}$ , we accept with probability $p_{i}$ . We assume the subsets are sorted such that $1\geq p_{1}\geq\ldots\geq p_{N}\geq 0$ . Then, the probability of false alarm is

where we used $p_{0}=1$ and $p_{N+1}=0$ , and hence it is included in the convex hull of the privacy region achieved by decision rules with hard thresholding.

Acknowledgement

The authors thank Maxim Raginsky for helpful discussions and for pointing out [Bla53], and Moritz Hardt for pointing out an error in an earlier version of this paper. This research is supported in part by NSF CISE award CCF-1422278, NSF SaTC award CNS-1527754, NSF CMMI award MES-1450848 and NSF ENG award ECCS-1232257.

Appendix A Examples illustrating the strengths of graphical representation of differential privacy

From Figure 1 it follows immediately that the total variation distance cannot be larger than $\delta+(1-\delta)(e^{\varepsilon}-1)/(e^{\varepsilon}+1)$ . $\Box$

Appendix B Analysis of the Gaussian mechanism in Theorem 4.3

Following the analysis in Section 7, we know that the privacy region of a composition of mechanisms is described by a set of $(\varepsilon,\delta)$ pairs that satisfy the following:

In the case of Gaussian mechanisms, we can assume without loss of generality that $D_{0}$ is such that $q_{i}(D_{0})=0$ and $D_{1}$ is such that $q_{i}(D_{1})=\Delta$ for all $i\in\{1,\ldots,k\}$ . When adding Gaussian noises with variances $\sigma^{2}$ , we want to ask how small the variance can be and still ensure $(\varepsilon,\delta)$ -differential privacy under $k$ -fold composition.

Let $f_{0}^{k}(x_{1},\ldots,x_{k})=\prod_{i=1}^{k}f_{0}(x_{i})=(1/\sqrt{2\pi\sigma^{2}})^{k}e^{-\sum_{i=1}^{k}x_{i}^{2}/2\sigma^{2}}$ and $f_{1}^{k}(x_{1}\ldots,x_{k})=\prod_{i=1}^{k}f_{1}(x_{i})=(1/\sqrt{2\pi\sigma^{2}})^{k}e^{-\sum_{i=1}^{k}(x_{i}-\Delta)^{2}/2\sigma^{2}}$ be the probability density functions of Gaussians centered at zero and $\Delta{\mathds{1}}_{k}$ respectively. Using a similar technique as in (41), we know that

Next, we give an upper bound on the moment generating function of $Z$ .

for any $\lambda\geq 0$ . Substituting this into (47) with a choice of $\lambda=\frac{\sigma^{2}}{k\Delta^{2}}\big{(}\varepsilon-\frac{k\Delta^{2}}{2\sigma^{2}}\big{)}$ , which is positive for $\varepsilon>k\Delta^{2}/2\sigma^{2}$ , we get

for our choice of $\sigma^{2}$ such that $\varepsilon\geq k\Delta^{2}/(2\sigma^{2})+\sqrt{(2k\Delta^{2}/\sigma^{2})\log(e+(1/\delta)\sqrt{k\Delta^{2}/\sigma^{2}})}$ . The right-hand side is always less than $\delta$ .

With $\sigma^{2}\geq(4k\Delta^{2}/\varepsilon^{2})\log(e+(\varepsilon/\delta))$ and $\sigma^{2}\geq k\Delta^{2}/(4\varepsilon)$ , this ensures that the above condition is satisfied. This implies that we only need $\sigma^{2}=O((k\Delta^{2}/\varepsilon^{2})\log(e+(\varepsilon/\delta)))$ .

Appendix C Analysis of the geometric mechanism in Theorem 4.4

It now follows from the proof of Theorem 3.3 that the $k$ -fold composition privacy region is exactly the optimal privacy region described in (5) with $\delta=0$ . We also know that this is the largest possible privacy region achieved by a class of $(\varepsilon,0)$ -differentially private mechanisms.

Blocki et. al. [BBDS12] showed that classical Johnson-Lindenstrauss transform can be used to produce a differentially private version of a database. Further, they show that this achieves the best tradeoff between privacy and utility for two applications: cut queries of a graph and variance queries of a matrix. In this section, we show how the best known trade off can be further improved by applying Theorem 3.4.

First, Blocki et. al. provide a differentially private mechanism for cut queries $q(G,S)$ : the number of edges crossing a ( $S,{\bar{S}}$ )-cut in a weighted undirected graph $G$ . This mechanism produces a sanitized graph satisfying $(\varepsilon,\delta)$ -differential privacy, where two graphs are neighbors if they only differ on a single edge. The utility of the mechanism is measured via the additive error $\tau$ incurred by the privatization. Precisely, a mechanism $M$ is said to give a $(\eta,\tau,\nu)$ -approximation for a single cut query $q(\cdot,\cdot)$ , if for every graph $G$ and every nonempty $S$ it holds that

For the proposed Johnson-Lindenstrauss mechanism satisfying $(\varepsilon,\delta$ )-differential privacy, it is shown that the additive error $\tau_{0}$ incurred by querying the database $k$ times is bounded by [BBDS12, Theorem 3.2]The original theorem is stated for a single query with $k=1$ . Here we state it more generally with arbitrary $k$ . This requires scaling $\nu$ by $1/k$ to take into account the union bound over $k$ query outputs in the utility guarantee in (48).

Compared to other state-of-the-art privacy mechanisms such as the Laplace noise adding mechanism [Dwo06], Exponential mechanism [MT07], Multiplicative weights [HR10], and Iterative Database Construction [GRU12], it is shown in [BBDS12] that the Johnson-Lindenstrauss mechanism achieves the best tradeoff between the additive error $\tau_{0}$ and the privacy $\varepsilon$ . This tradeoff in (49) is proved using the existing Theorem 3.2. We can improve this analysis using the optimal composition theorem of Theorem 3.4, which gives

This is smaller than (49) by (a square root of) a logarithmic factor when $\varepsilon=\Theta(\delta)$ . The proof of the analysis in (50) is provided below.

A similar technique has been used in [BBDS12] to provide a differentially private mechanism for variance queries $v(A,x)=x^{T}A^{T}Ax$ : the variance of a given matrix in a direction $x$ . The proposed mechanism produces a sanitized covariance matrix that satisfy $(\varepsilon,\delta)$ -differential privacy, where two matrices are neighbors if they differ only in a single row and the difference is by Euclidean distance at most one. With the previous composition theorem in Theorem 3.2, the authors of [BBDS12] get an error bound $\tau_{1}=O\Big{(}\,\frac{\log(1/\delta)\log(k/\nu)}{\varepsilon^{2}\eta}\log^{2}\Big{(}\frac{\log(k/\nu)}{\eta^{2}\delta}\Big{)}\,\Big{)}.$ Using our tight composition theorem, this can be improved as $\tau=O\Big{(}\,\frac{\log(e+\varepsilon/\delta)\log(k/\nu)}{\varepsilon^{2}\eta}\log^{2}\Big{(}\frac{\log(k/\nu)}{\eta^{2}\delta}\Big{)}\,\Big{)}.$ Again, for $\varepsilon=\Theta(\delta)$ , this is an improvement of a logarithmic factor.

For cut queries, Johnson-Lindenstrauss mechanism proceeds as follows:

each row of $NE_{G}$ satisfy $(\varepsilon_{0},\delta_{0})$ -differential privacy, and the resulting Johnson-Lindenstrauss mechanism satisfy $(\eta,\tau,\nu)$ -approximation guarantee with

where $|S|$ is the size of the smaller partition $S$ of the cut $(S,{\bar{S}})$ .

The error bound in (49) follows from choosing

and applying Theorem 3.2 to ensure that the resulting mechanism with $r$ -composition of the $r$ rows of $ME_{G}$ is $(\varepsilon,\delta)$ -differentially private. Here it is assumed that $\varepsilon<1$ .

Now, with Theorem 3.3, we do not require $\varepsilon_{0}$ to be as small, which in turn allows us to add smaller noise $w$ , giving us an improved error bound on $\tau$ . Precisely, using Theorem 3.4 it follows that a choice of

suffices to ensure that after $r$ -composition we get $(\varepsilon,\delta)$ -differential privacy. Resulting noise is bounded by $w\leq 4\sqrt{4r\log(e+2\varepsilon/\delta)}\log(4r/\delta)/\varepsilon$ , which gives the error bound in (50). The proof follows analogously for the matrix variance queries.

Appendix D Proof of Theorem 5.1

For any protocol that generates a random transcript $\tau$ , there exists a stochastic transformation $T$ such that the joint distribution of the bits and the transcript can be simulated from the randomized outputs:

by the fact that $\tau$ is $(\varepsilon_{1},\delta_{1})$ -differentially private and Corollary 2.3. Next, notice that by construction, the randomized response achieves this outer bound, i.e.

Next, notice that by construction, the randomized response achieves this outer bound, i.e.