Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds

Mark Bun, Thomas Steinke

Introduction

Differential privacy [DMNS06] is a formal mathematical standard for protecting individual-level privacy in statistical data analysis. In its simplest form, (pure) differential privacy is parameterized by a real number ε>0\varepsilon>0, which controls how much “privacy loss”The privacy loss is a random variable which quantifies how much information is revealed about an individual by a computation involving their data; it depends on the outcome of the computation, the way the computation was performed, and the information that the individual wants to hide. We discuss it informally in this introduction and define it precisely in Definition 1.2 on page 1.2. an individual can suffer when a computation (i.e., a statistical data analysis task) is performed involving his or her data.

One particular hallmark of differential privacy is that it degrades smoothly and predictably under the composition of multiple computations. In particular, if one performs kk computational tasks that are each ε\varepsilon-differentially private and combines the results of those tasks, then the computation as a whole is kεk\varepsilon-differentially private. This property makes differential privacy amenable to the type of modular reasoning used in the design and analysis of algorithms: When a sophisticated algorithm is comprised of a sequence of differentially private steps, one can establish that the algorithm as a whole remains differentially private.

A widely-used relaxation of pure differential privacy is approximate or (ε,δ)(\varepsilon,\delta)-differential privacy [DKM+06], which essentially guarantees that the probability that any individual suffers privacy loss exceeding ε\varepsilon is bounded by δ\delta. For sufficiently small δ\delta, approximate (ε,δ)(\varepsilon,\delta)-differential privacy provides a comparable standard of privacy protection as pure ε\varepsilon-differential privacy, while often permitting substantially more useful analyses to be performed.

Unfortunately, there are situations where, unlike pure differential privacy, approximate differential privacy is not a very elegant abstraction for mathematical analysis, particularly the analysis of composition. The “advanced composition theorem” of Dwork, Rothblum, and Vadhan [DRV10] (subsequently improved by [KOV15, MV16]) shows that the composition of kk tasks which are each (ε,δ)(\varepsilon,\delta)-differentially private is ( ⁣ ⁣kε, ⁣ ⁣kδ)(\approx\!\!\sqrt{k}\varepsilon,\approx\!\!k\delta)-differentially private. However, these bounds can be unwieldy; computing the tightest possible privacy guarantee for the composition of kk arbitrary mechanisms with differing (εi,δi)(\varepsilon_{i},\delta_{i})-differential privacy guarantees is #P\#\mathsf{P}-hard [MV16]! Furthermore, these bounds are not tight even for simple and natural privacy-preserving computations. For instance, consider the mechanism which approximately answers kk statistical queries on a given database by adding independent Gaussian noise to each answer. Even for this basic computation, the advanced composition theorem does not yield a tight analysis.In particular, consider answering kk statistical queries on a dataset of nn individuals by adding noise drawn from N(0,(σ/n)2)\mathcal{N}(0,(\sigma/n)^{2}) independently for each query. Each individual query satisfies (O(log(1/δ)/σ),δ)(O(\sqrt{\log(1/\delta)}/\sigma),\delta)-differential privacy for any δ>0\delta>0. Applying the advanced composition theorem shows that the composition of all kk queries satisfies (O(klog(1/δ)/σ),(k+1)δ)(O(\sqrt{k}\log(1/\delta)/\sigma),(k+1)\delta)-differential privacy for any δ>0\delta>0. However, it is well-known that this bound can be improved to (O(klog(1/δ)/σ),δ)(O(\sqrt{k\log(1/\delta)}/\sigma),\delta)-differential privacy.

Dwork and Rothblum [DR16] recently put forth a different relaxation of differential privacy called concentrated differential privacy. Roughly, a randomized mechanism satisfies concentrated differentially privacy if the privacy loss has small mean and is subgaussian. Concentrated differential privacy behaves in a qualitatively similar way as approximate (ε,δ)(\varepsilon,\delta)-differential privacy under composition. However, it permits sharper analyses of basic computational tasks, including a tight analysis of the aforementioned Gaussian mechanism.

Using the work of Dwork and Rothblum [DR16] as a starting point, we introduce an alternative formulation of the concept of concentrated differential privacy that we call “zero-concentrated differential privacy” (zCDP for short). To distinguish our definition from that of Dwork and Rothblum, we refer to their definition as “mean-concentrated differential privacy” (mCDP for short). Our definition uses the Rényi divergence between probability distributions as a different method of capturing the requirement that the privacy loss random variable is subgaussian.

As is typical in the literature, we model a dataset as a multiset or tuple of nn elements (or “rows”) in Xn\mathcal{X}^{n}, for some “data universe” X\mathcal{X}, where each element represents one individual’s information. A (privacy-preserving) computation is a randomized algorithm M:XnYM:\mathcal{X}^{n}\to\mathcal{Y}, where Y\mathcal{Y} represents the space of all possible outcomes of the computation.

A randomised mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} is (ξ,ρ)(\xi,\rho)-zero-concentrated differentially private (henceforth (ξ,ρ)(\xi,\rho)-zCDP) if, for all x,xXnx,x^{\prime}\in\mathcal{X}^{n} differing on a single entry and all α(1,)\alpha\in(1,\infty),

We define ρ\rho-zCDP to be (0,ρ)(0,\rho)-zCDP.For clarity of exposition, we consider only ρ\rho-zCDP in the introduction and give more general statements for (ξ,ρ)(\xi,\rho)-zCDP later. We also believe that having a one-parameter definition is desirable.

where Z=PrivLoss(M(x)M(x))Z=\mathsf{PrivLoss}\left(M(x)\middle\|M(x^{\prime})\right) is the privacy loss random variable:

Intuitively, the value of the privacy loss Z=PrivLoss(M(x)M(x))Z=\mathsf{PrivLoss}\left(M(x)\middle\|M(x^{\prime})\right) represents how well we can distinguish xx from xx^{\prime} given only the output M(x)M(x) or M(x)M(x^{\prime}). If Z>0Z>0, then the observed output of MM is more likely to have occurred if the input was xx than if xx^{\prime} was the input. Moreover, the larger ZZ is, the bigger this likelihood ratio is. Likewise, Z<0Z<0 indicates that the output is more likely if xx^{\prime} is the input. If Z=0Z=0, both xx and xx^{\prime} “explain” the output of MM equally well.

for all λ>0\lambda>0.We only discuss bounds on the upper tail of ZZ. We can obtain similar bounds on the lower tail of Z=PrivLoss(M(x)M(x))Z=\mathsf{PrivLoss}\left(M(x)\middle\|M(x^{\prime})\right) by considering Z=PrivLoss(M(x)M(x))Z^{\prime}=\mathsf{PrivLoss}\left(M(x^{\prime})\middle\|M(x)\right).

Thus zCDP requires that the privacy loss random variable is concentrated around zero (hence the name). That is, ZZ is “small” with high probability, with larger deviations from zero becoming increasingly unlikely. Hence we are unlikely to be able to distinguish xx from xx^{\prime} given the output of M(x)M(x) or M(x)M(x^{\prime}). Note that the randomness of the privacy loss random variable is taken only over the randomnesss of the mechanism MM.

For comparison, Dwork and Rothblum [DR16] define (μ,τ)(\mu,\tau)-concentrated differential privacy for a randomized mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} as the requirement that, if Z=PrivLoss(M(x)M(x))Z=\mathsf{PrivLoss}\left(M(x)\middle\|M(x^{\prime})\right) is the privacy loss for x,xXnx,x^{\prime}\in\mathcal{X}^{n} differing on one entry, then

Our definition, zCDP, is a relaxation of mCDP. In particular, a (μ,τ)(\mu,\tau)-mCDP mechanism is also (μτ2/2,τ2/2)(\mu-\tau^{2}/2,\tau^{2}/2)-zCDP (which is tight for the Gaussian mechanism example), whereas the converse is not true. (However, a partial converse holds; see Lemma 4.3.)

2 Results

Like Dwork and Rothblum’s formulation of concentrated differential privacy, zCDP can be thought of as providing guarantees of (ε,δ)(\varepsilon,\delta)-differential privacy for all values of δ>0\delta>0:

If MM provides ρ\rho-zCDP, then MM is (ρ+2ρlog(1/δ),δ)(\rho+2\sqrt{\rho\log(1/\delta)},\delta)-differentially private for any δ>0\delta>0.

We also prove a slight strengthening of this result (Lemma 3.6). Moreover, there is a partial converse, which shows that, up to a loss in parameters, zCDP is equivalent to differential privacy with this δ>0\forall\delta>0 quantification (see Lemma 3.7).

There is also a direct link from pure differential privacy to zCDP:

If MM satisfies ε\varepsilon-differential privacy, then MM satisfies (12ε2)(\frac{1}{2}\varepsilon^{2})-zCDP.

Dwork and Rothblum [DR16, Theorem 3.5] give a slightly weaker version of Proposition 1.4, which implies that ε\varepsilon-differential privacy yields (12ε(eε1))(\frac{1}{2}\varepsilon(e^{\varepsilon}-1))-zCDP; this improves on an earlier bound [DRV10] by the factor 12\frac{1}{2}.

We give proofs of these and other properties using properties of Rényi divergence in Sections 2 and 3.

Propositions 1.3 and 1.4 show that zCDP is an intermediate notion between pure differential privacy and approximate differential privacy. Indeed, many algorithms satisfying approximate differential privacy do in fact also satisfy zCDP.

2.2 Gaussian Mechanism

Just as with mCDP, the prototypical example of a mechanism satisfying zCDP is the Gaussian mechanism, which answers a real-valued query on a database by perturbing the true answer with Gaussian noise.

We remark that either inequality defining zCDP — (1) or (2) — is exactly tight for the Gaussian mechanism for all values of α\alpha. Thus the definition of zCDP seems tailored to the Gaussian mechanism.

2.3 Basic Properties of zCDP

Our definition of zCDP satisfies the key basic properties of differential privacy. Foremost, these properties include smooth degradation under composition, and invariance under postprocessing:

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} and M:XnZM^{\prime}:\mathcal{X}^{n}\to\mathcal{Z} be randomized algorithms. Suppose MM satisfies ρ\rho-zCDP and MM^{\prime} satisfies ρ\rho^{\prime}-zCDP. Define M:XnY×ZM^{\prime\prime}:\mathcal{X}^{n}\to\mathcal{Y}\times\mathcal{Z} by M(x)=(M(x),M(x))M^{\prime\prime}(x)=(M(x),M^{\prime}(x)). Then MM^{\prime\prime} satisfies (ρ+ρ)(\rho+\rho^{\prime})-zCDP.

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} and f:YZf:\mathcal{Y}\to\mathcal{Z} be randomized algorithms. Suppose MM satisfies ρ\rho-zCDP. Define M:XnZM^{\prime}:\mathcal{X}^{n}\to\mathcal{Z} by M(x)=f(M(x))M^{\prime}(x)=f(M(x)). Then MM^{\prime} satisfies ρ\rho-zCDP.

These properties follow immediately from corresponding properties of the Rényi divergence outlined in Lemma 2.2.

We remark that Dwork and Rothblum’s definition of mCDP is not closed under postprocessing; we provide a counterexample in Appendix A. (However, an arbitrary amount of postprocessing can worsen the guarantees of mCDP by at most constant factors.)

2.4 Group Privacy

A mechanism MM guarantees group privacy if no small group of individuals has a significant effect on the outcome of a computation (whereas the definition of zCDP only refers to individuals, which are groups of size 11). That is, group privacy for groups of size kk guarantees that, if xx and xx^{\prime} are inputs differing on kk entries (rather than a single entry), then the outputs M(x)M(x) and M(x)M(x^{\prime}) are close.

Dwork and Rothblum [DR16, Theorem 4.1] gave nearly tight bounds on the group privacy guarantees of concentrated differential privacy, showing that a (μ=τ2/2,τ)(\mu=\tau^{2}/2,\tau)-concentrated differentially private mechanism affords (k2μ(1+o(1)),kτ(1+o(1)))(k^{2}\mu\cdot(1+o(1)),k\tau\cdot(1+o(1)))-concentrated differential privacy for groups of size k=o(1/τ)k=o(1/\tau). We are able to show a group privacy guarantee for zCDP that is exactly tight and works for a wider range of parameters:

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy ρ\rho-zCDP. Then MM guarantees (k2ρ)(k^{2}\rho)-zCDP for groups of size kk — i.e. for every x,xXnx,x^{\prime}\in\mathcal{X}^{n} differing in up to kk entries and every α(1,)\alpha\in(1,\infty), we have

In particular, this bound is achieved (simultaneously for all values α\alpha) by the Gaussian mechanism. Our proof is also simpler than that of Dwork and Rothblum; see Section 5.

2.5 Lower Bounds

The strong group privacy guarantees of zCDP yield, as an unfortunate consequence, strong lower bounds as well. We show that, as with pure differential privacy, zCDP is susceptible to information-based lower bounds, as well as to so-called packing arguments [HT10, MMP+10, De12]:

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy ρ\rho-zCDP. Let XX be a random variable on Xn\mathcal{X}^{n}. Then

where I(;)I(\cdot;\cdot) denotes the mutual information between the random variables (in nats, rather than bits). Furthermore, if the entries of XX are independent, then I(X;M(X))ρnI(X;M(X))\leq\rho\cdot n.

Theorem 1.10 yields strong lower bounds for zCDP mechanisms, as we can construct distributions XX such that, for any accurate mechanism MM, M(X)M(X) reveals a lot of information about XX (i.e. I(X;M(X))I(X;M(X)) is large for any accurate MM).

In particular, we obtain a strong separation between approximate differential privacy and zCDP. For example, we can show that releasing an accurate approximate histogram (or, equivalently, accurately answering all point queries) on a data domain of size kk requires an input with at least n=Θ(logk)n=\Theta(\sqrt{\log k}) entries to satisfy zCDP. In contrast, under approximate differential privacy, nn can be independent of the domain size kk [BNS13]! In particular, our lower bounds show that “stability-based” techniques (such as those in the propose-test-release framework [DL09]) are not compatible with zCDP.

Our lower bound exploits the strong group privacy guarantee afforded by zCDP. Group privacy has been used to prove tight lower bounds for pure differential privacy [HT10, De12] and approximate differential privacy [SU15a]. These results highlight the fact that group privacy is often the limiting factor for private data analysis. For (ε,δ)(\varepsilon,\delta)-differential privacy, group privacy becomes vacuous for groups of size k=Θ(log(1/δ)/ε)k=\Theta(\log(1/\delta)/\varepsilon). Indeed, stability-based techniques exploit precisely this breakdown in group privacy.

As a result of this strong lower bound, we show that any mechanism for answering statistical queries that satisfies zCDP can be converted into a mechanism satisfying pure differential privacy with only a quadratic blowup in its sample complexity. More precisely, the following theorem illustrates a more general result we prove in Section 7.

for all xXnx\in\mathcal{X}^{n}. Then there exists M:XnkM^{\prime}:\mathcal{X}^{n^{\prime}}\to^{k} for n=5n2n^{\prime}=5n^{2} satisfying ε\varepsilon-differential privacy and

For some classes of queries, this reduction is essentially tight. For example, for kk one-way marginals, the Gaussian mechanism achieves sample complexity n=Θ(k)n=\Theta(\sqrt{k}) subject to zCDP, whereas the Laplace mechanism achieves sample complexity n=Θ(k)n=\Theta(k) subject to pure differential privacy, which is known to be optimal.

2.6 Approximate zCDP

To circumvent these strong lower bounds for zCDP, we consider a relaxation of zCDP in the spirit of approximate differential privacy that permits a small probability δ\delta of (catastrophic) failure:

where M(x)EM(x)|_{E} denotes the distribution of M(x)M(x) conditioned on the event EE. We further define δ\delta-approximate ρ\rho-zCDP to be δ\delta-approximate (0,ρ)(0,\rho)-zCDP.

In particular, setting δ=0\delta=0 gives the original definition of zCDP. However, this definition unifies zCDP with approximate differential privacy:

If MM satisfies (ε,δ)(\varepsilon,\delta)-differential privacy, then MM satisfies δ\delta-approximate 12ε2\frac{1}{2}\varepsilon^{2}-zCDP.

Approximate zCDP retains most of the desirable properties of zCDP, but allows us to incorporate stability-based techniques and bypass the above lower bounds. This also presents a unified tool to analyse a composition of zCDP with approximate differential privacy; see Section 8.

3 Related Work

Our work builds on the aforementioned prior work of Dwork and Rothblum [DR16].Although Dwork and Rothblum’s work only appeared publicly in March 2016, they shared a preliminary draft of their paper with us before we commenced this work. As such, our ideas are heavily inspired by theirs. We view our definition of concentrated differential privacy as being “morally equivalent” to their definition of concentrated differential privacy, in the sense that both definitions formalize the same concept.We refer to our definition as “zero-concentrated differential privacy” (zCDP) and their definition as “mean-concentrated differential privacy” (mCDP). We use “concentrated differential privacy” (CDP) to refer to the underlying concept formalized by both definitions. (The formal relationship between the two definitions is discussed in Section 4.) However, the definition of zCDP generally seems to be easier to work with than that of mCDP. In particular, our formulation in terms of Rényi divergence simplifies many analyses.

Dwork and Rothblum prove several results about concentrated differential privacy that are similar to ours. Namely, they prove analogous properties of mCDP as we prove for zCDP (cf. Sections 1.2.1, 1.2.2, 1.2.3, and 1.2.4). However, as noted, some of their bounds are weaker than ours; also, they do not explore lower bounds.

Several of the ideas underlying concentrated differential privacy are implicit in earlier works. In particular, the proof of the advanced composition theorem of Dwork, Rothblum, and Vadhan [DRV10] essentially uses the ideas of concentrated differential privacy. Their proof contains analogs of Propositions 1.7, 1.3, and 1.4,.

We also remark that Tardos [Tar08] used Rényi divergence to prove lower bounds for cryptographic objects called fingerprinting codes. Fingerprinting codes turn out to be closely related to differential privacy [Ull13, BUV14, SU15b], and Tardos’ lower bound can be (loosely) viewed as a kind of privacy-preserving algorithm.

4 Further Work

We believe that concentrated differential privacy is a useful tool for analysing private computations, as it provides both simpler and tighter bounds. We hope that CDP will be prove useful in both the theory and practice of differential privacy.

Furthermore, our lower bounds show that CDP can really be a much more stringent condition than approximate differential privacy. Thus CDP defines a “subclass” of all (ε,δ)(\varepsilon,\delta)-differentially private algorithms. This subclass includes most differentially private algorithms in the literature, but not all — the most notable exceptions being algorithms that use the propose-test-release approach [DL09] to exploit low local sensitivity.

This “CDP subclass” warrants further exploration. In particular, is there a “complete” mechanism for this class of algorithms, in the same sense that the exponential mechanism [MT07, BLR13] is complete for pure differential privacy? Can we obtain a simple characterization of the sample complexity needed to satisfy CDP? The ability to prove stronger and simpler lower bounds for CDP than for approximate DP may be useful for showing the limitations of certain algorithmic paradigms. For example, any differentially private algorithm that only uses the Laplace mechanism, the exponential mechanism, the Gaussian mechanism, and the “sparse vector” technique, along with composition and postprocessing will be subject to the lower bounds for CDP.

There is also room to examine how to interpret the zCDP privacy guarantee. In particular, we leave it as an open question to understand the extent to which ρ\rho-zCDP provides a stronger privacy guarantee than the implied (ε,δ)(\varepsilon,\delta)-DP guarantees (cf. Proposition 1.3).

In general, much of the literature on differential privacy can be re-examined through the lens of CDP, which may yield new insights and results.

Rényi Divergence

Recall the definition of Rényi divergence:

Let PP and QQ be probability distributions on Ω\Omega. For α(1,)\alpha\in(1,\infty), we define the Rényi divergence of order α\alpha between PP and QQ as

Alternatively, Rényi divergence can be defined in terms of the privacy loss (Definition 1.2) between PP and QQ:

We record several useful and well-known properties of Rényi divergence. We refer the reader to [vEH14] for proofs and discussion of these (and many other) properties. Self-contained proofs are given in Appendix B.1.

Let PP and QQ be probability distributions and α[1,]\alpha\in[1,\infty].

Composition: Suppose PP and QQ are distributions on Ω×Θ\Omega\times\Theta. Let PP^{\prime} and QQ^{\prime} denote the marginal distributions on Ω\Omega induced by PP and QQ respectively. For xΩx\in\Omega, let PxP^{\prime}_{x} and QxQ^{\prime}_{x} denote the conditional distributions on Θ\Theta induced by PP and QQ respectively, where xx specifies the first coordinate. Then

In particular if PP and QQ are product distributions, then the Rényi divergence between PP and QQ is just the sum of the Rényi divergences of the marginals.

Note that quasi-convexity allows us to extend this guarantee to the case where ff is a randomized mapping.

The following lemma gives the postprocessing and (adaptive) composition bounds (extending Lemmas 1.7 and 1.8).

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} and M:Xn×YZM^{\prime}:\mathcal{X}^{n}\times\mathcal{Y}\to\mathcal{Z}. Suppose MM satisfies (ξ,ρ)(\xi,\rho)-zCDP and MM^{\prime} satisfies (ξ,ρ)(\xi^{\prime},\rho^{\prime})-zCDP (as a function of its first argument). Define M:XnZM^{\prime\prime}:\mathcal{X}^{n}\to\mathcal{Z} by M(x)=M(x,M(x))M^{\prime\prime}(x)=M^{\prime}(x,M(x)). Then MM^{\prime\prime} satisfies (ξ+ξ,ρ+ρ)(\xi+\xi^{\prime},\rho+\rho^{\prime})-zCDP.

The proof is immediate from Lemma 2.2. Note that, while Lemma 2.3 is only stated for the composition of two mechanisms, it can be inductively applied to analyse the composition of arbitrarily many mechanisms.

2 Gaussian Mechanism

The following lemma gives the Rényi divergence between two Gaussian distributions with the same variance.

Consequently, the Gaussian mechanism, which answers a sensitivity-Δ\Delta query by adding noise drawn from N(0,σ2)\mathcal{N}(0,\sigma^{2}), satisfies (Δ22σ2)\left(\frac{\Delta^{2}}{2\sigma^{2}}\right)-zCDP (Proposition 1.6).

For the multivariate Gaussian mechanism, Lemma 2.4 generalises to the following.

Relation to Differential Privacy

We now discuss the relationship between zCDP and the traditional definitions of pure and approximate differential privacy. There is a close relationship between the notions, but not an exact characterization.

For completeness, we state the definition of differential privacy:

A randomized mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (ε,δ)(\varepsilon,\delta)-differential privacy if, for all x,xXx,x^{\prime}\in\mathcal{X} differing in a single entry, we have

for all (measurable) SYS\subset\mathcal{Y}. Further define ε\varepsilon-differential privacy to be (ε,0)(\varepsilon,0)-differential privacy.

Pure differential privacy is exactly characterized by (ξ,0)(\xi,0)-zCDP:

A mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies ε\varepsilon-DP if and only if it satisfies (ε,0)(\varepsilon,0)-zCDP.

for all α\alpha. So MM satisfies (ε,0)(\varepsilon,0)-zCDP. Conversely, suppose MM satisfies (ε,0)(\varepsilon,0)-zCDP. Then

We now show that ε\varepsilon-differential privacy implies (12ε2)(\frac{1}{2}\varepsilon^{2})-zCDP (Proposition 1.4).

We may assume 12εα1\frac{1}{2}\varepsilon\alpha\leq 1, as otherwise 12ε2α>ε\frac{1}{2}\varepsilon^{2}\alpha>\varepsilon, whence the result follows from monotonicity. We must show that

The result now follows from the following inequality, which is proved in Lemma B.1.

2 Approximate DP versus zCDP

The statements in this section show that, up to some loss in parameters, zCDP is equivalent to a family of (ε,δ)(\varepsilon,\delta)-DP guarantees for all δ>0\delta>0.

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy (ξ,ρ)(\xi,\rho)-zCDP. Then MM satisfies (ε,δ)(\varepsilon,\delta)-DP for all δ>0\delta>0 and

Thus to achieve a given (ε,δ)(\varepsilon,\delta)-DP guarantee it suffices to satisfy (ξ,ρ)(\xi,\rho)-zCDP with

Choosing α=(εξ+ρ)/2ρ>1\alpha=(\varepsilon-\xi+\rho)/2\rho>1 gives

Now, for any measurable SYS\subset\mathcal{Y},

Lemma 3.5 is not tight. In particular, we have the following refinement of Lemma 3.5, the proof of which is deferred to the appendix (Lemma B.2).

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy (ξ,ρ)(\xi,\rho)-zCDP. Then MM satisfies (ε,δ)(\varepsilon,\delta)-DP for all δ>0\delta>0 and

Alternatively MM satisfies (ε,δ)(\varepsilon,\delta)-DP for all εξ+ρ\varepsilon\geq\xi+\rho and

Note that the last of three options in the minimum dominates the first two options. We have included the first two options as they are simpler.

Now we show a partial converse to Lemma 3.5, which is proved in the appendix (Lemma B.3).

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy (ε,δ)(\varepsilon,\delta)-DP for all δ>0\delta>0 and

for some constants ξ^,ρ^\hat{\xi},\hat{\rho}\in. Then MM is (ξ^14ρ^+5ρ^,14ρ^)\left(\hat{\xi}-\frac{1}{4}\hat{\rho}+5\sqrt{\hat{\rho}},\frac{1}{4}\hat{\rho}\right)-zCDP.

Thus zCDP and DP are equivalent up to a (potentially substantial) loss in parameters and the quantification over all δ\delta.

Zero- versus Mean-Concentrated Differential Privacy

We begin by stating the definition of mean-concentrated differential privacy:

A randomized mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (μ,τ)(\mu,\tau)-mean-concentrated differential privacy if, for all x,xXnx,x^{\prime}\in\mathcal{X}^{n} differing in one entry, and letting Z=PrivLoss(M(x)M(x))Z=\mathsf{PrivLoss}\left(M(x)\middle\|M(x^{\prime})\right), we have

If M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (μ,τ)(\mu,\tau)-mCDP, then MM satisfies (μτ2/2,τ2/2)(\mu-\tau^{2}/2,\tau^{2}/2)-zCDP.

If M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (ξ,ρ)(\xi,\rho)-zCDP, then MM satisfies (ξ+ρ,O(ξ+2ρ))(\xi+\rho,O(\sqrt{\xi+2\rho}))-mCDP.

The proof of Lemma 4.3 is deferred to the appendix.

Thus we can convert (μ,τ)(\mu,\tau)-mCDP into (μτ2/2,τ2/2)(\mu-\tau^{2}/2,\tau^{2}/2)-zCDP and then back to (μ,O(μ+τ2/2))(\mu,O(\sqrt{\mu+\tau^{2}/2}))-mCDP. This may result in a large loss in parameters, which is why, for example, pure DP can be characterised in terms of zCDP, but not in terms of mCDP.

We view zCDP as a relaxation of mCDP; mCDP requires the privacy loss to be “tightly” concentrated about its mean and that the mean is close to the origin. The triangle inequality then implies that the privacy loss is “weakly” concentrated about the origin. (The difference between “tightly” and “weakly” accounts for the use of the triangle inequality.) On the other hand, zCDP direcly requires that the privacy loss is weakly concentrated about the origin. That is to say, zCDP gives a subgaussian bound on the privacy loss that is centered at zero, whereas mCDP gives a subgaussian bound that is centered at the mean and separately bounds the mean.

There may be some advantage to the stronger requirement of mCDP, either in terms of what kind of privacy guarantee it affords, or how it can be used as an analytic tool. However, it seems that for most applications, we only need what zCDP provides.

Group Privacy

In this section we show that zCDP provides privacy protections to small groups of individuals.

We say that a mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} provides (ξ,ρ)(\xi,\rho)-zCDP for groups of size kk if, for every x,xXnx,x^{\prime}\in\mathcal{X}^{n} differing in at most kk entries, we have

The usual definition of zCDP only applies to groups of size 11. Here we show that it implies bounds for all group sizes. We begin with a technical lemma.

Let PP, QQ, and RR be probability distributions. Then

Let p=kα1α(k1)p=\frac{k\alpha-1}{\alpha(k-1)} and q=kα1α1q=\frac{k\alpha-1}{\alpha-1}. Then 1p+1q=α(k1)+(α1)kα1=1\frac{1}{p}+\frac{1}{q}=\frac{\alpha(k-1)+(\alpha-1)}{k\alpha-1}=1. By Hölder’s inequality,

Now pα=kα1k1p\alpha=\frac{k\alpha-1}{k-1}, q(α1)+1=kαq(\alpha-1)+1=k\alpha, and

If M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (ξ,ρ)(\xi,\rho)-zCDP, then MM gives (ξki=1k1i,ρk2)(\xi\cdot k\sum_{i=1}^{k}\frac{1}{i},\rho\cdot k^{2})-zCDP for groups of size kk.

Thus (ξ,ρ)(\xi,\rho)-zCDP implies (ξO(klogk),ρk2)(\xi\cdot O(k\log k),\rho\cdot k^{2})-zCDP for groups of size kk.

The Gaussian mechanism shows that k2ρk^{2}\rho is the optimal dependence on ρ\rho. However, O(klogk)ξO(k\log k)\xi is not the optimal dependence on ξ\xi: (ξ,0)(\xi,0)-zCDP implies (kξ,0)(k\xi,0)-zCDP for groups of size kk.

We show this by induction on kk. The statement is clearly true for groups of size 11. We now assume the statement holds for groups of size k1k-1 and will verify it for groups of size kk.

Let x,xXnx,x^{\prime}\in\mathcal{X}^{n} differ in kk entries. Let x^Xn\hat{x}\in\mathcal{X}^{n} be such that xx and x^\hat{x} differ in k1k-1 entries and xx^{\prime} and x^\hat{x} differ in one entry.

where the last inequality follows from the fact that kαkα1\frac{k\alpha}{k\alpha-1} is a decreasing function of α\alpha for α>1\alpha>1. ∎

Lower Bounds

In this section we develop tools to prove lower bounds for zCDP. We will use group privacy to bound the mutual information between the input and the output of a mechanism satisfying zCDP. Thus, if we are able to construct a distribution on inputs such that any accurate mechanism must reveal a high amount of information about its input, we obtain a lower bound showing that no accurate mechanism satisfying zCDP can be accurate for this data distribution.

We begin with the simplest form of our mutual information bound, which is an analogue of the bound of [MMP+10] for pure differential privacy:

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy (ξ,ρ)(\xi,\rho)-zCDP. Let XX be a random variable in Xn\mathcal{X}^{n}. Then

where II denotes mutual information (measured in nats, rather than bits).

By Proposition 5.3, MM provides (ξni=1n1i,ρn2)(\xi\cdot n\sum_{i=1}^{n}\frac{1}{i},\rho\cdot n^{2})-zCDP for groups of size nn. Thus

for all x,xXnx,x^{\prime}\in\mathcal{X}^{n}. Since KL-divergence is convex,

The reason this lower bound works is the strong group privacy guarantee — even for groups of size nn, we obtain nontrivial privacy guarantees. While this is good for privacy it is bad for usefulness, as it implies that even information that is “global” (rather than specific to a individual or a small group) is protected. These lower bounds reinforce the connection between group privacy and lower bounds [HT10, De12, SU15a].

In contrast, (ε,δ)(\varepsilon,\delta)-DP is not susceptible to such a lower bound because it gives a vacuous privacy guarantee for groups of size k=O(log(1/δ)/ε)k=O(\log(1/\delta)/\varepsilon). This helps explain the power of the propose-test-release paradigm.

Furthermore, we obtain even stronger mutual information bounds when the entries of the distribution are independent:

Let M:XmYM:\mathcal{X}^{m}\to\mathcal{Y} satisfy (ξ,ρ)(\xi,\rho)-zCDP. Let XX be a random variable in Xm\mathcal{X}^{m} with independent entries. Then

where II denotes mutual information (measured in nats, rather than bits).

First, by the chain rule for mutual information,

By zCDP, we know that for all xXi1x\in\mathcal{X}^{i-1}, y,yXy,y^{\prime}\in\mathcal{X}, and zXmiz\in\mathcal{X}^{m-i}, we have

for all xx and yy. The result follows. ∎

More generally, we can combine dependent and independent entries as follows.

where II denotes the mutual information (measured in nats, rather than bits).

By the chain rule for mutual information,

By (6) and the convexity of KL-divergence,

for all xx and yy. The result follows. ∎

We informally discuss a few applications of our information-based lower bounds to some simple and well-studied problems in differential privacy.

Consider M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} where X={0,1}d\mathcal{X}=\{0,1\}^{d} and Y=d\mathcal{Y}=^{d}. The goal of MM is to estimate the attribute means, or one-way marginals, of its input database xx:

We now analyze what can be accomplished with zCDP. Adding independent noise drawn from N(0,d/2n2ρ)\mathcal{N}(0,d/2n^{2}\rho) to each of the dd coordinates of x\overline{x} satisfies ρ\rho-zCDP. This gives accurate answers as long as nd/ρn\gg\sqrt{d/\rho}.

For a lower bound, consider sampling X1{0,1}dX_{1}\in\{0,1\}^{d} uniformly at random. Set Xi=X1X_{i}=X_{1} for all i[n]i\in[n]. By Proposition 6.1,

for any ρ\rho-zCDP M:({0,1}d)ndM:(\{0,1\}^{d})^{n}\to^{d}. However, if MM is accurate, we can recover (most of) X1X_{1} from M(X)M(X), whence I(X;M(X))Ω(d)I(X;M(X))\geq\Omega(d). This yields a lower bound of nΩ(d/ρ)n\geq\Omega(\sqrt{d/\rho}), which is tight up to constant factors.

For ε\varepsilon-DP it is possible to do this if and only if n=Θ(log(T)/ε))n=\Theta(\log(T)/\varepsilon)); the optimal algorithm is to independently sample

However, for (ε,δ)(\varepsilon,\delta)-DP, it is possible to attain sample complexity n=O(log(1/δ)/ε)n=O(\log(1/\delta)/\varepsilon) [BNS13, BNS16b, Theorem 3.13]. Interestingly, for zCDP we can show that n=Θ(log(T)/ρ)n=\Theta(\sqrt{\log(T)/\rho}) is sufficient and necessary:

independently for t[T]t\in[T] satisfies ρ\rho-zCDP. Moreover,

On the other hand, if we sample X1[T]X_{1}\in[T] uniformly at random and set Xi=X1X_{i}=X_{1} for all i[n]i\in[n], then I(X;M(X))Ω(logT)I(X;M(X))\geq\Omega(\log T) for any accurate MM, as we can recover X1X_{1} from M(X)M(X) if MM is accurate. Proposition 6.1 thus implies that nΩ(log(T)/ρ)n\geq\Omega(\sqrt{\log(T)/\rho}) is necessary to obtain accuracy.

This gives a strong separation between approximate DP and zCDP.

Consider M:{±1}n{±1}nM:\{\pm 1\}^{n}\to\{\pm 1\}^{n} where the goal is to maximize x,M(x)\langle x,M(x)\rangle subject to ρ\rho-zCDP.

One solution is randomized response [War65]: Each output bit ii of MM is chosen independently with

Turning our attention to lower bounds: Let X{±1}nX\in\{\pm 1\}^{n} be uniformly random. By Lemma 6.2, since the bits of XX are independent, we have I(X;M(X))ρnI(X;M(X))\leq\rho\cdot n for any ρ\rho-zCDP MM. However, if MM is accurate, we can recover part of XX from M(X)M(X) [BSU16], whence I(X;M(X))Ω(n)I(X;M(X))\geq\Omega(n).

Randomized response is a special case of the exponential mechanism [MT07, BLR13]. Consequently this can be interpreted as a lower bound for search problems.

The above examples can be easily discussed in terms of a more formal and quantitative definition of accuracy. In particular, we consider the histogram example again:

then nΩ(log(α2T)/ρα2)n\geq\Omega(\sqrt{\log(\alpha^{2}T)/\rho\alpha^{2}}).

where X=(X1,,Xm)XnX=(X_{1},\cdots,X_{m})\in\mathcal{X}^{n}. However,

for any functions ff and gg, where HH is the entropy (in nats). In particular, we let

Clearly H(X)=mlogTH(X)=m\log T. Furthermore, H(Xf(X))mlogmH(X|f(X))\leq m\log m, since XX can be specified by naming mm elements of f(X)f(X), which is a set of at most mm elements.

then g(M(X))g(M(X)) contains exactly all the values in XX — i.e. f(X)=g(M(X))f(X)=g(M(X)). By Markov’s inequality, (8) holds with probability at least 4/54/5.

Now we can upper bound H(f(X)g(M(X)))H(f(X)|g(M(X))) by giving a scheme for specifying f(X)f(X) given g(M(X))g(M(X)). If (8) holds, we simply need one bit to say so. If (8) does not hold, we need one bit to say this and mlog2Tm\log_{2}T bits to describe f(X)f(X). This gives

Combining this with (7) completes the proof. ∎

We remark that our lower bounds for zCDP can be converted to lower bounds for mCDP using Lemma 4.2.

Obtaining Pure DP Mechanisms from zCDP

We now establish limits on what more can be achieved with zCDP over pure differential privacy. In particular, we will prove that any mechanism satisfying zCDP can be converted into a mechanism satisfying pure DP with at most a quadratic blowup in sample complexity. Formally, we show the following theorem.

Assume ξα2\xi\leq\alpha^{2}, ρα2\rho\leq\alpha^{2}, and

for all xXnx\in\mathcal{X}^{n^{\prime}} and β>0\beta>0.

Before discussing the proof of Theorem 7.1, we make some remarks about its statement:

If ξ=0\xi=0, we have n=O(n2ρ/εα)n^{\prime}=O(n^{2}\rho/\varepsilon\alpha). So, if ρ\rho, ε\varepsilon, and α\alpha are all constants, we have n=O(n2)n^{\prime}=O(n^{2}). This justifies our informal statement that we can convert any mechanism satisfying zCDP into one satisfying pure DP with a quadratic blowup in sample complexity.

This example illustrates that the theorem is not tight in terms of α\alpha; it loses a 1/α21/\alpha^{2} factor here. However, the other parameters are tight.

The requirement that ξ,ρα2\xi,\rho\leq\alpha^{2} is only used to show that

using Lemma 7.5. However, in many situations (9) holds even when ξ,ρα2\xi,\rho\gg\alpha^{2}. For example, if nO(log(k)/α2)n\geq O(\log(k)/\alpha^{2}) or even nO(VC(q)/α2)n\geq O(VC(q)/\alpha^{2}) then (9) is automatically satisfied.

The technical condition (9) is needed to relate the part of the proof with inputs of size nn to the part with inputs of size nn^{\prime}.

Thus we can restate Theorem 7.1 with the condition ξ,ρα2\xi,\rho\leq\alpha^{2} replaced by (9). This would be more general, but also more mysterious.

Alas, the proof of Theorem 7.1 is not constructive. Rather than directly constructing a mechanism satisfying pure DP from any mechanism satisfying zCDP, we show the contrapositive statement: any lower bound for pure DP can be converted into a lower bound for zCDP. Pure DP is characterized by so-called packing lower bounds and the exponential mechanism.

We begin by giving a technical lemma showing that for any output space and any desired accuracy we have a “packing” and a “net:”

Let (Y,d)(\mathcal{Y},d) be a metric space. Fix α>0\alpha>0. Then there exists a countable TYT\subset\mathcal{Y} such that both of the following hold.

(Net:) Either TT is infinite or for all yYy^{\prime}\in\mathcal{Y} there exists yTy\in T with d(y,y)αd(y,y^{\prime})\leq\alpha.

(Packing:) For all y,yTy,y^{\prime}\in T, if yyy\neq y^{\prime}, then d(y,y)>αd(y,y^{\prime})>\alpha.

Consider the following procedure for producing TT.

Initialize AYA\leftarrow\mathcal{Y} and TT\leftarrow\emptyset.

Update A{yA:d(y,y)>α}A\leftarrow\{y^{\prime}\in A:d(y^{\prime},y)>\alpha\}.

This procedure either terminates giving a finite TT or runs forever enumerating a countably infinite TT.

(Net:) If TT is infinite, we immediately can dispense the first condition, so suppose the procedure terminates and TT is finite. Fix yYy^{\prime}\in\mathcal{Y}. Since the procedure terminates, A=A=\emptyset at the end, which means yy^{\prime} was removed from AA at some point. This means some yTy\in T was added such that d(y,y)αd(y^{\prime},y)\leq\alpha, as required.

(Packing:) Fix yyTy\neq y^{\prime}\in T. We assume, without loss of generality, that yy was added to TT before yy^{\prime}. This means yy^{\prime} was not removed from AA when yy was added to TT. In particular, this means d(y,y)>αd(y^{\prime},y)>\alpha. ∎

It is well-known that a net yields a pure DP algorithm:

for all xXnx\in\mathcal{X}^{n} and β>0\beta>0.

The analysis can be found in [DR14, Theorems 3.10 and 3.11] and [BNS+16a, Lemma 7.1]. ∎

We also show that a packing yields a lower bound for zCDP:

Let (Y,d)(\mathcal{Y},d) be a metric space and q:XnYq:\mathcal{X}^{n}\to\mathcal{Y} a function. Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} be a (ξ,ρ)(\xi,\rho)-zCDP mechanism satisfying

for all xXnx\in\mathcal{X}^{n}. Let TYT\subset\mathcal{Y} be such that d(y,y)>αd(y,y^{\prime})>\alpha, for all y,yTy,y^{\prime}\in T with yyy\neq y^{\prime}. Assume that for all yTy\in T there exists xXnx\in\mathcal{X}^{n} with q(x)=yq(x)=y. Then

Let q1:TXnq^{-1}:T\to\mathcal{X}^{n} be a function such that q(q1(y))=yq(q^{-1}(y))=y for all yTy\in T. Define f:YTf:\mathcal{Y}\to T by

Let YY be a uniformly random element of TT and let X=q1(Y)X=q^{-1}(Y). By the data processing inequality and Proposition 6.1,

Clearly H(Y)=logTH(Y)=\log|T| and H(EZ)H(E)log2H(E|Z)\leq H(E)\leq\log 2. Moreover,

The result now follows by combining inequalities. ∎

for all xXnx\in\mathcal{X}^{n}. For all nn^{\prime},

The proof of Lemma 7.5 is deferred to the appendix.

Now we can combine Lemmas 7.2, 7.3, 7.4, and 7.5 to prove Theorem 7.1:

(Packing:) For all y,yTy,y^{\prime}\in T, if yyy\neq y^{\prime}, then yy>4α\|y-y^{\prime}\|>4\alpha.

This gives an upper bound on T|T|. In particular, TT must be finite.

for all xXnx\in\mathcal{X}^{n^{\prime}}. For xXnx\in\mathcal{X}^{n^{\prime}}, by the Net property and Lemma 7.5,

The theorem now follows by combining inequalities. ∎

Approximate zCDP

In the spirit of approximate DP, we propose a relaxation of zCDP:

A randomised mechanism M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} is δ\delta-approximately (ξ,ρ)(\xi,\rho)-zCDP if, for all x,xXnx,x^{\prime}\in\mathcal{X}^{n} differing on a single entry, there exist events E=E(M(x))E=E(M(x)) and E=E(M(x))E^{\prime}=E^{\prime}(M(x^{\prime})) such that, for all α(1,)\alpha\in(1,\infty),

Clearly -approximate zCDP is simply zCDP. Hence we have a generalization of zCDP. As we will show later in this section, δ\delta-approximate (ε,0)(\varepsilon,0)-zCDP is equivalent to (ε,δ)(\varepsilon,\delta)-DP. Thus we have also generalized approximate DP. Hence, this definition unifies both relaxations of pure DP.

Approximate zCDP is a three-parameter definition which allows us to capture many different aspects of differential privacy. However, three parameters is quite overwhelming. We believe that use of the one-parameter ρ\rho-zCDP (or the two-parameter δ\delta-approximate ρ\rho-zCDP if necessary) is sufficient for most purposes.

It is easy to verify that the definition of approximate zCDP satisfies the following basic properties.

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} and M:Xn×YZM^{\prime}:\mathcal{X}^{n}\times\mathcal{Y}\to\mathcal{Z} be randomized algorithms. Suppose MM satisfies δ\delta-approximate (ξ,ρ)(\xi,\rho)-zCDP and, for all yYy\in\mathcal{Y}, M(,y):XnZM^{\prime}(\cdot,y):\mathcal{X}^{n}\to\mathcal{Z} satisfies δ\delta^{\prime}-approximate (ξ,ρ)(\xi^{\prime},\rho^{\prime})-zCDP. Define M:XnZM^{\prime\prime}:\mathcal{X}^{n}\to\mathcal{Z} by M(x)=M(x,M(x))M^{\prime\prime}(x)=M^{\prime}(x,M(x)). Then MM^{\prime\prime} satisfies (δ+δδδ)(\delta+\delta^{\prime}-\delta\cdot\delta^{\prime})-approximate (ξ+ξ,ρ+ρ)(\xi+\xi^{\prime},\rho+\rho^{\prime})-zCDP.

Suppose M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies δ\delta-approximate (ξ,0)(\xi,0)-zCDP. Then MM satisfies δ\delta-approximate ξ\xi-zCDP and δ\delta-approximate 12ξ2\frac{1}{2}\xi^{2}-zCDP.

However, the strong group privacy guarantees of Section 5 no longer apply to approximate zCDP and, hence, the strong lower bounds of Section 6 also no longer hold. Circumventing these lower bounds is part of the motivation for considering approximate zCDP. However, approximate zCDP is not necessarily the only way to relax zCDP that circumvents our lower bounds:

Consider relaxing the definition of zCDP to only require the bound (1) or (2) to hold when αm\alpha\leq m:

This relaxed definition may also be able to circumvent the group privacy-based lower bounds, as our group privacy proof would no longer work for groups of size larger than mm. We do not know what group privacy guarantees Definition 8.4 provides for groups of size kmk\gg m. This relaxed definition may be worth exploring, but is beyond the scope of our work.

We can convert approximate DP to approximate zCDP using the following lemma.

First we define a approximate DP version of the randomized response mechanism:

The above mechanism is “complete” for approximate DP:

If M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (ε,δ)(\varepsilon,\delta)-DP, then MM satisfies δ\delta-approximate (ε,0)(\varepsilon,0)-zCDP, which, in turn, implies δ\delta-approximate (0,12ε2)(0,\frac{1}{2}\varepsilon^{2})-zCDP.

Fix neighbouring x0,x1Xnx_{0},x_{1}\in\mathcal{X}^{n}. Let T:{0,1}×{,}YT:\{0,1\}\times\{\bot,\top\}\to\mathcal{Y} be as in Lemma 8.6.

for both b{0,1}b\in\{0,1\} and all α(1,)\alpha\in(1,\infty). Thus we have satisfied the definition of δ\delta-approximate (ε,0)(\varepsilon,0)-zCDP.

Applying Proposition 3.3 shows that this also implies δ\delta-approximate (0,12ε2)(0,\frac{1}{2}\varepsilon^{2})-zCDP. ∎

2 Approximate zCDP Implies Approximate DP

Suppose M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies δ\delta-approximate (ξ,ρ)(\xi,\rho)-zCDP. If ρ=0\rho=0, then MM satisfies (ξ,δ)(\xi,\delta)-DP. In general, MM satisfies (ε,δ+(1δ)δ)(\varepsilon,\delta+(1-\delta)\delta^{\prime})-DP for all εξ+ρ\varepsilon\geq\xi+\rho, where

which proves the first half of the lemma.

Secondly, by Lemma B.2 (cf. Lemma 3.5), for all εξ+ρ\varepsilon\geq\xi+\rho,

3 Application of Approximate zCDP

Approximate zCDP subsumes approximate DP. A result of this is that we can apply our tightened lemmas to give a tighter version of the so-called advanced composition theorem [DRV10].

Note that the following results are subsumed by the bounds of Kairouz, Oh, and Viswanath [KOV15] and Murtagh and Vadhan [MV16]. However, these bounds may be extended to analyse the composition of mechanisms satisfying CDP with mechanisms satisfying approximate DP. We believe that such a “unified” analysis of composition will be useful.

Applying Corollary 8.7, Lemma 8.2, and Lemma 8.8 yields the following result.

Let M1,,Mk:XnYM_{1},\cdots,M_{k}:\mathcal{X}^{n}\to\mathcal{Y} and let M:XnYkM:\mathcal{X}^{n}\to\mathcal{Y}^{k} be their composition. Suppose each MiM_{i} satisfies (εi,δi)(\varepsilon_{i},\delta_{i})-DP. Set ρ=12ikεi2\rho=\frac{1}{2}\sum_{i}^{k}\varepsilon_{i}^{2}. Then MM satisfies

Let M1,,Mk:XnYM_{1},\cdots,M_{k}:\mathcal{X}^{n}\to\mathcal{Y} and let M:XnYkM:\mathcal{X}^{n}\to\mathcal{Y}^{k} be their composition. Suppose each MiM_{i} satisfies (εi,δi)(\varepsilon_{i},\delta_{i})-DP. Set ε2=12ikεi2\varepsilon^{2}=\frac{1}{2}\sum_{i}^{k}\varepsilon_{i}^{2}. Then MM satisfies

Finally, by picking the second term in the minimum and using 1i(1δi)iδi1-\prod_{i}(1-\delta_{i})\leq\sum_{i}\delta_{i}, we have the following simpler form of the lemma.

Let M1,,Mk:XnYM_{1},\cdots,M_{k}:\mathcal{X}^{n}\to\mathcal{Y} and let M:XnYkM:\mathcal{X}^{n}\to\mathcal{Y}^{k} be their composition. Suppose each MiM_{i} satisfies (εi,δi)(\varepsilon_{i},\delta_{i})-DP. Then MM satisfies

for all λ0\lambda\geq 0. Alternatively MM satisfies

In comparison to the composition theorem of [DRV10], we save modestly by a constant factor in the first term and, in most cases π/2ε2<1\sqrt{\pi/2}\|\varepsilon\|_{2}<1, whence the logarithmic term is an improvement over the usual advanced composition theorem.

We thank Cynthia Dwork and Guy Rothblum for sharing a preliminary draft of their work with us. We also thank Ilya Mironov, Kobbi Nissim, Adam Smith, Salil Vadhan, and the Harvard Differential Privacy Research Group for helpful discussions and suggestions.

References

Appendix A Postprocessing and mCDP

In this appendix we give a family of counterexamples showing that mCDP is not closed under postprocessing (unlike zCDP).

We examine the mCDP guarantees of the postprocessed mechanism M:{1,1}{1,0,1}M^{\prime}:\{-1,1\}\to\{-1,0,1\} defined by M(x)=T(M(x))M^{\prime}(x)=T(M(x)):

Let σ1\sigma\geq 1 and let t6σ3+1t\geq 6\sigma^{3}+1. Then while the mechanism MM is (2/σ2,2/σ)(2/\sigma^{2},2/\sigma)-mCDP, the postprocessed mechanism MM^{\prime} is not (2/σ2,2/σ)(2/\sigma^{2},2/\sigma)-mCDP.

Note that p>qp>q. Hence, for each x{1,1}x\in\{-1,1\},

Now consider the privacy loss random variable Z=PrivLoss(M(1)M(1))=f(M(1))Z=\mathsf{PrivLoss}\left(M(1)\middle\|M(-1)\right)=f(M(1)). Then ZZ is distributed according to

The values p,qp,q satisfy the following inequalities:

12πσt+σ1e(t1)2/2σ2p12πσt1e(t1)2/2σ2\sqrt{\frac{1}{2\pi}}\cdot\frac{\sigma}{t+\sigma-1}\cdot e^{-(t-1)^{2}/2\sigma^{2}}\leq p\leq\sqrt{\frac{1}{2\pi}}\cdot\frac{\sigma}{t-1}\cdot e^{-(t-1)^{2}/2\sigma^{2}}

The first part now follows from the fact that σt1\sigma\leq t-1.

To establish the second inequality, we write

Now set λ=t2\lambda=\frac{t}{2}. Then this quantity becomes

The expression exp((t1)/2σ2)/2t\exp((t-1)/2\sigma^{2})/2t is monotone increasing for t2σ2t\geq 2\sigma^{2}. Thus, it is strictly larger than 2π\sqrt{2\pi} as long as t6σ3+1t\geq 6\sigma^{3}+1. ∎

Appendix B Miscellaneous Proofs and Lemmata

This technical lemma may be “verified” numerically by inspecting a plot of z=e12xysinh(xy)(sinh(x)sinh(y))z=e^{\frac{1}{2}xy}\cdot\sinh(x-y)-(\sinh(x)-\sinh(y)) for (x,y)2(x,y)\in^{2}.

For intuition, consider the third-order Taylor approximation to sinh(x)\sinh(x) about :

Unfortunately, turning this intuition into an actual proof is quite involved. We instead provide a proof from [hh]:

We need the following hyperbolic trigonometric identities.

We also use the fact that 0tanh(z)<min{z,1}0\leq\tanh(z)<\min\{z,1\} for all z>0z>0. Thus

where t=tanh(x2)tanh(y2)<1t=\tanh\left(\frac{x}{2}\right)\tanh\left(\frac{y}{2}\right)<1. Now

So it only remains to show that tanh(x2)tanh(y2)tanh(xy4)\tanh\left(\frac{x}{2}\right)\tanh\left(\frac{y}{2}\right)\leq\tanh\left(\frac{xy}{4}\right). This is clearly true when y=0y=0. Now

Since 0y<x20\leq y<x\leq 2, we have 0xy/4y/20\leq xy/4\leq y/2 and, hence, cosh(y/2)cosh(xy/4)\cosh(y/2)\geq\cosh(xy/4). Thus

The inequality now follows by integration of the inequality on the derivatives. ∎

The following lemma immediately implies Lemma 3.6 and is also used to prove Lemma 8.8.

As in Lemma 3.5, by Markov’s inequality, for all α>1\alpha>1 and λ>ξ+ρ\lambda>\xi+\rho,

Choosing α=(λξ+ρ)/2ρ>1\alpha=(\lambda-\xi+\rho)/2\rho>1 gives

Now we can substitute upper bounds on hh to obtain the desired results.

Let M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfy (ε,δ)(\varepsilon,\delta)-DP for all δ>0\delta>0 and

for some constants ξ^,ρ^\hat{\xi},\hat{\rho}\in. Then MM is (ξ^14ρ^+5ρ^,14ρ^)\left(\hat{\xi}-\frac{1}{4}\hat{\rho}+5\sqrt{\hat{\rho}},\frac{1}{4}\hat{\rho}\right)-zCDP.

We use the inequality 1+xeξex+ξ1+xe^{\xi}\leq e^{x+\xi} for all x,ξ0x,\xi\geq 0. We have

We make use of the following technical lemma taken from [DSS+15].

Let XX^{\prime} be an independent copy of XX and let Y{0,1}Y\in\{0,1\} be uniformly random and independent from XX and XX^{\prime}. By Jensen’s inequality,

If M:XnYM:\mathcal{X}^{n}\to\mathcal{Y} satisfies (ξ,ρ)(\xi,\rho)-zCDP, then MM satisfies (ξ+ρ,O(ξ+2ρ))(\xi+\rho,O(\sqrt{\xi+2\rho}))-CDP.

The other side of the inequality is symmetric. ∎

Unfortunately Rényi divergence is not convex for α>1\alpha>1. (Although KL-divergence is.) However, the following property implies that Rényi divergence is quasi-convex.

Let P0,P1,Q0,Q1P_{0},P_{1},Q_{0},Q_{1} be distributions on Ω\Omega. For tt\in, define Pt=tP1+(1t)P0P_{t}=tP_{1}+(1-t)P_{0} and Qt=tQ1+(1t)Q0Q_{t}=tQ_{1}+(1-t)Q_{0} to be the convex combinations specified by tt. Then

Moreover, the limit as α1+\alpha\to 1+ gives

Let h(x)=xαh(x)=x^{\alpha}. Note that hh is convex. Let f1(y)={xΩ:f(x)=y}f^{-1}(y)=\{x\in\Omega:f(x)=y\}. Let QyQ_{y} be the conditional distribution on xQx\sim Q conditioned on f(x)=yf(x)=y. By Jensen’s inequality,

Let 1<αα<1<\alpha\leq\alpha^{\prime}<\infty. Let h(x)=xα1α1h(x)=x^{\frac{\alpha^{\prime}-1}{\alpha-1}}. Then

Appendix C Privacy versus Sampling

In this appendix we prove the technical Lemma 7.5. Essentially we show that if a private mechanism can accurately answer a set of queries with a given sample complexity, then those queries can be approximated on an unknown distribution with the same sample complexity. This is related to lower bounds on private sample complexity using Vapnik-Chervonenkis dimension e.g. [DR14, Theorem 4.8] and [BNSV15, Theorem 5.5].

First we state Pinsker’s inequality [vEH14, Theorem 31].

Let XX and YY be random variables on $$. Then

We can also generalise Pinsker’s inequality using Rényi divergence:

for all xXnx\in\mathcal{X}^{n}. Then, for any distribution D\mathcal{D} on X\mathcal{X},

By postprocessing, WW satisfies (ξ,ρ)(\xi,\rho)-zCDP and

The following is similar to [BNS+16a, Lemma 3.1]. Let xDnx\sim\mathcal{D}^{n} and yDy\sim\mathcal{D}. Now

Fix xXnx\in\mathcal{X}^{n^{\prime}}. Let D\mathcal{D} be the uniform distribution on elements of xx so that q(D)=q(x)q(\mathcal{D})=q(x). By Proposition C.3,

In particular, there must exist x^Dn\hat{x}\sim\mathcal{D}^{n} such that q(x^)q(D)2α+2(ξ+ρ)\|q(\hat{x})-q(\mathcal{D})\|\leq 2\alpha+\sqrt{2(\xi+\rho)}, as required. ∎