Between Pure and Approximate Differential Privacy

Thomas Steinke, Jonathan Ullman

Introduction

The goal of privacy-preserving data analysis is to enable rich statistical analysis of a database while protecting the privacy of individuals whose data is in the database. A formal privacy guarantee is given by (ε,δ)(\varepsilon,\delta)-differential privacy [DMNS06, DKM+06], which ensures that no individual’s data has a significant influence on the information released about the database. The two parameters ε\varepsilon and δ\delta control the level of privacy. Very roughly, ε\varepsilon is an upper bound on the amount of influence an individual’s record has on the information released and δ\delta is the probability that this bound fails to holdThis intuition is actually somewhat imprecise, although it is suitable for this informal discussion. See [KS08] for a more precise semantic interpretation of (ε,δ)(\varepsilon,\delta)-differential privacy., so the definition becomes more stringent as ε,δ0\varepsilon,\delta\rightarrow 0.

A natural way to measure the tradeoff between privacy and utility is sample complexity—the minimum number of records nn that is sufficient in order to publicly release a given set of statistics about the database, while achieving both differential privacy and statistical accuracy. Intuitively, it’s easier to achieve these two goals when nn is large, as each individual’s data will have only a small influence on the aggregate statistics of interest. Conversely, the sample complexity nn should increase as ε\varepsilon and δ\delta decrease (which strengthens the privacy guarantee).

The strongest version of differential privacy, in which δ=0\delta=0, is known as pure differential privacy. The sample complexity of achieving pure differential privacy is well known for many settings (e.g. [HT10]). The more general case where δ>0\delta>0 is known as approximate differential privacy, and is less well understood. Recently, Bun, Ullman, and Vadhan [BUV14] showed how to prove strong lower bounds for approximate differential privacy that are essentially optimal for δ1/n\delta\approx 1/n, which is essentially the weakest privacy guarantee that is still meaningful.When δ1/n\delta\geq 1/n there are algorithms that are intuitively not private, yet satisfy (0,δ)(0,\delta)-differential privacy.

Since δ\delta bounds the probability of a complete privacy breach, we would like δ\delta to be very small. Thus we would like to quantify the cost (in terms of sample complexity) as δ0\delta\rightarrow 0. In this work we give lower bounds for approximately differentially private algorithms that are nearly optimal for every choice of δ\delta, and smoothly interpolate between pure and approximate differential privacy.

Specifically, we consider algorithms that compute the one-way marginals of the database—an extremely simple and fundamental family of queries. For a database D{±1}n×dD\in\{\pm 1\}^{n\times d}, the dd one-way marginals are simply the mean of the bits in each of the dd columns. Formally, we define

where Di{±1}dD_{i}\in\{\pm 1\}^{d} is the ii-th row of DD. A mechanism MM is said to be accurate if, on input DD, its output is “close to” D\overline{D}. Accuracy may be measured in a worst-case sense—i.e. M(D)Dα\left|\left|M(D)-\overline{D}\right|\right|_{\infty}\leq\alpha, meaning every one-way marginal is answered with accuracy α\alpha—or in an average-case sense—i.e. M(D)D1αd\left|\left|M(D)-\overline{D}\right|\right|_{1}\leq\alpha d, meaning the marginals are answered with average accuracy α\alpha.

Some of the earliest results in differential privacy [DN03, DN04, BDMN05, DMNS06] give a simple (ε,δ)(\varepsilon,\delta)-differentially private algorithm—the Laplace mechanism—that computes the one-way marginals of D{±1}n×dD\in\{\pm 1\}^{n\times d} with average error α\alpha as long as

More generally, this is the first result showing that the sample complexity must grow by a multiplicative factor of log(1/δ)\sqrt{\log(1/\delta)} for answering any family of queries, as opposed to an additive dependence on δ\delta. We also remark that the assumption on the range of δ\delta is necessary, as the Laplace mechanism gives accuracy α\alpha and satisfies (ε,0)(\varepsilon,0)-differential privacy when nO(d/εα)n\geq O(d/\varepsilon\alpha).

Our lower bound holds for mechanisms with an average-case (L1L_{1}) error guarantee. Thus, it also holds for algorithms that achieve worst-case (LL_{\infty}) error guarantees. The Laplace mechanism gives a matching upper bound for average-case error. In many cases worst-case error guarantees are preferrable. For worst-case error, the sample complexity of the Laplace mechanism degrades by an additional logd\log d factor compared to (1).

Surprisingly, this degradation is not necessary. We present algorithms that answer every one-way marginal with α\alpha accuracy and improve on the sample complexity of the Laplace mechanism by roughly a logd\log d factor. These algorithms demonstrate that the widely used technique of adding independent noise to each query is suboptimal when the goal is to achieve worst-case error guarantees.

Our algorithm for pure differential privacy satisfies the following.

For every ε,α>0\varepsilon,\alpha>0, d1d\geq 1, and n4d/εαn\geq 4d/\varepsilon\alpha, there exists an efficient mechanism M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} that is (ε,0)(\varepsilon,0)-differentially private and

And our algorithm for approximate differential privacy is as follows.

For every ε,δ,α>0\varepsilon,\delta,\alpha>0, d1d\geq 1, and

there exists an efficient mechanism M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} that is (ε,δ)(\varepsilon,\delta)-differentially private and

These algorithms improve over the sample complexity of the best known mechanisms for each privacy and accuracy guarantee by a factor of (log(d))Ω(1)(\log(d))^{\Omega(1)}. Namely, the Laplace mechanism requires nO(dlogd/εα)n\geq O(d\cdot\log d/\varepsilon\alpha) samples for pure differential privacy and the Gaussian mechanism requires nO(dlog(1/δ)logd/εα)n\geq O(\sqrt{d\cdot\log(1/\delta)\cdot\log d}/\varepsilon\alpha) samples for approximate differential privacy.

2 Techniques

Our proof uses a new, more general reduction from breaking fingerprinting codes to differentially private data release. Specifically, our reduction uses group differential privacy. This property states that if an algorithm is (ε,δ)(\varepsilon,\delta)-differentially private with respect to the change of one individual’s data, then for any kk, it is roughly (kε,ekεδ)(k\varepsilon,e^{k\varepsilon}\delta)-differentially private with respect to the change of kk individuals’ data. Thus an (ε,δ)(\varepsilon,\delta)-differentially private algorithm provides a meaningful privacy guarantee for groups of size klog(1/δ)/εk\approx\log(1/\delta)/\varepsilon.

To use this in our reduction, we start with a mechanism MM that takes a database of nn rows and is (ε,δ)(\varepsilon,\delta)-differentially private. We design a mechanism MkM_{k} that takes a database of n/kn/k rows, copies each of its rows kk times, and uses the result as input to MM. The resulting mechanism MkM_{k} is roughly (kε,ekεδ)(k\varepsilon,e^{k\varepsilon}\delta)-differentially private. For our choice of kk, these parameters will be small enough to apply the attack of [BUV14] to obtain a lower bound on the number of samples used by MkM_{k}, which is n/kn/k. Thus, for larger values of kk (equivalently, smaller values of δ\delta), we obtain a stronger lower bound. The remainder of the proof is to quantify the parameters precisely.

Upper Bounds:

Preliminaries

We define a database D{±1}n×dD\in\{\pm 1\}^{n\times d} to be a matrix of nn rows, where each row corresponds to an individual, and each row has dimension dd (consists of dd binary attributes). We say that two databases D,D{±1}n×dD,D^{\prime}\in\{\pm 1\}^{n\times d} are adjacent if they differ only by a single row, and we denote this by DDD\sim D^{\prime}. In particular, we can replace the iith row of a database DD with some fixed element of {±1}d\{\pm 1\}^{d} to obtain another database DiDD_{-i}\sim D.

Let M:{±1}n×dRM:\{\pm 1\}^{n\times d}\to\mathcal{R} be a randomized mechanism. We say that MM is (ε,δ)(\varepsilon,\delta)-differentially private if for every two adjacent databases DDD\sim D^{\prime} and every subset SRS\subseteq\mathcal{R},

A well known fact about differential privacy is that it generalizes smoothly to databases that differ on more than a single row. We say that two databases D,D{±1}n×dD,D^{\prime}\in\{\pm 1\}^{n\times d} are kk-adjacent if they differ by at most kk rows, and we denote this by DkDD\sim_{k}D^{\prime}.

For every k1k\geq 1, if M:{±1}n×dRM:\{\pm 1\}^{n\times d}\to\mathcal{R} is (ε,δ)(\varepsilon,\delta)-differentially private, then for every two kk-adjacent databases DkDD\sim_{k}D^{\prime}, and every subset SRS\subseteq\mathcal{R},

All of the upper and lower bounds for one-way marginals have a multiplicative 1/αε1/\alpha\varepsilon dependence on the accuracy α\alpha and the privacy loss ε\varepsilon. This is no coincidence - there is a generic reduction:

Let p[1,]p\in[1,\infty] and α,ε,δ[0,1/10]\alpha,\varepsilon,\delta\in[0,1/10].

Suppose there exists a (ε,δ)(\varepsilon,\delta)-differentially private mechanism M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} such that for every database D{±1}n×dD\in\{\pm 1\}^{n\times d},

Then there exists a (1,δ/ε)(1,\delta/\varepsilon)-differentially private mechanism M:{±1}n×d[±1]dM^{\prime}:\{\pm 1\}^{n^{\prime}\times d}\to[\pm 1]^{d} for n=Θ(αεn)n^{\prime}=\Theta(\alpha\varepsilon n) such that for every database D{±1}n×dD^{\prime}\in\{\pm 1\}^{n^{\prime}\times d},

Lower Bounds for Approximate Differential Privacy

Our main theorem can be stated as follows.

Let M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} be a (1,δ)(1,\delta)-differentially private mechanism that answers one-way marginals such that

where D\overline{D} is the true answer vector. If 2Ω(n)δ1/n1+Ω(1)2^{-\Omega(n)}\leq\delta\leq 1/n^{1+\Omega(1)} and nn is sufficiently large, then

Theorem 1.1 in the introduction follows by rearranging terms, and applying Fact 2.3. The statement above is more convenient technically, but the statement in the introduction is more consistent with the literature.

First we must introduce fingerprinting codes. The following definition is tailored to the application to privacy. Fingerprinting codes were originally defined by Boneh and Shaw [BS98] with a worst-case accuracy guarantee. Subsequent works [BUV14, SU14] have altered the accuracy guarantee to an average-case one, which we use here.

A ε\varepsilon-complete δ\delta-sound α\alpha-robust L1L_{1} fingerprinting code for nn users with length dd is a pair of random variables D{±1}n×dD\in\{\pm 1\}^{n\times d} and Trace:[±1]d2[n]\mathit{Trace}:[\pm 1]^{d}\to 2^{[n]} such that the following hold.

Completeness: For any fixed M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d},

Soundness: For any i[n]i\in[n] and fixed M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d},

where DiD_{-i} denotes DD with the ithi^{\text{th}} row replaced by some fixed element of {±1}d\{\pm 1\}^{d}.

Fingerprinting codes with optimal length were first constructed by Tardos [Tar08] (for worst-case error) and subsequent works [BUV14, SU14] have adapted Tardos’ construction to work for average-case error guarantees, which yields the following theorem.

For every n1n\geq 1, δ>0\delta>0, and ddn,δ=O(n2log(1/δ))d\geq d_{n,\delta}=O(n^{2}\log(1/\delta)), there exists a 1/1001/100-complete δ\delta-sound 1/81/8-robust L1L_{1} fingerprinting code for nn users with length dd.

We now show how the existence of fingerprinting codes implies our lower bound.

Let M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} be a (1,δ)(1,\delta)-differentially private mechanism such that

Let kk be a parameter to be chosen later. Let nk=n/kn_{k}=\lfloor n/k\rfloor. Let Mk:{±1}nk×d[±1]dM_{k}:\{\pm 1\}^{n_{k}\times d}\to[\pm 1]^{d} be the following mechanism. On input D{±1}nk×dD^{*}\in\{\pm 1\}^{n_{k}\times d}, MkM_{k} creates D{±1}n×dD\in\{\pm 1\}^{n\times d} by taking kk copies of DD^{*} and filling the remaining entries with 1s. Then MkM_{k} runs MM on DD and outputs M(D)M(D).

By group privacy (Fact 2.2), MkM_{k} is a (εk=k,δk=ek1e1δ)\left(\varepsilon_{k}=k,\delta_{k}=\frac{e^{k}-1}{e-1}\delta\right)-differentially private mechanism. By the triangle inequality,

Thus DD12k/n\left|\left|\overline{D}-\overline{D^{*}}\right|\right|_{1}\leq 2k/n. Assume kn/200k\leq n/200. Thus DD1d/100\left|\left|\overline{D}-\overline{D^{*}}\right|\right|_{1}\leq d/100 and, by (2) and (3),

Assume ddnk,δd\geq d_{n_{k},\delta}, were dnk,δ=O(nk2log(1/δ))d_{n_{k},\delta}=O(n_{k}^{2}\log(1/\delta)) is as in Theorem 3.3. We will show by contradiction that this cannot be – that is dO(nk2log(1/δ))d\leq O(n_{k}^{2}\log(1/\delta)). Let D{±1}nk×dD^{*}\in\{\pm 1\}^{n_{k}\times d} and Trace:[±1]d2[nk]\mathit{Trace}:[\pm 1]^{d}\to 2^{[n_{k}]} be a 1/1001/100-complete δ\delta-sound 1/81/8-robust L1L_{1} fingerprinting code for nkn_{k} users of length dd.

By the completeness of the fingerprinting code,

In particular, there exists i[nk]i^{*}\in[n_{k}] such that

We have that Trace(Mk(D))\mathit{Trace}(M_{k}(D^{*})) is a (εk,δk)(\varepsilon_{k},\delta_{k})-differentially private function of DD^{*}, as it is only postprocessing Mk(D)M_{k}(D^{*}). Thus

where the second inequality follows from the soundness of the fingerprinting code.

If klog(1/12nkδ)1k\leq\log(1/12n_{k}\delta)-1, then (8) gives a contradiction. Let k=log(1/12nδ)1k=\lfloor\log(1/12n\delta)-1\rfloor. Assuming δen/200\delta\geq e^{-n/200} ensures kn/200k\leq n/200, as required. Assuming δ1/n1+γ\delta\leq 1/n^{1+\gamma} implies klog(1/δ)/(1+1/γ)5Ω(log(1/δ))k\geq\log(1/\delta)/(1+1/\gamma)-5\geq\Omega(\log(1/\delta)). This setting of kk gives a contradiction, which implies that

Adding independent noise seems very natural for one-way marginals, but it is suboptimal if one is interested in worst-case (i.e. LL_{\infty}) error bounds, rather than average-case (i.e. L1L_{1}) error bounds.

Theorem 1.2 follows from Theorem 4.1. In particular, the mechanism M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} in Theorem 1.2 is given by M(D)=D+YM(D)=\overline{D}+Y, where YDY\sim\mathcal{D} and D\mathcal{D} is the distribution from Theorem 4.1 with Δ=2/n\Delta=2/n.Note that we must truncate the output of MM to ensure that M(D)M(D) is always in [±1]d[\pm 1]^{d}.

Efficiency: D\mathcal{D} can be efficiently sampled.

The distribution D\mathcal{D} is simply an instantiation of the exponential mechanism [MT07]. In particular, the probability density function is given by

Firstly, this is clearly a well-defined distribution as long as ε/Δ>0\varepsilon/\Delta>0.

Define a distribution D\mathcal{D}^{*} on [0,)[0,\infty) to by ZDZ\sim\mathcal{D}^{*} meaning Z=YZ=\left|\left|Y\right|\right|_{\infty} for YDY\sim\mathcal{D}. To prove accuracy, we must give a tail bound on D\mathcal{D}^{*}. The probability density function of D\mathcal{D}^{*} is given by

which is obtained by integrating the probability density function of D\mathcal{D} over the infinity-ball of radius zz, which has surface area d2dzd1zd1d2^{d}z^{d-1}\propto z^{d-1}. Thus D\mathcal{D}^{*} is precisely the gamma distribution with shape dd and mean dΔ/εd\Delta/\varepsilon. The moment generating function is therefore

for all t<ε/Δt<\varepsilon/\Delta. By Markov’s inequality

Setting t=ε/Δd/αt=\varepsilon/\Delta-d/\alpha gives the required bound.

It is easy to verify that YDY\sim\mathcal{D} can be sampled by first sampling a radius RR from a gamma distribution with shape d+1d+1 and mean (d+1)Δ/ε(d+1)\Delta/\varepsilon and then sampling Y[±R]dY\in[\pm R]^{d} uniformly at random. To sample RR we can set R=Δεi=0dlogUiR=\frac{\Delta}{\varepsilon}\sum_{i=0}^{d}\log U_{i}, where each Ui(0,1]U_{i}\in(0,1] is uniform and independent. This gives an algorithm (in the form of an explicit circuit) to sample D\mathcal{D} that uses only O(d)O(d) real arithmetic operations, d+1d+1 logarithms, and 2d+12d+1 independent uniform samples from $$.

2 Approximate Differential Privacy

Our algorithm for approximate differential privacy makes use of a powerful tool from the literature [DNR+09, HR10, DNPR10, RR10] called the sparse vector algorithm:

For every c,k1c,k\geq 1, ε,δ,α,β>0\varepsilon,\delta,\alpha,\beta>0, and

there exists a mechanism SV\mathit{SV} with the following properties.

SV\mathit{SV} takes as input a database DXnD\in\mathcal{X}^{n} and provides answers a1,,ak[±1]a_{1},\cdots,a_{k}\in[\pm 1] to kk (adaptive) linear queries q1,,qk:X[±1]q_{1},\cdots,q_{k}:\mathcal{X}\to[\pm 1].

SV\mathit{SV} is (ε,δ)(\varepsilon,\delta)-differentially private.

A proof of this theorem can be found in [DR13, Theorem 3.28].Note that the algorithms in the literature are designed to sometimes output \perp as an answer or halt prematurely. To modify these algorithms into the form given by Theorem 4.2 simply output in these cases. We now describe our approximately differentially private mechanism.

Now we must prove accuracy. Suppose that a^jqj(D)αSV=α/2|\hat{a}_{j}-q_{j}(D)|\leq\alpha_{\mathit{SV}}=\alpha/2 for all j[d]j\in[d]. Then

as required. So we need only show that a^jqj(D)αSV|\hat{a}_{j}-q_{j}(D)|\leq\alpha_{\mathit{SV}} for all j[d]j\in[d], which sparse vector guarantees will happen with probability at least 1βSV1-\beta_{\mathit{SV}} as long as

Now we verify that (9) holds with high probability.

By our setting of parameters, we have qj(D)=zj/2q_{j}(D)=-z_{j}/2. This means

Let Ej{0,1}E_{j}\in\{0,1\} be the indicator of the event qj(D)>αSV/2|q_{j}(D)|>\alpha_{\mathit{SV}}/2. Since the zjz_{j}s are independent, so are the EjE_{j}s. Thus we can apply a Chernoff bound:

The failure probability of MM is bounded by the failure probability of SV\mathit{SV} plus (10), which is dominated by βSV=exp(log4d)\beta_{\mathit{SV}}=\exp(-\log^{4}d).

Finally we consider the sample complexity. The accuracy is bounded by

for sparse vector to work, which is also satisfied. ∎

We remark that we have not attempted to optimize the constant factors in this analysis.

References

Appendix A Alternative Lower Bound for Pure Differential Privacy

It is known [HT10] that any ε\varepsilon-differentially private mechanism that answers dd one-way marginals requires nΩ(d/ε)n\geq\Omega(d/\varepsilon) samples. Our techniques yield an alternative simple proof of this fact.

Let M:{±1}n×d[±1]dM:\{\pm 1\}^{n\times d}\to[\pm 1]^{d} be a ε\varepsilon-differentially private mechanism. Suppose

The proof uses a special case of Hoeffding’s Inequality:

Let x,x{±1}dx,x^{\prime}\in\{\pm 1\}^{d} be independent and uniform. Let D{±1}n×dD\in\{\pm 1\}^{n\times d} be nn copies of xx and, likewise, let D{±1}n×dD^{\prime}\in\{\pm 1\}^{n\times d} be nn copies of xx^{\prime}. Let Z=M(D),xZ=\langle M(D),x\rangle and Z=M(D),xZ^{\prime}=\langle M(D^{\prime}),x\rangle.

Now we give conflicting tail bounds for ZZ and ZZ^{\prime}, which we can relate by privacy.

By our hypothesis and Markov’s inequality,

Since M(D)M(D^{\prime}) is independent from xx, we have

Now DD and DD^{\prime} are databases that differ in nn rows, so privacy implies that

Rearranging 1/20<enεed/8001/20<e^{n\varepsilon}e^{-d/800}, gives