Between Pure and Approximate Differential Privacy

Thomas Steinke, Jonathan Ullman

Introduction

The goal of privacy-preserving data analysis is to enable rich statistical analysis of a database while protecting the privacy of individuals whose data is in the database. A formal privacy guarantee is given by $(\varepsilon,\delta)$ -differential privacy [DMNS06, DKM+06], which ensures that no individual’s data has a significant influence on the information released about the database. The two parameters $\varepsilon$ and $\delta$ control the level of privacy. Very roughly, $\varepsilon$ is an upper bound on the amount of influence an individual’s record has on the information released and $\delta$ is the probability that this bound fails to holdThis intuition is actually somewhat imprecise, although it is suitable for this informal discussion. See [KS08] for a more precise semantic interpretation of $(\varepsilon,\delta)$ -differential privacy., so the definition becomes more stringent as $\varepsilon,\delta\rightarrow 0$ .

A natural way to measure the tradeoff between privacy and utility is sample complexity—the minimum number of records $n$ that is sufficient in order to publicly release a given set of statistics about the database, while achieving both differential privacy and statistical accuracy. Intuitively, it’s easier to achieve these two goals when $n$ is large, as each individual’s data will have only a small influence on the aggregate statistics of interest. Conversely, the sample complexity $n$ should increase as $\varepsilon$ and $\delta$ decrease (which strengthens the privacy guarantee).

The strongest version of differential privacy, in which $\delta=0$ , is known as pure differential privacy. The sample complexity of achieving pure differential privacy is well known for many settings (e.g. [HT10]). The more general case where $\delta>0$ is known as approximate differential privacy, and is less well understood. Recently, Bun, Ullman, and Vadhan [BUV14] showed how to prove strong lower bounds for approximate differential privacy that are essentially optimal for $\delta\approx 1/n$ , which is essentially the weakest privacy guarantee that is still meaningful.When $\delta\geq 1/n$ there are algorithms that are intuitively not private, yet satisfy $(0,\delta)$ -differential privacy.

Since $\delta$ bounds the probability of a complete privacy breach, we would like $\delta$ to be very small. Thus we would like to quantify the cost (in terms of sample complexity) as $\delta\rightarrow 0$ . In this work we give lower bounds for approximately differentially private algorithms that are nearly optimal for every choice of $\delta$ , and smoothly interpolate between pure and approximate differential privacy.

Specifically, we consider algorithms that compute the one-way marginals of the database—an extremely simple and fundamental family of queries. For a database $D\in\{\pm 1\}^{n\times d}$ , the $d$ one-way marginals are simply the mean of the bits in each of the $d$ columns. Formally, we define

where $D_{i}\in\{\pm 1\}^{d}$ is the $i$ -th row of $D$ . A mechanism $M$ is said to be accurate if, on input $D$ , its output is “close to” $\overline{D}$ . Accuracy may be measured in a worst-case sense—i.e. $\left|\left|M(D)-\overline{D}\right|\right|_{\infty}\leq\alpha$ , meaning every one-way marginal is answered with accuracy $\alpha$ —or in an average-case sense—i.e. $\left|\left|M(D)-\overline{D}\right|\right|_{1}\leq\alpha d$ , meaning the marginals are answered with average accuracy $\alpha$ .

Some of the earliest results in differential privacy [DN03, DN04, BDMN05, DMNS06] give a simple $(\varepsilon,\delta)$ -differentially private algorithm—the Laplace mechanism—that computes the one-way marginals of $D\in\{\pm 1\}^{n\times d}$ with average error $\alpha$ as long as

More generally, this is the first result showing that the sample complexity must grow by a multiplicative factor of $\sqrt{\log(1/\delta)}$ for answering any family of queries, as opposed to an additive dependence on $\delta$ . We also remark that the assumption on the range of $\delta$ is necessary, as the Laplace mechanism gives accuracy $\alpha$ and satisfies $(\varepsilon,0)$ -differential privacy when $n\geq O(d/\varepsilon\alpha)$ .

Our lower bound holds for mechanisms with an average-case ( $L_{1}$ ) error guarantee. Thus, it also holds for algorithms that achieve worst-case ( $L_{\infty}$ ) error guarantees. The Laplace mechanism gives a matching upper bound for average-case error. In many cases worst-case error guarantees are preferrable. For worst-case error, the sample complexity of the Laplace mechanism degrades by an additional $\log d$ factor compared to (1).

Surprisingly, this degradation is not necessary. We present algorithms that answer every one-way marginal with $\alpha$ accuracy and improve on the sample complexity of the Laplace mechanism by roughly a $\log d$ factor. These algorithms demonstrate that the widely used technique of adding independent noise to each query is suboptimal when the goal is to achieve worst-case error guarantees.

Our algorithm for pure differential privacy satisfies the following.

For every $\varepsilon,\alpha>0$ , $d\geq 1$ , and $n\geq 4d/\varepsilon\alpha$ , there exists an efficient mechanism $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ that is $(\varepsilon,0)$ -differentially private and

And our algorithm for approximate differential privacy is as follows.

For every $\varepsilon,\delta,\alpha>0$ , $d\geq 1$ , and

there exists an efficient mechanism $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ that is $(\varepsilon,\delta)$ -differentially private and

These algorithms improve over the sample complexity of the best known mechanisms for each privacy and accuracy guarantee by a factor of $(\log(d))^{\Omega(1)}$ . Namely, the Laplace mechanism requires $n\geq O(d\cdot\log d/\varepsilon\alpha)$ samples for pure differential privacy and the Gaussian mechanism requires $n\geq O(\sqrt{d\cdot\log(1/\delta)\cdot\log d}/\varepsilon\alpha)$ samples for approximate differential privacy.

2 Techniques

Our proof uses a new, more general reduction from breaking fingerprinting codes to differentially private data release. Specifically, our reduction uses group differential privacy. This property states that if an algorithm is $(\varepsilon,\delta)$ -differentially private with respect to the change of one individual’s data, then for any $k$ , it is roughly $(k\varepsilon,e^{k\varepsilon}\delta)$ -differentially private with respect to the change of $k$ individuals’ data. Thus an $(\varepsilon,\delta)$ -differentially private algorithm provides a meaningful privacy guarantee for groups of size $k\approx\log(1/\delta)/\varepsilon$ .

To use this in our reduction, we start with a mechanism $M$ that takes a database of $n$ rows and is $(\varepsilon,\delta)$ -differentially private. We design a mechanism $M_{k}$ that takes a database of $n/k$ rows, copies each of its rows $k$ times, and uses the result as input to $M$ . The resulting mechanism $M_{k}$ is roughly $(k\varepsilon,e^{k\varepsilon}\delta)$ -differentially private. For our choice of $k$ , these parameters will be small enough to apply the attack of [BUV14] to obtain a lower bound on the number of samples used by $M_{k}$ , which is $n/k$ . Thus, for larger values of $k$ (equivalently, smaller values of $\delta$ ), we obtain a stronger lower bound. The remainder of the proof is to quantify the parameters precisely.

Upper Bounds:

Preliminaries

We define a database $D\in\{\pm 1\}^{n\times d}$ to be a matrix of $n$ rows, where each row corresponds to an individual, and each row has dimension $d$ (consists of $d$ binary attributes). We say that two databases $D,D^{\prime}\in\{\pm 1\}^{n\times d}$ are adjacent if they differ only by a single row, and we denote this by $D\sim D^{\prime}$ . In particular, we can replace the $i$ th row of a database $D$ with some fixed element of $\{\pm 1\}^{d}$ to obtain another database $D_{-i}\sim D$ .

Let $M:\{\pm 1\}^{n\times d}\to\mathcal{R}$ be a randomized mechanism. We say that $M$ is $(\varepsilon,\delta)$ -differentially private if for every two adjacent databases $D\sim D^{\prime}$ and every subset $S\subseteq\mathcal{R}$ ,

A well known fact about differential privacy is that it generalizes smoothly to databases that differ on more than a single row. We say that two databases $D,D^{\prime}\in\{\pm 1\}^{n\times d}$ are $k$ -adjacent if they differ by at most $k$ rows, and we denote this by $D\sim_{k}D^{\prime}$ .

For every $k\geq 1$ , if $M:\{\pm 1\}^{n\times d}\to\mathcal{R}$ is $(\varepsilon,\delta)$ -differentially private, then for every two $k$ -adjacent databases $D\sim_{k}D^{\prime}$ , and every subset $S\subseteq\mathcal{R}$ ,

All of the upper and lower bounds for one-way marginals have a multiplicative $1/\alpha\varepsilon$ dependence on the accuracy $\alpha$ and the privacy loss $\varepsilon$ . This is no coincidence - there is a generic reduction:

Let $p\in[1,\infty]$ and $\alpha,\varepsilon,\delta\in[0,1/10]$ .

Suppose there exists a $(\varepsilon,\delta)$ -differentially private mechanism $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ such that for every database $D\in\{\pm 1\}^{n\times d}$ ,

Then there exists a $(1,\delta/\varepsilon)$ -differentially private mechanism $M^{\prime}:\{\pm 1\}^{n^{\prime}\times d}\to[\pm 1]^{d}$ for $n^{\prime}=\Theta(\alpha\varepsilon n)$ such that for every database $D^{\prime}\in\{\pm 1\}^{n^{\prime}\times d}$ ,

Lower Bounds for Approximate Differential Privacy

Our main theorem can be stated as follows.

Let $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ be a $(1,\delta)$ -differentially private mechanism that answers one-way marginals such that

where $\overline{D}$ is the true answer vector. If $2^{-\Omega(n)}\leq\delta\leq 1/n^{1+\Omega(1)}$ and $n$ is sufficiently large, then

Theorem 1.1 in the introduction follows by rearranging terms, and applying Fact 2.3. The statement above is more convenient technically, but the statement in the introduction is more consistent with the literature.

First we must introduce fingerprinting codes. The following definition is tailored to the application to privacy. Fingerprinting codes were originally defined by Boneh and Shaw [BS98] with a worst-case accuracy guarantee. Subsequent works [BUV14, SU14] have altered the accuracy guarantee to an average-case one, which we use here.

A $\varepsilon$ -complete $\delta$ -sound $\alpha$ -robust $L_{1}$ fingerprinting code for $n$ users with length $d$ is a pair of random variables $D\in\{\pm 1\}^{n\times d}$ and $\mathit{Trace}:[\pm 1]^{d}\to 2^{[n]}$ such that the following hold.

Completeness: For any fixed $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ ,

Soundness: For any $i\in[n]$ and fixed $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ ,

where $D_{-i}$ denotes $D$ with the $i^{\text{th}}$ row replaced by some fixed element of $\{\pm 1\}^{d}$ .

Fingerprinting codes with optimal length were first constructed by Tardos [Tar08] (for worst-case error) and subsequent works [BUV14, SU14] have adapted Tardos’ construction to work for average-case error guarantees, which yields the following theorem.

For every $n\geq 1$ , $\delta>0$ , and $d\geq d_{n,\delta}=O(n^{2}\log(1/\delta))$ , there exists a $1/100$ -complete $\delta$ -sound $1/8$ -robust $L_{1}$ fingerprinting code for $n$ users with length $d$ .

We now show how the existence of fingerprinting codes implies our lower bound.

Let $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ be a $(1,\delta)$ -differentially private mechanism such that

Let $k$ be a parameter to be chosen later. Let $n_{k}=\lfloor n/k\rfloor$ . Let $M_{k}:\{\pm 1\}^{n_{k}\times d}\to[\pm 1]^{d}$ be the following mechanism. On input $D^{*}\in\{\pm 1\}^{n_{k}\times d}$ , $M_{k}$ creates $D\in\{\pm 1\}^{n\times d}$ by taking $k$ copies of $D^{*}$ and filling the remaining entries with 1s. Then $M_{k}$ runs $M$ on $D$ and outputs $M(D)$ .

By group privacy (Fact 2.2), $M_{k}$ is a $\left(\varepsilon_{k}=k,\delta_{k}=\frac{e^{k}-1}{e-1}\delta\right)$ -differentially private mechanism. By the triangle inequality,

Thus $\left|\left|\overline{D}-\overline{D^{*}}\right|\right|_{1}\leq 2k/n$ . Assume $k\leq n/200$ . Thus $\left|\left|\overline{D}-\overline{D^{*}}\right|\right|_{1}\leq d/100$ and, by (2) and (3),

Assume $d\geq d_{n_{k},\delta}$ , were $d_{n_{k},\delta}=O(n_{k}^{2}\log(1/\delta))$ is as in Theorem 3.3. We will show by contradiction that this cannot be – that is $d\leq O(n_{k}^{2}\log(1/\delta))$ . Let $D^{*}\in\{\pm 1\}^{n_{k}\times d}$ and $\mathit{Trace}:[\pm 1]^{d}\to 2^{[n_{k}]}$ be a $1/100$ -complete $\delta$ -sound $1/8$ -robust $L_{1}$ fingerprinting code for $n_{k}$ users of length $d$ .

By the completeness of the fingerprinting code,

In particular, there exists $i^{*}\in[n_{k}]$ such that

We have that $\mathit{Trace}(M_{k}(D^{*}))$ is a $(\varepsilon_{k},\delta_{k})$ -differentially private function of $D^{*}$ , as it is only postprocessing $M_{k}(D^{*})$ . Thus

where the second inequality follows from the soundness of the fingerprinting code.

If $k\leq\log(1/12n_{k}\delta)-1$ , then (8) gives a contradiction. Let $k=\lfloor\log(1/12n\delta)-1\rfloor$ . Assuming $\delta\geq e^{-n/200}$ ensures $k\leq n/200$ , as required. Assuming $\delta\leq 1/n^{1+\gamma}$ implies $k\geq\log(1/\delta)/(1+1/\gamma)-5\geq\Omega(\log(1/\delta))$ . This setting of $k$ gives a contradiction, which implies that

Adding independent noise seems very natural for one-way marginals, but it is suboptimal if one is interested in worst-case (i.e. $L_{\infty}$ ) error bounds, rather than average-case (i.e. $L_{1}$ ) error bounds.

Theorem 1.2 follows from Theorem 4.1. In particular, the mechanism $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ in Theorem 1.2 is given by $M(D)=\overline{D}+Y$ , where $Y\sim\mathcal{D}$ and $\mathcal{D}$ is the distribution from Theorem 4.1 with $\Delta=2/n$ .Note that we must truncate the output of $M$ to ensure that $M(D)$ is always in $[\pm 1]^{d}$ .

Efficiency: $\mathcal{D}$ can be efficiently sampled.

The distribution $\mathcal{D}$ is simply an instantiation of the exponential mechanism [MT07]. In particular, the probability density function is given by

Firstly, this is clearly a well-defined distribution as long as $\varepsilon/\Delta>0$ .

Define a distribution $\mathcal{D}^{*}$ on $[0,\infty)$ to by $Z\sim\mathcal{D}^{*}$ meaning $Z=\left|\left|Y\right|\right|_{\infty}$ for $Y\sim\mathcal{D}$ . To prove accuracy, we must give a tail bound on $\mathcal{D}^{*}$ . The probability density function of $\mathcal{D}^{*}$ is given by

which is obtained by integrating the probability density function of $\mathcal{D}$ over the infinity-ball of radius $z$ , which has surface area $d2^{d}z^{d-1}\propto z^{d-1}$ . Thus $\mathcal{D}^{*}$ is precisely the gamma distribution with shape $d$ and mean $d\Delta/\varepsilon$ . The moment generating function is therefore

for all $t<\varepsilon/\Delta$ . By Markov’s inequality

Setting $t=\varepsilon/\Delta-d/\alpha$ gives the required bound.

It is easy to verify that $Y\sim\mathcal{D}$ can be sampled by first sampling a radius $R$ from a gamma distribution with shape $d+1$ and mean $(d+1)\Delta/\varepsilon$ and then sampling $Y\in[\pm R]^{d}$ uniformly at random. To sample $R$ we can set $R=\frac{\Delta}{\varepsilon}\sum_{i=0}^{d}\log U_{i}$ , where each $U_{i}\in(0,1]$ is uniform and independent. This gives an algorithm (in the form of an explicit circuit) to sample $\mathcal{D}$ that uses only $O(d)$ real arithmetic operations, $d+1$ logarithms, and $2d+1$ independent uniform samples from $$.

2 Approximate Differential Privacy

Our algorithm for approximate differential privacy makes use of a powerful tool from the literature [DNR+09, HR10, DNPR10, RR10] called the sparse vector algorithm:

For every $c,k\geq 1$ , $\varepsilon,\delta,\alpha,\beta>0$ , and

there exists a mechanism $\mathit{SV}$ with the following properties.

$\mathit{SV}$ takes as input a database $D\in\mathcal{X}^{n}$ and provides answers $a_{1},\cdots,a_{k}\in[\pm 1]$ to $k$ (adaptive) linear queries $q_{1},\cdots,q_{k}:\mathcal{X}\to[\pm 1]$ .

$\mathit{SV}$ is $(\varepsilon,\delta)$ -differentially private.

A proof of this theorem can be found in [DR13, Theorem 3.28].Note that the algorithms in the literature are designed to sometimes output $\perp$ as an answer or halt prematurely. To modify these algorithms into the form given by Theorem 4.2 simply output in these cases. We now describe our approximately differentially private mechanism.

Now we must prove accuracy. Suppose that $|\hat{a}_{j}-q_{j}(D)|\leq\alpha_{\mathit{SV}}=\alpha/2$ for all $j\in[d]$ . Then

as required. So we need only show that $|\hat{a}_{j}-q_{j}(D)|\leq\alpha_{\mathit{SV}}$ for all $j\in[d]$ , which sparse vector guarantees will happen with probability at least $1-\beta_{\mathit{SV}}$ as long as

Now we verify that (9) holds with high probability.

By our setting of parameters, we have $q_{j}(D)=-z_{j}/2$ . This means

Let $E_{j}\in\{0,1\}$ be the indicator of the event $|q_{j}(D)|>\alpha_{\mathit{SV}}/2$ . Since the $z_{j}$ s are independent, so are the $E_{j}$ s. Thus we can apply a Chernoff bound:

The failure probability of $M$ is bounded by the failure probability of $\mathit{SV}$ plus (10), which is dominated by $\beta_{\mathit{SV}}=\exp(-\log^{4}d)$ .

Finally we consider the sample complexity. The accuracy is bounded by

for sparse vector to work, which is also satisfied. ∎

We remark that we have not attempted to optimize the constant factors in this analysis.

References

Appendix A Alternative Lower Bound for Pure Differential Privacy

It is known [HT10] that any $\varepsilon$ -differentially private mechanism that answers $d$ one-way marginals requires $n\geq\Omega(d/\varepsilon)$ samples. Our techniques yield an alternative simple proof of this fact.

Let $M:\{\pm 1\}^{n\times d}\to[\pm 1]^{d}$ be a $\varepsilon$ -differentially private mechanism. Suppose

The proof uses a special case of Hoeffding’s Inequality:

Let $x,x^{\prime}\in\{\pm 1\}^{d}$ be independent and uniform. Let $D\in\{\pm 1\}^{n\times d}$ be $n$ copies of $x$ and, likewise, let $D^{\prime}\in\{\pm 1\}^{n\times d}$ be $n$ copies of $x^{\prime}$ . Let $Z=\langle M(D),x\rangle$ and $Z^{\prime}=\langle M(D^{\prime}),x\rangle$ .

Now we give conflicting tail bounds for $Z$ and $Z^{\prime}$ , which we can relate by privacy.

By our hypothesis and Markov’s inequality,

Since $M(D^{\prime})$ is independent from $x$ , we have

Now $D$ and $D^{\prime}$ are databases that differ in $n$ rows, so privacy implies that

Rearranging $1/20<e^{n\varepsilon}e^{-d/800}$ , gives