Differentially Private Release and Learning of Threshold Functions

Mark Bun, Kobbi Nissim, Uri Stemmer, Salil Vadhan

Introduction

The line of work on differential privacy [DMNS06] is aimed at enabling useful statistical analyses on privacy-sensitive data while providing strong privacy protections for individual-level information. Privacy is achieved in differentially private algorithms through randomization and the introduction of “noise” to obscure the effect of each individual, and thus differentially private algorithms can be less accurate than their non-private analogues. Nevertheless, by now a rich literature has shown that many data analysis tasks of interest are compatible with differential privacy, and generally the loss in accuracy vanishes as the number nn of individuals tends to infinity. However, in many cases, there is still is a price of privacy hidden in these asymptotics — in the rate at which the loss in accuracy vanishes, and in how large nn needs to be to start getting accurate results at all (the “sample complexity”).

We recall the definition of differential privacy. We think of a dataset as consisting of nn rows from a data universe XX, where each row corresponds to one individual. Differential privacy requires that no individual’s data has a significant effect on the distribution of what we output.

A randomized algorithm M:XnYM:X^{n}\rightarrow Y is (ε,δ)(\varepsilon,\delta) differentially private if for every two datasets x,xXnx,x^{\prime}\in X^{n} that differ on one row, and every set TYT\subseteq Y, we have

The original definition from [DMNS06] had δ=0\delta=0, and is sometimes referred to as pure differential privacy. However, a number of subsequent works have shown that allowing a small (but negligible) value of δ\delta, referred to as approximate differential privacy, can provide substantial gains over pure differential privacy [DL09, HT10, DRV10, De12, BNS13b].

The common setting of parameters is to take ε\varepsilon to be a small constant and δ\delta to be negligible in nn (or a given security parameter). To simplify the exposition, we fix ε=0.1\varepsilon=0.1 and δ=1/nlogn\delta=1/n^{\log n} throughout the introduction (but precise dependencies on these parameters are given in the main body).

2 Private Query Release

A special case of interest is the case where QQ consists of counting queries. In this case, we are given a set QQ of predicates q:X{0,1}q:X\rightarrow\{0,1\} on individual rows, and then extend them to databases by averaging. That is, q(D)=(1/n)i=1Dq(Di)q(D)=(1/n)\sum_{i=1}^{D}q(D_{i}) counts the fraction of the population that satisfies predicate qq.

The sample complexity of releasing threshold functions over a data universe XX with differential privacy is at least Ω(logX)\Omega(\log^{*}|X|). In particular, there is no differentially private algorithm for releasing threshold functions over an infinite data universe.

In addition, inspired by the ideas in our lower bound, we present a simplification of the algorithm of [BNS13b] and improve the sample complexity to 2(1+o(1))logX2^{(1+o(1))\log^{*}|X|} (from roughly 8logX8^{\log^{*}|X|}). Closing the gap between the lower bound of logX\approx\log^{*}|X| and the upper bound of 2logX\approx 2^{\log^{*}|X|} remains an intriguing open problem.

We remark that in the case of pure differential privacy (δ=0\delta=0), a sample complexity lower bound of n=Ω(logX)n=\Omega(\log|X|) follows from a standard “packing argument” [HT10, BKN10]. For point functions, this is matched by the standard Laplace mechanism [DMNS06]. For threshold functions, a matching upper bound was recently obtained [RR14], building on a construction of [DNPR10]. We note that these algorithms have a slightly better dependence on the accuracy parameter α\alpha than our algorithm (linear rather than nearly linear in 1/α1/\alpha). In general, while packing arguments often yield tight lower bounds for pure differential privacy, they fail badly for approximate differential privacy, for which much less is known.

3 Private Distribution Learning

The well-known Dvoretzky-Kiefer-Wolfowitz inequality [DKW56] implies that without privacy, any distribution over XX can be learned to within constant error with O(1)O(1) samples. On the other hand, we show that with approximate differential privacy, the task of releasing thresholds is essentially equivalent to distribution learning. As a consequence, with approximate differential privacy, distribution learning instead requires sample complexity that grows with the size of the domain.

The sample complexity of learning arbitrary distributions on a domain XX with differential privacy is at least Ω(logX)\Omega(\log^{*}|X|).

4 Private PAC Learning

Kasiviswanathan et al. [KLN+11] defined private PAC learning as a combination of probably approximately correct (PAC) learning [Val84] and differential privacy. Recall that a PAC learning algorithm takes some nn labeled examples (xi,c(xi))X×{0,1}(x_{i},c(x_{i}))\in X\times\{0,1\} where the xix_{i}’s are i.i.d. samples of an arbitrary and unknown distribution on a data universe XX and c:X{0,1}c:X\rightarrow\{0,1\} is an unknown concept from some concept class CC. The goal of the learning algorithm is to output a hypothesis h:X{0,1}h:X\rightarrow\{0,1\} that approximates cc well on the unknown distribution. We are interested in PAC learning algorithms L:(X×{0,1})nHL:(X\times\{0,1\})^{n}\rightarrow H that are also differentially private. Here HH is the hypothesis class; if HCH\subseteq C, then LL is called a proper learner.

As with query release and distribution learning, a natural problem is to characterize the sample complexity — the minimum number nn of samples in order to achieve differentially private PAC learning for a given concept class CC. Without privacy, it is well-known that the sample complexity of (proper) PAC learning is proportional to the Vapnik–Chervonenkis (VC) dimension of the class CC [VC71, BEHW89, EHKV89]. In the initial work on differentially private learning, Kasiviswanathan et al. [KLN+11] showed that O(logC)O(\log|C|) labeled examples suffice for privately learning any concept class CC.As with the query release discussion, we omit the dependency on all parameters except for C|C|, X|X| and VC(C)\operatorname{\rm VC}(C). The VC dimension of a concept class CC is always at most logC\log|C|, but is significantly lower for many interesting classes. Hence, the results of [KLN+11] left open the possibility that the sample complexity of private learning may be significantly higher than that of non-private learning.

In the case of pure differential privacy (δ=0\delta=0), this gap in the sample complexity was shown to be unavoidable in general. Beimel, Kasiviswanathan, and Nissim [BKN10] considered the concept class CC of point functions over a data universe XX, which have VC dimension 1 and hence can be (properly) learned without privacy with O(1)O(1) samples. In contrast, they showed that proper PAC learning with pure differential privacy requires sample complexity Ω(logX)=Ω(logC)\Omega(\log|X|)=\Omega(\log|C|). Feldman and Xiao [FX14] showed a similar separation even for improper learning — the class CC of threshold functions over XX also has VC dimension 1, but PAC learning with pure differential privacy requires sample complexity Ω(logX)=Ω(logC)\Omega(\log|X|)=\Omega(\log|C|).

For approximate differential privacy (δ>0\delta>0), however, it was still open whether there is an asymptotic gap between the sample complexity of private learning and non-private learning. Indeed, Beimel et al. [BNS13b] showed that point functions can be properly learned with approximate differential privacy using O(1)O(1) samples (i.e. with no dependence on X|X|). For threshold functions, they exhibited a proper learner with sample complexity 2O(logX)2^{O(\log^{*}|X|)}, but it was conceivable that the sample complexity could also be reduced to O(1)O(1).

We prove that the sample complexity of proper PAC learning with approximate differential privacy can be asymptotically larger than the VC dimension:

The sample complexity of properly learning threshold functions over a data universe XX with differential privacy is at least Ω(logX)\Omega(\log^{*}|X|).

Based on these results, it would be interesting to fully characterize the difference between the sample complexity of proper non-private learners and of proper learners with (approximate) differential privacy. Furthermore, our results still leave open the possibility that improper PAC learning with (approximate) differential privacy has sample complexity O(VC(C))O(\operatorname{\rm VC}(C)). We consider this to be an important question for future work.

5 Techniques

Our results for query release and proper learning of threshold functions are obtained by analyzing the sample complexity of a related but simpler problem, which we call the interior-point problem. Here we want a mechanism M:XnXM:X^{n}\rightarrow X (for a totally ordered data universe XX) such that for every database DXnD\in X^{n}, with high probability we have miniDiM(D)maxiDi\min_{i}D_{i}\leq M(D)\leq\max_{i}D_{i}. We give reductions showing that the sample complexity of this problem is equivalent to the other ones we study:

Over every totally ordered data universe XX, the following four problems have the same sample complexity (up to constant factors) under differential privacy:

Distribution learning (with respect to Kolmogorov distance).

Proper PAC learning of threshold functions.

Thus we obtain our lower bounds and our simplified and improved upper bounds for query release and proper learning by proving such bounds for the interior-point problem, such as:

The sample complexity for solving the interior-point problem over a data universe XX with differential privacy is Ω(logX)\Omega(\log^{*}|X|).

Note that for every fixed distribution D\mathcal{D} over XX there exists a simple differentially private algorithm for solving the interior-point problem (w.h.p.) over databases sampled i.i.d. from D\mathcal{D} – simply output a point zz s.t. PrxD[xz]=1/2\Pr_{x\sim\mathcal{D}}[x\geq z]=1/2. Hence, in order to prove Theorem 1.8, we show a (correlated) distribution D\mathcal{D} over databases of size nlogXn\approx\log^{*}|X| on which privately solving the interior-point problem is impossible. The construction is recursive: we use a hard distribution over databases of size (n1)(n-1) over a data universe of size logarithmic in X|X| to construct the hard distribution over databases of size nn over XX.

By another reduction to the interior-point problem, we show an impossibility result for the following undominated-point problem:

Note that for the above problem, one cannot hope to construct a single distribution over databases that every private mechanism fails on. The reason is that for any such distribution D\mathcal{D}, and any desired failure probability β\beta, there is some number KK for which PrDD[maxD>K]β\Pr_{D\sim\mathcal{D}}[\max{D}>K]\leq\beta, and hence that the mechanism that always outputs KK solves the problem. Hence, given a mechanism M\mathcal{M} we must tailor a hard distribution DM\mathcal{D}_{\mathcal{M}}. We use a similar mechanism-dependent approach to prove Theorem 1.6.

Preliminaries

Throughout this work, we use the convention that [n]={0,1,,n1}[n]=\{0,1,\dots,n-1\} and write log\log for log2\log_{2}. We use THRESHX\operatorname*{\tt THRESH}_{X} to denote the set of all threshold functions over a totally ordered domain XX. That is,

We will present algorithms that access their input database using (several) differentially private mechanisms and use the following composition theorem to prove their overall privacy guarantee.

Let M1:XnR1\mathcal{M}_{1}:X^{n}\to\mathcal{R}_{1} be (ε1,δ1)(\varepsilon_{1},\delta_{1})-differentially private. Let M2:Xn×R1R2\mathcal{M}_{2}:X^{n}\times\mathcal{R}_{1}\to\mathcal{R}_{2} be (ε2,δ2)(\varepsilon_{2},\delta_{2})-differentially private for any fixed value of its second argument. Then the composition M(D)=M2(D,M1(D))M(D)=\mathcal{M}_{2}(D,\mathcal{M}_{1}(D)) is (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-differentially private.

The Interior Point Problem

In this work we exhibit a close connection between the problems of privately learning and releasing threshold queries, distribution learning, and solving the interior point problem as defined below.

An algorithm A:XnXA:X^{n}\to X solves the interior point problem on XX with error probability β\beta if for every DXnD\in X^{n},

where the probability is taken over the coins of AA. The sample complexity of the algorithm AA is the database size nn.

We call a solution xx with minDxmaxD\min D\leq x\leq\max D an interior point of DD. Note that xx need not be a member of the database DD.

2 Lower Bound

We now prove our lower bound on the sample complexity of private algorithms for solving the interior point problem.

Fix any constant 0<ε<1/40<\varepsilon<1/4. Let δ(n)1/(50n2)\delta(n)\leq 1/(50n^{2}). Then for every positive integer nn, solving the interior point problem on XX with probability at least 3/43/4 and with (ε,δ(n))(\varepsilon,\delta(n))-differential privacy requires sample complexity nΩ(logX)n\geq\Omega(\log^{*}|X|).

The inductive lemma we prove depends on a number of parameters we now define. Fix 14>ε,β>0\frac{1}{4}>\varepsilon,\beta>0. Let δ(n)\delta(n) be any positive non-increasing sequence for which

for every nn. In particular, it suffices that

Let b(n)=1/δ(n)b(n)=1/\delta(n) and define the function SS recursively by

For every positive integer nn, there exists a distribution Dn\mathcal{D}_{n} over databases D[S(n)]n={0,1,,S(n)1}nD\in[S(n)]^{n}=\{0,1,\dots,S(n)-1\}^{n} such that for every (ε,δ(n))(\varepsilon,\delta(n))-differentially private mechanism M\mathcal{M},

where the probability is taken over D\mboxRDnD\leftarrow_{\mbox{\tiny R}}\mathcal{D}_{n} and the coins of M\mathcal{M}.

In this section, we give a direct proof of the lemma and in Appendix B, we show how the lemma follows from the construction of a new combinatorial object we call an “interior point fingerprinting code.” This is a variant on traditional fingerprinting codes, which have been used recently to show nearly optimal lower bounds for other problems in approximate differential privacy [BUV14, DTTZ14, BST14].

The proof is by induction on nn. We first argue that the claim holds for n=1n=1 by letting D1\mathcal{D}_{1} be uniform over the singleton databases (0)(0) and (1)(1). To that end let x\mboxRD1x\leftarrow_{\mbox{\tiny R}}\mathcal{D}_{1} and note that for any (ε,δ(1))(\varepsilon,\delta(1))-differentially private mechanism M0:{0,1}{0,1}\mathcal{M}_{0}:\{0,1\}\rightarrow\{0,1\} it holds that

giving the desired bound on Pr[M0(x)=x]\Pr[\mathcal{M}_{0}(x)=x].

Now inductively suppose we have a distribution Dn\mathcal{D}_{n} that satisfies the claim. We construct a distribution Dn+1\mathcal{D}_{n+1} on databases (y0,y1,,yn)[S(n+1)]n+1(y_{0},y_{1},\dots,y_{n})\in[S(n+1)]^{n+1} that is sampled as follows:

Sample (x1,,xn)\mboxRDn(x_{1},\dots,x_{n})\leftarrow_{\mbox{\tiny R}}\mathcal{D}_{n}.

Sample a uniformly random y0\mboxR[S(n+1)]y_{0}\leftarrow_{\mbox{\tiny R}}[S(n+1)]. We write the base b(n)b(n) representation of y0y_{0} as y0(1)y0(2)y0(S(n))y_{0}^{(1)}y_{0}^{(2)}\dots y_{0}^{(S(n))}.

For each i=1,,ni=1,\dots,n let yiy_{i} be a base b(n)b(n) number (written yi(1)yi(2)yi(S(n))y_{i}^{(1)}y_{i}^{(2)}\dots y_{i}^{(S(n))}) that agrees with the base b(n)b(n) representation of y0y_{0} on the first xix_{i} digits and contains a random sample from [b(n)][b(n)] in every index thereafter.

Suppose for the sake of contradiction that there were an (ε,δ(n+1))(\varepsilon,\delta(n+1))-differentially private mechanism M^\hat{\mathcal{M}} that could solve the interior point problem on Dn+1\mathcal{D}_{n+1} with probability greater than Pn+1P_{n+1}. We use M^\hat{\mathcal{M}} to construct the following private mechanism M\mathcal{M} for solving the interior point problem on Dn\mathcal{D}_{n}, giving the desired contradiction:

The mechanism M\mathcal{M} is also (ε,δ(n+1))(\varepsilon,\delta(n+1))-differentially private, since for all pairs of adjacent databases DDD\sim D^{\prime} and every T[S(n)]T\subseteq[S(n)],

where T^\hat{T} is the set of yy that agree with y0y_{0} in exactly the first xx entries for some xTx\in T.

Now we argue that M\mathcal{M} solves the interior point problem on Dn\mathcal{D}_{n} with probability greater than PnP_{n}. First we show that xminDx\geq\min D with probability greater than Pn+1P_{n+1}. Observe that by construction, all the elements of D^\hat{D} agree in at least the first minD\min D digits, and hence so does any interior point of D^\hat{D}. Therefore, if M\mathcal{M}^{\prime} succeeds in outputting an interior point yy of D^\hat{D}, then yy must in particular agree with y0y_{0} in at least minD\min D digits, so xminDx\geq\min D.

Now we use the privacy that M^\hat{\mathcal{M}} provides to y0y_{0} to show that xmaxDx\leq\max D except with probability at most eε/b(n)+δ(n+1)e^{\varepsilon}/b(n)+\delta(n+1). Fix a database DD. Let w=maxDw=\max D, and fix all the randomness of M\mathcal{M} but the (w+1)(w+1)st entry of y0y_{0} (note that since w=maxDw=\max D, this fixes y1,,yny_{1},\ldots,y_{n}). Since the (w+1)(w+1)st entry of y0y_{0} is still a uniformly random element of [b(n)][b(n)], the privately produced entry yw+1y^{w+1} should not be able to do much better than randomly guessing y0(w+1)y_{0}^{(w+1)}. Formally, for each z[b(n)]z\in[b(n)], let D^z\hat{D}_{z} denote the database D^\hat{D} with y0(w+1)y_{0}^{(w+1)} set to zz and everything else fixed as above. Then by the differential privacy of M^\hat{\mathcal{M}},

where all probabilities are also taken over the coins of M^\hat{\mathcal{M}}. Thus xmaxDx\leq\max D except with probability at most eε/b(n)+δ(n+1)e^{\varepsilon}/b(n)+\delta(n+1). By a union bound, minDxmaxD\min D\leq x\leq\max D with probability greater than

We now prove Theorem 3.2 by estimating the S(n)S(n) guaranteed by Lemma 3.3.

Let S(n)S(n) be as in Lemma 3.3. We introduce the following notation for iterated exponentials:

Observe that for k1k\geq 1, x>0x>0, and M>16M>16,

By induction on nn we get an upper bound of

This immediately shows that solving the interior point problem on X=[S(n)]X=[S(n)] requires sample complexity

To get a lower bound for solving the interior point problem on XX when X|X| is not of the form S(n)S(n), note that a mechanism for XX is also a mechanism for every XX^{\prime} s.t. XX|X^{\prime}|\leq|X|. The lower bound follows by setting X=S(n)|X^{\prime}|=S(n) for the largest nn such that S(n)XS(n)\leq|X|. ∎

3 Upper Bound

We now present a recursive algorithm, RecPrefix, for privately solving the interior point problem.

RecPrefix is (ϵ,δ)(\epsilon,\delta)-differentially private;

With probability at least (1β)(1-\beta), the output xx satisfies min{xi:xiS}xmax{xi:xiS}\min\{x_{i}:x_{i}\in S\}\leq x\leq\max\{x_{i}:x_{i}\in S\}.

The idea of the algorithm is that on each level of recursion, RecPrefix takes an input database SS over XX and constructs a database SS^{\prime} over a smaller universe XX^{\prime}, where X=logX|X^{\prime}|=\log|X|, in which every element is the length of the longest prefix of a pair of elements in SS (represented in binary). In a sense, this reverses the construction presented in Section 3.2.

The exponential mechanism is ϵ\epsilon-differentially private.

Let qq be a quality function with sensitivity at most 11. Fix a database SXnS\in X^{n} and let OPT=maxfF{q(S,f)}\operatorname{\rm OPT}=\max_{f\in\mathcal{F}}\{q(S,f)\}. Let t>0t>0. Then exponential mechanism outputs a solution ff with q(S,f)OPTtnq(S,f)\leq\operatorname{\rm OPT}-tn with probability at most Fexp(ϵtn/2)|\mathcal{F}|\cdot\exp(-\epsilon tn/2).

We will also use an (ε,δ)(\varepsilon,\delta)-differentially private variant of the exponential mechanism called the choosing mechanism, introduced in [BNS13b].

A quality function with sensitivity at most 11 is of kk-bounded-growth if adding an element to a database can increase (by 1) the score of at most kk solutions, without changing the scores of other solutions. Specifically, it holds that

q(,f)=0q(\emptyset,f)=0 for all fFf\in\mathcal{F},

If S2=S1{x}S_{2}=S_{1}\cup\{x\}, then q(S1,f)+1q(S2,f)q(S1,f)q(S_{1},f)+1\geq q(S_{2},f)\geq q(S_{1},f) for all fFf\in\mathcal{F}, and

There are at most kk values of ff for which q(S2,f)=q(S1,f)+1q(S_{2},f)=q(S_{1},f)+1.

The choosing mechanism is a differentially private algorithm for approximately solving bounded-growth choice problems. Step 1 of the algorithm checks whether a good solution exists (otherwise any solution is approximately optimal) and Step 2 invokes the exponential mechanism, but with the small set G(S)G(S) of good solutions instead of F\mathcal{F}.

The following lemmas give the privacy and utility guarantees of the choosing mechanism. We give a slightly improved utility result over [BNS13b], and the analysis is presented in Appendix A.

Fix δ>0\delta>0, and 0<ϵ20<\epsilon\leq 2. If qq is a kk-bounded-growth quality function, then the choosing mechanism is (ϵ,δ)(\epsilon,\delta)-differentially private.

Let the choosing mechanism be executed on a kk-bounded-growth quality function, and on a database SS s.t. there exists a solution f^\hat{f} with quality q(S,f^)16ϵln(4kβϵδ)q(S,\hat{f})\geq\frac{16}{\epsilon}\ln(\frac{4k}{\beta\epsilon\delta}). With probability at least (1β)(1-\beta), the choosing mechanism outputs a solution ff with quality q(S,f)1q(S,f)\geq 1.

Let the choosing mechanism be executed on a kk-bounded-growth quality function, and on a database SS containing mm elements. With probability at least (1β)(1-\beta), the choosing mechanism outputs a solution ff with quality q(S,f)OPT16ϵln(4kmβϵδ)q(S,f)\geq\operatorname{\rm OPT}-\frac{16}{\epsilon}\ln(\frac{4km}{\beta\epsilon\delta}).

3.2 The RecPrefix algorithm

We are now ready to present and analyze the algorithm RecPrefix.

We start the analysis of RecPrefix with the following two simple observations.

There are at most logX\log^{*}|X| recursive calls throughout the execution of RecPrefix on a database SXS\in X^{*}.

Let RecPrefix be executed on a database SXnS\in X^{n}, where n2logX2312ϵln(4βϵδ)n\geq 2^{\log^{*}|X|}\cdot\frac{2312}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta}). Every recursive call throughout the execution operates on a database containing at least 1540ϵln(4βϵδ)\frac{1540}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta}) elements.

We now analyze the utility guarantees of RecPrefix by proving the following lemma.

Let β,ϵ,δ\beta,\epsilon,\delta, and SXnS\in X^{n} be inputs on which RecPrefix performs at most NN recursive calls, all of which are on databases of at least 1540ϵln(4βϵδ)\frac{1540}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta}) elements. With probability at least (13βN)(1-3\beta N), the output xx is s.t.

{i:xix}k386ϵln(4βϵδ)|\{i:x_{i}\geq x\}|\geq k\triangleq\lfloor\frac{386}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta})\rfloor.

Before proving the lemma, we make a combinatorial observation that motivates the random shuffling in Step 2 of RecPrefix. A pair of elements y,ySy,y^{\prime}\in S is useful in Algorithm RecPrefix if many of the values in SS lie between yy and yy^{\prime} – a prefix on which y,yy,y^{\prime} agree is also a prefix of every element between yy and yy^{\prime}. A prefix common to a useful pair can hence be identified privately via stability-based techniques. Towards creating useful pairs, the set SS is shuffled randomly. We will use the following lemma:

Let (Π1,Π2,,Πn)(\Pi_{1},\Pi_{2},\ldots,\Pi_{n}) be a random permutation of (1,2,,n)(1,2,\ldots,n). Then for all r1r\geq 1,

We need to show that w.h.p. there are at most rr “bad” pairs (Π2i1,Π2i)(\Pi_{2i-1},\Pi_{2i}) within distance r12\frac{r}{12}. For each ii, we call Π2i1\Pi_{2i-1} the left side of the pair, and Π2i\Pi_{2i} the right side of the pair. Let us first choose rr elements to be placed on the left side of rr bad pairs (there are (nr){n\choose r} such choices). Once those are fixed, there are at most (r6)r(\frac{r}{6})^{r} choices for placing elements on the right side of those pairs. Now we have rr pairs and n2rn-2r unpaired elements that can be shuffled in (nr)!(n-r)! ways. Overall, the probability of having at least rr bad pairs is at most

where we have used Stirling’s approximation for the first inequality. ∎

Suppose we have paired random elements in our input database SS, and constructed a database SS^{\prime} containing lengths of the prefixes for those pairs. Moreover, assume that by recursion we have identified a length zz which is the length at least rr random pairs. Although those prefixes may be different for each pair, Claim 3.12 guarantees that (w.h.p.) at least one of these prefixes is the prefix of at least r12\frac{r}{12} input elements. This will help us in (privately) identifying such a prefix.

The proof is by induction on the number of recursive calls, denoted as tt. For t=1t=1 (i.e., X32|X|\leq 32), the claim holds as long as the exponential mechanism outputs an xx with q(S,x)kq(S,x)\geq k except with probability at most β\beta. By Proposition 3.5, it suffices to have n1540ϵln(4βϵδ)n\geq\frac{1540}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta}), since 32exp(ε(n/2k)/2)β32\exp(-\varepsilon(n/2-k)/2)\leq\beta.

Assume that the stated lemma holds whenever RecPrefix performs at most t1t-1 recursive calls. Let β,ϵ,δ\beta,\epsilon,\delta and S=(xi)i=1nXnS=(x_{i})_{i=1}^{n}\in X^{n} be inputs on which algorithm RecPrefix performs tt recursive calls, all of which are on databases containing at least 1540ϵln(4βϵδ)\frac{1540}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta}) elements. Consider the first call in the execution on those inputs, and let y1,,yn2ky_{1},\ldots,y_{n-2k} be the random permutation chosen on Step 2. We say that a pair y2j1,y2jy_{2j-1},y_{2j} is close if

By Claim 3.12, except with probability at most 2(k1)<β2^{-(k-1)}<\beta, there are at most (k1)(k-1) close pairs. We continue the proof assuming that this is the case.

Let S=(zi)i=1(n2k)/2S^{\prime}=(z_{i})_{i=1}^{(n-2k)/2} be the database constructed in Step 3. By the inductive assumption, with probability at least (13β(t1))(1-3\beta(t-1)), the value zz obtained in Step 4 is s.t. (1) ziS\exists z_{i}\in S^{\prime} s.t. zizz_{i}\leq z; and (2) {ziS:ziz}k|\{z_{i}\in S^{\prime}:z_{i}\geq z\}|\geq k. We proceed with the analysis assuming that this event happened.

By (2), there are at least kk pairs y2j1,y2jy_{2j-1},y_{2j} that agree on a prefix of length at least zz. At least one of those pairs, say y2j1,y2jy_{2j^{*}-1},y_{2j^{*}}, is not close. Note that every yy between y2j1y_{2j^{*}-1} and y2jy_{2j^{*}} agrees on the same prefix of length zz, and that there are at least k112\frac{k-1}{12} such elements in SS. Moreover, as the next bit is either 0 or 1, at least half of those elements agree on a prefix of length (z+1)(z+1). Thus, when using the choosing mechanism on Step 5 (to choose a prefix of length (z+1)(z+1)), there exists at least one prefix with quality at least k12416ϵln(4βϵδ)\frac{k-1}{24}\geq\frac{16}{\epsilon}\cdot\ln(\frac{4}{\beta\epsilon\delta}). By Lemma 3.8, the choosing mechanism ensures, therefore, that with probability at least (1β)(1-\beta), the chosen prefix LL is the prefix of at least one yiSy_{i^{\prime}}\in S, and, hence, this yiy_{i^{\prime}} satisfies L0yiL1L_{0}\leq y_{i^{\prime}}\leq L_{1} (defined in Step 6). We proceed with the analysis assuming that this is the case.

Let zj^Sz_{\hat{j}}\in S^{\prime} be s.t. zj^zz_{\hat{j}}\leq z. By the definition of zj^z_{\hat{j}}, this means that y2j^1y_{2\hat{j}-1} and y2j^y_{2\hat{j}} agree on a prefix of length at most zz. Hence, as LL is of length z+1z+1, we have that either min{y2j^1,y2j^}<L0\min\{y_{2\hat{j}-1},y_{2\hat{j}}\}<L_{0} or max{y2j^1,y2j^}>L1\max\{y_{2\hat{j}-1},y_{2\hat{j}}\}>L_{1}. If min{y2j^1,y2j^}<L0\min\{y_{2\hat{j}-1},y_{2\hat{j}}\}<L_{0}, then L0L_{0} satisfies Condition 1 of being a good output. It also satisfies Condition 2 because yiL0y_{i^{\prime}}\geq L_{0} and yiYy_{i^{\prime}}\in Y, which we took to be the smallest n2kn-2k elements of SS. Similarly, L1L_{1} is a good output if max{y2j^1,y2j^}>L1\max\{y_{2\hat{j}-1},y_{2\hat{j}}\}>L_{1}. In any case, at least one out of L0,L1L_{0},L_{1} is a good output.

If both L0L_{0} and L1L_{1} are good outputs, then Step 8 cannot fail. We have already established the existence of L0yiL1L_{0}\leq y_{i^{\prime}}\leq L_{1}. Hence, if L1L_{1} is not a good output, then there are at most (k1)(k{-}1) elements xiSx_{i}\in S s.t. xiL1x_{i}\geq L_{1}. Hence, the probability of big^3k/2\widehat{big}\geq 3k/2 and Step 8 failing is at most exp(ϵk2)β\exp(-\frac{\epsilon k}{2})\leq\beta. It remains to analyze the case where L0L_{0} is not a good output (and L1L_{1} is).

If L0L_{0} is not a good output, then every xjSx_{j}\in S satisfies xj>L0x_{j}>L_{0}. In particular, min{y2j^1,y2j^}>L0\min\{y_{2\hat{j}-1},y_{2\hat{j}}\}>L_{0}, and, hence, max{y2j^1,y2j^}>L1\max\{y_{2\hat{j}-1},y_{2\hat{j}}\}>L_{1}. Recall that there are at least 2k2k elements in SS which are bigger than max{y2j^1,y2j^}\max\{y_{2\hat{j}-1},y_{2\hat{j}}\}. As k2ϵln(1β)k\geq\frac{2}{\epsilon}\ln(\frac{1}{\beta}), the probability that big^<3k/2\widehat{big}<3k/2 and RecPrefix fails to return L1L_{1} in this case is at most β\beta.

All in all, RecPrefix fails to return an appropriate xx with probability at most 3βt3\beta t. ∎

We now proceed with the privacy analysis.

When executed for NN recursive calls, RecPrefix is (2ϵN,2δN)(2\epsilon N,2\delta N)-differentially private.

The proof is by induction on the number of recursive calls, denoted by tt. For t=1t=1 (i.e., X32|X|\leq 32), then by Proposition 3.5 the exponential mechanism ensures that RecPrefix is (ϵ,0)(\epsilon,0)-differentially private. Assume that the stated lemma holds whenever RecPrefix performs at most t1t-1 recursive calls, and let S1,S2XS_{1},S_{2}\in X^{*} be two neighboring databases on which RecPrefix performs tt recursive calls.The recursion depth is determined by X|X|, which is identical in S1S_{1} and in S2S_{2}. Let B\mathcal{B} denote an algorithm consisting of steps 1-4 of RecPrefix (the output of B\mathcal{B} is the value zz from Step 4). Consider the executions of B\mathcal{B} on S1S_{1} and on S2S_{2}, and denote by Y1,S1Y_{1},S^{\prime}_{1} and by Y2,S2Y_{2},S^{\prime}_{2} the elements Y,SY,S^{\prime} as they are in the executions on S1S_{1} and on S2S_{2}.

We show that the distributions on the databases S1S^{\prime}_{1} and S2S^{\prime}_{2} are similar in the sense that for each database in one of the distributions there exist a neighboring database in the other that have the same probability. Thus, applying the recursion (which is differentially private by the inductive assumption) preserves privacy. We now make this argument formal.

First note that as S1,S2S_{1},S_{2} differ in only one element, there is a bijection between orderings Π\Pi and Π^\widehat{\Pi} of the smallest (n2k)(n-2k) elements of S1S_{1} and of S2S_{2} respectively s.t. Y1Y_{1} and Y2Y_{2} are neighboring databases. This is because there exists a permutation of the smallest (n2k)(n-2k) elements of S1S_{1} that is a neighbor of the smallest (n2k)(n-2k) elements of S2S_{2}; composition with this fixed permutation yields the desired bijection. Moreover, note that whenever Y1,Y2Y_{1},Y_{2} are neighboring databases, the same is true for S1S^{\prime}_{1} and S2S^{\prime}_{2}. Hence, for every set of outputs FF it holds that

So when executed for tt recursive calls, the sequence of Steps 1-4 of RecPrefix is (2ϵ(t1),2δ(t1))(2\epsilon(t{-}1),2\delta(t{-}1))-differentially private. On Steps 5 and 7, algorithm RecPrefix interacts with its database through the choosing mechanism and using the Laplace mechanism, each of which is (ϵ,δ)(\epsilon,\delta)-differentially private. By composition (Lemma 2.2), we get that RecPrefix is (2tϵ,2tδ)(2t\epsilon,2t\delta)-differentially private. ∎

Combining Lemma 3.11 and Lemma 3.13 we obtain Theorem 3.4.

3.3 Informal Discussion and Open Questions

An natural open problem is to close the gap between our (roughly) 2logX2^{\log^{*}|X|} upper bound on the sample complexity of privately solving the interior point problem (Theorem 3.4), and our logX\log^{*}|X| lower bound (Theorem 3.2). Below we describe an idea for reducing the upper bound to poly(logX)\mathop{\rm{poly}}\nolimits(\log^{*}|X|).

In our recursive construction for the lower bound, we took nn elements (x1,,xn)(x_{1},\ldots,x_{n}) and generated n+1n+1 elements where y0y_{0} is a random element (independent of the xix_{i}’s), and every xix_{i} is the length of the longest common prefix of y0y_{0} and yiy_{i}. Therefore, a change limited to one xix_{i} affects only one yiy_{i} and privacy is preserved (assuming that our future manipulations on (y0,,yn)(y_{0},\ldots,y_{n}) preserve privacy). While the representation length of domain elements grows exponentially on every step, the database size grows by 1. This resulted in the Ω(logX)\Omega(\log^{*}|X|) lower bound.

In RecPrefix on the other hand, every level of recursion shrank the database size by a factor of 12\frac{1}{2}, and hence, we required a sample of (roughly) 2logX2^{\log^{*}|X|} elements. Specifically, in each level of recursion, two input elements y2j1,y2jy_{2j-1},y_{2j} were paired and a new element zjz_{j} was defined as the length of their longest common prefix. As with the lower bound, we wanted to ensure that a change limited to one of the inputs affects only one new element, and hence, every input element is paired only once, and the database size shrinks.

If we could pair input elements twice then the database size would only be reduced additively (which will hopefully result in a poly(logX)\mathop{\rm{poly}}\nolimits(\log^{*}|X|) upper bound). However, this must be done carefully, as we are at risk of deteriorating the privacy parameter ϵ\epsilon by a factor of 22 and thus remaining with an exponential dependency in logX\log^{*}|X|. Consider the following thought experiment for pairing elements.

Input: (x1,,xn)Xn(x_{1},\ldots,x_{n})\in X^{n}. 1. Let (y10,,yn0)(y_{1}^{0},\ldots,y_{n}^{0}) denote a random permutation of (x1,,xn)(x_{1},\ldots,x_{n}). 2. For t=1t=1 to logX\log^{*}|X|: For i=1i=1 to (nt)(n{-}t), let yity_{i}^{t} be the length of the longest common prefix of yit1y_{i}^{t-1} and yi+1t1y_{i+1}^{t-1}.

As (most of the) elements are paired twice on every step, the database size reduces additively. In addition, every input element xix_{i} affects at most t+1t+1 elements at depth tt, and the privacy loss is acceptable. However, this still does not solve the problem. Recall that every iteration of RecPrefix begins by randomly shuffling the inputs. Specifically, we needed to ensure that (w.h.p.) the number of “close” pairs is limited. The reason was that if a “not close” pair agrees on a prefix LL, then LL is the prefix “a lot” of other elements as well, and we could privately identify LL. In the above process we randomly shuffled only the elements at depth . Thus we do not know if the number of “close” pairs is small at depth t>0t>0. On the other hand, if we changed the pairing procedure to shuffle at every step, then each input element xix_{i} might affect 2t2^{t} elements at depth tt, causing the privacy loss to deteriorate rapidly.

Query Release and Distribution Learning

Recall that a counting query qq is a predicate q:X{0,1}q:X\to\{0,1\}. For a database D=(x1,,xn)XnD=(x_{1},\dots,x_{n})\in X^{n}, we write q(D)q(D) to denote the average value of qq over the rows of DD, i.e. q(D)=1ni=1nq(xi).q(D)=\frac{1}{n}\sum_{i=1}^{n}q(x_{i}). In the query release problem, we seek differentially private algorithms that can output approximate answers to a family of counting queries QQ simultaneously.

We are interested in the query release problem for the class THRESHX\operatorname*{\tt THRESH}_{X} of threshold queries, which we view as a class of counting queries.

We are also interested in the following distribution learning problem, which is very closely related to the query release problem.

We highlight two important special cases of the distance measure dQd_{Q} in the distribution learning problem. First, when QQ is the collection of all counting queries on a domain XX, the distance dQd_{Q} is the total variation distance between distributions, defined by

Let D\mathcal{D} be a distribution over a totally ordered domain XX. The CDF FDF_{\mathcal{D}} of D\mathcal{D} is defined by FD(t)=PrxD[xt]F_{\mathcal{D}}(t)=\Pr_{x\sim\mathcal{D}}[x\leq t]. If XX is finite, then any function F:XF:X\to that is non-decreasing with F(maxX)=1F(\max X)=1 is a CDF.

Algorithm AA is an (α,β)(\alpha,\beta)-accurate distribution learner with respect to Kolmogorov distance with sample complexity nn if for all distributions D\mathcal{D} on a totally ordered domain XX, given an input of nn samples D=(x1,,xn)D=(x_{1},\dots,x_{n}) where each xix_{i} is drawn i.i.d. from D\mathcal{D}, algorithm AA outputs a CDF FF with supxXF(x)FD(x)\sup_{x\in X}|F(x)-F_{\mathcal{D}}(x)| with probability at least 1β1-\beta.

The query release problem for a collection of counting queries QQ is very closely related to the distribution learning problem with respect to QQ. In particular, solving the query release problem on a dataset DD amounts to learning the empirical distribution of DD. Conversely, results in statistical learning theory show that one can solve the distribution learning problem by first solving the query release problem on a sufficiently large random sample, and then fitting a distribution to approximately agree with the released answers. The requisite size of this sample (without privacy considerations) is characterized by a combinatorial measure of the class QQ called the VC dimension:

Let QQ be a collection of queries over domain XX. A set S={x1,,xk}XS=\{x_{1},\dots,x_{k}\}\subseteq X is shattered by QQ if for every T[k]T\subseteq[k] there exists qQq\in Q such that T={i:q(xi)=1}T=\{i:q(x_{i})=1\}. The Vapnik-Chervonenkis (VC) dimension of QQ, denoted VC(Q)\operatorname{\rm VC}(Q), is the cardinality of the largest set SXS\subseteq X that is shattered by QQ.

It is known [AB09] that solving the query release problem on 256VC(Q)ln(48/αβ)/α2256\operatorname{\rm VC}(Q)\ln(48/\alpha\beta)/\alpha^{2} random samples yields an (α,β)(\alpha,\beta)-accurate distribution learner for a query class QQ.

2 Equivalences with the Interior Point Problem

We show that the problems of privately releasing thresholds and solving the interior point problem are equivalent.

Let XX be a totally ordered domain. Then,

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm that is able to release threshold queries on XX with (α,β)(\alpha,\beta)-accuracy and sample complexity n/(8α)n/(8\alpha), then there is an (ε,δ)(\varepsilon,\delta)-differentially private algorithm that solves the interior point problem on XX with error β\beta and sample complexity nn.

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm solving the interior point problem on XX with error αβ/24\alpha\beta/24 and sample complexity mm, then there is a (5ε,(1+eε)δ)(5\varepsilon,(1+e^{\varepsilon})\delta)-differentially private algorithm for releasing threshold queries with (α,β)(\alpha,\beta)-accuracy and sample complexity

For the first direction, observe that an algorithm for releasing thresholds could easily be used for solving the interior point problem. Formally,

Suppose A\mathcal{A} is a private (α,β)(\alpha,\beta)-accurate algorithm for releasing thresholds over XX for databases of size n8α\frac{n}{8\alpha}. Define A\mathcal{A}^{\prime} on databases of size nn to pad the database with an equal number of min{X}\min\{X\} and max{X}\max\{X\} entries, and run A\mathcal{A} on the result. We can now return any point tt for which the approximate answer to the query ctc_{t} is (12±α)(\frac{1}{2}\pm\alpha) on the (padded) database. ∎

We now show the converse, i.e., that the problem of releasing thresholds can be reduced to the interior point problem. Specifically, we reduce the problem to a combination of solving the interior point problem, and of releasing thresholds on a much smaller data universe. The latter task is handled by the following algorithm.

In Appendix C, we describe another reduction that, up to constant factors, gives the same sample complexity.

Let R:XXR:X^{*}\to X be an (ε,δ)(\varepsilon,\delta)-differentially private algorithm solving the interior point problem on XX with error αβ/24\alpha\beta/24 and sample complexity mm. We may actually assume that RR is differentially private in the sense that if DXD\in X^{*} and DD^{\prime} differs from DD up to the addition or removal of a row, then for every SXS\subseteq X, Pr[R(D)S]eεPr[R(D)S]+δ\Pr[R(D)\in S]\leq e^{\varepsilon}\Pr[R(D^{\prime})\in S]+\delta, and that RR solves the interior point problem with probability at least 1αβ/241-\alpha\beta/24 whenever its input is of size at least mm. This is because we can pad databases of size less than mm with an arbitrary fixed element, and subsample the first mm entries from any database with size greater than mm.

Consider the following algorithm for answering thresholds on databases DXnD\in X^{n} for n>mn>m:

Let D=(x1,,xn)D=(x_{1},\dots,x_{n}) where x1x2xnx_{1}\leq x_{2}\leq\dots\leq x_{n}, and consider a neighboring database D=(x1,,xi,,xn)D^{\prime}=(x_{1},\dots,x_{i}^{\prime},\dots,x_{n}). Assume without loss of generality that xixi+1x_{i}^{\prime}\geq x_{i+1}, and suppose

Moreover, under the bijection we constructed between ν\nu and ν\nu^{\prime}, noise vector ν\nu^{\prime} is sampled with density at most e2εe^{2\varepsilon} times the density of ν\nu, so for every set SXmS\subseteq X^{m},

Finally, the execution of AA at the end of the algorithm is ε\varepsilon-differentially private, so by composition (Lemma 2.2), we obtain the asserted level of privacy.

We can produce α\alpha-accurate answers to every threshold function as long as

The partitioning exhausts the database, i.e. every element of DD is in some DiD_{i},

Every execution of RR succeeds at finding an interior point,

Every database DiD_{i} has size at most 5αn/125\alpha n/12 (ensuring that we have error at most 5α/65\alpha/6 from interpolation),

The answers obtained from executing AA all have error at most α/6\alpha/6.

We now estimate the probabilities of each event. For each ii we have νiαn/6\nu_{i}\geq-\alpha n/6 with probability at least

So by a union bound, every νi\nu_{i} is at least (αn/6)(-\alpha n/6) with probability at least 1β/41-\beta/4. If this is the case, then item 1 holds because tk=kαn/3+ν1++νk(6/α)(αn/3)+(6/α)(αn/6)nt_{k}=k\cdot\alpha n/3+\nu_{1}+\dots+\nu_{k}\geq(6/\alpha)(\alpha n/3)+(6/\alpha)(-\alpha n/6)\geq n. Moreover, if every νiαn/6\nu_{i}\geq-\alpha n/6, then item 2 also holds with probability at least 1β/41-\beta/4. This is because every Diαn/3αn/6m|D_{i}|\geq\alpha n/3-\alpha n/6\geq m, and hence every execution of RR on a subdatabase DiD_{i} succeeds with probability 1αβ/241-\alpha\beta/24.

By a similar argument, property 3 holds as long as each noise value νi\nu_{i} is at most αn/12\alpha n/12, which happens with probability at least 1β/41-\beta/4. Finally, property 4 holds with probability at least 1β/41-\beta/4 since

A union bound over the four properties completes the proof. ∎

2.2 Releasing Thresholds vs. Distribution Learning

Query release and distribution learning are very similar tasks: A distribution learner can be viewed as an algorithm for query release with small error w.r.t. the underlying distribution (rather than the fixed input database). We show that the two tasks are equivalent under differential privacy.

Let QQ be a collection of counting queries over a domain XX.

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm for releasing QQ with (α,β)(\alpha,\beta)-accuracy and sample complexity n256VC(Q)ln(48/αβ)/α2n\geq 256\operatorname{\rm VC}(Q)\ln(48/\alpha\beta)/\alpha^{2}, then there is an (ε,δ)(\varepsilon,\delta)-differentially private (3α,2β)(3\alpha,2\beta)-accurate distribution learner w.r.t. QQ with sample complexity nn.

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private (α,β)(\alpha,\beta)-accurate distribution learner w.r.t. QQ with sample complexity nn, then there is an (ε,δ)(\varepsilon,\delta)-differentially private query release algorithm for QQ with (α,β)(\alpha,\beta)-accuracy and sample complexity 9n9n.

The first direction follows from a standard generalization bound, showing that if a given database DD contains (enough) i.i.d. samples from a distribution D\mathcal{D}, then (w.h.p.) accuracy with respect to DD implies accuracy with respect to D\mathcal{D}. We remark that the sample complexity lower bound on nn required to apply item 1 of Theorem 4.8 does not substantially restrict its applicability: It is known that an (ε,δ)(\varepsilon,\delta)-differentially private algorithm for releasing QQ always requires sample complexity Ω(VC(Q)/αε)\Omega(\operatorname{\rm VC}(Q)/\alpha\varepsilon) anyway [BLR08].

We now use the following generalization theorem to show that (w.h.p.) q(D)q(D) is close to q(D)q(\mathcal{D}) for every qQq\in Q.

Let QQ be a collection of counting queries over a domain XX. Let D=(x1,,xn)D=(x_{1},\dots,x_{n}) consist of i.i.d. samples from a distribution D\mathcal{D} over XX. If d=VC(Q)d=\operatorname{\rm VC}(Q), then

Using the above theorem, together with the fact that n256VC(Q)ln(48/αβ)/α2n\geq 256\operatorname{\rm VC}(Q)\ln(48/\alpha\beta)/\alpha^{2}, we see that except with probability at least 1β1-\beta we have that q(D)q(D))α|q(D)-q(\mathcal{D}))|\leq\alpha for every qQq\in Q. By a union bound (and the triangle inequality) we get that A\mathcal{A} is (3α,2β)(3\alpha,2\beta)-accurate. ∎

In the special case where Q=THRESHXQ=\operatorname*{\tt THRESH}_{X} for a totally ordered domain XX, corresponding to distribution learning under Kolmogorov distance, the above theorem holds as long as n2ln(2/β)/α2n\geq 2\ln(2/\beta)/\alpha^{2}. This follows from using the Dvoretzky-Kiefer-Wolfowitz inequality [DKW56, Mas90] in place of Theorem 4.9.

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm for releasing THRESHX\operatorname*{\tt THRESH}_{X} over a totally ordered domain XX with (α,β)(\alpha,\beta)-accuracy and sample complexity n2ln(2/β)/α2n\geq 2\ln(2/\beta)/\alpha^{2}, then there is an (ε,δ)(\varepsilon,\delta)-differentially private (2α,2β)(2\alpha,2\beta)-accurate distribution learner under Kolmogorov distance with sample complexity nn.

We now show the other direction of the equivalence.

Note that drawing nn i.i.d. samples from D\mathcal{D} is equivalent to subsampling nn rows of DD (with replacement). Then with probability at least 1β1-\beta, the distribution D\mathcal{D}^{\prime} returned by A\mathcal{A} is such that for every xXx\in X

Here, DTD|_{T} denotes the subsample of DD consisting of the indices in TT, and similarly for DTD^{\prime}|_{T}. Note that q0=q0q_{0}=q^{\prime}_{0}, since DT=DTD|_{T}=D^{\prime}|_{T} if index nn is not sampled. Our goal is to show that

To do this, observe that by privacy, qkeεqk1+δq_{k}\leq e^{\varepsilon}q_{k-1}+\delta, so

Combining inequalities 1 and 2 we get that

PAC Learning

A concept c:X{0,1}c:X\rightarrow\{0,1\} is a predicate that labels examples taken from the domain XX. A concept class CC over XX is a set of concepts over the domain XX. A learner is given examples sampled from an unknown probability distribution D\mathcal{D} over XX that are labeled according to an unknown target concept cCc\in C and outputs a hypothesis hh that approximates the target concept with respect to the distribution D\mathcal{D}. More precisely,

The generalization error of a hypothesis h:X{0,1}h:X\rightarrow\{0,1\} (with respect to a target concept cc and distribution D\mathcal{D}) is defined by errorD(c,h)=PrxD[h(x)c(x)].{\rm error}_{\mathcal{D}}(c,h)=\Pr_{x\sim\mathcal{D}}[h(x)\neq c(x)]. If errorD(c,h)α{\rm error}_{\mathcal{D}}(c,h)\leq\alpha we say that hh is an α\alpha-good hypothesis for cc on D\mathcal{D}.

Algorithm AA is an (α,β)(\alpha,\beta)-accurate PAC learner for a concept class CC over XX using hypothesis class HH with sample complexity mm if for all target concepts cCc\in C and all distributions D\mathcal{D} on XX, given an input of mm samples D=((xi,c(xi)),,(xm,c(xm)))D=((x_{i},c(x_{i})),\ldots,(x_{m},c(x_{m}))), where each xix_{i} is drawn i.i.d. from D\mathcal{D}, algorithm AA outputs a hypothesis hHh\in H satisfying Pr[errorD(c,h)α]1β.\Pr[{\rm error}_{\mathcal{D}}(c,h)\leq\alpha]\geq 1-\beta.

The probability is taken over the random choice of the examples in DD and the coin tosses of the learner AA. If HCH\subseteq C then AA is called proper, otherwise, it is called improper.

Classical results in statistical learning theory show that a sample of size Θ(VC(C))\Theta(\operatorname{\rm VC}(C)) is both necessary and sufficient for PAC learning a concept class CC. That O(VC(C))O(\operatorname{\rm VC}(C)) samples suffice follows from a “generalization” argument: for any concept cc and distribution D\mathcal{D}, with probability at least 1β1-\beta over m>Oα,β(VC(C))m>O_{\alpha,\beta}(\operatorname{\rm VC}(C)) random labeled examples, every concept hCh\in C that agrees with cc on the examples has error at most α\alpha on D\mathcal{D}. Therefore, CC can be properly learned by finding any hypothesis hCh\in C that agrees with the given examples.

Recall the class of threshold functions, which are concepts defined over a totally ordered domain XX by THRESHX={cx:xX}\operatorname*{\tt THRESH}_{X}=\{c_{x}:x\in X\} where cx(y)=1c_{x}(y)=1 iff yxy\leq x. The class of threshold functions has VC dimension VC(THRESHX)=1\operatorname{\rm VC}(\operatorname*{\tt THRESH}_{X})=1, and hence can be learned with Oα,β(1)O_{\alpha,\beta}(1) samples.

A private learner is a PAC learner that is differentially private. Following [KLN+11], we consider algorithms A:(X×{0,1})mHA:(X\times\{0,1\})^{m}\to H, where HH is a hypothesis class, and require that

AA is an (α,β)(\alpha,\beta)-accurate PAC learner for a concept class CC with sample complexity mm, and

AA is (ε,δ)(\varepsilon,\delta)-differentially private.

Note that while we require utility (PAC learning) to hold only when the database DD consists of random labeled examples from a distribution, the requirement of differential privacy applies to every pair of neighboring databases DDD\sim D^{\prime}, including those that do not correspond to examples labeled by any concept.

Recall the relationship between distribution learning and releasing thresholds, where accuracy is measured w.r.t. the underlying distribution in the former and w.r.t. the fixed input database in the later. Analogously, we now define the notion of an empirical learner which is similar to a PAC learner where accuracy is measured w.r.t. the fixed input database.

Algorithm AA is an (α,β)(\alpha,\beta)-accurate empirical learner for a concept class CC over XX using hypothesis class HH with sample complexity mm if for every cCc\in C and for every database D=((xi,c(xi)),,(xm,c(xm)))(X×{0,1})mD=((x_{i},c(x_{i})),\ldots,(x_{m},c(x_{m})))\in(X\times\{0,1\})^{m} algorithm AA outputs a hypothesis hHh\in H satisfying Pr[errorD(c,h)α]1β.\Pr[{\rm error}_{D}(c,h)\leq\alpha]\geq 1-\beta.

The probability is taken over the coin tosses of AA.

Note that without privacy (and ignoring computational efficiency) identifying a hypothesis with small empirical error is trivial for every concept class CC and for every database of size at least 11. This is not the case with (ε,δ)(\varepsilon,\delta)-differential privacy,The lower bound in Theorem 5.5 also holds for label private empirical learners, that are only required to provide privacy for the labels in the database. and the sample complexity of every empirical learner for a concept class CC is at least Ω(VC(C))\Omega(\operatorname{\rm VC}(C)):

For every α,β1/8\alpha,\beta\leq 1/8, every δ18n\delta\leq\frac{1}{8n} and ϵ>0\epsilon>0, if A\mathcal{A} is an (ε,δ)(\varepsilon,\delta)-differentially private (α,β)(\alpha,\beta)-accurate empirical learner for a class CC with sample complexity nn, then n=Ω(1αϵVC(C))n=\Omega\left(\frac{1}{\alpha\epsilon}\operatorname{\rm VC}(C)\right).

The proof of Theorem 5.5 is very similar the analysis of [BLR08] for lower bounding the sample complexity of releasing approximated answers for queries in the class CC. As we will see in the next section, at least in some cases (namely, for threshold functions) the sample complexity must also have some dependency in the size of the domain XX.

Observe that xx is a random element of SS that is labeled as 1 in DcD_{c}, and that an α\alpha-consistent hypothesis for DcD_{c} must label at least (118)d(1-\frac{1}{8})d such elements as 1. Hence, by the utility properties of A\mathcal{A}, we have that

Similarly, xx is a random elements of SS that is labeled as 0 in DcD_{c^{\prime}}, and an α\alpha-consistent hypothesis for DcD_{c^{\prime}} must not label more than d/8d/8 such elements as 1. Hence,

Finally, as DcD_{c} and DcD_{c^{\prime}} differ in at most 16αn/d16\alpha n/d entries, differential privacy ensures that

showing that nd40αϵn\geq\frac{d}{40\alpha\epsilon}. ∎

2 Private Learning of Thresholds vs. the Interior Point Problem

We show that with differential privacy, there is a Θ(1/α)\Theta(1/\alpha) multiplicative relationship between the sample complexities of properly PAC learning thresholds with (α,β)(\alpha,\beta)-accuracy and of solving the interior point problem with error probability Θ(β)\Theta(\beta). Specifically, we show

Let XX be a totally ordered domain. Then,

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm solving the interior point problem on XX with error probability β\beta and sample complexity nn, then there is a (2ε,(1+eε)δ)(2\varepsilon,(1+e^{\varepsilon})\delta)-differentially private (2α,2β)(2\alpha,2\beta)-accurate proper PAC learner for THRESHX\operatorname*{\tt THRESH}_{X} with sample complexity max{n2α,4log(2/β)α}\max\left\{\frac{n}{2\alpha},\frac{4\log(2/\beta)}{\alpha}\right\}.

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private (α,β)(\alpha,\beta)-accurate proper PAC learner for THRESHX\operatorname*{\tt THRESH}_{X} with sample complexity nn, then there is a (2ε,(1+eε)δ)(2\varepsilon,(1+e^{\varepsilon})\delta)-differentially private algorithm that solves the interior point problem on XX with error β\beta and sample complexity 27αn27\alpha n.

We show this equivalence in two phases. In the first, we show a Θ(1/α)\Theta(1/\alpha) relationship between the sample complexity of solving the interior point problem and the sample complexity of empirically learning thresholds. We then use generalization and resampling arguments to show that with privacy, this latter task is equivalent to learning with samples from a distribution.

Let XX be a totally ordered domain. Then,

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm solving the interior point problem on XX with error probability β\beta and sample complexity nn, then there is a (2ε,(1+eε)δ)(2\varepsilon,(1+e^{\varepsilon})\delta)-differentially private algorithm for properly and empirically learning thresholds with (α,β)(\alpha,\beta)-accuracy and sample complexity n/(2α)n/(2\alpha).

If there exists an (ε,δ)(\varepsilon,\delta)-differentially private algorithm that is able to properly and empirically learn thresholds on XX with (α,β)(\alpha,\beta)-accuracy and sample complexity n/(3α)n/(3\alpha), then there is a (2ε,(1+eε)δ)(2\varepsilon,(1+e^{\varepsilon})\delta)-differentially private algorithm that solves the interior point problem on XX with error β\beta and sample complexity nn.

For the first direction, let A\mathcal{A} be a private algorithm for the interior point problem on databases of size nn. Consider the algorithm A\mathcal{A}^{\prime} that, on input a database DD of size n/(2α)n/(2\alpha), runs A\mathcal{A}^{\prime} on a database DD^{\prime} consisting of the largest n/2n/2 elements of DD that are labeled 11 and the smallest n/2n/2 elements of DD that are labeled . If there are not enough of either such element, pad DD^{\prime} with min{X}\min\{X\}’s or max{X}\max\{X\}’s respectively. Note that if xx is an interior point of DD^{\prime} then cxc_{x} is a threshold function with error at most n/2n/(2α)\frac{n/2}{n/(2\alpha)} on DD, and is hence α\alpha-consistent with DD. For privacy, note that changing one row of DD changes at most two rows of DD^{\prime}. Hence, applying algorithm A\mathcal{A} preserves (2ε,(eε+1)δ)(2\varepsilon,(e^{\varepsilon}+1)\delta)-differential privacy.

For the reverse direction, suppose A\mathcal{A}^{\prime} privately finds an α\alpha-consistent threshold functions for databases of size n/(3α)n/(3\alpha). Define A\mathcal{A} on a database DXnD^{\prime}\in X^{n} to label the smaller n/2n/2 points 1 and the larger n/2n/2 points 0 to obtain a labeled database D(X×{0,1})nD\in(X\times\{0,1\})^{n}, pad DD with an equal number of (min{X},1)(\min\{X\},1) and (max{X},0)(\max\{X\},0) entries to make it of size n/(3α)n/(3\alpha), and run A\mathcal{A}^{\prime} on the result. Note that if cxc_{x} is a threshold function with error at most α\alpha on DD then xx is an interior point of DD^{\prime}, as otherwise cxc_{x} has error at least n/2n/(3α)>α\frac{n/2}{n/(3\alpha)}>\alpha on DD. For privacy, note that changing one row of DD^{\prime} changes at most two rows of DD. Hence, applying algorithm A\mathcal{A}^{\prime} preserves (2ε,(eε+1)δ)(2\varepsilon,(e^{\varepsilon}+1)\delta)-differential privacy. ∎

Now we show that the task of privately outputting an almost consistent hypothesis on any fixed database is essentially equivalent to the task of private (proper) PAC learning. One direction follows immediately from a standard generalization bound for learning thresholds:

Any algorithm A\mathcal{A} for empirically learning THRESHX\operatorname*{\tt THRESH}_{X} with (α,β)(\alpha,\beta)-accuracy is also a (2α,β+β)(2\alpha,\beta+\beta^{\prime})-accurate PAC learner for THRESHX\operatorname*{\tt THRESH}_{X} when given at least max{n,4ln(2/β)/α}\max\{n,4\ln(2/\beta^{\prime})/\alpha\} samples.

Let D\mathcal{D} be a distribution over a totally ordered domain XX and fix a target concept c=qxTHRESHXc=q_{x}\in\operatorname*{\tt THRESH}_{X}. It suffices to show that for a sample S=((xi,c(xi)),(xm,c(xm)))S=((x_{i},c(x_{i})),\dots(x_{m},c(x_{m}))) where m4ln(2/β)/αm\geq 4\ln(2/\beta^{\prime})/\alpha and the xix_{i} are drawn i.i.d. from D\mathcal{D}, it holds that

Let xxx^{-}\leq x be the largest point with errorD(qx,c)2α{\rm error}_{\mathcal{D}}(q_{x^{-}},c)\geq 2\alpha. If some yxy\leq x has errorD(qy,c)2α{\rm error}_{\mathcal{D}}(q_{y},c)\geq 2\alpha then yxy\leq x^{-}, and hence for any sample SS, errorS(qx)errorS(qy){\rm error}_{S}(q_{x^{-}})\leq{\rm error}_{S}(q_{y}). Similarly let x+xx^{+}\geq x be the smallest point with errorD(qx+,c)2α{\rm error}_{\mathcal{D}}(q_{x^{+}},c)\geq 2\alpha. Let c=qxc^{-}=q_{x^{-}} and c+=qx+c^{+}=q_{x^{+}}. Then it suffices to show that

Concentrating first on cc^{-}, we define the error region R=(x,x]XR^{-}=(x^{-},x]\cap X as the interval where cc^{-} disagrees with cc. By a Chernoff bound, the probability that after mm independent samples from D\mathcal{D}, fewer than αm\alpha m appear in RR^{-} is at most exp(αm/4)β/2\exp(-\alpha m/4)\leq\beta^{\prime}/2. The same argument holds for c+c^{+}, so the result follows by a union bound. ∎

In general, an algorithm that can output an α\alpha-consistent hypothesis from concept class C\mathcal{C} can also be used to learn C\mathcal{C} using max{n,64VC(C)log(512/αβ)/α}\max\{n,64\operatorname{\rm VC}(\mathcal{C})\log(512/\alpha\beta^{\prime})/\alpha\} samples [BEHW89]. The concept class of thresholds has VC dimension 11, so the generalization bound for thresholds saves an O(log(1/α))O(\log(1/\alpha)) factor over the generic statement.

For the other direction, we note that a distribution-free learner must perform well on the uniform distribution on the rows of any fixed database, and thus must be useful for outputting a consistent hypothesis on such a database.

Suppose A\mathcal{A} is an (ϵ,δ)(\epsilon,\delta)-differentially private (α,β)(\alpha,\beta)-accurate PAC learner for a concept class C\mathcal{C} with sample complexity mm. Then there is an (ϵ,δ)(\epsilon,\delta)-differentially private (α,β)(\alpha,\beta)-accurate empirical learner for C\mathcal{C} with sample complexity n=9mn=9m. Moreover, if A\mathcal{A} is proper, then so is the resulting empirical learner.

Thresholds in High Dimension

First observe that SolveD{Solve}_{\mathcal{D}} is (ε,δ)(\varepsilon,\delta)-differentially private. To see this, note that a change limited to one input entry affects only one entry of the multiset SS. Hence, applying the (ε,δ)(\varepsilon,\delta)-differentially private algorithm A\mathcal{A} on SS preserves privacy.

Now the proof of Theorem 6.2 follows from the lower bound on the sample complexity of privately finding an α\alpha-consistent threshold function (see Section 3.2):

There exists a constant λ>0\lambda>0 s.t. the following holds. For every totally ordered data universe XX there exists a distribution D\mathcal{D} over databases containing at most n=λαlogXn=\frac{\lambda}{\alpha}\log^{*}|X| labeled examples from XX such that every (12,150n2)(\frac{1}{2},\frac{1}{50n^{2}})-differentially private algorithm fails to find an α\alpha-consistent threshold function for DDD\sim\mathcal{D} with probability at least 14\frac{1}{4}.

Let CC be a class of predicates. If there exists a (1150,δ)(\frac{1}{150},\delta)-differentially private algorithm capable of releasing queries from CC with (1150,1150)(\frac{1}{150},\frac{1}{150})-accuracy and sample complexity nn, then there exists a (15,5δ)(\frac{1}{5},5\delta)-differentially private (15,15)(\frac{1}{5},\frac{1}{5})-accurate PAC learner for CC with sample complexity O(n)O(n).

Similar arguments show that acc(D)1/150a_{c}\geq c(D)-1/150, proving that A\mathcal{A} is (1/150,1/150)(1/150,1/150)-accurate and contradicting the lower bound on the sample complexity of such algorithms. ∎

Mechanism-Dependent Lower Bounds

By a reduction to the interior point problem problem, we can prove an impossibility result for the problem of privately outputting something that is at least the minimum of a database on an unbounded domain. Specifically, we show

2 Properly Learning Point Functions with Pure Differential Privacy

We resolve this question in the negative. Specifically, we show that it is impossible to learn (even improperly) point functions over an infinite domain with pure differential privacy using a countable hypothesis class.

Let XX be an infinite domain, let HH be a countable collection of hypotheses {h:X{0,1}}\{h:X\to\{0,1\}\}, and let ε0\varepsilon\geq 0. Then there is no ε\varepsilon-differentially private (1/3,1/3)(1/3,1/3)-accurate PAC learner for points over XX using the hypothesis class HH.

A learner implemented by an algorithm (i.e. a probabilistic Turing machine) must use a hypothesis class where each hypothesis has a finite description. Note that the standard proper learner for POINTX\operatorname{\tt POINT}_{X} can be implemented by an algorithm. However, a consequence of our result is that there is no algorithm for privately learning POINTX\operatorname{\tt POINT}_{X}.

There is an infinite sequence of points x1,x2,x3,x_{1},x_{2},x_{3},\dots together with distributions Di:=Dxi\mathcal{D}_{i}:=\mathcal{D}_{x_{i}} such that the sets Gi:=GxiG_{i}:=G_{x_{i}} are all disjoint.

It is impossible for this to hold for infinitely many disjoint sets GiG_{i}. ∎

and hence Gi+1G_{i+1} is disjoint from the preceding GjG_{j}’s. ∎

We thank Amos Beimel, Adam Smith, Jonathan Ullman, and anonymous reviewers for helpful conversations and suggestions that helped guide our work. We also thank Gautam Kamath for pointing us to references on distribution learning.

References

Appendix A The Choosing Mechanism

We supply the proofs of privacy and utility for the choosing mechanism.

Let AA denote the choosing mechanism (Algorithm 2). Let S,SS,S^{\prime} be neighboring databases of mm elements. We need to show that Pr[A(S)R]exp(ϵ)Pr[A(S)R]+δ\Pr[A(S)\in R]\leq\exp(\epsilon)\cdot\Pr[A(S^{\prime})\in R]+\delta for every set of outputs RF{}R\subseteq\mathcal{F}\cup\{\bot\}. Note first that OPT(S)=maxfF{q(S,f)}\operatorname{\rm OPT}(S)=\max_{f\in\mathcal{F}}\{q(S,f)\} has sensitivity at most 11, so by the properties of the Laplace Mechanism,

Similarly, we have Pr[A(S)]exp(ε/4)Pr[A(S)]\Pr[A(S)\neq\bot]\leq\exp(\varepsilon/4)\Pr[A(S^{\prime})\neq\bot]. Thus, we my assume below that ∉R\bot\not\in R. (If R\bot\in R, then we can write Pr[A(S)R]=Pr[A(S)=]+Pr[A(S)R{}]\Pr[A(S)\in R]=\Pr[A(S)=\bot]+\Pr[A(S)\in R\setminus\{\bot\}], and similarly for SS^{\prime}.)

Let G(S)G(S) and G(S)G(S^{\prime}) be the sets used in step 2 in the execution SS and on SS^{\prime} respectively. We will show that the following two facts hold:

Fact  1:Fact\;1: For every fG(S)G(S)f\in G(S)\setminus G(S^{\prime}), it holds that Pr[A(S)=f]δk\Pr[A(S)=f]\leq\frac{\delta}{k}.

Fact  2:Fact\;2: For every possible output fG(S)G(S)f\in G(S)\cap G(S^{\prime}), it holds that Pr[A(S)=f]eϵPr[A(S)=f]\Pr[A(S)=f]\leq e^{\epsilon}\cdot\Pr[A(S^{\prime})=f].

We first show that the two facts imply that the lemma holds for Case (b). Let BG(S)G(S)B\triangleq G(S)\setminus G(S^{\prime}), and note that as qq is of kk-bounded growth, Bk|B|\leq k. Using the above two facts, for every set of outputs RFR\subseteq\mathcal{F} we have

To prove Fact 1, let fG(S)G(S)f\in G(S)\setminus G(S^{\prime}). That is, q(S,f)1q(S,f)\geq 1 and q(S,f)=0q(S^{\prime},f)=0. As qq has sensitivity at most 11, it must be that q(S,f)=1q(S,f)=1. As there exists f^S\hat{f}\in S with q(S,f^)4ϵln(4kβϵδ)q(S,\hat{f})\geq\frac{4}{\epsilon}\ln(\frac{4k}{\beta\epsilon\delta}), we have that

which is at most δ/k\delta/k for ϵ2\epsilon\leq 2.

To prove Fact 2, let fG(S)G(S)f\in G(S)\cap G(S^{\prime}) be a possible output of A(S)A(S). We use the following Fact 3, proved below.

Fact  3:Fact\;3: hG(S)exp(ϵ4q(S,h))eϵ/2hG(S)exp(ϵ4q(S,h))\sum\limits_{h\in G(S^{\prime})}\exp(\frac{\epsilon}{4}q(S^{\prime},h))\leq e^{\epsilon/2}\cdot\sum\limits_{h\in G(S)}\exp(\frac{\epsilon}{4}q(S,h)).

Using Fact 3, for every possible output fG(S)G(S)f\in G(S)\cap G(S^{\prime}) we have that

We now prove Fact 3. Let XhG(S)exp(ε4q(S,h))\mathcal{X}\triangleq\sum_{h\in G(S)}\exp(\frac{\varepsilon}{4}q(S,h)). Since there exists a solution f^\hat{f} s.t. q(S,f^)4ϵln(4kβϵδ)q(S,\hat{f})\geq\frac{4}{\epsilon}\ln(\frac{4k}{\beta\epsilon\delta}), we have Xexp(ϵ44ϵln(4kβϵδ))4kϵ\mathcal{X}\geq\exp(\frac{\epsilon}{4}\cdot\frac{4}{\epsilon}\ln(\frac{4k}{\beta\epsilon\delta}))\geq\frac{4k}{\epsilon}.

Now, recall that qq is of kk-bounded growth, so G(S)G(S)k|G(S^{\prime})\setminus G(S)|\leq k, and every h(G(S)G(S))h\in(G(S^{\prime})\setminus G(S)) satisfies q(S,h)=1q(S^{\prime},h)=1. Hence,

where the last inequality follows from the fact that X4k/ε\mathcal{X}\geq 4k/\varepsilon. This concludes the proof of Fact 3, and completes the proof of the lemma. ∎

The utility analysis for the choosing mechanism is rather straightforward:

Recall that the mechanism defines OPT~(S)\widetilde{\operatorname{\rm OPT}}(S) as OPT(S)+Lap(4ϵ)\operatorname{\rm OPT}(S)+\operatorname{\rm Lap}(\frac{4}{\epsilon}). Note that the mechanism succeeds whenever OPT~(S)8ϵln(4kβϵδ)\widetilde{\operatorname{\rm OPT}}(S)\geq\frac{8}{\epsilon}\ln(\frac{4k}{\beta\epsilon\delta}). This happens provided the Lap(4ε)\operatorname{\rm Lap}\left(\frac{4}{\varepsilon}\right) random variable is at most 8εln(4kβεδ)\frac{8}{\varepsilon}\ln(\frac{4k}{\beta\varepsilon\delta}), which happens with probability at least (1β)(1-\beta). ∎

Note that if OPT(S)<16ϵln(4kmβϵδ)\operatorname{\rm OPT}(S)<\frac{16}{\epsilon}\ln(\frac{4km}{\beta\epsilon\delta}), then every solution is a good output, and the mechanism cannot fail. Assume, therefore, that there exists a solution ff s.t. q(f,S)16ϵln(4kmβϵδ)q(f,S)\geq\frac{16}{\epsilon}\ln(\frac{4km}{\beta\epsilon\delta}), and recall that the mechanism defines OPT~(S)\widetilde{\operatorname{\rm OPT}}(S) as OPT(S)+Lap(4ϵ)\operatorname{\rm OPT}(S)+\operatorname{\rm Lap}(\frac{4}{\epsilon}). As in the proof of Lemma 3.7, with probability at least 1β/21-\beta/2, we have OPT~(S)8εln(4kβεδ)\widetilde{\operatorname{\rm OPT}}(S)\geq\frac{8}{\varepsilon}\ln\left(\frac{4k}{\beta\varepsilon\delta}\right). Assuming this event occurs, we will show that with probability at least 1β/21-\beta/2, the exponential mechanism chooses a solution ff s.t. q(S,f)opt(S)16ϵln(4kmβϵδ)q(S,f)\geq\mathop{\rm{opt}}\nolimits(S)-\frac{16}{\epsilon}\ln(\frac{4km}{\beta\epsilon\delta}).

By the growth-boundedness of qq, and as SS is of size mm, there are at most kmkm possible solutions ff with q(S,f)>0q(S,f)>0. That is, G(S)km|G(S)|\leq km. By the properties of the Exponential Mechanism, we obtain a solution as desired with probability at least

By a union bound, we get that the choosing mechanism outputs a good solution with probability at least (1β)(1-\beta). ∎

Appendix B Interior Point Fingerprinting Codes

Fingerprinting codes were introduced by Boneh and Shaw [BS98] to address the problem of watermarking digital content. Suppose a content distributor wishes to distribute a piece of digital content to nn legitimate users in such a way that any pirated copy of that content can be traced back to any user who helped in producing the copy. A fingerprinting code is a scheme for assigning each nn users a codeword that can be hidden in their copy of the content, and then be uniquely traced back to the identity of that user. Informally, a finger printing code is fully collusion-resistant if when an arbitrary coalition TT of users combine their codewords to produce a new pirate codeword the pirate codeword can still be successfully traced to a member of TT, provided the pirate codeword satisfies a certain marking assumption. Traditionally, this marking assumption requires that if all users in TT see the same bit bb at index jj of their codewords, then index jj of their combined codeword must also be bb.

Recent work has shown how to use fingerprinting codes to obtain lower bounds in differential privacy [BUV14, DTTZ14, BST14]. Roughly speaking, these works show how any algorithm with nontrivial accuracy for a given task can be used to create a pirate algorithm that satisfies the marking assumption for a fingerprinting code. The security of the fingerprinting code means that the output of this algorithm can be traced back to one of its inputs. This implies that the algorithm is not differentially private.

We show how our lower bound for privately solving the interior point problem can also be proved by the construction of an object we call an interior point fingerprinting code. The difference between this object and a traditional fingerprinting code lies in the marking assumption. Thinking of our codewords as being from an ordered domain XX, our marking assumption is that the codeword produced by a set of TT users must be an interior point of their codewords. The full definition of the code is as follows.

For a totally ordered domain XX, an interior point fingerprinting code over XX consists of a pair of randomized algorithms (Gen,Trace)(\operatorname{Gen},\operatorname{Trace}) with the following syntax.

Genn\operatorname{Gen}_{n} samples a codebook C=(x1,,xn)XnC=(x_{1},\dots,x_{n})\in X^{n}

Tracen(x)\operatorname{Trace}_{n}(x) takes as input a “codeword” xXx\in X and outputs either a user i[n]i\in[n] or a failure symbol \bot.

The algorithms Gen\operatorname{Gen} and Trace\operatorname{Trace} are allowed to share a common state (e.g. their random coin tosses).

The adversary to a fingerprinting code consists of a subset T[n]T\subseteq[n] of users and a pirate algorithm A:XTX\mathcal{A}:X^{|T|}\to X. The algorithm A\mathcal{A} is given CTC|_{T}, i.e. the codewords xix_{i} for iTi\in T, and its output x\mboxRA(CT)x\leftarrow_{\mbox{\tiny R}}\mathcal{A}(C|_{T}) is said to be “feasible” if x[miniTxi,maxiTxi]x\in[\min_{i\in T}x_{i},\max_{i\in T}x_{i}]. The security guarantee of a fingerprinting code is that for all coalitions T[n]T\subseteq[n] and all pirate algorithms A\mathcal{A}, if x=A(CT)x=\mathcal{A}(C|_{T}), then we have

Completeness: Pr[Trace(x)=x feasible]γ\Pr[\operatorname{Trace}(x)=\bot\land x\text{ feasible}]\leq\gamma, where γ\gamma\in is the completeness error.

Soundness: Pr[Trace(x)[n]T]ξ\Pr[\operatorname{Trace}(x)\in[n]\setminus T]\leq\xi, where ξ\xi\in is the soundness error.

The probabilities in both cases are taken over the coins of Gen,Trace\operatorname{Gen},\operatorname{Trace}, and A\mathcal{A}.

We note that an interior point fingerprinting code could also be interpreted as an ordinary fingerprinting code (using the traditional marking assumption) with codewords of length X|X| of the form 000011111000011111. As an example for using such a code, consider a vendor interested in fingerprinting movies. Using an interior point fingerprinting code, the vendor could produce fingerprinted copies by simply splicing two versions of the movie.

We now argue as in [BUV14] that the existence of an interior point fingerprinting code yields a lower bound for privately solving the interior point problem.

Let ε1\varepsilon\leq 1, δ1/(12n)\delta\leq 1/(12n), γ1/2\gamma\leq 1/2 and ξ1/(33n)\xi\leq 1/(33n). If there is an interior point fingerprinting code on domain XX for nn users with completeness error γ\gamma and soundness error ξ\xi, then there is no (ε,δ)(\varepsilon,\delta)-differentially private algorithm that, with probability at least 2/32/3, solves the interior point problem on XX for databases of size n1n-1.

Suppose for the sake of contradiction that there were a differentially private A\mathcal{A} for solving the interior point problem on Xn1X^{n-1}. Let T=[n1]T=[n-1], and let x=A(CT)x=\mathcal{A}(C|_{T}) for a codebook C\mboxRGenC\leftarrow_{\mbox{\tiny R}}\operatorname{Gen}.

Therefore, there exists some i[n]i^{*}\in[n] such that

Now consider the coalition TT^{\prime} obtained by replacing user ii^{*} with user nn. Let x=A(CT)x^{\prime}=\mathcal{A}(C|_{T^{\prime}}), again for a random codebook C\mboxRGenC\leftarrow_{\mbox{\tiny R}}\operatorname{Gen}. Since A\mathcal{A} is differentially private,

contradicting the soundness of the interior point fingerprinting code. ∎

We now show how to construct an interior point fingerprinting code, using similar ideas as in the proof of Lemma 3.3. For nn users, the codewords lie in a domain with size an exponential tower in nn, allowing us to recover the logX\log^{*}|X| lower bound for interior point queries.

Let b(n)=2n2/ξb(n)=2n^{2}/\xi, and define the function SS recursively by S(1)=1S(1)=1 and S(n+1)=b(n)S(n)S(n+1)=b(n)^{S(n)}. By induction on nn, we will construct codes for nn users over a domain of size S(n)S(n) with perfect completeness and soundness at most j=1n1b(j)<ξ\sum_{j=1}^{n}\frac{1}{b(j)}<\xi. First note that there is a code with perfect completeness and perfect soundness for n=1n=1 user over a domain of size S(1)=1S(1)=1. Suppose we have defined the behavior of (Genn,Tracen)(\operatorname{Gen}_{n},\operatorname{Trace}_{n}) for nn users. Then we define

Genn+1\operatorname{Gen}_{n+1} samples C=(x1,,xn)\mboxRGennC^{\prime}=(x_{1}^{\prime},\dots,x_{n}^{\prime})\leftarrow_{\mbox{\tiny R}}\operatorname{Gen}_{n} and xn+1\mboxR[S(n+1)]x_{n+1}\leftarrow_{\mbox{\tiny R}}[S(n+1)]. For each i=1,,ni=1,\dots,n, let xix_{i} be a base-b(n)b(n) number (written xi(0)xi(1)xi(S(n)1)x_{i}^{(0)}x_{i}^{(1)}\dots x_{i}^{(S(n)-1)}, where xi(0)x_{i}^{(0)} is the most significant digit) that agrees with xn+1x_{n+1} in the xix_{i}^{\prime} most-significant digits, and has random entries from [b(n)][b(n)] at every index thereafter. The output codebook is C=(x1,,xn+1)C=(x_{1},\dots,x_{n+1}).

Tracen+1(x)\operatorname{Trace}_{n+1}(x) retrieves the codebook CC from its shared state with Genn+1\operatorname{Gen}_{n+1}. Let MM be the maximum number of digits to which any xix_{i} (for i=1,,ni=1,\dots,n) agrees with xn+1x_{n+1}. If xx agrees with xn+1x_{n+1} on more than MM digits, accuse user n+1n+1. Otherwise, let xx^{\prime} be the number of indices on which xx agrees with xn+1x_{n+1}, and run Tracen(x)\operatorname{Trace}_{n}(x^{\prime}) with respect to codebook C=(x1,,xn)C^{\prime}=(x_{1}^{\prime},\dots,x_{n}^{\prime}).

We reduce the security of this scheme to that of (Genn,Tracen)(\operatorname{Gen}_{n},\operatorname{Trace}_{n}). To check completeness, let T[n+1]T\subseteq[n+1] be a pirate coalition and let A\mathcal{A} be a pirate algorithm. Consider the pirate algorithm A\mathcal{A}^{\prime} for codes on nn users that, given a set of codewords CTC^{\prime}|_{T^{\prime}} where T=T{n+1}T^{\prime}=T\setminus\{n+1\}, simulates Genn+1\operatorname{Gen}_{n+1} to produce a set of codewords CTC|_{T} and outputs the number xx^{\prime} of indices on which x=A(CT)x=\mathcal{A}(C|_{T}) agrees with xn+1x_{n+1}.

If xx is feasible for CTC|_{T} and xM+1xn+1M+1x^{M+1}\neq x_{n+1}^{M+1}, then xx^{\prime} is feasible for CTC^{\prime}|_{T^{\prime}}. Therefore,

by induction, proving perfect completeness.

To prove soundness, let M=maxxiM^{\prime}=\max x_{i}^{\prime}. Then

Combining Lemmas B.3 and B.4 yields Theorem 1.8.

Appendix C Another Reduction from Releasing Thresholds to the Interior Point Problem

This reduction computes approximate (α/3)(\alpha/3)-quantiles of its input, which can then be used to release thresholds with α\alpha-accuracy. To do so, it uses the strategy of [DNPR10] of using a complete binary tree to generate a sequence of k=3/αk=3/\alpha noise values. The tree has kk leaves and depth logk\log k, and at each node in the tree we sample a Laplace random variable. The noise value corresponding to a leaf is the sum of the samples along the path from that leaf to the root.

We take the sorted input database and divide it into equal-size blocks around the kk (α/3)(\alpha/3)-quantiles, and perturb the boundaries of the blocks by the kk noise values. Solving the interior point problem on these buckets then gives approximate (α/3)(\alpha/3)-quantiles. Moreover, the noisy bucketing step ensures that the final algorithm is differentially private.

We formally describe this algorithm as Thresh2Thresh_{2} below. Let RR be an (ε,δ)(\varepsilon,\delta)-differentially private mechanism for solving the interior point problem on XX that succeeds with probability at least 1αβ/61-\alpha\beta/6 on databases of size mm. In the algorithm below, let P(i)P(i) denote the set of prefixes of the binary representation of ii (including the empty prefix).

We will show that this algorithm satisfies (3ε,(1+eε)δ)(3\varepsilon,(1+e^{\varepsilon})\delta)-differential privacy, and is able to release approximate k(=3/α)k(=3/\alpha)-quantiles with (α/3,β)(\alpha/3,\beta)-accuracy, and hence (α,β)(\alpha,\beta)-accurate answers to threshold queries, as long as

Let D=(x1,,xn)D=(x_{1},\dots,x_{n}) where x1x2xnx_{1}\leq x_{2}\leq\dots\leq x_{n}, and let D=(x1,,xi,,xn)D^{\prime}=(x_{1},\dots,x_{i}^{\prime},\dots,x_{n}). Assume without loss of generality that xixi+1x_{i}^{\prime}\geq x_{i+1}, and suppose

Consider vectors of noise values ν=(ν1,ν2,,νm)\nu=(\nu_{1},\nu_{2},\dots,\nu_{m}). Then there is a bijection between noise vectors ν\nu and noise vectors ν\nu^{\prime} such that DD partitioned according to ν\nu and DD^{\prime} partitioned according to ν\nu^{\prime} differ on at most 2 blocks (cf. [DNPR10]). Moreover, this bijection changes at most 2logm2\log m values νs\nu_{s} by at most 11. Thus under this mapping, noise vector ν\nu^{\prime} is sampled with probability at most eεe^{\varepsilon} times the probability ν\nu is sampled. We get that for any set SS,

We can produce (α/3)(\alpha/3)-accurate estimates of every quantile as long as

Every noise value has magnitude at most αn/3\alpha n/3

By the analysis of Lemma 4.7 in [DNPR10], with probability at least 1β/21-\beta/2, every noise value ηi\eta_{i} is bounded by 11log2.5(1/α)/εαn/1211\log^{2.5}(1/\alpha)/\varepsilon\leq\alpha n/12. This suffices to achieve item 1. Moreover, conditioned on the noise values being so bounded, each Diαn/6m|D_{i}|\geq\alpha n/6\geq m, so each execution of RR individually succeeds with probability 1αβ/61-\alpha\beta/6. Hence they all succeed simultaneously with probability at least 1β/21-\beta/2, giving item 2.