The Geometry of Differential Privacy: the Sparse and Approximate Cases

Aleksandar Nikolov, Kunal Talwar, Li Zhang

Introduction

Differential privacy [DMNS06] is a recent privacy definition that has quickly become the standard notion of privacy in statistical databases. Informally, a mechanism (a randomized function on databases) satisfies differential privacy if the distribution of the outcome of the mechanism does not change noticeably when one individual’s input to the database is changed. Privacy is measured by how small this change must be: an ε\varepsilon-differentially private (ε\varepsilon-DP) mechanism M\mathcal{M} satisfies Pr[M(x)S]exp(ε)Pr[M(x)S]{\rm Pr}[\mathcal{M}(x)\in S]\leq\exp(\varepsilon){\rm Pr}[\mathcal{M}(x^{\prime}{})\in S] for any pair x,xx,x^{\prime}{} of neighboring databases, and for any measurable subset SS of the range. A relaxation of this definition is approximate differential privacy. A mechanism M\mathcal{M} is (ε,δ)(\varepsilon,\delta)-differentially private ((ε,δ)(\varepsilon,\delta)-DP) if Pr[M(x)S]exp(ε)Pr[M(x)S]+δ{\rm Pr}[\mathcal{M}(x)\in S]\leq\exp(\varepsilon){\rm Pr}[\mathcal{M}(x^{\prime}{})\in S]+\delta with x,x,Sx,x^{\prime}{},S as before. Here δ\delta is thought of as negligible in the size of the database. Both these definitions satisfy several desirable properties such as composability, and are resistant to post-processing of the output of the mechanism.

We think of the database as being given by a multiset of database rows, one for each individual. We will let NN denote the size of the universe that these rows come from, and we will denote by nn the number of individuals in the database. We can represent the database as its histogram x\mathdsRNx\in{\mathds{R}}^{N} with xix_{i} denoting the number of occurrences of the iith element of the universe. Thus xx would in fact be a vector of non-negative integers with x1=n\|x\|_{1}=n. We will be concerned with reporting reasonably accurate answers to a given set of dd linear queries over this histogram xx. This set of queries can naturally be represented by a matrix A\mathdsRd×NA\in{\mathds{R}}^{d\times N} with the vector Ax\mathdsRdAx\in{\mathds{R}}^{d} giving the correct answers to the queries. When A{0,1}d×NA\in\{0,1\}^{d\times N}, we call such queries counting queries. We are interested in the (practical) regime where NdnN\gg d\gg n, although our results hold for all settings of the parameters.

Specific sets of counting queries of interest can however admit much better mechanisms than adversarially chosen queries for which the lower bounds are shown. Indeed several classes of specific queries have attracted attention. Some, such as range queries, are “easier”, and asymptotically better mechanisms can be designed for them. Others, such as constant dimensional contingency tables, are nearly as hard as general counting queries, and asymptotically better mechanisms can be ruled out in some ranges of the parameters. These query-specific upper bounds are usually proved by carefully exploiting the structure of the query, and query-specific lower bounds have been proved by reconstruction attacks that exploit a lower bound on the smallest singular value of an appropriately chosen AA [DN03, DMT07, DY08, KRSU10, De12, KRS13]. It is natural to address this question in a competitive analysis framework: can we design an efficient algorithm that given any query AA, computes (even approximately) the minimum error differentially private mechanism for AA?

Hardt and Talwar [HT10] answered this question in the affirmative for ε\varepsilon-DP mechanisms, and gave a mechanism that has error within factor O(log3d)O(\log^{3}d) of the optimal assuming a conjecture from convex geometry known as the hyperplane conjecture or the slicing conjecture. Bhaskara et al. [BDKT12] removed the dependence on the hyperplane conjecture and improved the approximation ratio to O(log2d)O(\log^{2}d). Can relaxing the privacy requirement to (ε,δ)(\varepsilon,\delta)-DP help with accuracy? In many settings, (ε,δ)(\varepsilon,\delta)-DP mechanisms can be simpler and more accurate than the best known ε\varepsilon-DP mechanisms. This motivates the first question we address.

Given AA, can we efficiently approximate the optimal error (ε,δ)(\varepsilon,\delta)-DP mechanism for it?

Hardt and Talwar [HT10] showed that for some AA, the lower bound for ε\varepsilon-DP mechanism can be Ω(logN)\Omega(\log N) larger than known (ε,δ)(\varepsilon,\delta)-DP mechanisms. For non-linear Lipschitz queries, De [De12] showed that this gap can be as large as Ω(d)\Omega(\sqrt{d}) (even when N=dN=d). This leads us to ask:

How large can the gap between the optimal ε\varepsilon-DP mechanism and the optimal (ε,δ)(\varepsilon,\delta)-DP mechanism be for linear queries?

Given AA and nn, can we approximate the optimal sparse case error (ε,δ)(\varepsilon,\delta)-DP mechanism for AA when restricted to databases of size at most nn?

Given AA and nn, can we approximate the optimal sparse case error ε\varepsilon-DP mechanism for AA when restricted to databases of size at most nn?

In this paper, we are interested in both the case when X=\mathdsRNX={\mathds{R}}^{N}, called the dense case, and when X=nB1NX=nB_{1}^{N} for n<dn<d, called the sparse case. We also write errM(A)errM(A,\mathdsRN)\operatorname{err}_{\mathcal{M}}(A)\triangleq\operatorname{err}_{\mathcal{M}}(A,{\mathds{R}}^{N}) and errM(A,n)errM(A,nB1N)\operatorname{err}_{\mathcal{M}}(A,n)\triangleq\operatorname{err}_{\mathcal{M}}(A,nB_{1}^{N}).

Our first result is a simple and efficient mechanism that for query matrix AA gives an O(log2d)O(\log^{2}d) approximation to the optimal error.

Given a query matrix A\mathdsRd×NA\in{\mathds{R}}^{d\times N}, there is an efficient (ε,δ)(\varepsilon,\delta)-DP mechanism M\mathcal{M} and an efficiently computable lower bound LAL_{A} such that

errM(A)O(log2dlog1/δ)LA\operatorname{err}_{M}(A)\leq O(\log^{2}d\log 1/\delta)\cdot L_{A}, and

for any (ε,δ)(\varepsilon,\delta)-DP mechanism M\mathcal{M}^{\prime}{}, errM(A,d)LA\operatorname{err}_{\mathcal{M}^{\prime}{}}(A,d)\geq L_{A}.

We also show that the gap of Ω(log(N/d))\Omega(\log(N/d)) between ε\varepsilon-DP and (ε,δ)(\varepsilon,\delta)-DP mechanisms shown in [HT10] is essentially the worst possible, within polylog(d)\operatorname{polylog}(d) factor, for linear queries. More precisely, the lower bound on ε\varepsilon-DP mechanisms used in [HT10] is always within O(log(N/d)polylog(d))O(\log(N/d)\operatorname{polylog}(d)) of the lower bound LAL_{A} computed by our algorithm above. Let M\mathcal{M}^{*} denote the ε\varepsilon-DP generalized KK-norm mechanism in [HT10].

For any (ε,δ)(\varepsilon,\delta)-DP mechanism M\mathcal{M}, errM(A)=Ω(1/(logO(1)(d)log(N/d)))errM(A)\operatorname{err}_{\mathcal{M}}(A)=\Omega(1/(\log^{O(1)}(d)\log(N/d)))\operatorname{err}_{\mathcal{M}^{*}}(A).

We next move to the sparse case. Here we give results analogous to the dense case with a slightly worse approximation ratio.

Given A\mathdsRd×NA\in{\mathds{R}}^{d\times N} and a bound nn, there is an efficient (ε,δ)(\varepsilon,\delta)-DP mechanism M\mathcal{M} and an efficiently computable lower bound LA,nL_{A,n} such that

errM(A,n)O(log3/2dlogNlog1/δ+log2dlog1/δ)LA,n\operatorname{err}_{\mathcal{M}}(A,n)\leq O(\log^{3/2}d\cdot\sqrt{\log N\log 1/\delta}+\log^{2}d\log 1/\delta)\cdot L_{A,n}, and

For any (ε,δ)(\varepsilon,\delta)-DP mechanism M\mathcal{M}^{\prime}{}, errM(A,n)LA,n\operatorname{err}_{\mathcal{M}^{\prime}{}}(A,n)\geq L_{A,n}.

Given A\mathdsRd×NA\in{\mathds{R}}^{d\times N} and a bound nn, there is an efficient ε\varepsilon-DP mechanism M\mathcal{M} and an efficiently computable lower bound LA,nL_{A,n} such that

errM(A,n)O(logO(1)dlog3/2N)LA,n\operatorname{err}_{\mathcal{M}}(A,n)\leq O(\log^{O(1)}d\cdot\log^{3/2}N)\cdot L_{A,n}, and

For any ε\varepsilon-DP mechanism M\mathcal{M}^{\prime}{}, errM(A,n)LA,n\operatorname{err}_{\mathcal{M}^{\prime}{}}(A,n)\geq L_{A,n}.

We remark that in these theorems, our upper bounds hold for all xx with x1n\|x\|_{1}\leq n, whereas the lower bounds hold even when xx is an integer vector.

The (ε,δ)(\varepsilon,\delta)-DP mechanism of Theorem 3 when run on any counting query has error no larger than the best known bounds [GRU12] for counting queries, up to constants (not ignoring logarithmic factors). The ε\varepsilon-DP mechanism of Theorem 4 when run on any counting query can be shown to have nearly the same asymptotics, answering question 5 in the affirmative.

We will summarize some key ideas we use to achieve these results. More details will follow in Section 1.2.

For the upper bounds, the first crucial step is to decompose AA into “geometrically nice” components and then add Gaussian noise to each component. This is similar to the approach in [HT10, BDKT12] but we use the minimum volume enclosing ellipsoid, rather than the MM-ellipsoid used in those works, to facilitate the decomposition process. This allows us to handle the approximate and the sparse cases. In addition, it simplifies the mechanism as well as the analysis. For the sparse case, we further couple the mechanism with least squares estimation of the noisy answer with respect to nAB1NnAB_{1}^{N}. By utilizing techniques from statistical estimation, we can show that this process can reduce the error when n<dn<d, and prove an error upper bound dependent on the size of the smallest projection of nAB1NnAB_{1}^{N}.

For the lower bounds, we first lower bound the accuracy of (ϵ,δ)(\epsilon,\delta)-DP mechanism by the hereditary discrepancy of the query matrix AA, which we in turn lower bound in terms of the least singular values of submatrices of AA. Finally, we close the loop by utilizing the restricted invertibility principle by Bourgain and Tzafriri [BT87] and its extension by Vershynin [Ver01] which, informally, shows that if there does not exist a “small” projection of nAB1NnAB_{1}^{N} then AA has a “large” submatrix with a “large” least singular value.

The discrepancy of a matrix A\mathdsRd×NA\in{\mathds{R}}^{d\times N} is defined to be disc(A)=minx{1,+1}NAx\operatorname{disc}(A)=\min_{x\in\{-1,+1\}^{N}}\|Ax\|_{\infty}. The hereditary discrepancy of a matrix is defined as herdisc(A)=maxS[N]disc(AS)\operatorname{herdisc}(A)=\max_{S\subseteq[N]}\operatorname{disc}(A|_{S}), where ASA|_{S} denotes the matrix AA restricted to the columns indexed by SS.

As hereditary discrepancy is a maximum over exponentially many submatrices, it is not a priori clear if there even exists a polynomial-time verifiable certificate for low hereditary discrepancy. Additionally, we can show that it is NP\mathsf{NP}-hard to approximate hereditary discrepancy to within a factor of 3/23/2. Bansal [Ban10] gave a pseudo-approximation algorithm for hereditary discrepancy, which efficiently computes a coloring of discrepancy at most a factor of O(logdN)O(\log dN) larger than herdisc(A)\operatorname{herdisc}(A) for a d×Nd\times N matrix AA. His algorithm allows efficiently computing a lower bound on herdisc\operatorname{herdisc} for any restriction ASA|_{S}; however, such a lower bound may be arbitrarily loose, and before our work it was not known how to efficiently compute nearly matching lower and upper bounds on herdisc\operatorname{herdisc}.

2 Techniques

In addition to known techniques from the differential privacy literature, our work borrows tools from discrepancy theory, convex geometry and statistical estimation. We next briefly describe how they fit in.

Central to designing a provably good approximation algorithm is an efficiently computable lower bound on the optimum. Muthukrishnan and Nikolov [MN12] proved that (a slight variant of) the hereditary discrepancy of AA leads to a lower bound for the error of any (ε,δ)(\varepsilon,\delta)-DP mechanism. Lovász, Spencer and Vesztergombi [LSV86] showed that hereditary discrepancy itself can be lower bounded by a quantity called the determinant lower bound. Geometrically, this lower bound corresponds to picking the dd columns of AA that (along with the origin) give us a simplex with the largest possible volume. The volume or this simplex, appropriately normalized, gives us a lower bound on OPT. More precisely for any simplex SS, d3vol(S)2dlog2dd^{3}\cdot\operatorname{vol}(S)^{\frac{2}{d}}\log^{2}d gives a lower bound on the error. The log2d\log^{2}d factor can be removed by using a lower bound based on the least singular values of submatrices of AA. Geometrically, for the least singular value lower bound we need to find a simplex of large volume whose dd non-zero vertices are also nearly pairwise orthogonal.

If the NN columns of AA all lie in a unit ball of radius RR, it can be shown that adding Gaussian noise proportional to RR suffices to guarantee (ε,δ)(\varepsilon,\delta)-DP, resulting in a mechanism having total squared error dR2dR^{2}. Can we relate this quantity to the lower bound? It turns out that if the unit ball of radius RR is the minimum volume ellipsoid containing the columns of AA, this can be done. In this case, a result of Vershynin [Ver01], building on the restricted invertability results by Bourgain and Tzafriri [BT87], tells us that one can find Ω(d)\Omega(d) vertices of KK that touch the minimum containing ellipsoid, and are nearly orthogonal. The simplex formed by these vertices therefore has large volume, giving us a (ε,δ)(\varepsilon,\delta)-DP lower bound of Ω(dR2)\Omega(dR^{2}). In this case, the Gaussian mechanism with the optimal RR is within a constant factor of the lower bound. When the minimum volume enclosing ellipsoid is not a ball, we need to project the query along the d2\frac{d}{2} shortest axes of this ellipsoid, answer this projection using the Gaussian mechanism, and recurse on the orthogonal projection. Using the full power of the restricted invertability result by Vershynin allows us to construct a large simplex and prove our competitive ratio.

Hardt and Talwar [HT10] also used a volume based lower bound, but for ε\varepsilon-DP mechanisms, one can take KK, the symmetric convex hull of all the columns of AA and use its volume instead of the volume of SS in the lower bound above. How do these lower bounds compare? By a result of Bárány and Füredi [BF88] and Gluskin [Glu07], one can show that the volume of the convex hull of NN points can be bounded by (logN)d/2dd/2(\log N)^{d/2}d^{-d/2} times that of the minimum enclosing ellipsoid. This, along with the aforementioned restricted invertability results, allows us to prove that the ε\varepsilon-DP lower bound is within O((logN)polylogd)O((\log N)\operatorname{polylog}d) of the (ε,δ)(\varepsilon,\delta)-DP lower bound.

How do we handle sparse queries? The first observation is that the lower bounding technique gives us dd columns of AA and the resulting lower bound holds not just for AA but even for the d×dd\times d submatrix of AA corresponding to the maximum volume simplex SS; moreover, the lower bound holds even when all databases are restricted to O(d)O(d) individuals. Thus the lower bound holds when n=O(d)n=O(d) and this value marks the transition between the sparse and the dense cases. Moreover, when the minimum volume ellipsoid containing the columns of AA is a ball, the restricted invertibility principle of Bourgain and Tzafriri and Vershynin gives us a dd-dimensional simplex with nearly pairwise orthogonal vertices, and, therefore any nn-dimensional face of this simplex is another simplex of large volume. The large nn-dimensional simplex gives a lower bound on error when databases are restricted to have at most nn individuals.

To get an ε\varepsilon-DP mechanism, we use the KK-norm mechanism [HT10] instead of Gaussian noise. To bound the shadow of nAB1NnAB_{1}^{N} on ww, where ww is the noise vector generated by the KK-norm mechanism, we first analyze the expectation of ai,w\langle a_{i},w\rangle for any column of AA, and we use the log concavity of the noise distribution to prove concentration of this random variable. A union bound helps complete the argument as in the Gaussian case.

3 Related Work

Dwork et al. [DMNS06] showed that any query can be released while adding noise proportional to the total sensitivity of the query. This motivated the question of designing mechanisms with good guarantees for any set of low sensitivity queries. Nissim, Raskhodnikova and Smith [NRS07] showed that adding noise proportional to (a smoothed version of) the local sensitivity of the query suffices for guaranteeing differential privacy; this may be much smaller than the worst case sensitivity for non-linear queries. Lower bounds on the amount of noise needed for general low sensitivity queries have been shown in [DN03, DMT07, DY08, DMNS06, RHS07, HT10, De12]. Kasiviswathan et al. [KRSU10] showed upper and lower bounds for contingency table queries and more recently [KRS13] showed lower bounds on publishing error rates of classifiers or even M-estimators. Muthukrishnan and Nikolov [MN12] showed that combinatorial discrepancy lower bounds the noise for answering any set of linear queries.

Using learning theoretic techniques, Blum, Ligett and Roth [BLR08] first showed that one can exploit sparsity of the database, and answer a large number of counting queries with error small compared to the number of individuals in the database. This line of work has been further extended and improved in terms of error bounds, efficiency, generality and interactivity in several subsequent works [DNR+09, DRV10, RR10, HR10, GHRU11, HLM12].

Ghosh, Roughgarden and Sundarajan [GRS09] showed that for any one dimensional counting query, a discrete version of the Laplacian mechanism is optimal for pure privacy in a very general utilitarian framework and Gupte and Sundararajan [GS10] extended this to risk averse agents. Brenner and Nissim [BN10] showed that such universally optimal private mechanisms do not exist for two counting queries or for a single non-binary sum query. As mentioned above, Hardt and Talwar [HT10], and Bhaskara et al. [BDKT12] gave relative guarantees for multi-dimensional queries under pure privacy with respect to total squared error. De [De12] unified and strengthened these bounds and showed stronger lower bounds for the class of non-linear low sensitivity queries.

For specific queries of interest, improved upper bounds are known. Barak et al. [BCD+07] studied low dimensional marginals and showed that by running the Laplace mechanism on a different set of queries, one can reduce error. Using a similar strategy, improved mechanisms were given by [XWG10, CSS10] for orthogonal counting queries, and near optimal mechanisms were given by Muthukrishnan and Nikolov [MN12] for halfspace counting queries. The approach of answering a set of queries different from the target query set has also been studied in more generality and for other sets of queries by [LHR+10, DWHL11, RHS07, XWG10, XXY10, YZW+12]. Li and Miklau [LM12a, LM12b] study a class of mechanisms called extended matrix mechanisms and show that one can efficiently find the best mechanisms from this class. Hay et al. [HRMS10] show that in certain settings such as unattributed histograms, correcting noisy answers to enforce a consistency constraint can improve accuracy.

Very recently, Fawaz et al. [FMN] used the hereditary discrepancy lower bounds of Muthukrishnan and Nikolov, as well as the determinant lower bound on discrepancy of Lovasz, Spencer, and Vesztergombi, to prove that a certain Gaussian noise mechanism is nearly optimal (in the dense setting) for computing any given convolution map. Like our algorithms, their algorithm adds correlated Gaussian noise; however, they always use the Fourier basis to correlate the noise.

We refer the reader to texts by Chazelle [Cha00] and Matoušek [Mat99] and the chapter by Beck and Sós [BS95] for an introduction to discrepancy theory. Bansal [Ban10] showed that a semidefinite relaxation can be used to design a pseudo-approximation algorithm for hereditary discrepancy. Matoušek [Mat11] showed that the determinant based lower bound of Lovász, Spencer and Vesztergombi [LSV86] is tight up to polylogarithmic factors. Larsen [Lar11] showed applications of hereditary discrepancy to data structure lower bounds, and Chandrasekaran and Vempala [CV11] recently showed applications of hereditary discrepancy to problems in integer programming.

Preliminaries

We start by introducing some basic notation.

For a d×Nd\times N matrix AA and a set S[N]S\subseteq[N], we denote by ASA|_{S} the submatrix of AA consisting of those columns of AA indexed by elements of SS. Occasionally we refer to a matrix VV whose columns form an orthonormal basis for some subspace of interest V\mathcal{V} as the orthonormal basis of V\mathcal{V}. Pk\mathcal{P}_{k} is the set of orthogonal projections onto kk-dimensional subspaces of \mathdsRd{\mathds{R}}^{d}.

By σmin(A)\sigma_{\min}(A) and σmax(A)\sigma_{\max}(A) we denote, respectively, the smallest and largest singular value of AA. I.e., σmin(A)=minx:x2=1Ax2\sigma_{\min}(A)=\min_{x:\|x\|_{2}=1}{\|Ax\|_{2}} and σmax(A)=maxx:x2=1Ax2\sigma_{\max}(A)=\max_{x:\|x\|_{2}=1}{\|Ax\|_{2}}. In general, σi(A)\sigma_{i}(A) is the ii-th largest singular value of AA, and λi(A)\lambda_{i}(A) is the ii-th largest eigenvalue of AA. We recall the minimax characterization of eigenvalues for symmetric matrices:

For a matrix AA (and the corresponding linear operator), we denote by A2=σmax(A)\|A\|_{2}=\sigma_{\max}(A) the spectral norm of AA and AF=iσi2(A)=i,jai,j2\|A\|_{F}=\sqrt{\sum_{i}\sigma_{i}^{2}(A)}=\sqrt{\sum_{i,j}a_{i,j}^{2}} the Frobenius norm of AA. By kerA\ker A we denote the kernel of AA, i.e. the subspace of vectors xx for which Ax=0Ax=0.

For a set K\mathdsRdK\subseteq{\mathds{R}}^{d}, we denote by vold(K)\operatorname{vol}_{d}(K) its dd-dimensional volume. Often we use instead the volume radius

For a convex body K\mathdsRdK\subseteq{\mathds{R}}^{d}, the polar body KK^{\circ} is defined by K={y:y,x1 xK}K^{\circ}=\{y:\langle y,x\rangle\leq 1~{}\forall x\in K\}. The fundamental fact about polar bodies we use is that for any two convex bodies KK and LL

In the remainder of this paper, when we claim that a fact follows “by convex duality,” we mean that it is implied by (1).

A convex body KK is (centrally) symmetric if K=K-K=K. The Minkowski norm xK\|x\|_{K} induced by a symmetric convex body KK is defined as xKmin{r\mathdsR:xrK}\|x\|_{K}\triangleq\min\{r\in{\mathds{R}}:x\in rK\}. The Minkowski norm induced by the polar body KK^{\circ} of KK is the dual norm of xK\|x\|_{K} and also has the form yK=maxxKx,y\|y\|_{K^{\circ}}=\max_{x\in K}{\langle x,y\rangle}. For convex symmetric KK, the induced norm and dual norm satisfy Hölder’s inequality:

An ellipsoid in \mathdsRd{\mathds{R}}^{d} is the image of B2dB_{2}^{d} under an affine map. All ellipsoids we consider are symmetric, and therefore, are equal to an image FB2dFB_{2}^{d} of the ball B2dB_{2}^{d} under a linear map FF. A full dimensional ellipsoid E=FB2dE=FB_{2}^{d} can be equivalently defined as E={x:xT(FFT)1x1}E=\{x:x^{T}(FF^{T})^{-1}x\leq 1\}. The polar body of a symmetric ellipsoid E=FB2dE=FB_{2}^{d} is the ellipsoid (or cylinder with an ellipsoid as its base in case FF is not full dimensional) E={x:xTFFTx1}E^{\circ}=\{x:x^{T}FF^{T}x\leq 1\}.

We repeatedly use a classical theorem of Fritz John, characterizing the (unique) minimum volume enclosing ellipsoid (MEE) of any convex body KK. We note that John’s theorem is frequently stated in terms of the maximum volume enclosed ellipsoid in KK; the two variants of the theorem are equivalent by convex duality. The MEE of KK is also known as a the Löwner or Löwner-John ellipsoid of KK.

Any convex body K\mathdsRdK\subseteq{\mathds{R}}^{d} is contained in a unique ellipsoid of minimal volume. This ellipsoid is B2dB_{2}^{d} if and only if there exist unit vectors u1,,umKB2du_{1},\ldots,u_{m}\in K\cap B_{2}^{d} and positive reals c1,,cmc_{1},\ldots,c_{m} such that

According to John’s characterization, when the MEE of KK is the ball B2dB_{2}^{d}, the contact points of KK and B2dB_{2}^{d} satisfy a structural property — the identity decomposes into a linear combination of the projection matrices onto the lines of the contact points. Intuitively, this means that KK “hits” B2dB_{2}^{d} in all directions — it has to, or otherwise B2dB_{2}^{d} can be “pinched” in order to produce a smaller ellipsoid that still contains KK. This intuition is formalized by a theorem of Vershynin, which generalizes the work of Bourgain and Tzafriri on restricted invertibility [BT87]. Vershynin ([Ver01] Theorem 3.1) shows that there exist Ω(d)\Omega(d) contact points of KK and B2dB_{2}^{d} which are approximately pairwise orthogonal.

Let K\mathdsRdK\subseteq{\mathds{R}}^{d} be a symmetric convex body whose minimum volume enclosing ellipsoid is the unit ball B2dB_{2}^{d}. Let TT be a linear map with spectral norm T21\|T\|_{2}\leq 1. Then for any β\beta, there exist constant C1(β)C_{1}(\beta), C2(β)C_{2}(\beta) and contact points x1,,xkx_{1},\ldots,x_{k} with k(1β)TF2k\geq(1-\beta)\|T\|_{F}^{2} such that the matrix TX=(Txi)i=1kTX=(Tx_{i})_{i=1}^{k} satisfies

2 Statistical Estimation

A key element in our algorithms for the sparse case is the use of least squares estimation to reduce error. Below we present a bound on the error of least squares estimation with respect to symmetric convex bodies. This analysis appears to be standard in the statistics literature; a special case of it appears for example in [RWY11].

First we show the easier bound y^y22w2\|\hat{y}-y\|_{2}\leq 2\|w\|_{2}, which follows by the triangle inequality:

The second bound is based on Hölder’s inequality and the following simple but very useful fact, illustrated schematically in Figure 1:

3 Differential Privacy

Following recent work in differential privacy, we model private data as a database DD of nn rows, where each row of DD contains information about an individual. Formally, a database DD is a multiset of size nn of elements of the universe U={t1,,tN}U=\{t_{1},\ldots,t_{N}\} of possible user types. Our algorithms take as input a histogram x\mathdsRNx\in{\mathds{R}}^{N} of the database DD, where the ii-th component xix_{i} of xx encodes the number of individuals in DD of type tit_{i}. Notice that in this histogram representation, we have x1=n\|x\|_{1}=n when DD is a database of size nn. Also, two neighboring databases DD and DD^{\prime} that differ in the presence or absence of a single individual correspond to two histograms xx and xx^{\prime} satisfying xx1=1\|x-x^{\prime}\|_{1}=1.

Through most of this paper, we work under the notion of approximate differential privacy. The definition follows.

When δ=0\delta=0, we are in the regime of pure differential privacy.

An important basic property of differential privacy is that the privacy guarantees degrade smoothly under composition and are not affected by post-processing.

Let M1\mathcal{M}_{1} and M2\mathcal{M}_{2} satisfy (ε1,δ1)(\varepsilon_{1},\delta_{1})- and (ε2,δ2)(\varepsilon_{2},\delta_{2})-differential privacy, respectively. Then the algorithm which on input x\mathbf{x} outputs the tuple (M1(x),M2(M1(x),x))(\mathcal{M}_{1}(\mathbf{x}),\mathcal{M}_{2}(\mathcal{M}_{1}(\mathbf{x}),\mathbf{x})) satisfies (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-differential privacy.

In this paper we study the necessary and sufficient error incurred by differentially private algorithms for approximating linear queries. A set of dd linear queries is given by a d×Nd\times N query matrix or workload AA; the exact answers to the queries on a histogram xx are given by the dd-dimensional vector y=Axy=Ax.

We define error as total squared error. More precisely, for an algorithm M\mathcal{M} and a subset X\mathdsRNX\subseteq{\mathds{R}}^{N}, we define

We also write errM(A,nB1N)\operatorname{err}_{\mathcal{M}}(A,nB_{1}^{N}) as errM(A,n)\operatorname{err}_{\mathcal{M}}(A,n). The optimal error achievable by any (ε,δ)(\varepsilon,\delta)-differentially private algorithm for queries AA and databases of size up to nn is

where the infimum is taken over all (ε,δ)(\varepsilon,\delta)-differentially private algorithms. When no restrictions are placed on the size nn of the database, the appropriate notion of optimal error is optε,δ(A)supnoptε,δ(A,n)\operatorname{opt}_{\varepsilon,\delta}(A)\triangleq\sup_{n}\operatorname{opt}_{\varepsilon,\delta}(A,n). Similarly, for an algorithm M\mathcal{M}, the error when database size is not bounded is errM(A)supnerrM(A,n)\operatorname{err}_{\mathcal{M}}(A)\triangleq\sup_{n}\operatorname{err}_{\mathcal{M}}(A,n). A priori it is not clear that these quantities are necessarily finite, but we will show that this is the case.

In order to get tight dependence on the privacy parameter ε\varepsilon in our analyses, we will use the following relationship between optε,δ(A,n)\operatorname{opt}_{\varepsilon,\delta}(A,n) and optε,δ(A,n)\operatorname{opt}_{\varepsilon^{\prime},\delta^{\prime}}(A,n).

For any ε\varepsilon, any δ<1\delta<1, any integer kk and for δekε1eε1δ\delta^{\prime}\geq\frac{e^{k\varepsilon}-1}{e^{\varepsilon}-1}\delta,

Let M\mathcal{M} be an (ε,δ)(\varepsilon,\delta)-differentially private algorithm achieving optε,δ(A,n)\operatorname{opt}_{\varepsilon,\delta}(A,n). We will use M\mathcal{M} as a black box to construct a (kε,δ)(k\varepsilon,\delta^{\prime})-differentially private algorithm M\mathcal{M}^{\prime} which satisfies the error guarantee errM(A,n/k)1k2errM(A,n)\operatorname{err}_{\mathcal{M}^{\prime}}(A,n/k)\leq\frac{1}{k^{2}}\operatorname{err}_{\mathcal{M}}(A,n).

The algorithm M\mathcal{M}^{\prime} on input xx satisfying x1n/k\|x\|_{1}\leq n/k outputs 1kM(kx)\frac{1}{k}\mathcal{M}(kx). We need to show that M\mathcal{M}^{\prime} satisfies (kε,δ)(k\varepsilon,\delta^{\prime})-differential privacy. Let xx and xx^{\prime} be two neighboring inputs to M\mathcal{M}^{\prime}, i.e. xx11\|x-x^{\prime}\|_{1}\leq 1, and let SS be a measurable subset of the output M\mathcal{M}^{\prime}. Denote p1=Pr[M(x)S]p_{1}={\rm Pr}[\mathcal{M}^{\prime}(x)\in S] and p2=Pr[M(x)S]p_{2}={\rm Pr}[\mathcal{M}^{\prime}(x^{\prime})\in S]. We need to show that p1ekεp2+δp_{1}\leq e^{k\varepsilon}p_{2}+\delta^{\prime}. To that end, define x0=kxx_{0}=kx, x1=kx+(xx)x_{1}=kx+(x^{\prime}-x), x2=kx+2(xx)x_{2}=kx+2(x^{\prime}-x), \ldots, xk=kxx_{k}=kx^{\prime}. Applying the (ε,δ)(\varepsilon,\delta)-privacy guarantee of M\mathcal{M} to each of the pairs of neighboring inputs x0,x1x_{0},x_{1}, x1,x2x_{1},x_{2}, \ldots, xk1,xkx_{k-1},x_{k} in sequence gives us

This finishes the proof of privacy for M\mathcal{M}^{\prime}. It is straightforward to verify that errM(A,n/k)1k2errM(A,n)\operatorname{err}_{\mathcal{M}^{\prime}}(A,n/k)\leq\frac{1}{k^{2}}\operatorname{err}_{\mathcal{M}}(A,n). ∎

3.2 Gaussian Noise Mechanism

A basic mechanism for achieving (ε,δ)(\varepsilon,\delta)-differential privacy for linear queries is adding appropriately scaled independent Gaussian noise to each query. This approach goes back to the work of Blum et al. [BDMN05], predating the definition of differential privacy. Next we define this basic mechanism formally and give a privacy guarantee. The privacy analysis of the Gaussian mechanism in the context of (ε,δ)(\varepsilon,\delta)-differential privacy was first given in [DKM+06b]. We give the full proof here for completeness.

Let A=(ai)i=1NA=(a_{i})_{i=1}^{N} be a d×Nd\times N matrix such that i:ai2σ\forall i:\|a_{i}\|_{2}\leq\sigma. Then a mechanism which on input x\mathdsRNx\in{\mathds{R}}^{N} outputs Ax+wAx+w, where wN(0,σ1+2ln(1/δ)ε)dw\sim N(0,\sigma\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon})^{d}, satisfies (ε,δ)(\varepsilon,\delta)-differential privacy.

Let C=1+2ln(1/δ)εC=\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon} and let pp be the probability density function of N(0,Cσ)dN(0,C\sigma)^{d}. Let also K=AB1K=AB_{1}, so xx1B1\|x-x^{\prime}\|_{1}\in B_{1} implies A(xx)KB2dA(x-x^{\prime})\in K\subseteq B_{2}^{d}. Define

We will prove that when wN(0,Cσ)w\sim N(0,C\sigma), for all vKv\in K, Pr[Dv(w)>ε]δ{\rm Pr}[|D_{v}(w)|>\varepsilon]\leq\delta. This suffices to prove (ε,δ)(\varepsilon,\delta)-differential privacy. Indeed, let the algorithm output Ax+wAx+w and fix any xx^{\prime} s.t. xx11\|x-x^{\prime}\|_{1}\leq 1. Let v=A(xx)Kv=A(x-x^{\prime})\in K and S={w:Dv(w)>ε}S=\{w:|D_{v}(w)|>\varepsilon\}. For any measurable T\mathdsRdT\subseteq{\mathds{R}}^{d} we have

Note that to bound Dv(w)|D_{v}(w)| we simply need to bound 1C2σ2vTw\frac{1}{C^{2}\sigma^{2}}v^{T}w from above and below. Since 1C2σ2vTwN(0,vCσ)\frac{1}{C^{2}\sigma^{2}}v^{T}w\sim N(0,\frac{\|v\|}{C\sigma}), we can apply a Chernoff bound and we get

Substituting C1+2ln(1/δ)εC\geq\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon} completes the proof. ∎

The following corollary is a useful geometric generalization of Lemma 4.

Let A=(ai)i=1NA=(a_{i})_{i=1}^{N} be a d×Nd\times N matrix of rank dd and let K=sym{a1,,aN}K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}. Let E=FB2dE=FB_{2}^{d} (FF is a linear map) be an ellipsoid containing KK. Then a mechanism that outputs Ax+FwAx+Fw where wN(0,1+2ln(1/δ)ε)dw\sim N(0,\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon})^{d} satisfies (ε,δ)(\varepsilon,\delta)-differential privacy.

Since KK is full dimensional (by rankA=d{\rm rank}A=d) and EE contains KK, EE is full dimensional as well, and, therefore, FF is an invertible linear map. Define G=F1AG=F^{-1}A. For each column gig_{i} of GG, we have gi21\|g_{i}\|_{2}\leq 1. Therefore, by Lemma 4, a mechanism that outputs Gx+wGx+w (where ww is distributed as in the statement of the corollary) satisfies (ε,δ)(\varepsilon,\delta)-differential privacy. Therefore, FGx+Fw=Ax+FwFGx+Fw=Ax+Fw is (ε,δ)(\varepsilon,\delta)-differentially private by the post-processing property of differential privacy. ∎

We present a composition theorem, specific to composing Gaussian noise mechanisms. We note that a similar composition result in a much more general setting but with slightly inferior dependence on the parameters is proven in [DRV10].

Let V1,,Vk\mathcal{V}_{1},\ldots,\mathcal{V}_{k} be vector spaces of respective dimensions d1,,dkd_{1},\ldots,d_{k}, such that ik1\forall i\leq k-1, Vi+1Vi\mathcal{V}_{i+1}\subseteq\mathcal{V}_{i}^{\perp} and d1++dk=dd_{1}+\ldots+d_{k}=d . Let A=(ai)i=1NA=(a_{i})_{i=1}^{N} be a d×Nd\times N matrix of rank dd and let K=sym(a1,,aN)K=\operatorname{sym}(a_{1},\ldots,a_{N}). Let Πi\Pi_{i} be the projection matrix for Vi\mathcal{V}_{i} and let Ei=FiB2diViE_{i}=F_{i}B_{2}^{d_{i}}\subseteq\mathcal{V}_{i} be an ellipsoid such that ΠiKEi\Pi_{i}K\subseteq E_{i}. Then the mechanism that outputs Ax+ki=1kFiwiAx+\sqrt{k}\sum_{i=1}^{k}{F_{i}w_{i}} where for each ii, wiN(0,1+2ln(1/δ)ε)diw_{i}\sim N(0,\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon})^{d_{i}}, satisfies (ε,δ)(\varepsilon,\delta)-differential privacy.

Let c(ε,δ)=1+2ln(1/δ)εc(\varepsilon,\delta)=\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon}. Since the random variables F1w1,,FkwkF_{1}w_{1},\ldots,F_{k}w_{k} are pairwise independent Gaussian random variables, and FiwiF_{i}w_{i} has covariance matrix c(ε,δ)2FiFiTc(\varepsilon,\delta)^{2}F_{i}F_{i}^{T}, we have that w=ki=1kFiwiw=\sqrt{k}\sum_{i=1}^{k}{F_{i}w_{i}} is a Gaussian random variable with covariance c(ε,δ)2Gc(\varepsilon,\delta)^{2}G, whee G=ki=1kFiFiTG=k\sum_{i=1}^{k}{F_{i}F_{i}^{T}}. By Corollary 8, it is sufficient to show that the ellipsoid E=GB2dE=GB_{2}^{d} contains KK. By convex duality, this is equivalent to showing EKE^{\circ}\subseteq K^{\circ}, which is in turn equivalent to x:xKxE\forall x:\|x\|_{K^{\circ}}\leq\|x\|_{E^{\circ}}. Recalling that xE2=xTGGTx\|x\|^{2}_{E^{\circ}}=x^{T}GG^{T}x and xK=maxyKy,x=maxj=1Naj,x\|x\|_{K^{\circ}}=\max_{y\in K}{\langle y,x\rangle}=\max_{j=1}^{N}{\langle a_{j},x\rangle}, we need to establish

We proceed by establishing (4). Since for all ii, ΠiKEi\Pi_{i}K\subseteq E_{i}, by duality and the same reasoning as above, we have that for all ii and jj, Πiaj,x2xTFiFiTx\langle\Pi_{i}a_{j},x\rangle^{2}\leq x^{T}F_{i}F_{i}^{T}x. Therefore, by the Cauchy-Schwarz inequality,

3.3 Noise Lower Bounds

We will make extensive use of a lower bound on the noise complexity of (ε,δ)(\varepsilon,\delta)-differentially private mechanisms in terms of combinatorial discrepancy. First we need to define the notion of hereditary α\alpha-discrepancy:

Next we present the lower bound, which is a simple extension of the discrepancy lower bound on noise recently proved by Muthukrishnan and Nikolov [MN12].

Let AA be an d×Nd\times N real matrix. For any constant α\alpha and sufficiently small constant εε(α)\varepsilon\leq\varepsilon(\alpha) and δδ(α)\delta\leq\delta(\alpha),

Substituting (5) into Theorem 10, we have that there exist constants c1c_{1} and c2c_{2} such that

For the remainder of this paper we fix some constants c1c_{1} and c2c_{2} for which (6) holds. Similarly to the notation for opt\operatorname{opt}, we will also sometimes denote specLB(A)=maxnspecLB(A,n)\operatorname{specLB}(A)=\max_{n}\operatorname{specLB}(A,n). We will use primarily the spectral lower bound (6) for arguing the optimality of our algorithms.

To show the small gap between the approximate and pure privacy (Theorem 2), we next develop a determinant based lower bound. We first switch from herdiscα\operatorname{herdisc}_{\alpha} to the classical notion of hereditary discrepancy, equivalent to herdisc1\operatorname{herdisc}_{1}, by observing the following relation between herdiscα\operatorname{herdisc}_{\alpha} and herdisc1\operatorname{herdisc}_{1} from [MN12]:

We then use an extension of the classical determinant lower bound for hereditary discrepancy, due to Lovász, Spencer, and Vesztergombi.

Let us first define linear discrepancy. For a d×kd\times k matrix MM and ckc\in^{k}, let discc(M)\operatorname{disc}^{c}(M) be defined as

The linear discrepancy of MM is then defined as lindisc(M)maxckdiscc(M)\operatorname{lindisc}(M)\triangleq\max_{c\in^{k}}{\operatorname{disc}^{c}(M)}. We claim that for any MM,

We complete the proof of the theorem by proving a lower bound on lindisc\operatorname{lindisc} in terms of detLB\operatorname{detLB}. We note that a similar lower bound can be proved for any variant of lindisc\operatorname{lindisc} defined in terms of any norm. The exact lower bound will depend on the volume radius of the unit ball of the norm. Since the proof of (8) also works for any norm, we get a determinant lower bound for hereditary discrepancy defined in terms of any norm as well.

We show that for any d×kd\times k matrix MM

Letting kk range over [n][n], MM range over all d×kd\times k submatrices of AA, and applying the bounds (8) and (9) implies the theorem.

We proceed to prove (9). Note that if rank(M)<k{\rm rank}(M)<k, (9) is trivially true; therefore, we may assume that rank(M)=k{\rm rank}(M)=k. Note also that without loss of generality we can take Π\Pi to be the orthogonal projection onto the range of MM, since this is the projection operator that maximizes detΠM|\det\Pi M|. Let EE be the ellipsoid E={x:Mx221}E=\{x:\|Mx\|_{2}^{2}\leq 1\}. The inequality lindisc(M)D\operatorname{lindisc}(M)\leq D is equivalent to

Thus 2k=vol(d)2kvol(DE)2^{k}=\operatorname{vol}(^{d})\leq 2^{k}\operatorname{vol}(D\cdot E), and therefore Dk1vol(E)D^{k}\geq\frac{1}{\operatorname{vol}(E)}. On the other hand, the volume of EE is equal to

Applying the standard estimate vol(B2k)1/k=Θ(k1/2)\operatorname{vol}(B_{2}^{k})^{1/k}=\Theta(k^{-1/2}) completes the proof. ∎

By the determinant lower bound, and (7), we get our determinant lower bound on the noise necessary for privacy. For some constant c1,c2>0c_{1},c_{2}>0,

Finally, we recall the stronger volume lower bound against (ε,0)(\varepsilon,0)-differential privacy from [HT10, BDKT12]. This lower bound is nearly optimal for (ε,0)(\varepsilon,0)-differential privacy, but does not hold for (ε,δ)(\varepsilon,\delta)-differential privacy when δ\delta is 2o(d)2^{-o(d)}.

For any d×Nd\times N real matrix A=(ai)i=1NA=(a_{i})_{i=1}^{N},

where K=sym{ai}i=1dK=\operatorname{sym}\{a_{i}\}_{i=1}^{d}.

Furthermore, there exists an efficient mechanism MK\mathcal{M}_{K} (the generalized KK-norm mechanism) which is (ε,0)(\varepsilon,0)-differentially private and satisfies errMK(A)=O(log3d)volLB(A,ε)\operatorname{err}_{\mathcal{M}_{K}}(A)=O(\log^{3}d)\operatorname{volLB}(A,\varepsilon).

Algorithms for Approximate Privacy

In this section we present our main results: efficient nearly optimal algorithms for approximate privacy in the cases of dense databases (n>d/εn>d/\varepsilon) and sparse databases (n=o(d/ε)n=o(d/\varepsilon)). Both algorithms rely on recursively computing an orthonormal basis for \mathdsRd{\mathds{R}}^{d}, based on the minimum volume enclosing ellipsoid of the columns of the query matrix AA. We first present the algorithm for computing this basis, together with a property essential for the analyses of the two algorithms presented next.

We first present an algorithm (Algorithm 1) that, given a matrix A\mathdsRd×NA\in{\mathds{R}}^{d\times N}, computes a set of orthonormal matrices U1,,UkU_{1},\ldots,U_{k}, where k1+logdk\leq\lceil 1+\log d\rceil. For each iji\neq j, UiTUj=0U_{i}^{T}U_{j}=0, and the union of the columns of U1,,UkU_{1},\ldots,U_{k} forms an orthonormal basis for \mathdsRd{\mathds{R}}^{d}. Thus, Algorithm 1 computes a basis for \mathdsRd{\mathds{R}}^{d}, and partitions (“decomposes”) it into k=O(logd)k=O(\log d) bases of mutually orthogonal subspaces. This set of bases also induces a decomposition of AA into A=A1++AkA=A_{1}+\ldots+A_{k}, where Ai=UiUiTAA_{i}=U_{i}U_{i}^{T}A. The base decomposition of Algorithm 1 is essential to both our dense case and sparse case algorithms. Intuitively, for both cases we can show that the error of a simple mechanism applied to AiA_{i} can be matched by an error lower bound for Ai+1+AkA_{i+1}+\ldots A_{k}. The error lower bounds are based on the spectral lower bound specLB\operatorname{specLB} on discrepancy; the geometric properties of the minimum enclosing ellipsoid of a convex body together with the restricted invertibility principle of Bourgain and Tzafriri are key in deriving the lower bounds.

The next lemma captures the technical property of the decomposition of Algorithm 1 that allows us to prove matching upper and lower error bounds for our dense and sparse case algorithms.

Let did_{i} be the dimension of the span of UiU_{i}. Furthermore, for i<ki<k, let Wi=j>iUjW_{i}=\sum_{j>i}{U_{j}}, and let Wk=UkW_{k}=U_{k}. For every iki\leq k, there exists a set Si[N]S_{i}\subseteq[N], such that Si=Ω(di)|S_{i}|=\Omega(d_{i}) and σmin2(WiWiTASi)=Ω(1)maxj=1NUiTaj22\sigma_{\min}^{2}(W_{i}W_{i}^{T}A|_{S_{i}})=\Omega(1)\max_{j=1}^{N}{\|U_{i}^{T}a_{j}\|_{2}^{2}}.

Let us, for ease of notation, assume that dd is a power of 22. We prove that there exists a set SS, S=Ω(d)|S|=\Omega(d), such that

By applying an appropriate unitary transformation to the columns of AA, we may assume that the major axes of EE are co-linear with the standard basis vectors of \mathdsRd{\mathds{R}}^{d}, and, therefore, FF is a diagonal matrix with Fii=σiF_{ii}=\sigma_{i}. This transformation comes without loss of generality, since it applies a unitary transformation to the columns of AA and VV and does not affect the singular values of any matrix VVTASVV^{T}A|_{S} for any S[N]S\subseteq[N]. For the rest of the proof we assume that FF is diagonal.

Since FF is diagonal, uiu_{i} is equal to eie_{i}, the ii-th standard basis vector. Therefore U1U_{1} is diagonal and equal to the projection onto ed/2+1,,ede_{d/2+1},\ldots,e_{d}, and VV is also diagonal and equal to the projection onto e1,,ed/2e_{1},\ldots,e_{d/2}. Consider L=F1K=F1AB1dL=F^{-1}K=F^{-1}AB_{1}^{d} (recall that we assumed that rankA=d{\rm rank}A=d and therefore FF is non-singular). Since the minimum enclosing ellipsoid of KK is E=FB2dE=FB_{2}^{d}, we have that the minimum enclosing ellipsoid of LL is B2dB_{2}^{d}. Let T=VVTT=VV^{T} be the projection onto e1,,ed/2e_{1},\ldots,e_{d/2}. Then, by Theorem 7, and because TF2=d/2\|T\|_{F}^{2}=d/2, we have that there exists a set SS of size S=Ω(d)|S|=\Omega(d) such that σmin2(TF1AS)=Ω(1)\sigma_{\min}^{2}(TF^{-1}A|_{S})=\Omega(1). We chose FF, and therefore F1F^{-1}, as well as TT to be diagonal matrices, so they all commute. Then, since TT is a projection matrix,

Observe that, since KEK\subseteq E, we have

Therefore, maxj=1NU1Taj22σd/2+12σd/22\max_{j=1}^{N}{\|U_{1}^{T}a_{j}\|_{2}^{2}}\leq\sigma_{d/2+1}^{2}\leq\sigma_{d/2}^{2}. Substituting for σd/22\sigma_{d/2}^{2} into (14) completes the proof. ∎

2 The Dense Case: Correlated Gaussian Noise

Our first result is an efficient algorithm whose expected error matches the spectral lower bound specLB\operatorname{specLB} up to polylogarithmic factors and is therefore nearly optimal. This proves Theorem 1. The algorithm adds correlated unbiased Gaussian noise to the exact answer AxAx. The noise distribution is computed based on the decomposition algorithm of the previous subsection.

Let Mg(A,x)\mathcal{M}_{g}(A,x) be the output of Algorithm 2 on input a d×Nd\times N query matrix AA and private input xx. Mg(A,x)\mathcal{M}_{g}(A,x) is (ε,δ)(\varepsilon,\delta)-differentially private and for all small enough ε\varepsilon and all δ\delta small enough with respect to ε\varepsilon satisfies

We start the proof of Theorem 13 with the privacy analysis. For ease of notation, we assume throughout the analysis that dd is a power of 2.

Mg(A,x)\mathcal{M}_{g}(A,x) satisfies (ε,δ)(\varepsilon,\delta)-differential privacy.

The lemma follows from Corollary 9. Next we describe in detail why the corollary applies.

Let U1,,UkU_{1},\ldots,U_{k} be the base decomposition computed by Algorithm 1 on input AA. Let Vi\mathcal{V}_{i} be the subspace spanned by the columns of UiU_{i} and let did_{i} be the dimension of Vi\mathcal{V}_{i}. The projection matrix onto Vi\mathcal{V}_{i} is UiUiTU_{i}U_{i}^{T}. Let EiE_{i} be the ellipsoid Ui(riB2di)=FiB2diU_{i}(r_{i}B_{2}^{d_{i}})=F_{i}B_{2}^{d_{i}} (FiF_{i} is riUir_{i}U_{i}). By the definition of rir_{i}, UiTKriB2diU_{i}^{T}K\subseteq r_{i}B_{2}^{d_{i}}, and therefore, ΠiKEi\Pi_{i}K\subseteq E_{i}. Mg(A,x)\mathcal{M}_{g}(A,x) is distributed identically to Ax+ki=1kFiwiAx+\sqrt{k}\sum_{i=1}^{k}{F_{i}w_{i}}. Therefore, by Corollary 9, Mg(A,x)\mathcal{M}_{g}(A,x) satisfies (ε,δ)(\varepsilon,\delta)-differential privacy. ∎

For all small enough ε\varepsilon and all δ\delta small enough with respect to ε\varepsilon, for all ii,

where rir_{i}, UiU_{i} and wiw_{i} are as defined in Algorithm 2.

By (15), it is enough to lower bound optε,δ(A)\operatorname{opt}_{\varepsilon,\delta}(A) by Ω(1ε2dri2)\Omega(\frac{1}{\varepsilon^{2}}dr_{i}^{2}). As a first step we lower bound specLB(A)\operatorname{specLB}(A). Then the lower bound on optε,δ\operatorname{opt}_{\varepsilon,\delta} will follow from (6) and Lemma 3.

To lower bound specLB(A)\operatorname{specLB}(A), we invoke Lemma 5. It follows from the lemma that for every ii there exists a projection matrix Πi=WiWiT\Pi_{i}=W_{i}W_{i}^{T} and a set SiS_{i} such that σmin2(ΠiASi)=Ω(ri2)\sigma_{\min}^{2}(\Pi_{i}A|_{S_{i}})=\Omega(r_{i}^{2}), and, furthermore, Si=Ω(di)|S_{i}|=\Omega(d_{i}). Substituting into the the definition of specLB(A,di)\operatorname{specLB}(A,d_{i}), we have that for all ii.

Therefore, by (6), optc1,c2(A,di)=Ω(diri2)\operatorname{opt}_{c_{1},c_{2}}(A,d_{i})=\Omega(d_{i}r_{i}^{2}) for all ii. Finally, by Lemma 3, there exists a small enough δ=δ(ε)\delta=\delta(\varepsilon), for which optε,δ(A,diε)=Ω(1ε2diri2)\operatorname{opt}_{\varepsilon,\delta}(A,\frac{d_{i}}{\varepsilon})=\Omega(\frac{1}{\varepsilon^{2}}d_{i}r_{i}^{2}), and this completes the proof. ∎

Proof of Theorem 13: The proof of the theorem follows from Lemma 6 and Lemma 7. The privacy guarantee is direct from Lemma 6. Next we prove that the error of Mg\mathcal{M}_{g} is near optimal.

3 The Sparse Case: Least Squares Estimation

In this subsection we present an algorithm with stronger accuracy guarantees than Algorithm 2: it is optimal for any query matrix AA and any database size bound nn (Theorem 3). The algorithm combines the noise distribution of Algorithm 2 with a least squares estimation step. Privacy is guaranteed by noise addition, while the least squares estimation step reduces the error significantly when n=o(d/ε)n=o(d/\varepsilon). The algorithm is shown as Algorithm 3.

To finish the proof of the lemma, we need to lower bound optε,δ(A,n)\operatorname{opt}_{\varepsilon,\delta}(A,n) by Ω(1εnri2)\Omega(\frac{1}{\varepsilon}nr_{i}^{2}). We will use Lemma 5 to lower bound specLB(A,εn)\operatorname{specLB}(A,\varepsilon n) by Ω(εnri2)\Omega(\varepsilon nr_{i}^{2}) and then we will invoke Lemma 3 to get the right dependence on ε\varepsilon.

By Lemma 5, for every ii there exists a projection matrix Πi=WiWiT\Pi_{i}=W_{i}W_{i}^{T} and a set SiS_{i} such that σmin2(ΠiASi)=Ω(ri2)\sigma_{\min}^{2}(\Pi_{i}A|_{S_{i}})=\Omega(r_{i}^{2}), and, furthermore, Si=Ω(di)|S_{i}|=\Omega(d_{i}). By the definition of tt, for all iti\leq t, diεnd_{i}\geq\varepsilon n, and, therefore, Si=Ω(εn)|S_{i}|=\Omega(\varepsilon n). Take TiSiT_{i}\subseteq S_{i} to be an arbitrary subset of SiS_{i} of size Ω(εn)\Omega(\varepsilon n). The smallest singular value of ΠiASi\Pi_{i}A|_{S_{i}} is a lower bound on the smallest singular value of ΠiATi\Pi_{i}A|_{T_{i}}:

where supp(x)\operatorname{supp}(x) is the subset of coordinates on which xx is nonzero. Therefore, σmin2(ΠiATi)=Ω(ri2)\sigma_{\min}^{2}(\Pi_{i}A|_{T_{i}})=\Omega(r_{i}^{2}). Substituting into the definition of specLB(A,εn)\operatorname{specLB}(A,\varepsilon n), we have

Therefore, by (6), optc1,c2(A,εn)=Ω(εnri2)\operatorname{opt}_{c_{1},c_{2}}(A,\varepsilon n)=\Omega(\varepsilon nr_{i}^{2}) for all iti\leq t. Finally, by Lemma 3, there exists a small enough δ=δ(ε)\delta=\delta(\varepsilon), for which optε,δ(A,n)=Ω(nεri2)\operatorname{opt}_{\varepsilon,\delta}(A,n)=\Omega(\frac{n}{\varepsilon}r_{i}^{2}), and this completes the proof. ∎

We show that the first term on the right hand side of (16) is at most O(log3/2dlogNlog(1/δ))optε,δ(A,n)O(\log^{3/2}d\sqrt{\log N\log(1/\delta)})\operatorname{opt}_{\varepsilon,\delta}(A,n) and the second term is O(log2dlog(1/δ))optε,δ(A,n)O(\log^{2}d\log(1/\delta))\operatorname{opt}_{\varepsilon,\delta}(A,n) .

Since, by the definition of tt, dt+1ε<n\frac{d_{t+1}}{\varepsilon}<n, we have the desired bound.

The last equality follows from the definition of the dual norm L\|\cdot\|_{L^{\circ}} and from the fact that LL is a polytope with vertices {aj}j=1N\{a_{j}\}_{j=1}^{N}, so any linear functional on LL is maximized at one of the vertices. From the fact that for all iji\neq j we have UiTUj=0U_{i}^{T}U_{j}=0, from the triangle inequality, and from Lemma 9, we derive

4 Computational Complexity

In this subsection we consider the computational complexity of our algorithms. We pay special attention to approximating the minimum enclosing ellipsoid of a polytope and computing least squares estimators. For both problems we need to go into the properties of known approximation algorithms in order to verify that the approximations are sufficient to guarantee that our algorithms can be implemented in polynomial time without hurting their near-optimality.

Computationally the most expensive step of the base decomposition algorithm (Algorithm 1) is computing the minimum enclosing ellipsoid EE of KK. Computing the exact MEE can be costly: the fastest known algorithms have complexity on the order of dO(d)Nd^{O(d)}N [AS93]. However, for our purposes it is enough to compute an approximation of EE in Banach-Mazur distance, i.e. some ellipsoid EE^{\prime} satisfying 1CEEE\frac{1}{C}E^{\prime}\subseteq E\subseteq E^{\prime} for an absolute constant CC. Known approximation algorithms for MME guarantee that their output is an enclosing ellipsoid with volume approximately equal to that of the MEE [Kha96, TY07]. It is not immediately clear whether such an approximation is also a Banach-Mazur approximation. However, we can use the fact that the algorithms in [Kha96, KY05, TY07] output an ellipsoid EE^{\prime} satisfying approximate complimentary slackness conditions and show that ΠE\Pi E^{\prime} approximates ΠE\Pi E in Banach-Mazur sense for some projection Π\Pi onto a subspace of dimension Ω(d)\Omega(d). This suffices for a slightly weaker version of Lemma 5.

We begin with a definition. Let’s define a vector pNp\in^{N} to be CC-optimal for A=(ai)i=1NA=(a_{i})_{i=1}^{N} if the following conditions are satisfied:

for all i[N]i\in[N], aiT(APAT)1aiCda_{i}^{T}(APA^{T})^{-1}a_{i}\leq C\cdot d where P=diag(p)P=\operatorname{diag}(p) (we use this notation throughout this section).

CC-optimality implies the following property of the the ellipsoid E(p)E(p) which is key to our analysis.

Let E=FB2dE^{*}=F^{*}B_{2}^{d} be the minimum enclosing ellipsoid of K=sym{ai}i=1NK=\operatorname{sym}\{a_{i}\}_{i=1}^{N}, and let pp be CC-optimal for A=(ai)i=1NA=(a_{i})_{i=1}^{N}. Let also E(p)={x:xT(APAT)1xCd}=F(p)B2dE(p)=\{x:x^{T}(APA^{T})^{-1}x\leq Cd\}=F(p)B_{2}^{d}. Then,

Let G=(F)1G=(F^{*})^{-1}. Since GE=B2dGE^{*}=B_{2}^{d}, we have that the MEE of GKGK is the unit ball, and, therefore, Gai221\|Ga_{i}\|_{2}^{2}\leq 1 for all i[N]i\in[N]. Since F(p)F(p)T=CdAPATF(p)F(p)^{T}=Cd\cdot APA^{T}, we have

By Markov’s inequality, σd/42(GF(p))4C\sigma_{d/4}^{2}(GF(p))\leq 4C. Let Π1\Pi_{1} be the projection operator onto the subspace spanned by the left singular vectors of GF(p)GF(p) corresponding to σd/4(GF(p)),,σd(GF(p))\sigma_{d/4}(GF(p)),\ldots,\sigma_{d}(GF(p)). We have Π1GE(p)2C1/2Π1B2d\Pi_{1}GE(p)\subseteq 2C^{1/2}\Pi_{1}B_{2}^{d}. Multiplying on both sides by FF^{*}, we get

Let Π2\Pi_{2} be the matrix Π2=G1Π1G=FΠ1G\Pi_{2}=G^{-1}\Pi_{1}G=F^{*}\Pi_{1}G. Since Π2\Pi_{2} is similar to Π1\Pi_{1}, it is also a projection matrix onto a 3d/43d/4 dimensional subspace. We have that FΠ1=Π2FF^{*}\Pi_{1}=\Pi_{2}F^{*}, and therefore

Define H=4CF(F)TF(p)F(p)TH=4CF^{*}(F^{*})^{T}-F(p)F(p)^{T}. The inclusion above is equivalent to the positive semidefiniteness of the matrix Π2HΠ2T\Pi_{2}H\Pi_{2}^{T}. As Π2\Pi_{2} is a projection onto a 3d/43d/4 dimensional subspace, by the standard minimax characterization of eigenvalues we have λ3d/4(H)0\lambda_{3d/4}(H)\geq 0. We recall the (dual) Weyl inequalities for symmetric d×dd\times d matrices XX and YY:

The inequalities are standard and follow from the minimax characterization of eigenvalues and dimension counting arguments — see, e.g. Chapter 1 in [Tao12]. Substituting X=HX=H and Y=F(p)F(p)TY=F(p)F(p)^{T}, i=3d/4i=3d/4 and j=d/2j=d/2, we have the inequality

Finally we give an analogue of Lemma 5 for the variant of the base decomposition algorithm that uses an approximate MEE. The proof follows from Lemma 10 and the arguments used to prove Lemma 5. We omit a full proof here.

Consider a variant of Algorithm 1 that, at each step, uses E(p)={x:xT(APAT)1xCd}=F(p)B2dE(p)=\{x:x^{T}(APA^{T})^{-1}x\leq Cd\}=F(p)B_{2}^{d}, where pp is O(1)O(1)-optimal for AA, rather than the minimum enclosing ellipsoid E=FB2dE=FB_{2}^{d}. Let did_{i} be the dimension of the span of UiU_{i}. For any ii there exists a subspace WiW_{i} of dimension Ω(di)\Omega(d_{i}) and a set Si[N]S_{i}\subseteq[N] of size Si=Ω(di)|S_{i}|=\Omega(d_{i}), such that σmin2(WiWiTASi)=Ω(1)maxj=1NUiTaj22\sigma_{\min}^{2}(W_{i}W_{i}^{T}A|_{S_{i}})=\Omega(1)\max_{j=1}^{N}{\|U_{i}^{T}a_{j}\|_{2}^{2}}.

One can verify that in all our proofs we can substitute Lemma 11 for Lemma 5 without changing the asymptotics of our lower and upper bounds. Therefore, in all our algorithms we can use the variant of Algorithm 1 from Lemma 11 without compromising near-optimality. This variant of Algorithm 1 runs in time dO(1)Nd^{O(1)}N.

Notice that the base decomposition can be reused for different databases, as long as the query matrix AA stays unchanged; once the decomposition is computed the rest of the algorithm is very efficient: it involves some standard algebraic computations and sampling from an O(d)O(d)-dimensional gaussian distribution. Furthermore, any ellipsoid EE^{\prime} containing KK suffices for privacy, and one may use heuristic approximations to the MEE problem.

4.2 Least Squares Estimator

Except for base decomposition, the other potentially computationally expensive step in Algorithm 3 is the computation of a least squares estimator y^1\hat{y}_{1}. This is a quadratic minimization problem, and can be approximated by the simple Frank-Wolfe gradient descent algorithm [FW56]. In particular, for a point y^\hat{y}^{\prime} such that y^y22miny^Ly^y22+α\|\hat{y}^{\prime}-y\|_{2}^{2}\leq\min_{\hat{y}\in L}{\|\hat{y}^{\prime}-y\|_{2}^{2}}+\alpha, Lemma 1 holds to within an additive approximation factor α\alpha, i.e. y^y224wL+α\|\hat{y}^{\prime}-y\|_{2}^{2}\leq 4\|w\|_{L^{\circ}}+\alpha. We call such a point y^\hat{y}^{\prime} an α\alpha additive approximation to the least squares estimator problem. By the analysis of Clarkson [Cla10], TT iterations of the Frank-Wolfe algorithm give an additive approximation where α4C(L)/(T+3)\alpha\leq 4C(L)/(T+3), for C(L)supu,vLu,uvC(L)\leq\sup_{u,v\in L}{|\langle u,u-v\rangle|}. In our case L=nXXTKL=nXX^{T}K. In order to have near optimality for Algorithm 3, an additive approximation αi=1tnri2\alpha\leq\sum_{i=1}^{t}{nr_{i}^{2}} suffices. Using the triangle inequality and Cauchy-Schwarz, we can bound C(L)C(L) for L=nXXTKL=nXX^{T}K as

Therefore, T=O(n)T=O(n) iterations of the Frank-Wolfe algorithm suffice. Since each iteration of the algorithm involves NN dot product computations and solving a homogeneous linear system in at most dd variables and at most dd equations, it follows that an approximate version of Algorithm 3 with unchanged asymptotic optimality guarantees can be implemented in time dO(1)Nnd^{O(1)}Nn.

We note that the approximation algorithm of Khachiyan for the MEE problem, as well as its modification in [TY07], can also be interpreted as instances of the Frank-Wolfe algorithm (see [TY07] for details).

Results for Pure Privacy

Our geometric approach to approximate privacy allows us to better understand the optimal error required for approximate privacy vs. that required for pure privacy. Our first result bounds the gap between the optimal error bounds for the two notions of privacy in the dense case. Then we extend these ideas and give a (ε,0)(\varepsilon,0)-differentially private algorithm which nearly matches the guarantees of Algorithm 3 for sparse databases.

In this subsection we investigate the worst-case gap between optε,δ(A)\operatorname{opt}_{\varepsilon,\delta}(A) (for small enough δ>0\delta>0) and optε,0(A)\operatorname{opt}_{\varepsilon,0}(A) over all query matrices AA. At the core of our analysis is a natural geometric fact: for any symmetric polytope KK with NN vertices in dd-dimensional space we can find a subset of dd vertices of KK whose symmetric convex hull has volume radius at most a factor O(log(N/d))O(\sqrt{\log(N/d)}) smaller than the volume radius of KK. Our proof of this fact goes through analyzing the contact points of KK with its minimum enclosing volume ellipsoid, and a bound on the volume of polytopes with few vertices.

Let K=sym{a1,,aN}\mathdsRdK=\operatorname{sym}\{a_{1},\ldots,a_{N}\}\subseteq{\mathds{R}}^{d} and let EE be an ellipsoid of minimal volume containing KK. There exists a set S[N]S\subseteq[N] of size dd such that the matrix AS=(ai)iSA|_{S}=(a_{i})_{i\in S} satisfies det(AS)1/d=Ω(vrad(E))\det(A|_{S})^{1/d}=\Omega(\operatorname{vrad}(E)).

For the proof of 15 we will use John’s theorem (Theorem 6) and the following elementary algebraic result.

Let u1,,umu_{1},\ldots,u_{m} be dd-dimensional unit vectors and let c1,,cmc_{1},\ldots,c_{m} be positive reals such that

Then there exists a set S[m]S\subseteq[m] of size dd such that the matrix U=(ui)iSU=(u_{i})_{i\in S} satisfies det(U)1/d=Ω(1)|\det(U)|^{1/d}=\Omega(1).

Notice that tr(uiuiT)=ui22=1{\rm tr}(u_{i}u_{i}^{T})=\|u_{i}\|_{2}^{2}=1 for all ii. By taking traces of both sides of (17), we have ci=d\sum{c_{i}}=d.

Let U=(ui)i=1mU=(u_{i})_{i=1}^{m} and let CC be the m×mm\times m diagonal matrix with (ci)i=1m(c_{i})_{i=1}^{m} on the diagonal. Then we can write I=ciuiuiT=UCUTI=\sum{c_{i}u_{i}u_{i}^{T}}=UCU^{T}. By the Binet-Cauchy formula for the determinant,

The inequality (18) follows since each term iSci\prod_{i\in S}{c_{i}} appears d!d! times in the expansion of (ci)d(\sum{c_{i}})^{d} and all other terms in the expansion are positive. Using the inequality d!(d/e)dd!\geq(d/e)^{d}, we have that maxS[m]:S=ddet(US)2/d1/e\max_{S\subseteq[m]:|S|=d}{\det(U|_{S})^{2/d}}\geq 1/e, and this completes the proof. ∎

Proof of Theorem 15: We can write the minimum enclosing ellipsoid EE as vrad(E)FB2\operatorname{vrad}(E)FB_{2} where FF is a linear map with determinant 11. Since F1F^{-1} does not change volumes, B2B_{2} is a minimal volume ellipsoid of vrad(E)1F1K\operatorname{vrad}(E)^{-1}F^{-1}K. Also, for any AS=(ai)iSA|_{S}=(a_{i})_{i\in S}, where S[N]S\subseteq[N], we have

Therefore, it is sufficient to show that for L=sym{u1,,uN}L=\operatorname{sym}\{u_{1},\ldots,u_{N}\} such that B2B_{2} is the minimal volume ellipsoid of LL, there exists a set S[N]S\subseteq[N] such that the matrix US=(ui)iSU|_{S}=(u_{i})_{i\in S} satisfies det(US)1/d=Ω(1)\det(U|_{S})^{1/d}=\Omega(1). Since, by convexity, the contact points LB2L\cap B_{2} of LL are a subset of u1,,uNu_{1},\ldots,u_{N}, the statement follows from Theorem 6 and Lemma 12. \blacksquare

Combined with the following theorem of Bárány and Füredi [BF88], and also Gluskin [Glu07] (with sharper bounds), Theorem 15 implies the corollary that for any dd-dimensional symmetric polytope one can find a set of dd vertices whose symmetric convex hull captures a significant fraction of the volume of the polytope.

Let K=sym{a1,,aN}K=\operatorname{sym}\{a_{1},\ldots,a_{N}\} and let EE be an ellipsoid containing KK. Then vrad(K)O(log(N/d)d)vrad(E)\operatorname{vrad}(K)\leq O\left(\sqrt{\frac{\log(N/d)}{d}}\right)\operatorname{vrad}(E).

For any K=sym{a1,,aN}K=\operatorname{sym}\{a_{1},\ldots,a_{N}\} there exists a set S[N]S\subseteq[N] such that

Finally, we describe the application to differential privacy. By Corollary 1, volLB(A,ε)=O(1ε2log(N/d))detLB(A,d)\operatorname{volLB}(A,\varepsilon)=O(\frac{1}{\varepsilon^{2}}\log(N/d))\operatorname{detLB}(A,d). Also, by (11), detLB(A,d)=O(log2d)optc1,c2(A,d)\operatorname{detLB}(A,d)=O(\log^{2}d)\operatorname{opt}_{c_{1},c_{2}}(A,d). Finally, Lemma 3 implies that optc1,c2(A,d)ε2optε,δ(A,d/ε)\operatorname{opt}_{c_{1},c_{2}}(A,d)\leq{\varepsilon^{2}}\operatorname{opt}_{\varepsilon,\delta}(A,d/\varepsilon) for δ\delta small enough with respect to ε\varepsilon. Putting all this together and using Theorem12, we have the following theorem (Theorem 2).

For small enough ε\varepsilon and all δ\delta small enough with respect to ε\varepsilon, for any d×Nd\times N real matrix AA we have

2 Sparse Case under Pure Privacy

We further extend our results from Section 3.3 and show an efficient (ε,0)(\varepsilon,0)-differentially private algorithm which, on input any query matrix AA and any database size bound nn, nearly matches optε,0(A,n)\operatorname{opt}_{\varepsilon,0}(A,n). This proves our main Theorem 4. In fact, our result is stronger: we show an (ε,0)(\varepsilon,0)-differentially private mechanism whose error nearly matches optε,δ(A,n)\operatorname{opt}_{\varepsilon,\delta}(A,n) for all δ\delta small enough with respect to ε\varepsilon. Thus, the result of this subsection can be seen as a generalization of Theorem 17 to the sparse databases regime.

Our algorithm for sparse databases under pure privacy closely follows Algorithm 3: we add noise from a distribution that is tailored to AA but oblivious to the database xx; then we use least squares estimation to reduce error on sparse databases. However, Gaussian noise does not preserve (ε,0)(\varepsilon,0)-differential privacy, and we need to use a different noise distribution. Intuitively, one expects that adding noise sampled from a near-optimal distribution [HT10, BDKT12] and then computing a least squares estimator would be nearly optimal, analogously to Algorithm 3. We are not currently able to analyze the error of this algorithm, but instead we analyze a variant of Algorithm 3 where the Gaussian distribution is simply substituted with the generalized KK-norm distribution from [HT10]. Intuitively, we are able to show that the generalized KK-norm distribution “approximates a Gaussian” well enough for our analysis to go through. A main tool in our analysis is a classical concentration of measure inequality from convex geometry.

We begin with a slight generalization of the main upper bound result of Hardt and Talwar [HT10]. This generalization follows directly from the methods used in [HT10] with only minor modifications in the proofs. We omit a full derivation here. Also, while the methods of Hardt and Talwar will lead to a proof conditional on the truth of the Hyperplane conjecture from convex geometry, using the ideas of Bhaskara et al. [BDKT12] the result can be made unconditional.

Let A=(ai)i=1NA=(a_{i})_{i=1}^{N} be an d×Nd\times N real matrix and let K=sym{ai}i=1NK=\operatorname{sym}\{a_{i}\}_{i=1}^{N}. There exists an efficiently computable and efficiently sampleable distribution W(A,ε)\mathcal{W}(A,\varepsilon) such that the following claims hold:

the algorithm MK\mathcal{M}_{K} which on input xx outputs Ax+wAx+w for wW(A,ε)w\sim\mathcal{W}(A,\varepsilon) satisfies (ε,0)(\varepsilon,0)-differential privacy;

Using the distribution W(A,ε)\mathcal{W}(A,\varepsilon), we define our near optimal sparse-case algorithm satisfying pure differential privacy as Algorithm 4.

Let Mp(A,x,n)\mathcal{M}_{p}(A,x,n) be the output of Algorithm 4 on input a d×Nd\times N query matrix AA, database size bound nn and private input xx. Mp(A,x,n)\mathcal{M}_{p}(A,x,n) is (ε,0)(\varepsilon,0)-differentially private and for all small enough ε\varepsilon and all δ\delta small enough with respect to ε\varepsilon satisfies

Once again privacy follows by a straightforward argument from the privacy of the underlying noise-adding mechanism, in this case the generalized KK-norm mechanism.

Mp(A,x,n)\mathcal{M}_{p}(A,x,n) satisfies (ε,0)(\varepsilon,0)-differential privacy.

Next we prove the main technical lemma we need in order to show near optimality. The analysis is very similar to that of Lemma 9. The main technical challenge is to show that the distribution W\mathcal{W} has all the properties we needed from the Gaussian distribution: covariance with bounded operator norm and exponential concentration. We use ideas from Section 4.1 and the following variant of a classical concentration of measure inequality, due to Borell [Bor75] (proved in the appendix).

Let μ\mu be a log-concave distribution over \mathdsRd{\mathds{R}}^{d}. Assume that AA is a symmetric convex subset of \mathdsRd{\mathds{R}}^{d} such that μ(A)=θ23\mu(A)=\theta\geq\frac{2}{3}. Then, for every t>1t>1 we have

We are now ready to prove the counterpart of Lemma 9 for Mp\mathcal{M}_{p}.

As in Algorithm 3, we define ri=maxj=1NUiTaj2r_{i}=\max_{j=1}^{N}{\|U_{i}^{T}a_{j}\|_{2}}. Equivalently rir_{i} is the radius of the smallest did_{i}-dimensional ball which contains UiTKU_{i}^{T}K. In the proof of Lemma 9 we argued that optε,δ(A,n)=Ω(nεri2)\operatorname{opt}_{\varepsilon,\delta}(A,n)=\Omega(\frac{n}{\varepsilon}r_{i}^{2}) for all small enough ε\varepsilon and all δ\delta small enough with respect to ε\varepsilon.

Therefore, for any aja_{j}, we can derive the following bound:

Thus applying Cauchy Schwarz once again, we conclude

For any jj, the set {w:UiTaj,iwiT}\{w:|\langle U_{i}^{T}a_{j},\sum_{i}w_{i}\rangle|\leq T\} is symmetric and convex for any bound TT. Then, by Chebyshev’s inequality, and Theorem 20 there exists a constant CC such that for any i,ji,j and α>2\alpha>2

Using a union bound and taking expectations completes the proof. ∎

Proof of Theorem 19: The privacy guarantee follows from Lemmas 13. By Lemma 14 analogously to the proof of Theorem 14, we can conclude that

Pythagoras theorem then implies the result. \blacksquare

Universal bounds

The mechanism Ms\mathcal{M}_{s} of Algorithm 5 is (ε,δ)(\varepsilon,\delta)-differentially private and satisfies

The privacy of the mechanism is immediate from Lemma 4. To analyze the error, we use Lemma 1 and the fact that L=nKL=nK to bound

where {aj}j=1N\{a_{j}\}_{j=1}^{N} are columns of AA. We have used the fact that the wK=maxaKa,w\|w\|_{K^{\circ}}=\max_{a\in K}\langle a,w\rangle is attained at one of the vertices of KK. Since each aj,w\langle a_{j},w\rangle is a Gaussian with variance r4c(ε,δ)2r^{4}c(\varepsilon,\delta)^{2}, aj,w|\langle a_{j},w\rangle| exceeds r2c(ε,δ)tlogNr^{2}c(\varepsilon,\delta)\sqrt{t\log N} with probability at most 1Nt\frac{1}{N^{t}}. Taking a union bound, we conclude that this expectation of the maximum is O(r2c(ε,δ)logN)O(r^{2}c(\varepsilon,\delta)\sqrt{\log N}). Recall that r=maxj=1Naj2dr=\max_{j=1}^{N}{\|a_{j}\|_{2}}\leq\sqrt{d}. It follows that

For getting pure ε\varepsilon-DP, we simply substitute the generalized KK-norm distribution guaranteed by Theorem 18 instead of the Gaussian noise.

We first observe an upper bound on the volume radius of projections of KK.

Let Ad×NA\in^{d\times N} and let K=sym{a1,,aN}K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}. Let Π(k)\Pi^{(k)} be a rank kk orthogonal projection that maps \mathdsRd{\mathds{R}}^{d} to \mathdsRk{\mathds{R}}^{k}. Then

Now we show that Algorithm 6 achieves the bound claimed in Theorem 5.

The mechanism Msp\mathcal{M}_{sp} of Algorithm 6 is (ε,0)(\varepsilon,0)-differentially private and satisfies

Extensions

In this section we describe a couple of extensions to our results. We show how to translate our optimality guarantees for total squared error in the dense case regime to worst case error over queries using the minimax theorem. We further show that our nearly optimal efficient mechanism in the dense case regime implies a polylogarithmic approximation to hereditary discrepancy.

Let A\mathdsRd×NA\in{\mathds{R}}^{d\times N}, ε,δ\varepsilon,\delta be given. There is an (ε,δ)(\varepsilon,\delta)-DP mechanism Mwe\mathcal{M}_{we}, and a non-negative diagonal matrix PP with tr(P2)=1{\rm tr}(P^{2})=1 such that for all i[d]i\in[d],

Thus the mechanism MgPA\mathcal{M}_{g}^{PA} satisfies

Recall that the mechanism MgPA\mathcal{M}_{g}^{PA} is of the form Ax+wAx+w where ww is a noise vector whose distribution is independent of xx. Thus the pair (D,P)(\mathcal{D},P) can be computed without looking at the database xx. Therefore, the mechanism Mwe\mathcal{M}_{we} that samples a M\mathcal{M} from D\mathcal{D} and runs it on xx is itself (ε,δ)(\varepsilon,\delta)-DP. The theorem follows. ∎

Using the above result, we can now construct a mechanism that has small expected worse case error.

Let A\mathdsRd×NA\in{\mathds{R}}^{d\times N}, ε,δ\varepsilon,\delta be given. There is an (ε,δ)(\varepsilon,\delta)-DP mechanism Mew\mathcal{M}_{ew}, and a non-negative diagonal matrix PP with tr(P2)=1{\rm tr}(P^{2})=1 such that

The resulting mechanism is not necessarily (ε,δ)(\varepsilon,\delta)-differentially private any more. However, since its outcome can be computed by postprocessing the outcome of LL (ε,δ)(\varepsilon,\delta)-differentially private mechanisms, it is still (Lε,Lδ)(L\varepsilon,L\delta)-differentially private. Scaling ε\varepsilon and δ\delta by a factor of LL, and substituting for β\beta, we get the result. ∎

2 Approximating Hereditary Discrepancy

Muthukrishnan and Nikolov [MN12] show that hereditary discrepancy gives a lower bound on the error of any mechanism.

There exist constants ε,δ\varepsilon,\delta such that the following holds: Let AA be a d×Nd\times N matrix and M\mathcal{M} be an (ϵ,δ)(\epsilon,\delta)-differentially private mechanism. Then

On the other hand, the mechanism of Theorem 24 satisfies

The description of this mechanism Mew\mathcal{M}_{ew} (which is specified by a distribution over O(logd)O(\log d) Gaussian noise addition mechanisms) thus serves as an efficiently computable and verifiable witness to an upper bound on the hereditary discrepancy of AA.

We next give a constrcutive version of this result, by appealing to a result of Bansal [Ban10] that shows that the discrepancy of any set system can be constructively upper bounded by O(logdN)O(\log dN) times the hereditary vector discrepancy. The vector discrepancy of a matrix AA, denoted vecdisc(A)\operatorname{vecdisc}(A) is defined as the smallest λ\lambda such that the following semidefinite program is feasible:

The hereditary vector discrepancy is simply hervecdisc(A)maxS[N]vecdisc(AS)\operatorname{hervecdisc}(A)\triangleq\max_{S\subseteq[N]}\operatorname{vecdisc}(A|_{S}). We will show that given the mechanism Mew\mathcal{M}_{ew} of Theorem 24, we can construct for any SS a solution to the SDP corresponding to SS. We note that the above SDP is a slight relaxation of the SDP used by Bansal: instead of the constraint Vjj=1V_{jj}=1 for all jj, we simply require each VjjV_{jj} to be at most one, and that the trace of VV is Ω(m)\Omega(m). It is easy to verify that [Ban10] implies that:

Suppose that for a matrix A\mathdsRd×NA\in{\mathds{R}}^{d\times N} and a λ0\lambda\geq 0, for every S[N]S\subseteq[N] with rank(AS)=Srank(A|_{S})=|S|, it is the case that the SDP 20 is feasible for ASA|_{S}. Them there is polynomial time algorithm that finds a coloring χ\chi of [N][N] with discrepancy at most O(λlogdN)O(\lambda\log dN).

We consider a variant of the above SDP, where we drop the Vjj1V_{jj}\leq 1 constraint and require the trace of VV to be slightly larger:

We first show that it suffices to satisfy the SDP 21.

Suppose that for a matrix A\mathdsRd×NA\in{\mathds{R}}^{d\times N}, for every S[N]S\subseteq[N] with rank(AS)=Srank(A|_{S})=|S|, it is the case that the SDP 21 is feasible for ASA|_{S} and λ\lambda. Then for any S[N]S\subseteq[N] with rank(AS)=Srank(A|_{S})=|S|, the SDP 20 is feasible for ASA|_{S} and 2λ2\lambda.

We construct a feasible solution to 20 by repeatedly using solutions to 21 for different values of the restriction ASA|_{S}. Fix an S[N]S\subseteq[N] with S=m|S|=m and let S0=SS_{0}=S. Let W0W^{0} be the d×md\times m zero matrix indexed by S[N]S\subseteq[N]. Let V0V^{0} be a solution to SDP 21 for AS0A|_{S_{0}}. Let γ0=maxjVjj0\gamma_{0}=\max_{j}V^{0}_{jj}, and let Vˉ0=V0/2γ0\bar{V}^{0}=V^{0}/2\gamma_{0}. For each j,jS0j,j^{\prime}\in S_{0}, set Wjj1=Wjj0+Vˉjj0W^{1}_{jj^{\prime}}=W^{0}_{jj^{\prime}}+\bar{V}^{0}_{jj^{\prime}}. Finally we set S1=S0{j:Wjj114}S_{1}=S_{0}\setminus\{j:W^{1}_{jj}\geq\frac{1}{4}\}.

Given SiS_{i} such that Sim/2|S_{i}|\geq m/2, we let ViV^{i} be a feasible solution to the SDP for ASiA|_{S_{i}} and set γi=maxjVjji\gamma_{i}=\max_{j}V^{i}_{jj}, and Vˉi=Vi/2γi\bar{V}^{i}=V^{i}/2\gamma_{i}. For each j,jSij,j^{\prime}\in S_{i}, set Wjji+1=Wjji+VˉjjiW^{i+1}_{jj^{\prime}}=W^{i}_{jj^{\prime}}+\bar{V}^{i}_{jj^{\prime}}. We update Si+1=Si{j:Wjji+114}S_{i+1}=S_{i}\setminus\{j:W^{i+1}_{jj}\geq\frac{1}{4}\}. We stop once Si|S_{i}| falls below m/2m/2, say in iteration LL. It is easy to see that we delete at least one jj from SiS_{i} in each step, so that this process converges.

We claim that WLW^{L} is a feasible solution to SDP 20. Observe that WLW^{L} is simply a a non-negative linear combination of ViV^{i}’s and hence is positive semidefinite. By definition of Vˉ\bar{V}, each diagonal entry is at most 12\frac{1}{2}, so that the definition of SiS_{i} ensures that WjjL34<1W^{L}_{jj}\leq\frac{3}{4}<1 for all jj. Moreover, for each jSSLj\in S\setminus S_{L}, the entry WjjL14W^{L}_{jj}\geq\frac{1}{4}, and there are at least m/2m/2 such jj, so that the trace of WLW^{L} is at least m/8m/8 as required. Finally, a(i)WLa(i)Ta^{(i)}W^{L}a^{(i)T} is simply the sum t=0L1a(i)Vta(i)T2γt\sum_{t=0}^{L-1}\frac{a^{(i)}V^{t}a^{(i)T}}{2\gamma_{t}}. Thus it suffices to show that t=1L1(2γt)12\sum_{t=1}^{L-1}(2\gamma_{t})^{-1}\leq 2. Note that tr(Wt+1Wt)=tr(Vˉt)(2γt)1St(2γt)1m/2{\rm tr}(W^{t+1}-W^{t})={\rm tr}(\bar{V}^{t})\geq(2\gamma_{t})^{-1}|S_{t}|\geq(2\gamma_{t})^{-1}m/2. It follows that mtr(WL)t=1L1(2γt)1m/2m\geq{\rm tr}(W^{L})\geq\sum_{t=1}^{L-1}(2\gamma_{t})^{-1}m/2 so that t=1L1(2γt)12\sum_{t=1}^{L-1}(2\gamma_{t})^{-1}\leq 2 as needed. The claim follows. ∎

Finally, we describe how to use the mechanism Mew\mathcal{M}_{ew} of Theorem 24 to construct a feasible solution to SDP 21.

Conclusions

We have presented near optimal mechanisms for any linear query for dense and sparse databases, under both pure and approximate differential privacy. Our mechanisms are simple and efficient, and it would be instructive to implement them so as to compare them with existing techniques.

Acknowledgements

The first and second named authors would like to thank Moritz Hardt and Daniel Dadush for several helpful discussions. The first named author was partially supported by a Simons Graduate Fellowship, award number 252861.

References

Appendix A Concentration of Log concave measures

The following measure concentration inequality is standard. We include a proof below for completeness. We start by stating the Brunn-Minkowski inequality.

Let μ\mu be a log-concave measure on d\Re^{d}, let α,β0\alpha,\beta\geq 0 be numbers such that α+β=1\alpha+\beta=1, and let A,BdA,B\subseteq\Re^{d} be measurable sets such that the set αA+βB\alpha A+\beta B is measurable. Then

This can be used to prove Borell’s lemma for arbitrary log concave distributions. We use the proof approach presented in [Gia03].

Applying the Brunn-Minkowski inequality and rearranging proves the result. \blacksquare

Appendix B From Concentration to Expectation

We repeatedly use the fact that a exponential or better bound on the upper tail implies a bound on the expectation. For completeness, we give a quantitative version with a proof.

The claim follows by linearity of expectation. ∎

Appendix C Hardness of Approximating Hereditary Discrepancy

The proof of Theorem 29 is a straight-forward reduction from the 2-colorability problem for 3-uniform hypergraphs. A maximization version of this problem (i.e. maximize the number of bichromatic edges) is also known as Max-E3-Set Splitting and is equivalent to NotAllEqual-SAT restricted to inputs with no negated variables.

A hypergraph H=(V,E)H=(V,E), where E2VE\subseteq 2^{V}, is 2-colorable if and only if there exists a set TVT\subseteq V such that for all eEe\in E, TeeT\cap e\neq e and TeT\cap e\neq\emptyset. The set TT is called a transversal of HH.

There exists a family of 3-uniform hypergraphs such that deciding whether a hypergraph in the family is 2-colorable is NP\mathsf{NP}-complete.

Proof of Theorem 29: The reduction simply maps a 3-uniform hypergraph to its incidence matrix. I.e. for a hypergraph H=(V,E)H=(V,E), where V={v1,,vn}V=\{v_{1},\ldots,v_{n}\} and E={e1,,em}E=\{e_{1},\ldots,e_{m}\}, we create a m×nm\times n matrix AA, where Aij=1A_{ij}=1 if vjeiv_{j}\in e_{i} and Aij=0A_{ij}=0 otherwise. Observe that if HH is 2-colorable, and this is witnessed by a transversal TT, then (AS)x2\|(A|_{S})x\|_{\infty}\leq 2 for all S[n]S\subseteq[n] and for xx defined by xi=+1viTx_{i}=+1\Leftrightarrow v_{i}\in T. On the other hand, if HH is not 2-colorable, for any x{+1,1}nx\in\{+1,-1\}^{n} we have Ax3\|Ax\|_{\infty}\geq 3, since otherwise the set T={vi:xi=+1}T=\{v_{i}:x_{i}=+1\} would be a transversal. \blacksquare

We note that Guruswami proved constant hardness of approximation for Max-E3-Set Splitting [Gur00]. In particular, he showed that it is NP\mathsf{NP}-hard to distinguish 2-colorable 3-uniform hypergraphs from hypergraphs for which any coloring with 2 colors leaves at least a 1/201/20 fraction of the edges monochromatic.