The Geometry of Differential Privacy: the Sparse and Approximate Cases

Aleksandar Nikolov, Kunal Talwar, Li Zhang

Introduction

Differential privacy [DMNS06] is a recent privacy definition that has quickly become the standard notion of privacy in statistical databases. Informally, a mechanism (a randomized function on databases) satisfies differential privacy if the distribution of the outcome of the mechanism does not change noticeably when one individual’s input to the database is changed. Privacy is measured by how small this change must be: an $\varepsilon$ -differentially private ( $\varepsilon$ -DP) mechanism $\mathcal{M}$ satisfies ${\rm Pr}[\mathcal{M}(x)\in S]\leq\exp(\varepsilon){\rm Pr}[\mathcal{M}(x^{\prime}{})\in S]$ for any pair $x,x^{\prime}{}$ of neighboring databases, and for any measurable subset $S$ of the range. A relaxation of this definition is approximate differential privacy. A mechanism $\mathcal{M}$ is $(\varepsilon,\delta)$ -differentially private ( $(\varepsilon,\delta)$ -DP) if ${\rm Pr}[\mathcal{M}(x)\in S]\leq\exp(\varepsilon){\rm Pr}[\mathcal{M}(x^{\prime}{})\in S]+\delta$ with $x,x^{\prime}{},S$ as before. Here $\delta$ is thought of as negligible in the size of the database. Both these definitions satisfy several desirable properties such as composability, and are resistant to post-processing of the output of the mechanism.

We think of the database as being given by a multiset of database rows, one for each individual. We will let $N$ denote the size of the universe that these rows come from, and we will denote by $n$ the number of individuals in the database. We can represent the database as its histogram $x\in{\mathds{R}}^{N}$ with $x_{i}$ denoting the number of occurrences of the $i$ th element of the universe. Thus $x$ would in fact be a vector of non-negative integers with $\|x\|_{1}=n$ . We will be concerned with reporting reasonably accurate answers to a given set of $d$ linear queries over this histogram $x$ . This set of queries can naturally be represented by a matrix $A\in{\mathds{R}}^{d\times N}$ with the vector $Ax\in{\mathds{R}}^{d}$ giving the correct answers to the queries. When $A\in\{0,1\}^{d\times N}$ , we call such queries counting queries. We are interested in the (practical) regime where $N\gg d\gg n$ , although our results hold for all settings of the parameters.

Specific sets of counting queries of interest can however admit much better mechanisms than adversarially chosen queries for which the lower bounds are shown. Indeed several classes of specific queries have attracted attention. Some, such as range queries, are “easier”, and asymptotically better mechanisms can be designed for them. Others, such as constant dimensional contingency tables, are nearly as hard as general counting queries, and asymptotically better mechanisms can be ruled out in some ranges of the parameters. These query-specific upper bounds are usually proved by carefully exploiting the structure of the query, and query-specific lower bounds have been proved by reconstruction attacks that exploit a lower bound on the smallest singular value of an appropriately chosen $A$ [DN03, DMT07, DY08, KRSU10, De12, KRS13]. It is natural to address this question in a competitive analysis framework: can we design an efficient algorithm that given any query $A$ , computes (even approximately) the minimum error differentially private mechanism for $A$ ?

Hardt and Talwar [HT10] answered this question in the affirmative for $\varepsilon$ -DP mechanisms, and gave a mechanism that has error within factor $O(\log^{3}d)$ of the optimal assuming a conjecture from convex geometry known as the hyperplane conjecture or the slicing conjecture. Bhaskara et al. [BDKT12] removed the dependence on the hyperplane conjecture and improved the approximation ratio to $O(\log^{2}d)$ . Can relaxing the privacy requirement to $(\varepsilon,\delta)$ -DP help with accuracy? In many settings, $(\varepsilon,\delta)$ -DP mechanisms can be simpler and more accurate than the best known $\varepsilon$ -DP mechanisms. This motivates the first question we address.

Given $A$ , can we efficiently approximate the optimal error $(\varepsilon,\delta)$ -DP mechanism for it?

Hardt and Talwar [HT10] showed that for some $A$ , the lower bound for $\varepsilon$ -DP mechanism can be $\Omega(\log N)$ larger than known $(\varepsilon,\delta)$ -DP mechanisms. For non-linear Lipschitz queries, De [De12] showed that this gap can be as large as $\Omega(\sqrt{d})$ (even when $N=d$ ). This leads us to ask:

How large can the gap between the optimal $\varepsilon$ -DP mechanism and the optimal $(\varepsilon,\delta)$ -DP mechanism be for linear queries?

Given $A$ and $n$ , can we approximate the optimal sparse case error $(\varepsilon,\delta)$ -DP mechanism for $A$ when restricted to databases of size at most $n$ ?

Given $A$ and $n$ , can we approximate the optimal sparse case error $\varepsilon$ -DP mechanism for $A$ when restricted to databases of size at most $n$ ?

In this paper, we are interested in both the case when $X={\mathds{R}}^{N}$ , called the dense case, and when $X=nB_{1}^{N}$ for $n<d$ , called the sparse case. We also write $\operatorname{err}_{\mathcal{M}}(A)\triangleq\operatorname{err}_{\mathcal{M}}(A,{\mathds{R}}^{N})$ and $\operatorname{err}_{\mathcal{M}}(A,n)\triangleq\operatorname{err}_{\mathcal{M}}(A,nB_{1}^{N})$ .

Our first result is a simple and efficient mechanism that for query matrix $A$ gives an $O(\log^{2}d)$ approximation to the optimal error.

Given a query matrix $A\in{\mathds{R}}^{d\times N}$ , there is an efficient $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}$ and an efficiently computable lower bound $L_{A}$ such that

$\operatorname{err}_{M}(A)\leq O(\log^{2}d\log 1/\delta)\cdot L_{A}$ , and

for any $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}^{\prime}{}$ , $\operatorname{err}_{\mathcal{M}^{\prime}{}}(A,d)\geq L_{A}$ .

We also show that the gap of $\Omega(\log(N/d))$ between $\varepsilon$ -DP and $(\varepsilon,\delta)$ -DP mechanisms shown in [HT10] is essentially the worst possible, within $\operatorname{polylog}(d)$ factor, for linear queries. More precisely, the lower bound on $\varepsilon$ -DP mechanisms used in [HT10] is always within $O(\log(N/d)\operatorname{polylog}(d))$ of the lower bound $L_{A}$ computed by our algorithm above. Let $\mathcal{M}^{*}$ denote the $\varepsilon$ -DP generalized $K$ -norm mechanism in [HT10].

For any $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}$ , $\operatorname{err}_{\mathcal{M}}(A)=\Omega(1/(\log^{O(1)}(d)\log(N/d)))\operatorname{err}_{\mathcal{M}^{*}}(A)$ .

We next move to the sparse case. Here we give results analogous to the dense case with a slightly worse approximation ratio.

Given $A\in{\mathds{R}}^{d\times N}$ and a bound $n$ , there is an efficient $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}$ and an efficiently computable lower bound $L_{A,n}$ such that

$\operatorname{err}_{\mathcal{M}}(A,n)\leq O(\log^{3/2}d\cdot\sqrt{\log N\log 1/\delta}+\log^{2}d\log 1/\delta)\cdot L_{A,n}$ , and

For any $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}^{\prime}{}$ , $\operatorname{err}_{\mathcal{M}^{\prime}{}}(A,n)\geq L_{A,n}$ .

Given $A\in{\mathds{R}}^{d\times N}$ and a bound $n$ , there is an efficient $\varepsilon$ -DP mechanism $\mathcal{M}$ and an efficiently computable lower bound $L_{A,n}$ such that

$\operatorname{err}_{\mathcal{M}}(A,n)\leq O(\log^{O(1)}d\cdot\log^{3/2}N)\cdot L_{A,n}$ , and

For any $\varepsilon$ -DP mechanism $\mathcal{M}^{\prime}{}$ , $\operatorname{err}_{\mathcal{M}^{\prime}{}}(A,n)\geq L_{A,n}$ .

We remark that in these theorems, our upper bounds hold for all $x$ with $\|x\|_{1}\leq n$ , whereas the lower bounds hold even when $x$ is an integer vector.

The $(\varepsilon,\delta)$ -DP mechanism of Theorem 3 when run on any counting query has error no larger than the best known bounds [GRU12] for counting queries, up to constants (not ignoring logarithmic factors). The $\varepsilon$ -DP mechanism of Theorem 4 when run on any counting query can be shown to have nearly the same asymptotics, answering question 5 in the affirmative.

We will summarize some key ideas we use to achieve these results. More details will follow in Section 1.2.

For the upper bounds, the first crucial step is to decompose $A$ into “geometrically nice” components and then add Gaussian noise to each component. This is similar to the approach in [HT10, BDKT12] but we use the minimum volume enclosing ellipsoid, rather than the $M$ -ellipsoid used in those works, to facilitate the decomposition process. This allows us to handle the approximate and the sparse cases. In addition, it simplifies the mechanism as well as the analysis. For the sparse case, we further couple the mechanism with least squares estimation of the noisy answer with respect to $nAB_{1}^{N}$ . By utilizing techniques from statistical estimation, we can show that this process can reduce the error when $n<d$ , and prove an error upper bound dependent on the size of the smallest projection of $nAB_{1}^{N}$ .

For the lower bounds, we first lower bound the accuracy of $(\epsilon,\delta)$ -DP mechanism by the hereditary discrepancy of the query matrix $A$ , which we in turn lower bound in terms of the least singular values of submatrices of $A$ . Finally, we close the loop by utilizing the restricted invertibility principle by Bourgain and Tzafriri [BT87] and its extension by Vershynin [Ver01] which, informally, shows that if there does not exist a “small” projection of $nAB_{1}^{N}$ then $A$ has a “large” submatrix with a “large” least singular value.

The discrepancy of a matrix $A\in{\mathds{R}}^{d\times N}$ is defined to be $\operatorname{disc}(A)=\min_{x\in\{-1,+1\}^{N}}\|Ax\|_{\infty}$ . The hereditary discrepancy of a matrix is defined as $\operatorname{herdisc}(A)=\max_{S\subseteq[N]}\operatorname{disc}(A|_{S})$ , where $A|_{S}$ denotes the matrix $A$ restricted to the columns indexed by $S$ .

As hereditary discrepancy is a maximum over exponentially many submatrices, it is not a priori clear if there even exists a polynomial-time verifiable certificate for low hereditary discrepancy. Additionally, we can show that it is $\mathsf{NP}$ -hard to approximate hereditary discrepancy to within a factor of $3/2$ . Bansal [Ban10] gave a pseudo-approximation algorithm for hereditary discrepancy, which efficiently computes a coloring of discrepancy at most a factor of $O(\log dN)$ larger than $\operatorname{herdisc}(A)$ for a $d\times N$ matrix $A$ . His algorithm allows efficiently computing a lower bound on $\operatorname{herdisc}$ for any restriction $A|_{S}$ ; however, such a lower bound may be arbitrarily loose, and before our work it was not known how to efficiently compute nearly matching lower and upper bounds on $\operatorname{herdisc}$ .

2 Techniques

In addition to known techniques from the differential privacy literature, our work borrows tools from discrepancy theory, convex geometry and statistical estimation. We next briefly describe how they fit in.

Central to designing a provably good approximation algorithm is an efficiently computable lower bound on the optimum. Muthukrishnan and Nikolov [MN12] proved that (a slight variant of) the hereditary discrepancy of $A$ leads to a lower bound for the error of any $(\varepsilon,\delta)$ -DP mechanism. Lovász, Spencer and Vesztergombi [LSV86] showed that hereditary discrepancy itself can be lower bounded by a quantity called the determinant lower bound. Geometrically, this lower bound corresponds to picking the $d$ columns of $A$ that (along with the origin) give us a simplex with the largest possible volume. The volume or this simplex, appropriately normalized, gives us a lower bound on OPT. More precisely for any simplex $S$ , $d^{3}\cdot\operatorname{vol}(S)^{\frac{2}{d}}\log^{2}d$ gives a lower bound on the error. The $\log^{2}d$ factor can be removed by using a lower bound based on the least singular values of submatrices of $A$ . Geometrically, for the least singular value lower bound we need to find a simplex of large volume whose $d$ non-zero vertices are also nearly pairwise orthogonal.

If the $N$ columns of $A$ all lie in a unit ball of radius $R$ , it can be shown that adding Gaussian noise proportional to $R$ suffices to guarantee $(\varepsilon,\delta)$ -DP, resulting in a mechanism having total squared error $dR^{2}$ . Can we relate this quantity to the lower bound? It turns out that if the unit ball of radius $R$ is the minimum volume ellipsoid containing the columns of $A$ , this can be done. In this case, a result of Vershynin [Ver01], building on the restricted invertability results by Bourgain and Tzafriri [BT87], tells us that one can find $\Omega(d)$ vertices of $K$ that touch the minimum containing ellipsoid, and are nearly orthogonal. The simplex formed by these vertices therefore has large volume, giving us a $(\varepsilon,\delta)$ -DP lower bound of $\Omega(dR^{2})$ . In this case, the Gaussian mechanism with the optimal $R$ is within a constant factor of the lower bound. When the minimum volume enclosing ellipsoid is not a ball, we need to project the query along the $\frac{d}{2}$ shortest axes of this ellipsoid, answer this projection using the Gaussian mechanism, and recurse on the orthogonal projection. Using the full power of the restricted invertability result by Vershynin allows us to construct a large simplex and prove our competitive ratio.

Hardt and Talwar [HT10] also used a volume based lower bound, but for $\varepsilon$ -DP mechanisms, one can take $K$ , the symmetric convex hull of all the columns of $A$ and use its volume instead of the volume of $S$ in the lower bound above. How do these lower bounds compare? By a result of Bárány and Füredi [BF88] and Gluskin [Glu07], one can show that the volume of the convex hull of $N$ points can be bounded by $(\log N)^{d/2}d^{-d/2}$ times that of the minimum enclosing ellipsoid. This, along with the aforementioned restricted invertability results, allows us to prove that the $\varepsilon$ -DP lower bound is within $O((\log N)\operatorname{polylog}d)$ of the $(\varepsilon,\delta)$ -DP lower bound.

How do we handle sparse queries? The first observation is that the lower bounding technique gives us $d$ columns of $A$ and the resulting lower bound holds not just for $A$ but even for the $d\times d$ submatrix of $A$ corresponding to the maximum volume simplex $S$ ; moreover, the lower bound holds even when all databases are restricted to $O(d)$ individuals. Thus the lower bound holds when $n=O(d)$ and this value marks the transition between the sparse and the dense cases. Moreover, when the minimum volume ellipsoid containing the columns of $A$ is a ball, the restricted invertibility principle of Bourgain and Tzafriri and Vershynin gives us a $d$ -dimensional simplex with nearly pairwise orthogonal vertices, and, therefore any $n$ -dimensional face of this simplex is another simplex of large volume. The large $n$ -dimensional simplex gives a lower bound on error when databases are restricted to have at most $n$ individuals.

To get an $\varepsilon$ -DP mechanism, we use the $K$ -norm mechanism [HT10] instead of Gaussian noise. To bound the shadow of $nAB_{1}^{N}$ on $w$ , where $w$ is the noise vector generated by the $K$ -norm mechanism, we first analyze the expectation of $\langle a_{i},w\rangle$ for any column of $A$ , and we use the log concavity of the noise distribution to prove concentration of this random variable. A union bound helps complete the argument as in the Gaussian case.

3 Related Work

Dwork et al. [DMNS06] showed that any query can be released while adding noise proportional to the total sensitivity of the query. This motivated the question of designing mechanisms with good guarantees for any set of low sensitivity queries. Nissim, Raskhodnikova and Smith [NRS07] showed that adding noise proportional to (a smoothed version of) the local sensitivity of the query suffices for guaranteeing differential privacy; this may be much smaller than the worst case sensitivity for non-linear queries. Lower bounds on the amount of noise needed for general low sensitivity queries have been shown in [DN03, DMT07, DY08, DMNS06, RHS07, HT10, De12]. Kasiviswathan et al. [KRSU10] showed upper and lower bounds for contingency table queries and more recently [KRS13] showed lower bounds on publishing error rates of classifiers or even M-estimators. Muthukrishnan and Nikolov [MN12] showed that combinatorial discrepancy lower bounds the noise for answering any set of linear queries.

Using learning theoretic techniques, Blum, Ligett and Roth [BLR08] first showed that one can exploit sparsity of the database, and answer a large number of counting queries with error small compared to the number of individuals in the database. This line of work has been further extended and improved in terms of error bounds, efficiency, generality and interactivity in several subsequent works [DNR+09, DRV10, RR10, HR10, GHRU11, HLM12].

Ghosh, Roughgarden and Sundarajan [GRS09] showed that for any one dimensional counting query, a discrete version of the Laplacian mechanism is optimal for pure privacy in a very general utilitarian framework and Gupte and Sundararajan [GS10] extended this to risk averse agents. Brenner and Nissim [BN10] showed that such universally optimal private mechanisms do not exist for two counting queries or for a single non-binary sum query. As mentioned above, Hardt and Talwar [HT10], and Bhaskara et al. [BDKT12] gave relative guarantees for multi-dimensional queries under pure privacy with respect to total squared error. De [De12] unified and strengthened these bounds and showed stronger lower bounds for the class of non-linear low sensitivity queries.

For specific queries of interest, improved upper bounds are known. Barak et al. [BCD+07] studied low dimensional marginals and showed that by running the Laplace mechanism on a different set of queries, one can reduce error. Using a similar strategy, improved mechanisms were given by [XWG10, CSS10] for orthogonal counting queries, and near optimal mechanisms were given by Muthukrishnan and Nikolov [MN12] for halfspace counting queries. The approach of answering a set of queries different from the target query set has also been studied in more generality and for other sets of queries by [LHR+10, DWHL11, RHS07, XWG10, XXY10, YZW+12]. Li and Miklau [LM12a, LM12b] study a class of mechanisms called extended matrix mechanisms and show that one can efficiently find the best mechanisms from this class. Hay et al. [HRMS10] show that in certain settings such as unattributed histograms, correcting noisy answers to enforce a consistency constraint can improve accuracy.

Very recently, Fawaz et al. [FMN] used the hereditary discrepancy lower bounds of Muthukrishnan and Nikolov, as well as the determinant lower bound on discrepancy of Lovasz, Spencer, and Vesztergombi, to prove that a certain Gaussian noise mechanism is nearly optimal (in the dense setting) for computing any given convolution map. Like our algorithms, their algorithm adds correlated Gaussian noise; however, they always use the Fourier basis to correlate the noise.

We refer the reader to texts by Chazelle [Cha00] and Matoušek [Mat99] and the chapter by Beck and Sós [BS95] for an introduction to discrepancy theory. Bansal [Ban10] showed that a semidefinite relaxation can be used to design a pseudo-approximation algorithm for hereditary discrepancy. Matoušek [Mat11] showed that the determinant based lower bound of Lovász, Spencer and Vesztergombi [LSV86] is tight up to polylogarithmic factors. Larsen [Lar11] showed applications of hereditary discrepancy to data structure lower bounds, and Chandrasekaran and Vempala [CV11] recently showed applications of hereditary discrepancy to problems in integer programming.

Preliminaries

We start by introducing some basic notation.

For a $d\times N$ matrix $A$ and a set $S\subseteq[N]$ , we denote by $A|_{S}$ the submatrix of $A$ consisting of those columns of $A$ indexed by elements of $S$ . Occasionally we refer to a matrix $V$ whose columns form an orthonormal basis for some subspace of interest $\mathcal{V}$ as the orthonormal basis of $\mathcal{V}$ . $\mathcal{P}_{k}$ is the set of orthogonal projections onto $k$ -dimensional subspaces of ${\mathds{R}}^{d}$ .

By $\sigma_{\min}(A)$ and $\sigma_{\max}(A)$ we denote, respectively, the smallest and largest singular value of $A$ . I.e., $\sigma_{\min}(A)=\min_{x:\|x\|_{2}=1}{\|Ax\|_{2}}$ and $\sigma_{\max}(A)=\max_{x:\|x\|_{2}=1}{\|Ax\|_{2}}$ . In general, $\sigma_{i}(A)$ is the $i$ -th largest singular value of $A$ , and $\lambda_{i}(A)$ is the $i$ -th largest eigenvalue of $A$ . We recall the minimax characterization of eigenvalues for symmetric matrices:

For a matrix $A$ (and the corresponding linear operator), we denote by $\|A\|_{2}=\sigma_{\max}(A)$ the spectral norm of $A$ and $\|A\|_{F}=\sqrt{\sum_{i}\sigma_{i}^{2}(A)}=\sqrt{\sum_{i,j}a_{i,j}^{2}}$ the Frobenius norm of $A$ . By $\ker A$ we denote the kernel of $A$ , i.e. the subspace of vectors $x$ for which $Ax=0$ .

For a set $K\subseteq{\mathds{R}}^{d}$ , we denote by $\operatorname{vol}_{d}(K)$ its $d$ -dimensional volume. Often we use instead the volume radius

For a convex body $K\subseteq{\mathds{R}}^{d}$ , the polar body $K^{\circ}$ is defined by $K^{\circ}=\{y:\langle y,x\rangle\leq 1~{}\forall x\in K\}$ . The fundamental fact about polar bodies we use is that for any two convex bodies $K$ and $L$

In the remainder of this paper, when we claim that a fact follows “by convex duality,” we mean that it is implied by (1).

A convex body $K$ is (centrally) symmetric if $-K=K$ . The Minkowski norm $\|x\|_{K}$ induced by a symmetric convex body $K$ is defined as $\|x\|_{K}\triangleq\min\{r\in{\mathds{R}}:x\in rK\}$ . The Minkowski norm induced by the polar body $K^{\circ}$ of $K$ is the dual norm of $\|x\|_{K}$ and also has the form $\|y\|_{K^{\circ}}=\max_{x\in K}{\langle x,y\rangle}$ . For convex symmetric $K$ , the induced norm and dual norm satisfy Hölder’s inequality:

An ellipsoid in ${\mathds{R}}^{d}$ is the image of $B_{2}^{d}$ under an affine map. All ellipsoids we consider are symmetric, and therefore, are equal to an image $FB_{2}^{d}$ of the ball $B_{2}^{d}$ under a linear map $F$ . A full dimensional ellipsoid $E=FB_{2}^{d}$ can be equivalently defined as $E=\{x:x^{T}(FF^{T})^{-1}x\leq 1\}$ . The polar body of a symmetric ellipsoid $E=FB_{2}^{d}$ is the ellipsoid (or cylinder with an ellipsoid as its base in case $F$ is not full dimensional) $E^{\circ}=\{x:x^{T}FF^{T}x\leq 1\}$ .

We repeatedly use a classical theorem of Fritz John, characterizing the (unique) minimum volume enclosing ellipsoid (MEE) of any convex body $K$ . We note that John’s theorem is frequently stated in terms of the maximum volume enclosed ellipsoid in $K$ ; the two variants of the theorem are equivalent by convex duality. The MEE of $K$ is also known as a the Löwner or Löwner-John ellipsoid of $K$ .

Any convex body $K\subseteq{\mathds{R}}^{d}$ is contained in a unique ellipsoid of minimal volume. This ellipsoid is $B_{2}^{d}$ if and only if there exist unit vectors $u_{1},\ldots,u_{m}\in K\cap B_{2}^{d}$ and positive reals $c_{1},\ldots,c_{m}$ such that

According to John’s characterization, when the MEE of $K$ is the ball $B_{2}^{d}$ , the contact points of $K$ and $B_{2}^{d}$ satisfy a structural property — the identity decomposes into a linear combination of the projection matrices onto the lines of the contact points. Intuitively, this means that $K$ “hits” $B_{2}^{d}$ in all directions — it has to, or otherwise $B_{2}^{d}$ can be “pinched” in order to produce a smaller ellipsoid that still contains $K$ . This intuition is formalized by a theorem of Vershynin, which generalizes the work of Bourgain and Tzafriri on restricted invertibility [BT87]. Vershynin ([Ver01] Theorem 3.1) shows that there exist $\Omega(d)$ contact points of $K$ and $B_{2}^{d}$ which are approximately pairwise orthogonal.

Let $K\subseteq{\mathds{R}}^{d}$ be a symmetric convex body whose minimum volume enclosing ellipsoid is the unit ball $B_{2}^{d}$ . Let $T$ be a linear map with spectral norm $\|T\|_{2}\leq 1$ . Then for any $\beta$ , there exist constant $C_{1}(\beta)$ , $C_{2}(\beta)$ and contact points $x_{1},\ldots,x_{k}$ with $k\geq(1-\beta)\|T\|_{F}^{2}$ such that the matrix $TX=(Tx_{i})_{i=1}^{k}$ satisfies

2 Statistical Estimation

A key element in our algorithms for the sparse case is the use of least squares estimation to reduce error. Below we present a bound on the error of least squares estimation with respect to symmetric convex bodies. This analysis appears to be standard in the statistics literature; a special case of it appears for example in [RWY11].

First we show the easier bound $\|\hat{y}-y\|_{2}\leq 2\|w\|_{2}$ , which follows by the triangle inequality:

The second bound is based on Hölder’s inequality and the following simple but very useful fact, illustrated schematically in Figure 1:

3 Differential Privacy

Following recent work in differential privacy, we model private data as a database $D$ of $n$ rows, where each row of $D$ contains information about an individual. Formally, a database $D$ is a multiset of size $n$ of elements of the universe $U=\{t_{1},\ldots,t_{N}\}$ of possible user types. Our algorithms take as input a histogram $x\in{\mathds{R}}^{N}$ of the database $D$ , where the $i$ -th component $x_{i}$ of $x$ encodes the number of individuals in $D$ of type $t_{i}$ . Notice that in this histogram representation, we have $\|x\|_{1}=n$ when $D$ is a database of size $n$ . Also, two neighboring databases $D$ and $D^{\prime}$ that differ in the presence or absence of a single individual correspond to two histograms $x$ and $x^{\prime}$ satisfying $\|x-x^{\prime}\|_{1}=1$ .

Through most of this paper, we work under the notion of approximate differential privacy. The definition follows.

When $\delta=0$ , we are in the regime of pure differential privacy.

An important basic property of differential privacy is that the privacy guarantees degrade smoothly under composition and are not affected by post-processing.

Let $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ satisfy $(\varepsilon_{1},\delta_{1})$ - and $(\varepsilon_{2},\delta_{2})$ -differential privacy, respectively. Then the algorithm which on input $\mathbf{x}$ outputs the tuple $(\mathcal{M}_{1}(\mathbf{x}),\mathcal{M}_{2}(\mathcal{M}_{1}(\mathbf{x}),\mathbf{x}))$ satisfies $(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})$ -differential privacy.

In this paper we study the necessary and sufficient error incurred by differentially private algorithms for approximating linear queries. A set of $d$ linear queries is given by a $d\times N$ query matrix or workload $A$ ; the exact answers to the queries on a histogram $x$ are given by the $d$ -dimensional vector $y=Ax$ .

We define error as total squared error. More precisely, for an algorithm $\mathcal{M}$ and a subset $X\subseteq{\mathds{R}}^{N}$ , we define

We also write $\operatorname{err}_{\mathcal{M}}(A,nB_{1}^{N})$ as $\operatorname{err}_{\mathcal{M}}(A,n)$ . The optimal error achievable by any $(\varepsilon,\delta)$ -differentially private algorithm for queries $A$ and databases of size up to $n$ is

where the infimum is taken over all $(\varepsilon,\delta)$ -differentially private algorithms. When no restrictions are placed on the size $n$ of the database, the appropriate notion of optimal error is $\operatorname{opt}_{\varepsilon,\delta}(A)\triangleq\sup_{n}\operatorname{opt}_{\varepsilon,\delta}(A,n)$ . Similarly, for an algorithm $\mathcal{M}$ , the error when database size is not bounded is $\operatorname{err}_{\mathcal{M}}(A)\triangleq\sup_{n}\operatorname{err}_{\mathcal{M}}(A,n)$ . A priori it is not clear that these quantities are necessarily finite, but we will show that this is the case.

In order to get tight dependence on the privacy parameter $\varepsilon$ in our analyses, we will use the following relationship between $\operatorname{opt}_{\varepsilon,\delta}(A,n)$ and $\operatorname{opt}_{\varepsilon^{\prime},\delta^{\prime}}(A,n)$ .

For any $\varepsilon$ , any $\delta<1$ , any integer $k$ and for $\delta^{\prime}\geq\frac{e^{k\varepsilon}-1}{e^{\varepsilon}-1}\delta$ ,

Let $\mathcal{M}$ be an $(\varepsilon,\delta)$ -differentially private algorithm achieving $\operatorname{opt}_{\varepsilon,\delta}(A,n)$ . We will use $\mathcal{M}$ as a black box to construct a $(k\varepsilon,\delta^{\prime})$ -differentially private algorithm $\mathcal{M}^{\prime}$ which satisfies the error guarantee $\operatorname{err}_{\mathcal{M}^{\prime}}(A,n/k)\leq\frac{1}{k^{2}}\operatorname{err}_{\mathcal{M}}(A,n)$ .

The algorithm $\mathcal{M}^{\prime}$ on input $x$ satisfying $\|x\|_{1}\leq n/k$ outputs $\frac{1}{k}\mathcal{M}(kx)$ . We need to show that $\mathcal{M}^{\prime}$ satisfies $(k\varepsilon,\delta^{\prime})$ -differential privacy. Let $x$ and $x^{\prime}$ be two neighboring inputs to $\mathcal{M}^{\prime}$ , i.e. $\|x-x^{\prime}\|_{1}\leq 1$ , and let $S$ be a measurable subset of the output $\mathcal{M}^{\prime}$ . Denote $p_{1}={\rm Pr}[\mathcal{M}^{\prime}(x)\in S]$ and $p_{2}={\rm Pr}[\mathcal{M}^{\prime}(x^{\prime})\in S]$ . We need to show that $p_{1}\leq e^{k\varepsilon}p_{2}+\delta^{\prime}$ . To that end, define $x_{0}=kx$ , $x_{1}=kx+(x^{\prime}-x)$ , $x_{2}=kx+2(x^{\prime}-x)$ , $\ldots$ , $x_{k}=kx^{\prime}$ . Applying the $(\varepsilon,\delta)$ -privacy guarantee of $\mathcal{M}$ to each of the pairs of neighboring inputs $x_{0},x_{1}$ , $x_{1},x_{2}$ , $\ldots$ , $x_{k-1},x_{k}$ in sequence gives us

This finishes the proof of privacy for $\mathcal{M}^{\prime}$ . It is straightforward to verify that $\operatorname{err}_{\mathcal{M}^{\prime}}(A,n/k)\leq\frac{1}{k^{2}}\operatorname{err}_{\mathcal{M}}(A,n)$ . ∎

3.2 Gaussian Noise Mechanism

A basic mechanism for achieving $(\varepsilon,\delta)$ -differential privacy for linear queries is adding appropriately scaled independent Gaussian noise to each query. This approach goes back to the work of Blum et al. [BDMN05], predating the definition of differential privacy. Next we define this basic mechanism formally and give a privacy guarantee. The privacy analysis of the Gaussian mechanism in the context of $(\varepsilon,\delta)$ -differential privacy was first given in [DKM+06b]. We give the full proof here for completeness.

Let $A=(a_{i})_{i=1}^{N}$ be a $d\times N$ matrix such that $\forall i:\|a_{i}\|_{2}\leq\sigma$ . Then a mechanism which on input $x\in{\mathds{R}}^{N}$ outputs $Ax+w$ , where $w\sim N(0,\sigma\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon})^{d}$ , satisfies $(\varepsilon,\delta)$ -differential privacy.

Let $C=\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon}$ and let $p$ be the probability density function of $N(0,C\sigma)^{d}$ . Let also $K=AB_{1}$ , so $\|x-x^{\prime}\|_{1}\in B_{1}$ implies $A(x-x^{\prime})\in K\subseteq B_{2}^{d}$ . Define

We will prove that when $w\sim N(0,C\sigma)$ , for all $v\in K$ , ${\rm Pr}[|D_{v}(w)|>\varepsilon]\leq\delta$ . This suffices to prove $(\varepsilon,\delta)$ -differential privacy. Indeed, let the algorithm output $Ax+w$ and fix any $x^{\prime}$ s.t. $\|x-x^{\prime}\|_{1}\leq 1$ . Let $v=A(x-x^{\prime})\in K$ and $S=\{w:|D_{v}(w)|>\varepsilon\}$ . For any measurable $T\subseteq{\mathds{R}}^{d}$ we have

Note that to bound $|D_{v}(w)|$ we simply need to bound $\frac{1}{C^{2}\sigma^{2}}v^{T}w$ from above and below. Since $\frac{1}{C^{2}\sigma^{2}}v^{T}w\sim N(0,\frac{\|v\|}{C\sigma})$ , we can apply a Chernoff bound and we get

Substituting $C\geq\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon}$ completes the proof. ∎

The following corollary is a useful geometric generalization of Lemma 4.

Let $A=(a_{i})_{i=1}^{N}$ be a $d\times N$ matrix of rank $d$ and let $K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}$ . Let $E=FB_{2}^{d}$ ( $F$ is a linear map) be an ellipsoid containing $K$ . Then a mechanism that outputs $Ax+Fw$ where $w\sim N(0,\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon})^{d}$ satisfies $(\varepsilon,\delta)$ -differential privacy.

Since $K$ is full dimensional (by ${\rm rank}A=d$ ) and $E$ contains $K$ , $E$ is full dimensional as well, and, therefore, $F$ is an invertible linear map. Define $G=F^{-1}A$ . For each column $g_{i}$ of $G$ , we have $\|g_{i}\|_{2}\leq 1$ . Therefore, by Lemma 4, a mechanism that outputs $Gx+w$ (where $w$ is distributed as in the statement of the corollary) satisfies $(\varepsilon,\delta)$ -differential privacy. Therefore, $FGx+Fw=Ax+Fw$ is $(\varepsilon,\delta)$ -differentially private by the post-processing property of differential privacy. ∎

We present a composition theorem, specific to composing Gaussian noise mechanisms. We note that a similar composition result in a much more general setting but with slightly inferior dependence on the parameters is proven in [DRV10].

Let $\mathcal{V}_{1},\ldots,\mathcal{V}_{k}$ be vector spaces of respective dimensions $d_{1},\ldots,d_{k}$ , such that $\forall i\leq k-1$ , $\mathcal{V}_{i+1}\subseteq\mathcal{V}_{i}^{\perp}$ and $d_{1}+\ldots+d_{k}=d$ . Let $A=(a_{i})_{i=1}^{N}$ be a $d\times N$ matrix of rank $d$ and let $K=\operatorname{sym}(a_{1},\ldots,a_{N})$ . Let $\Pi_{i}$ be the projection matrix for $\mathcal{V}_{i}$ and let $E_{i}=F_{i}B_{2}^{d_{i}}\subseteq\mathcal{V}_{i}$ be an ellipsoid such that $\Pi_{i}K\subseteq E_{i}$ . Then the mechanism that outputs $Ax+\sqrt{k}\sum_{i=1}^{k}{F_{i}w_{i}}$ where for each $i$ , $w_{i}\sim N(0,\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon})^{d_{i}}$ , satisfies $(\varepsilon,\delta)$ -differential privacy.

Let $c(\varepsilon,\delta)=\frac{1+\sqrt{2\ln(1/\delta)}}{\varepsilon}$ . Since the random variables $F_{1}w_{1},\ldots,F_{k}w_{k}$ are pairwise independent Gaussian random variables, and $F_{i}w_{i}$ has covariance matrix $c(\varepsilon,\delta)^{2}F_{i}F_{i}^{T}$ , we have that $w=\sqrt{k}\sum_{i=1}^{k}{F_{i}w_{i}}$ is a Gaussian random variable with covariance $c(\varepsilon,\delta)^{2}G$ , whee $G=k\sum_{i=1}^{k}{F_{i}F_{i}^{T}}$ . By Corollary 8, it is sufficient to show that the ellipsoid $E=GB_{2}^{d}$ contains $K$ . By convex duality, this is equivalent to showing $E^{\circ}\subseteq K^{\circ}$ , which is in turn equivalent to $\forall x:\|x\|_{K^{\circ}}\leq\|x\|_{E^{\circ}}$ . Recalling that $\|x\|^{2}_{E^{\circ}}=x^{T}GG^{T}x$ and $\|x\|_{K^{\circ}}=\max_{y\in K}{\langle y,x\rangle}=\max_{j=1}^{N}{\langle a_{j},x\rangle}$ , we need to establish

We proceed by establishing (4). Since for all $i$ , $\Pi_{i}K\subseteq E_{i}$ , by duality and the same reasoning as above, we have that for all $i$ and $j$ , $\langle\Pi_{i}a_{j},x\rangle^{2}\leq x^{T}F_{i}F_{i}^{T}x$ . Therefore, by the Cauchy-Schwarz inequality,

3.3 Noise Lower Bounds

We will make extensive use of a lower bound on the noise complexity of $(\varepsilon,\delta)$ -differentially private mechanisms in terms of combinatorial discrepancy. First we need to define the notion of hereditary $\alpha$ -discrepancy:

Next we present the lower bound, which is a simple extension of the discrepancy lower bound on noise recently proved by Muthukrishnan and Nikolov [MN12].

Let $A$ be an $d\times N$ real matrix. For any constant $\alpha$ and sufficiently small constant $\varepsilon\leq\varepsilon(\alpha)$ and $\delta\leq\delta(\alpha)$ ,

Substituting (5) into Theorem 10, we have that there exist constants $c_{1}$ and $c_{2}$ such that

For the remainder of this paper we fix some constants $c_{1}$ and $c_{2}$ for which (6) holds. Similarly to the notation for $\operatorname{opt}$ , we will also sometimes denote $\operatorname{specLB}(A)=\max_{n}\operatorname{specLB}(A,n)$ . We will use primarily the spectral lower bound (6) for arguing the optimality of our algorithms.

To show the small gap between the approximate and pure privacy (Theorem 2), we next develop a determinant based lower bound. We first switch from $\operatorname{herdisc}_{\alpha}$ to the classical notion of hereditary discrepancy, equivalent to $\operatorname{herdisc}_{1}$ , by observing the following relation between $\operatorname{herdisc}_{\alpha}$ and $\operatorname{herdisc}_{1}$ from [MN12]:

We then use an extension of the classical determinant lower bound for hereditary discrepancy, due to Lovász, Spencer, and Vesztergombi.

Let us first define linear discrepancy. For a $d\times k$ matrix $M$ and $c\in^{k}$ , let $\operatorname{disc}^{c}(M)$ be defined as

The linear discrepancy of $M$ is then defined as $\operatorname{lindisc}(M)\triangleq\max_{c\in^{k}}{\operatorname{disc}^{c}(M)}$ . We claim that for any $M$ ,

We complete the proof of the theorem by proving a lower bound on $\operatorname{lindisc}$ in terms of $\operatorname{detLB}$ . We note that a similar lower bound can be proved for any variant of $\operatorname{lindisc}$ defined in terms of any norm. The exact lower bound will depend on the volume radius of the unit ball of the norm. Since the proof of (8) also works for any norm, we get a determinant lower bound for hereditary discrepancy defined in terms of any norm as well.

We show that for any $d\times k$ matrix $M$

Letting $k$ range over $[n]$ , $M$ range over all $d\times k$ submatrices of $A$ , and applying the bounds (8) and (9) implies the theorem.

We proceed to prove (9). Note that if ${\rm rank}(M)<k$ , (9) is trivially true; therefore, we may assume that ${\rm rank}(M)=k$ . Note also that without loss of generality we can take $\Pi$ to be the orthogonal projection onto the range of $M$ , since this is the projection operator that maximizes $|\det\Pi M|$ . Let $E$ be the ellipsoid $E=\{x:\|Mx\|_{2}^{2}\leq 1\}$ . The inequality $\operatorname{lindisc}(M)\leq D$ is equivalent to

Thus $2^{k}=\operatorname{vol}(^{d})\leq 2^{k}\operatorname{vol}(D\cdot E)$ , and therefore $D^{k}\geq\frac{1}{\operatorname{vol}(E)}$ . On the other hand, the volume of $E$ is equal to

Applying the standard estimate $\operatorname{vol}(B_{2}^{k})^{1/k}=\Theta(k^{-1/2})$ completes the proof. ∎

By the determinant lower bound, and (7), we get our determinant lower bound on the noise necessary for privacy. For some constant $c_{1},c_{2}>0$ ,

Finally, we recall the stronger volume lower bound against $(\varepsilon,0)$ -differential privacy from [HT10, BDKT12]. This lower bound is nearly optimal for $(\varepsilon,0)$ -differential privacy, but does not hold for $(\varepsilon,\delta)$ -differential privacy when $\delta$ is $2^{-o(d)}$ .

For any $d\times N$ real matrix $A=(a_{i})_{i=1}^{N}$ ,

where $K=\operatorname{sym}\{a_{i}\}_{i=1}^{d}$ .

Furthermore, there exists an efficient mechanism $\mathcal{M}_{K}$ (the generalized $K$ -norm mechanism) which is $(\varepsilon,0)$ -differentially private and satisfies $\operatorname{err}_{\mathcal{M}_{K}}(A)=O(\log^{3}d)\operatorname{volLB}(A,\varepsilon)$ .

Algorithms for Approximate Privacy

In this section we present our main results: efficient nearly optimal algorithms for approximate privacy in the cases of dense databases ( $n>d/\varepsilon$ ) and sparse databases ( $n=o(d/\varepsilon)$ ). Both algorithms rely on recursively computing an orthonormal basis for ${\mathds{R}}^{d}$ , based on the minimum volume enclosing ellipsoid of the columns of the query matrix $A$ . We first present the algorithm for computing this basis, together with a property essential for the analyses of the two algorithms presented next.

We first present an algorithm (Algorithm 1) that, given a matrix $A\in{\mathds{R}}^{d\times N}$ , computes a set of orthonormal matrices $U_{1},\ldots,U_{k}$ , where $k\leq\lceil 1+\log d\rceil$ . For each $i\neq j$ , $U_{i}^{T}U_{j}=0$ , and the union of the columns of $U_{1},\ldots,U_{k}$ forms an orthonormal basis for ${\mathds{R}}^{d}$ . Thus, Algorithm 1 computes a basis for ${\mathds{R}}^{d}$ , and partitions (“decomposes”) it into $k=O(\log d)$ bases of mutually orthogonal subspaces. This set of bases also induces a decomposition of $A$ into $A=A_{1}+\ldots+A_{k}$ , where $A_{i}=U_{i}U_{i}^{T}A$ . The base decomposition of Algorithm 1 is essential to both our dense case and sparse case algorithms. Intuitively, for both cases we can show that the error of a simple mechanism applied to $A_{i}$ can be matched by an error lower bound for $A_{i+1}+\ldots A_{k}$ . The error lower bounds are based on the spectral lower bound $\operatorname{specLB}$ on discrepancy; the geometric properties of the minimum enclosing ellipsoid of a convex body together with the restricted invertibility principle of Bourgain and Tzafriri are key in deriving the lower bounds.

The next lemma captures the technical property of the decomposition of Algorithm 1 that allows us to prove matching upper and lower error bounds for our dense and sparse case algorithms.

Let $d_{i}$ be the dimension of the span of $U_{i}$ . Furthermore, for $i<k$ , let $W_{i}=\sum_{j>i}{U_{j}}$ , and let $W_{k}=U_{k}$ . For every $i\leq k$ , there exists a set $S_{i}\subseteq[N]$ , such that $|S_{i}|=\Omega(d_{i})$ and $\sigma_{\min}^{2}(W_{i}W_{i}^{T}A|_{S_{i}})=\Omega(1)\max_{j=1}^{N}{\|U_{i}^{T}a_{j}\|_{2}^{2}}$ .

Let us, for ease of notation, assume that $d$ is a power of $2$ . We prove that there exists a set $S$ , $|S|=\Omega(d)$ , such that

By applying an appropriate unitary transformation to the columns of $A$ , we may assume that the major axes of $E$ are co-linear with the standard basis vectors of ${\mathds{R}}^{d}$ , and, therefore, $F$ is a diagonal matrix with $F_{ii}=\sigma_{i}$ . This transformation comes without loss of generality, since it applies a unitary transformation to the columns of $A$ and $V$ and does not affect the singular values of any matrix $VV^{T}A|_{S}$ for any $S\subseteq[N]$ . For the rest of the proof we assume that $F$ is diagonal.

Since $F$ is diagonal, $u_{i}$ is equal to $e_{i}$ , the $i$ -th standard basis vector. Therefore $U_{1}$ is diagonal and equal to the projection onto $e_{d/2+1},\ldots,e_{d}$ , and $V$ is also diagonal and equal to the projection onto $e_{1},\ldots,e_{d/2}$ . Consider $L=F^{-1}K=F^{-1}AB_{1}^{d}$ (recall that we assumed that ${\rm rank}A=d$ and therefore $F$ is non-singular). Since the minimum enclosing ellipsoid of $K$ is $E=FB_{2}^{d}$ , we have that the minimum enclosing ellipsoid of $L$ is $B_{2}^{d}$ . Let $T=VV^{T}$ be the projection onto $e_{1},\ldots,e_{d/2}$ . Then, by Theorem 7, and because $\|T\|_{F}^{2}=d/2$ , we have that there exists a set $S$ of size $|S|=\Omega(d)$ such that $\sigma_{\min}^{2}(TF^{-1}A|_{S})=\Omega(1)$ . We chose $F$ , and therefore $F^{-1}$ , as well as $T$ to be diagonal matrices, so they all commute. Then, since $T$ is a projection matrix,

Observe that, since $K\subseteq E$ , we have

Therefore, $\max_{j=1}^{N}{\|U_{1}^{T}a_{j}\|_{2}^{2}}\leq\sigma_{d/2+1}^{2}\leq\sigma_{d/2}^{2}$ . Substituting for $\sigma_{d/2}^{2}$ into (14) completes the proof. ∎

2 The Dense Case: Correlated Gaussian Noise

Our first result is an efficient algorithm whose expected error matches the spectral lower bound $\operatorname{specLB}$ up to polylogarithmic factors and is therefore nearly optimal. This proves Theorem 1. The algorithm adds correlated unbiased Gaussian noise to the exact answer $Ax$ . The noise distribution is computed based on the decomposition algorithm of the previous subsection.

Let $\mathcal{M}_{g}(A,x)$ be the output of Algorithm 2 on input a $d\times N$ query matrix $A$ and private input $x$ . $\mathcal{M}_{g}(A,x)$ is $(\varepsilon,\delta)$ -differentially private and for all small enough $\varepsilon$ and all $\delta$ small enough with respect to $\varepsilon$ satisfies

We start the proof of Theorem 13 with the privacy analysis. For ease of notation, we assume throughout the analysis that $d$ is a power of 2.

$\mathcal{M}_{g}(A,x)$ satisfies $(\varepsilon,\delta)$ -differential privacy.

The lemma follows from Corollary 9. Next we describe in detail why the corollary applies.

Let $U_{1},\ldots,U_{k}$ be the base decomposition computed by Algorithm 1 on input $A$ . Let $\mathcal{V}_{i}$ be the subspace spanned by the columns of $U_{i}$ and let $d_{i}$ be the dimension of $\mathcal{V}_{i}$ . The projection matrix onto $\mathcal{V}_{i}$ is $U_{i}U_{i}^{T}$ . Let $E_{i}$ be the ellipsoid $U_{i}(r_{i}B_{2}^{d_{i}})=F_{i}B_{2}^{d_{i}}$ ( $F_{i}$ is $r_{i}U_{i}$ ). By the definition of $r_{i}$ , $U_{i}^{T}K\subseteq r_{i}B_{2}^{d_{i}}$ , and therefore, $\Pi_{i}K\subseteq E_{i}$ . $\mathcal{M}_{g}(A,x)$ is distributed identically to $Ax+\sqrt{k}\sum_{i=1}^{k}{F_{i}w_{i}}$ . Therefore, by Corollary 9, $\mathcal{M}_{g}(A,x)$ satisfies $(\varepsilon,\delta)$ -differential privacy. ∎

For all small enough $\varepsilon$ and all $\delta$ small enough with respect to $\varepsilon$ , for all $i$ ,

where $r_{i}$ , $U_{i}$ and $w_{i}$ are as defined in Algorithm 2.

By (15), it is enough to lower bound $\operatorname{opt}_{\varepsilon,\delta}(A)$ by $\Omega(\frac{1}{\varepsilon^{2}}dr_{i}^{2})$ . As a first step we lower bound $\operatorname{specLB}(A)$ . Then the lower bound on $\operatorname{opt}_{\varepsilon,\delta}$ will follow from (6) and Lemma 3.

To lower bound $\operatorname{specLB}(A)$ , we invoke Lemma 5. It follows from the lemma that for every $i$ there exists a projection matrix $\Pi_{i}=W_{i}W_{i}^{T}$ and a set $S_{i}$ such that $\sigma_{\min}^{2}(\Pi_{i}A|_{S_{i}})=\Omega(r_{i}^{2})$ , and, furthermore, $|S_{i}|=\Omega(d_{i})$ . Substituting into the the definition of $\operatorname{specLB}(A,d_{i})$ , we have that for all $i$ .

Therefore, by (6), $\operatorname{opt}_{c_{1},c_{2}}(A,d_{i})=\Omega(d_{i}r_{i}^{2})$ for all $i$ . Finally, by Lemma 3, there exists a small enough $\delta=\delta(\varepsilon)$ , for which $\operatorname{opt}_{\varepsilon,\delta}(A,\frac{d_{i}}{\varepsilon})=\Omega(\frac{1}{\varepsilon^{2}}d_{i}r_{i}^{2})$ , and this completes the proof. ∎

Proof of Theorem 13: The proof of the theorem follows from Lemma 6 and Lemma 7. The privacy guarantee is direct from Lemma 6. Next we prove that the error of $\mathcal{M}_{g}$ is near optimal.

3 The Sparse Case: Least Squares Estimation

In this subsection we present an algorithm with stronger accuracy guarantees than Algorithm 2: it is optimal for any query matrix $A$ and any database size bound $n$ (Theorem 3). The algorithm combines the noise distribution of Algorithm 2 with a least squares estimation step. Privacy is guaranteed by noise addition, while the least squares estimation step reduces the error significantly when $n=o(d/\varepsilon)$ . The algorithm is shown as Algorithm 3.

To finish the proof of the lemma, we need to lower bound $\operatorname{opt}_{\varepsilon,\delta}(A,n)$ by $\Omega(\frac{1}{\varepsilon}nr_{i}^{2})$ . We will use Lemma 5 to lower bound $\operatorname{specLB}(A,\varepsilon n)$ by $\Omega(\varepsilon nr_{i}^{2})$ and then we will invoke Lemma 3 to get the right dependence on $\varepsilon$ .

By Lemma 5, for every $i$ there exists a projection matrix $\Pi_{i}=W_{i}W_{i}^{T}$ and a set $S_{i}$ such that $\sigma_{\min}^{2}(\Pi_{i}A|_{S_{i}})=\Omega(r_{i}^{2})$ , and, furthermore, $|S_{i}|=\Omega(d_{i})$ . By the definition of $t$ , for all $i\leq t$ , $d_{i}\geq\varepsilon n$ , and, therefore, $|S_{i}|=\Omega(\varepsilon n)$ . Take $T_{i}\subseteq S_{i}$ to be an arbitrary subset of $S_{i}$ of size $\Omega(\varepsilon n)$ . The smallest singular value of $\Pi_{i}A|_{S_{i}}$ is a lower bound on the smallest singular value of $\Pi_{i}A|_{T_{i}}$ :

where $\operatorname{supp}(x)$ is the subset of coordinates on which $x$ is nonzero. Therefore, $\sigma_{\min}^{2}(\Pi_{i}A|_{T_{i}})=\Omega(r_{i}^{2})$ . Substituting into the definition of $\operatorname{specLB}(A,\varepsilon n)$ , we have

Therefore, by (6), $\operatorname{opt}_{c_{1},c_{2}}(A,\varepsilon n)=\Omega(\varepsilon nr_{i}^{2})$ for all $i\leq t$ . Finally, by Lemma 3, there exists a small enough $\delta=\delta(\varepsilon)$ , for which $\operatorname{opt}_{\varepsilon,\delta}(A,n)=\Omega(\frac{n}{\varepsilon}r_{i}^{2})$ , and this completes the proof. ∎

We show that the first term on the right hand side of (16) is at most $O(\log^{3/2}d\sqrt{\log N\log(1/\delta)})\operatorname{opt}_{\varepsilon,\delta}(A,n)$ and the second term is $O(\log^{2}d\log(1/\delta))\operatorname{opt}_{\varepsilon,\delta}(A,n)$ .

Since, by the definition of $t$ , $\frac{d_{t+1}}{\varepsilon}<n$ , we have the desired bound.

The last equality follows from the definition of the dual norm $\|\cdot\|_{L^{\circ}}$ and from the fact that $L$ is a polytope with vertices $\{a_{j}\}_{j=1}^{N}$ , so any linear functional on $L$ is maximized at one of the vertices. From the fact that for all $i\neq j$ we have $U_{i}^{T}U_{j}=0$ , from the triangle inequality, and from Lemma 9, we derive

4 Computational Complexity

In this subsection we consider the computational complexity of our algorithms. We pay special attention to approximating the minimum enclosing ellipsoid of a polytope and computing least squares estimators. For both problems we need to go into the properties of known approximation algorithms in order to verify that the approximations are sufficient to guarantee that our algorithms can be implemented in polynomial time without hurting their near-optimality.

Computationally the most expensive step of the base decomposition algorithm (Algorithm 1) is computing the minimum enclosing ellipsoid $E$ of $K$ . Computing the exact MEE can be costly: the fastest known algorithms have complexity on the order of $d^{O(d)}N$ [AS93]. However, for our purposes it is enough to compute an approximation of $E$ in Banach-Mazur distance, i.e. some ellipsoid $E^{\prime}$ satisfying $\frac{1}{C}E^{\prime}\subseteq E\subseteq E^{\prime}$ for an absolute constant $C$ . Known approximation algorithms for MME guarantee that their output is an enclosing ellipsoid with volume approximately equal to that of the MEE [Kha96, TY07]. It is not immediately clear whether such an approximation is also a Banach-Mazur approximation. However, we can use the fact that the algorithms in [Kha96, KY05, TY07] output an ellipsoid $E^{\prime}$ satisfying approximate complimentary slackness conditions and show that $\Pi E^{\prime}$ approximates $\Pi E$ in Banach-Mazur sense for some projection $\Pi$ onto a subspace of dimension $\Omega(d)$ . This suffices for a slightly weaker version of Lemma 5.

We begin with a definition. Let’s define a vector $p\in^{N}$ to be $C$ -optimal for $A=(a_{i})_{i=1}^{N}$ if the following conditions are satisfied:

for all $i\in[N]$ , $a_{i}^{T}(APA^{T})^{-1}a_{i}\leq C\cdot d$ where $P=\operatorname{diag}(p)$ (we use this notation throughout this section).

$C$ -optimality implies the following property of the the ellipsoid $E(p)$ which is key to our analysis.

Let $E^{*}=F^{*}B_{2}^{d}$ be the minimum enclosing ellipsoid of $K=\operatorname{sym}\{a_{i}\}_{i=1}^{N}$ , and let $p$ be $C$ -optimal for $A=(a_{i})_{i=1}^{N}$ . Let also $E(p)=\{x:x^{T}(APA^{T})^{-1}x\leq Cd\}=F(p)B_{2}^{d}$ . Then,

Let $G=(F^{*})^{-1}$ . Since $GE^{*}=B_{2}^{d}$ , we have that the MEE of $GK$ is the unit ball, and, therefore, $\|Ga_{i}\|_{2}^{2}\leq 1$ for all $i\in[N]$ . Since $F(p)F(p)^{T}=Cd\cdot APA^{T}$ , we have

By Markov’s inequality, $\sigma_{d/4}^{2}(GF(p))\leq 4C$ . Let $\Pi_{1}$ be the projection operator onto the subspace spanned by the left singular vectors of $GF(p)$ corresponding to $\sigma_{d/4}(GF(p)),\ldots,\sigma_{d}(GF(p))$ . We have $\Pi_{1}GE(p)\subseteq 2C^{1/2}\Pi_{1}B_{2}^{d}$ . Multiplying on both sides by $F^{*}$ , we get

Let $\Pi_{2}$ be the matrix $\Pi_{2}=G^{-1}\Pi_{1}G=F^{*}\Pi_{1}G$ . Since $\Pi_{2}$ is similar to $\Pi_{1}$ , it is also a projection matrix onto a $3d/4$ dimensional subspace. We have that $F^{*}\Pi_{1}=\Pi_{2}F^{*}$ , and therefore

Define $H=4CF^{*}(F^{*})^{T}-F(p)F(p)^{T}$ . The inclusion above is equivalent to the positive semidefiniteness of the matrix $\Pi_{2}H\Pi_{2}^{T}$ . As $\Pi_{2}$ is a projection onto a $3d/4$ dimensional subspace, by the standard minimax characterization of eigenvalues we have $\lambda_{3d/4}(H)\geq 0$ . We recall the (dual) Weyl inequalities for symmetric $d\times d$ matrices $X$ and $Y$ :

The inequalities are standard and follow from the minimax characterization of eigenvalues and dimension counting arguments — see, e.g. Chapter 1 in [Tao12]. Substituting $X=H$ and $Y=F(p)F(p)^{T}$ , $i=3d/4$ and $j=d/2$ , we have the inequality

Finally we give an analogue of Lemma 5 for the variant of the base decomposition algorithm that uses an approximate MEE. The proof follows from Lemma 10 and the arguments used to prove Lemma 5. We omit a full proof here.

Consider a variant of Algorithm 1 that, at each step, uses $E(p)=\{x:x^{T}(APA^{T})^{-1}x\leq Cd\}=F(p)B_{2}^{d}$ , where $p$ is $O(1)$ -optimal for $A$ , rather than the minimum enclosing ellipsoid $E=FB_{2}^{d}$ . Let $d_{i}$ be the dimension of the span of $U_{i}$ . For any $i$ there exists a subspace $W_{i}$ of dimension $\Omega(d_{i})$ and a set $S_{i}\subseteq[N]$ of size $|S_{i}|=\Omega(d_{i})$ , such that $\sigma_{\min}^{2}(W_{i}W_{i}^{T}A|_{S_{i}})=\Omega(1)\max_{j=1}^{N}{\|U_{i}^{T}a_{j}\|_{2}^{2}}$ .

One can verify that in all our proofs we can substitute Lemma 11 for Lemma 5 without changing the asymptotics of our lower and upper bounds. Therefore, in all our algorithms we can use the variant of Algorithm 1 from Lemma 11 without compromising near-optimality. This variant of Algorithm 1 runs in time $d^{O(1)}N$ .

Notice that the base decomposition can be reused for different databases, as long as the query matrix $A$ stays unchanged; once the decomposition is computed the rest of the algorithm is very efficient: it involves some standard algebraic computations and sampling from an $O(d)$ -dimensional gaussian distribution. Furthermore, any ellipsoid $E^{\prime}$ containing $K$ suffices for privacy, and one may use heuristic approximations to the MEE problem.

4.2 Least Squares Estimator

Except for base decomposition, the other potentially computationally expensive step in Algorithm 3 is the computation of a least squares estimator $\hat{y}_{1}$ . This is a quadratic minimization problem, and can be approximated by the simple Frank-Wolfe gradient descent algorithm [FW56]. In particular, for a point $\hat{y}^{\prime}$ such that $\|\hat{y}^{\prime}-y\|_{2}^{2}\leq\min_{\hat{y}\in L}{\|\hat{y}^{\prime}-y\|_{2}^{2}}+\alpha$ , Lemma 1 holds to within an additive approximation factor $\alpha$ , i.e. $\|\hat{y}^{\prime}-y\|_{2}^{2}\leq 4\|w\|_{L^{\circ}}+\alpha$ . We call such a point $\hat{y}^{\prime}$ an $\alpha$ additive approximation to the least squares estimator problem. By the analysis of Clarkson [Cla10], $T$ iterations of the Frank-Wolfe algorithm give an additive approximation where $\alpha\leq 4C(L)/(T+3)$ , for $C(L)\leq\sup_{u,v\in L}{|\langle u,u-v\rangle|}$ . In our case $L=nXX^{T}K$ . In order to have near optimality for Algorithm 3, an additive approximation $\alpha\leq\sum_{i=1}^{t}{nr_{i}^{2}}$ suffices. Using the triangle inequality and Cauchy-Schwarz, we can bound $C(L)$ for $L=nXX^{T}K$ as

Therefore, $T=O(n)$ iterations of the Frank-Wolfe algorithm suffice. Since each iteration of the algorithm involves $N$ dot product computations and solving a homogeneous linear system in at most $d$ variables and at most $d$ equations, it follows that an approximate version of Algorithm 3 with unchanged asymptotic optimality guarantees can be implemented in time $d^{O(1)}Nn$ .

We note that the approximation algorithm of Khachiyan for the MEE problem, as well as its modification in [TY07], can also be interpreted as instances of the Frank-Wolfe algorithm (see [TY07] for details).

Results for Pure Privacy

Our geometric approach to approximate privacy allows us to better understand the optimal error required for approximate privacy vs. that required for pure privacy. Our first result bounds the gap between the optimal error bounds for the two notions of privacy in the dense case. Then we extend these ideas and give a $(\varepsilon,0)$ -differentially private algorithm which nearly matches the guarantees of Algorithm 3 for sparse databases.

In this subsection we investigate the worst-case gap between $\operatorname{opt}_{\varepsilon,\delta}(A)$ (for small enough $\delta>0$ ) and $\operatorname{opt}_{\varepsilon,0}(A)$ over all query matrices $A$ . At the core of our analysis is a natural geometric fact: for any symmetric polytope $K$ with $N$ vertices in $d$ -dimensional space we can find a subset of $d$ vertices of $K$ whose symmetric convex hull has volume radius at most a factor $O(\sqrt{\log(N/d)})$ smaller than the volume radius of $K$ . Our proof of this fact goes through analyzing the contact points of $K$ with its minimum enclosing volume ellipsoid, and a bound on the volume of polytopes with few vertices.

Let $K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}\subseteq{\mathds{R}}^{d}$ and let $E$ be an ellipsoid of minimal volume containing $K$ . There exists a set $S\subseteq[N]$ of size $d$ such that the matrix $A|_{S}=(a_{i})_{i\in S}$ satisfies $\det(A|_{S})^{1/d}=\Omega(\operatorname{vrad}(E))$ .

For the proof of 15 we will use John’s theorem (Theorem 6) and the following elementary algebraic result.

Let $u_{1},\ldots,u_{m}$ be $d$ -dimensional unit vectors and let $c_{1},\ldots,c_{m}$ be positive reals such that

Then there exists a set $S\subseteq[m]$ of size $d$ such that the matrix $U=(u_{i})_{i\in S}$ satisfies $|\det(U)|^{1/d}=\Omega(1)$ .

Notice that ${\rm tr}(u_{i}u_{i}^{T})=\|u_{i}\|_{2}^{2}=1$ for all $i$ . By taking traces of both sides of (17), we have $\sum{c_{i}}=d$ .

Let $U=(u_{i})_{i=1}^{m}$ and let $C$ be the $m\times m$ diagonal matrix with $(c_{i})_{i=1}^{m}$ on the diagonal. Then we can write $I=\sum{c_{i}u_{i}u_{i}^{T}}=UCU^{T}$ . By the Binet-Cauchy formula for the determinant,

The inequality (18) follows since each term $\prod_{i\in S}{c_{i}}$ appears $d!$ times in the expansion of $(\sum{c_{i}})^{d}$ and all other terms in the expansion are positive. Using the inequality $d!\geq(d/e)^{d}$ , we have that $\max_{S\subseteq[m]:|S|=d}{\det(U|_{S})^{2/d}}\geq 1/e$ , and this completes the proof. ∎

Proof of Theorem 15: We can write the minimum enclosing ellipsoid $E$ as $\operatorname{vrad}(E)FB_{2}$ where $F$ is a linear map with determinant $1$ . Since $F^{-1}$ does not change volumes, $B_{2}$ is a minimal volume ellipsoid of $\operatorname{vrad}(E)^{-1}F^{-1}K$ . Also, for any $A|_{S}=(a_{i})_{i\in S}$ , where $S\subseteq[N]$ , we have

Therefore, it is sufficient to show that for $L=\operatorname{sym}\{u_{1},\ldots,u_{N}\}$ such that $B_{2}$ is the minimal volume ellipsoid of $L$ , there exists a set $S\subseteq[N]$ such that the matrix $U|_{S}=(u_{i})_{i\in S}$ satisfies $\det(U|_{S})^{1/d}=\Omega(1)$ . Since, by convexity, the contact points $L\cap B_{2}$ of $L$ are a subset of $u_{1},\ldots,u_{N}$ , the statement follows from Theorem 6 and Lemma 12. $\blacksquare$

Combined with the following theorem of Bárány and Füredi [BF88], and also Gluskin [Glu07] (with sharper bounds), Theorem 15 implies the corollary that for any $d$ -dimensional symmetric polytope one can find a set of $d$ vertices whose symmetric convex hull captures a significant fraction of the volume of the polytope.

Let $K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}$ and let $E$ be an ellipsoid containing $K$ . Then $\operatorname{vrad}(K)\leq O\left(\sqrt{\frac{\log(N/d)}{d}}\right)\operatorname{vrad}(E)$ .

For any $K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}$ there exists a set $S\subseteq[N]$ such that

Finally, we describe the application to differential privacy. By Corollary 1, $\operatorname{volLB}(A,\varepsilon)=O(\frac{1}{\varepsilon^{2}}\log(N/d))\operatorname{detLB}(A,d)$ . Also, by (11), $\operatorname{detLB}(A,d)=O(\log^{2}d)\operatorname{opt}_{c_{1},c_{2}}(A,d)$ . Finally, Lemma 3 implies that $\operatorname{opt}_{c_{1},c_{2}}(A,d)\leq{\varepsilon^{2}}\operatorname{opt}_{\varepsilon,\delta}(A,d/\varepsilon)$ for $\delta$ small enough with respect to $\varepsilon$ . Putting all this together and using Theorem12, we have the following theorem (Theorem 2).

For small enough $\varepsilon$ and all $\delta$ small enough with respect to $\varepsilon$ , for any $d\times N$ real matrix $A$ we have

2 Sparse Case under Pure Privacy

We further extend our results from Section 3.3 and show an efficient $(\varepsilon,0)$ -differentially private algorithm which, on input any query matrix $A$ and any database size bound $n$ , nearly matches $\operatorname{opt}_{\varepsilon,0}(A,n)$ . This proves our main Theorem 4. In fact, our result is stronger: we show an $(\varepsilon,0)$ -differentially private mechanism whose error nearly matches $\operatorname{opt}_{\varepsilon,\delta}(A,n)$ for all $\delta$ small enough with respect to $\varepsilon$ . Thus, the result of this subsection can be seen as a generalization of Theorem 17 to the sparse databases regime.

Our algorithm for sparse databases under pure privacy closely follows Algorithm 3: we add noise from a distribution that is tailored to $A$ but oblivious to the database $x$ ; then we use least squares estimation to reduce error on sparse databases. However, Gaussian noise does not preserve $(\varepsilon,0)$ -differential privacy, and we need to use a different noise distribution. Intuitively, one expects that adding noise sampled from a near-optimal distribution [HT10, BDKT12] and then computing a least squares estimator would be nearly optimal, analogously to Algorithm 3. We are not currently able to analyze the error of this algorithm, but instead we analyze a variant of Algorithm 3 where the Gaussian distribution is simply substituted with the generalized $K$ -norm distribution from [HT10]. Intuitively, we are able to show that the generalized $K$ -norm distribution “approximates a Gaussian” well enough for our analysis to go through. A main tool in our analysis is a classical concentration of measure inequality from convex geometry.

We begin with a slight generalization of the main upper bound result of Hardt and Talwar [HT10]. This generalization follows directly from the methods used in [HT10] with only minor modifications in the proofs. We omit a full derivation here. Also, while the methods of Hardt and Talwar will lead to a proof conditional on the truth of the Hyperplane conjecture from convex geometry, using the ideas of Bhaskara et al. [BDKT12] the result can be made unconditional.

Let $A=(a_{i})_{i=1}^{N}$ be an $d\times N$ real matrix and let $K=\operatorname{sym}\{a_{i}\}_{i=1}^{N}$ . There exists an efficiently computable and efficiently sampleable distribution $\mathcal{W}(A,\varepsilon)$ such that the following claims hold:

the algorithm $\mathcal{M}_{K}$ which on input $x$ outputs $Ax+w$ for $w\sim\mathcal{W}(A,\varepsilon)$ satisfies $(\varepsilon,0)$ -differential privacy;

Using the distribution $\mathcal{W}(A,\varepsilon)$ , we define our near optimal sparse-case algorithm satisfying pure differential privacy as Algorithm 4.

Let $\mathcal{M}_{p}(A,x,n)$ be the output of Algorithm 4 on input a $d\times N$ query matrix $A$ , database size bound $n$ and private input $x$ . $\mathcal{M}_{p}(A,x,n)$ is $(\varepsilon,0)$ -differentially private and for all small enough $\varepsilon$ and all $\delta$ small enough with respect to $\varepsilon$ satisfies

Once again privacy follows by a straightforward argument from the privacy of the underlying noise-adding mechanism, in this case the generalized $K$ -norm mechanism.

$\mathcal{M}_{p}(A,x,n)$ satisfies $(\varepsilon,0)$ -differential privacy.

Next we prove the main technical lemma we need in order to show near optimality. The analysis is very similar to that of Lemma 9. The main technical challenge is to show that the distribution $\mathcal{W}$ has all the properties we needed from the Gaussian distribution: covariance with bounded operator norm and exponential concentration. We use ideas from Section 4.1 and the following variant of a classical concentration of measure inequality, due to Borell [Bor75] (proved in the appendix).

Let $\mu$ be a log-concave distribution over ${\mathds{R}}^{d}$ . Assume that $A$ is a symmetric convex subset of ${\mathds{R}}^{d}$ such that $\mu(A)=\theta\geq\frac{2}{3}$ . Then, for every $t>1$ we have

We are now ready to prove the counterpart of Lemma 9 for $\mathcal{M}_{p}$ .

As in Algorithm 3, we define $r_{i}=\max_{j=1}^{N}{\|U_{i}^{T}a_{j}\|_{2}}$ . Equivalently $r_{i}$ is the radius of the smallest $d_{i}$ -dimensional ball which contains $U_{i}^{T}K$ . In the proof of Lemma 9 we argued that $\operatorname{opt}_{\varepsilon,\delta}(A,n)=\Omega(\frac{n}{\varepsilon}r_{i}^{2})$ for all small enough $\varepsilon$ and all $\delta$ small enough with respect to $\varepsilon$ .

Therefore, for any $a_{j}$ , we can derive the following bound:

Thus applying Cauchy Schwarz once again, we conclude

For any $j$ , the set $\{w:|\langle U_{i}^{T}a_{j},\sum_{i}w_{i}\rangle|\leq T\}$ is symmetric and convex for any bound $T$ . Then, by Chebyshev’s inequality, and Theorem 20 there exists a constant $C$ such that for any $i,j$ and $\alpha>2$

Using a union bound and taking expectations completes the proof. ∎

Proof of Theorem 19: The privacy guarantee follows from Lemmas 13. By Lemma 14 analogously to the proof of Theorem 14, we can conclude that

Pythagoras theorem then implies the result. $\blacksquare$

Universal bounds

The mechanism $\mathcal{M}_{s}$ of Algorithm 5 is $(\varepsilon,\delta)$ -differentially private and satisfies

The privacy of the mechanism is immediate from Lemma 4. To analyze the error, we use Lemma 1 and the fact that $L=nK$ to bound

where $\{a_{j}\}_{j=1}^{N}$ are columns of $A$ . We have used the fact that the $\|w\|_{K^{\circ}}=\max_{a\in K}\langle a,w\rangle$ is attained at one of the vertices of $K$ . Since each $\langle a_{j},w\rangle$ is a Gaussian with variance $r^{4}c(\varepsilon,\delta)^{2}$ , $|\langle a_{j},w\rangle|$ exceeds $r^{2}c(\varepsilon,\delta)\sqrt{t\log N}$ with probability at most $\frac{1}{N^{t}}$ . Taking a union bound, we conclude that this expectation of the maximum is $O(r^{2}c(\varepsilon,\delta)\sqrt{\log N})$ . Recall that $r=\max_{j=1}^{N}{\|a_{j}\|_{2}}\leq\sqrt{d}$ . It follows that

For getting pure $\varepsilon$ -DP, we simply substitute the generalized $K$ -norm distribution guaranteed by Theorem 18 instead of the Gaussian noise.

We first observe an upper bound on the volume radius of projections of $K$ .

Let $A\in^{d\times N}$ and let $K=\operatorname{sym}\{a_{1},\ldots,a_{N}\}$ . Let $\Pi^{(k)}$ be a rank $k$ orthogonal projection that maps ${\mathds{R}}^{d}$ to ${\mathds{R}}^{k}$ . Then

Now we show that Algorithm 6 achieves the bound claimed in Theorem 5.

The mechanism $\mathcal{M}_{sp}$ of Algorithm 6 is $(\varepsilon,0)$ -differentially private and satisfies

Extensions

In this section we describe a couple of extensions to our results. We show how to translate our optimality guarantees for total squared error in the dense case regime to worst case error over queries using the minimax theorem. We further show that our nearly optimal efficient mechanism in the dense case regime implies a polylogarithmic approximation to hereditary discrepancy.

Let $A\in{\mathds{R}}^{d\times N}$ , $\varepsilon,\delta$ be given. There is an $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}_{we}$ , and a non-negative diagonal matrix $P$ with ${\rm tr}(P^{2})=1$ such that for all $i\in[d]$ ,

Thus the mechanism $\mathcal{M}_{g}^{PA}$ satisfies

Recall that the mechanism $\mathcal{M}_{g}^{PA}$ is of the form $Ax+w$ where $w$ is a noise vector whose distribution is independent of $x$ . Thus the pair $(\mathcal{D},P)$ can be computed without looking at the database $x$ . Therefore, the mechanism $\mathcal{M}_{we}$ that samples a $\mathcal{M}$ from $\mathcal{D}$ and runs it on $x$ is itself $(\varepsilon,\delta)$ -DP. The theorem follows. ∎

Using the above result, we can now construct a mechanism that has small expected worse case error.

Let $A\in{\mathds{R}}^{d\times N}$ , $\varepsilon,\delta$ be given. There is an $(\varepsilon,\delta)$ -DP mechanism $\mathcal{M}_{ew}$ , and a non-negative diagonal matrix $P$ with ${\rm tr}(P^{2})=1$ such that

The resulting mechanism is not necessarily $(\varepsilon,\delta)$ -differentially private any more. However, since its outcome can be computed by postprocessing the outcome of $L$ $(\varepsilon,\delta)$ -differentially private mechanisms, it is still $(L\varepsilon,L\delta)$ -differentially private. Scaling $\varepsilon$ and $\delta$ by a factor of $L$ , and substituting for $\beta$ , we get the result. ∎

2 Approximating Hereditary Discrepancy

Muthukrishnan and Nikolov [MN12] show that hereditary discrepancy gives a lower bound on the error of any mechanism.

There exist constants $\varepsilon,\delta$ such that the following holds: Let $A$ be a $d\times N$ matrix and $\mathcal{M}$ be an $(\epsilon,\delta)$ -differentially private mechanism. Then

On the other hand, the mechanism of Theorem 24 satisfies

The description of this mechanism $\mathcal{M}_{ew}$ (which is specified by a distribution over $O(\log d)$ Gaussian noise addition mechanisms) thus serves as an efficiently computable and verifiable witness to an upper bound on the hereditary discrepancy of $A$ .

We next give a constrcutive version of this result, by appealing to a result of Bansal [Ban10] that shows that the discrepancy of any set system can be constructively upper bounded by $O(\log dN)$ times the hereditary vector discrepancy. The vector discrepancy of a matrix $A$ , denoted $\operatorname{vecdisc}(A)$ is defined as the smallest $\lambda$ such that the following semidefinite program is feasible:

The hereditary vector discrepancy is simply $\operatorname{hervecdisc}(A)\triangleq\max_{S\subseteq[N]}\operatorname{vecdisc}(A|_{S})$ . We will show that given the mechanism $\mathcal{M}_{ew}$ of Theorem 24, we can construct for any $S$ a solution to the SDP corresponding to $S$ . We note that the above SDP is a slight relaxation of the SDP used by Bansal: instead of the constraint $V_{jj}=1$ for all $j$ , we simply require each $V_{jj}$ to be at most one, and that the trace of $V$ is $\Omega(m)$ . It is easy to verify that [Ban10] implies that:

Suppose that for a matrix $A\in{\mathds{R}}^{d\times N}$ and a $\lambda\geq 0$ , for every $S\subseteq[N]$ with $rank(A|_{S})=|S|$ , it is the case that the SDP 20 is feasible for $A|_{S}$ . Them there is polynomial time algorithm that finds a coloring $\chi$ of $[N]$ with discrepancy at most $O(\lambda\log dN)$ .

We consider a variant of the above SDP, where we drop the $V_{jj}\leq 1$ constraint and require the trace of $V$ to be slightly larger:

We first show that it suffices to satisfy the SDP 21.

Suppose that for a matrix $A\in{\mathds{R}}^{d\times N}$ , for every $S\subseteq[N]$ with $rank(A|_{S})=|S|$ , it is the case that the SDP 21 is feasible for $A|_{S}$ and $\lambda$ . Then for any $S\subseteq[N]$ with $rank(A|_{S})=|S|$ , the SDP 20 is feasible for $A|_{S}$ and $2\lambda$ .

We construct a feasible solution to 20 by repeatedly using solutions to 21 for different values of the restriction $A|_{S}$ . Fix an $S\subseteq[N]$ with $|S|=m$ and let $S_{0}=S$ . Let $W^{0}$ be the $d\times m$ zero matrix indexed by $S\subseteq[N]$ . Let $V^{0}$ be a solution to SDP 21 for $A|_{S_{0}}$ . Let $\gamma_{0}=\max_{j}V^{0}_{jj}$ , and let $\bar{V}^{0}=V^{0}/2\gamma_{0}$ . For each $j,j^{\prime}\in S_{0}$ , set $W^{1}_{jj^{\prime}}=W^{0}_{jj^{\prime}}+\bar{V}^{0}_{jj^{\prime}}$ . Finally we set $S_{1}=S_{0}\setminus\{j:W^{1}_{jj}\geq\frac{1}{4}\}$ .

Given $S_{i}$ such that $|S_{i}|\geq m/2$ , we let $V^{i}$ be a feasible solution to the SDP for $A|_{S_{i}}$ and set $\gamma_{i}=\max_{j}V^{i}_{jj}$ , and $\bar{V}^{i}=V^{i}/2\gamma_{i}$ . For each $j,j^{\prime}\in S_{i}$ , set $W^{i+1}_{jj^{\prime}}=W^{i}_{jj^{\prime}}+\bar{V}^{i}_{jj^{\prime}}$ . We update $S_{i+1}=S_{i}\setminus\{j:W^{i+1}_{jj}\geq\frac{1}{4}\}$ . We stop once $|S_{i}|$ falls below $m/2$ , say in iteration $L$ . It is easy to see that we delete at least one $j$ from $S_{i}$ in each step, so that this process converges.

We claim that $W^{L}$ is a feasible solution to SDP 20. Observe that $W^{L}$ is simply a a non-negative linear combination of $V^{i}$ ’s and hence is positive semidefinite. By definition of $\bar{V}$ , each diagonal entry is at most $\frac{1}{2}$ , so that the definition of $S_{i}$ ensures that $W^{L}_{jj}\leq\frac{3}{4}<1$ for all $j$ . Moreover, for each $j\in S\setminus S_{L}$ , the entry $W^{L}_{jj}\geq\frac{1}{4}$ , and there are at least $m/2$ such $j$ , so that the trace of $W^{L}$ is at least $m/8$ as required. Finally, $a^{(i)}W^{L}a^{(i)T}$ is simply the sum $\sum_{t=0}^{L-1}\frac{a^{(i)}V^{t}a^{(i)T}}{2\gamma_{t}}$ . Thus it suffices to show that $\sum_{t=1}^{L-1}(2\gamma_{t})^{-1}\leq 2$ . Note that ${\rm tr}(W^{t+1}-W^{t})={\rm tr}(\bar{V}^{t})\geq(2\gamma_{t})^{-1}|S_{t}|\geq(2\gamma_{t})^{-1}m/2$ . It follows that $m\geq{\rm tr}(W^{L})\geq\sum_{t=1}^{L-1}(2\gamma_{t})^{-1}m/2$ so that $\sum_{t=1}^{L-1}(2\gamma_{t})^{-1}\leq 2$ as needed. The claim follows. ∎

Finally, we describe how to use the mechanism $\mathcal{M}_{ew}$ of Theorem 24 to construct a feasible solution to SDP 21.

Conclusions

We have presented near optimal mechanisms for any linear query for dense and sparse databases, under both pure and approximate differential privacy. Our mechanisms are simple and efficient, and it would be instructive to implement them so as to compare them with existing techniques.

Acknowledgements

The first and second named authors would like to thank Moritz Hardt and Daniel Dadush for several helpful discussions. The first named author was partially supported by a Simons Graduate Fellowship, award number 252861.

References

Appendix A Concentration of Log concave measures

The following measure concentration inequality is standard. We include a proof below for completeness. We start by stating the Brunn-Minkowski inequality.

Let $\mu$ be a log-concave measure on $\Re^{d}$ , let $\alpha,\beta\geq 0$ be numbers such that $\alpha+\beta=1$ , and let $A,B\subseteq\Re^{d}$ be measurable sets such that the set $\alpha A+\beta B$ is measurable. Then

This can be used to prove Borell’s lemma for arbitrary log concave distributions. We use the proof approach presented in [Gia03].

Applying the Brunn-Minkowski inequality and rearranging proves the result. $\blacksquare$

Appendix B From Concentration to Expectation

We repeatedly use the fact that a exponential or better bound on the upper tail implies a bound on the expectation. For completeness, we give a quantitative version with a proof.

The claim follows by linearity of expectation. ∎

Appendix C Hardness of Approximating Hereditary Discrepancy

The proof of Theorem 29 is a straight-forward reduction from the 2-colorability problem for 3-uniform hypergraphs. A maximization version of this problem (i.e. maximize the number of bichromatic edges) is also known as Max-E3-Set Splitting and is equivalent to NotAllEqual-SAT restricted to inputs with no negated variables.

A hypergraph $H=(V,E)$ , where $E\subseteq 2^{V}$ , is 2-colorable if and only if there exists a set $T\subseteq V$ such that for all $e\in E$ , $T\cap e\neq e$ and $T\cap e\neq\emptyset$ . The set $T$ is called a transversal of $H$ .

There exists a family of 3-uniform hypergraphs such that deciding whether a hypergraph in the family is 2-colorable is $\mathsf{NP}$ -complete.

Proof of Theorem 29: The reduction simply maps a 3-uniform hypergraph to its incidence matrix. I.e. for a hypergraph $H=(V,E)$ , where $V=\{v_{1},\ldots,v_{n}\}$ and $E=\{e_{1},\ldots,e_{m}\}$ , we create a $m\times n$ matrix $A$ , where $A_{ij}=1$ if $v_{j}\in e_{i}$ and $A_{ij}=0$ otherwise. Observe that if $H$ is 2-colorable, and this is witnessed by a transversal $T$ , then $\|(A|_{S})x\|_{\infty}\leq 2$ for all $S\subseteq[n]$ and for $x$ defined by $x_{i}=+1\Leftrightarrow v_{i}\in T$ . On the other hand, if $H$ is not 2-colorable, for any $x\in\{+1,-1\}^{n}$ we have $\|Ax\|_{\infty}\geq 3$ , since otherwise the set $T=\{v_{i}:x_{i}=+1\}$ would be a transversal. $\blacksquare$

We note that Guruswami proved constant hardness of approximation for Max-E3-Set Splitting [Gur00]. In particular, he showed that it is $\mathsf{NP}$ -hard to distinguish 2-colorable 3-uniform hypergraphs from hypergraphs for which any coloring with 2 colors leaves at least a $1/20$ fraction of the edges monochromatic.