The Geometry of Differential Privacy: the Sparse and Approximate Cases
Aleksandar Nikolov, Kunal Talwar, Li Zhang
Introduction
Differential privacy [DMNS06] is a recent privacy definition that has quickly become the standard notion of privacy in statistical databases. Informally, a mechanism (a randomized function on databases) satisfies differential privacy if the distribution of the outcome of the mechanism does not change noticeably when one individual’s input to the database is changed. Privacy is measured by how small this change must be: an -differentially private (-DP) mechanism satisfies for any pair of neighboring databases, and for any measurable subset of the range. A relaxation of this definition is approximate differential privacy. A mechanism is -differentially private (-DP) if with as before. Here is thought of as negligible in the size of the database. Both these definitions satisfy several desirable properties such as composability, and are resistant to post-processing of the output of the mechanism.
We think of the database as being given by a multiset of database rows, one for each individual. We will let denote the size of the universe that these rows come from, and we will denote by the number of individuals in the database. We can represent the database as its histogram with denoting the number of occurrences of the th element of the universe. Thus would in fact be a vector of non-negative integers with . We will be concerned with reporting reasonably accurate answers to a given set of linear queries over this histogram . This set of queries can naturally be represented by a matrix with the vector giving the correct answers to the queries. When , we call such queries counting queries. We are interested in the (practical) regime where , although our results hold for all settings of the parameters.
Specific sets of counting queries of interest can however admit much better mechanisms than adversarially chosen queries for which the lower bounds are shown. Indeed several classes of specific queries have attracted attention. Some, such as range queries, are “easier”, and asymptotically better mechanisms can be designed for them. Others, such as constant dimensional contingency tables, are nearly as hard as general counting queries, and asymptotically better mechanisms can be ruled out in some ranges of the parameters. These query-specific upper bounds are usually proved by carefully exploiting the structure of the query, and query-specific lower bounds have been proved by reconstruction attacks that exploit a lower bound on the smallest singular value of an appropriately chosen [DN03, DMT07, DY08, KRSU10, De12, KRS13]. It is natural to address this question in a competitive analysis framework: can we design an efficient algorithm that given any query , computes (even approximately) the minimum error differentially private mechanism for ?
Hardt and Talwar [HT10] answered this question in the affirmative for -DP mechanisms, and gave a mechanism that has error within factor of the optimal assuming a conjecture from convex geometry known as the hyperplane conjecture or the slicing conjecture. Bhaskara et al. [BDKT12] removed the dependence on the hyperplane conjecture and improved the approximation ratio to . Can relaxing the privacy requirement to -DP help with accuracy? In many settings, -DP mechanisms can be simpler and more accurate than the best known -DP mechanisms. This motivates the first question we address.
Given , can we efficiently approximate the optimal error -DP mechanism for it?
Hardt and Talwar [HT10] showed that for some , the lower bound for -DP mechanism can be larger than known -DP mechanisms. For non-linear Lipschitz queries, De [De12] showed that this gap can be as large as (even when ). This leads us to ask:
How large can the gap between the optimal -DP mechanism and the optimal -DP mechanism be for linear queries?
Given and , can we approximate the optimal sparse case error -DP mechanism for when restricted to databases of size at most ?
Given and , can we approximate the optimal sparse case error -DP mechanism for when restricted to databases of size at most ?
In this paper, we are interested in both the case when , called the dense case, and when for , called the sparse case. We also write and .
Our first result is a simple and efficient mechanism that for query matrix gives an approximation to the optimal error.
Given a query matrix , there is an efficient -DP mechanism and an efficiently computable lower bound such that
, and
for any -DP mechanism , .
We also show that the gap of between -DP and -DP mechanisms shown in [HT10] is essentially the worst possible, within factor, for linear queries. More precisely, the lower bound on -DP mechanisms used in [HT10] is always within of the lower bound computed by our algorithm above. Let denote the -DP generalized -norm mechanism in [HT10].
For any -DP mechanism , .
We next move to the sparse case. Here we give results analogous to the dense case with a slightly worse approximation ratio.
Given and a bound , there is an efficient -DP mechanism and an efficiently computable lower bound such that
, and
For any -DP mechanism , .
Given and a bound , there is an efficient -DP mechanism and an efficiently computable lower bound such that
, and
For any -DP mechanism , .
We remark that in these theorems, our upper bounds hold for all with , whereas the lower bounds hold even when is an integer vector.
The -DP mechanism of Theorem 3 when run on any counting query has error no larger than the best known bounds [GRU12] for counting queries, up to constants (not ignoring logarithmic factors). The -DP mechanism of Theorem 4 when run on any counting query can be shown to have nearly the same asymptotics, answering question 5 in the affirmative.
We will summarize some key ideas we use to achieve these results. More details will follow in Section 1.2.
For the upper bounds, the first crucial step is to decompose into “geometrically nice” components and then add Gaussian noise to each component. This is similar to the approach in [HT10, BDKT12] but we use the minimum volume enclosing ellipsoid, rather than the -ellipsoid used in those works, to facilitate the decomposition process. This allows us to handle the approximate and the sparse cases. In addition, it simplifies the mechanism as well as the analysis. For the sparse case, we further couple the mechanism with least squares estimation of the noisy answer with respect to . By utilizing techniques from statistical estimation, we can show that this process can reduce the error when , and prove an error upper bound dependent on the size of the smallest projection of .
For the lower bounds, we first lower bound the accuracy of -DP mechanism by the hereditary discrepancy of the query matrix , which we in turn lower bound in terms of the least singular values of submatrices of . Finally, we close the loop by utilizing the restricted invertibility principle by Bourgain and Tzafriri [BT87] and its extension by Vershynin [Ver01] which, informally, shows that if there does not exist a “small” projection of then has a “large” submatrix with a “large” least singular value.
The discrepancy of a matrix is defined to be . The hereditary discrepancy of a matrix is defined as , where denotes the matrix restricted to the columns indexed by .
As hereditary discrepancy is a maximum over exponentially many submatrices, it is not a priori clear if there even exists a polynomial-time verifiable certificate for low hereditary discrepancy. Additionally, we can show that it is -hard to approximate hereditary discrepancy to within a factor of . Bansal [Ban10] gave a pseudo-approximation algorithm for hereditary discrepancy, which efficiently computes a coloring of discrepancy at most a factor of larger than for a matrix . His algorithm allows efficiently computing a lower bound on for any restriction ; however, such a lower bound may be arbitrarily loose, and before our work it was not known how to efficiently compute nearly matching lower and upper bounds on .
2 Techniques
In addition to known techniques from the differential privacy literature, our work borrows tools from discrepancy theory, convex geometry and statistical estimation. We next briefly describe how they fit in.
Central to designing a provably good approximation algorithm is an efficiently computable lower bound on the optimum. Muthukrishnan and Nikolov [MN12] proved that (a slight variant of) the hereditary discrepancy of leads to a lower bound for the error of any -DP mechanism. Lovász, Spencer and Vesztergombi [LSV86] showed that hereditary discrepancy itself can be lower bounded by a quantity called the determinant lower bound. Geometrically, this lower bound corresponds to picking the columns of that (along with the origin) give us a simplex with the largest possible volume. The volume or this simplex, appropriately normalized, gives us a lower bound on OPT. More precisely for any simplex , gives a lower bound on the error. The factor can be removed by using a lower bound based on the least singular values of submatrices of . Geometrically, for the least singular value lower bound we need to find a simplex of large volume whose non-zero vertices are also nearly pairwise orthogonal.
If the columns of all lie in a unit ball of radius , it can be shown that adding Gaussian noise proportional to suffices to guarantee -DP, resulting in a mechanism having total squared error . Can we relate this quantity to the lower bound? It turns out that if the unit ball of radius is the minimum volume ellipsoid containing the columns of , this can be done. In this case, a result of Vershynin [Ver01], building on the restricted invertability results by Bourgain and Tzafriri [BT87], tells us that one can find vertices of that touch the minimum containing ellipsoid, and are nearly orthogonal. The simplex formed by these vertices therefore has large volume, giving us a -DP lower bound of . In this case, the Gaussian mechanism with the optimal is within a constant factor of the lower bound. When the minimum volume enclosing ellipsoid is not a ball, we need to project the query along the shortest axes of this ellipsoid, answer this projection using the Gaussian mechanism, and recurse on the orthogonal projection. Using the full power of the restricted invertability result by Vershynin allows us to construct a large simplex and prove our competitive ratio.
Hardt and Talwar [HT10] also used a volume based lower bound, but for -DP mechanisms, one can take , the symmetric convex hull of all the columns of and use its volume instead of the volume of in the lower bound above. How do these lower bounds compare? By a result of Bárány and Füredi [BF88] and Gluskin [Glu07], one can show that the volume of the convex hull of points can be bounded by times that of the minimum enclosing ellipsoid. This, along with the aforementioned restricted invertability results, allows us to prove that the -DP lower bound is within of the -DP lower bound.
How do we handle sparse queries? The first observation is that the lower bounding technique gives us columns of and the resulting lower bound holds not just for but even for the submatrix of corresponding to the maximum volume simplex ; moreover, the lower bound holds even when all databases are restricted to individuals. Thus the lower bound holds when and this value marks the transition between the sparse and the dense cases. Moreover, when the minimum volume ellipsoid containing the columns of is a ball, the restricted invertibility principle of Bourgain and Tzafriri and Vershynin gives us a -dimensional simplex with nearly pairwise orthogonal vertices, and, therefore any -dimensional face of this simplex is another simplex of large volume. The large -dimensional simplex gives a lower bound on error when databases are restricted to have at most individuals.
To get an -DP mechanism, we use the -norm mechanism [HT10] instead of Gaussian noise. To bound the shadow of on , where is the noise vector generated by the -norm mechanism, we first analyze the expectation of for any column of , and we use the log concavity of the noise distribution to prove concentration of this random variable. A union bound helps complete the argument as in the Gaussian case.
3 Related Work
Dwork et al. [DMNS06] showed that any query can be released while adding noise proportional to the total sensitivity of the query. This motivated the question of designing mechanisms with good guarantees for any set of low sensitivity queries. Nissim, Raskhodnikova and Smith [NRS07] showed that adding noise proportional to (a smoothed version of) the local sensitivity of the query suffices for guaranteeing differential privacy; this may be much smaller than the worst case sensitivity for non-linear queries. Lower bounds on the amount of noise needed for general low sensitivity queries have been shown in [DN03, DMT07, DY08, DMNS06, RHS07, HT10, De12]. Kasiviswathan et al. [KRSU10] showed upper and lower bounds for contingency table queries and more recently [KRS13] showed lower bounds on publishing error rates of classifiers or even M-estimators. Muthukrishnan and Nikolov [MN12] showed that combinatorial discrepancy lower bounds the noise for answering any set of linear queries.
Using learning theoretic techniques, Blum, Ligett and Roth [BLR08] first showed that one can exploit sparsity of the database, and answer a large number of counting queries with error small compared to the number of individuals in the database. This line of work has been further extended and improved in terms of error bounds, efficiency, generality and interactivity in several subsequent works [DNR+09, DRV10, RR10, HR10, GHRU11, HLM12].
Ghosh, Roughgarden and Sundarajan [GRS09] showed that for any one dimensional counting query, a discrete version of the Laplacian mechanism is optimal for pure privacy in a very general utilitarian framework and Gupte and Sundararajan [GS10] extended this to risk averse agents. Brenner and Nissim [BN10] showed that such universally optimal private mechanisms do not exist for two counting queries or for a single non-binary sum query. As mentioned above, Hardt and Talwar [HT10], and Bhaskara et al. [BDKT12] gave relative guarantees for multi-dimensional queries under pure privacy with respect to total squared error. De [De12] unified and strengthened these bounds and showed stronger lower bounds for the class of non-linear low sensitivity queries.
For specific queries of interest, improved upper bounds are known. Barak et al. [BCD+07] studied low dimensional marginals and showed that by running the Laplace mechanism on a different set of queries, one can reduce error. Using a similar strategy, improved mechanisms were given by [XWG10, CSS10] for orthogonal counting queries, and near optimal mechanisms were given by Muthukrishnan and Nikolov [MN12] for halfspace counting queries. The approach of answering a set of queries different from the target query set has also been studied in more generality and for other sets of queries by [LHR+10, DWHL11, RHS07, XWG10, XXY10, YZW+12]. Li and Miklau [LM12a, LM12b] study a class of mechanisms called extended matrix mechanisms and show that one can efficiently find the best mechanisms from this class. Hay et al. [HRMS10] show that in certain settings such as unattributed histograms, correcting noisy answers to enforce a consistency constraint can improve accuracy.
Very recently, Fawaz et al. [FMN] used the hereditary discrepancy lower bounds of Muthukrishnan and Nikolov, as well as the determinant lower bound on discrepancy of Lovasz, Spencer, and Vesztergombi, to prove that a certain Gaussian noise mechanism is nearly optimal (in the dense setting) for computing any given convolution map. Like our algorithms, their algorithm adds correlated Gaussian noise; however, they always use the Fourier basis to correlate the noise.
We refer the reader to texts by Chazelle [Cha00] and Matoušek [Mat99] and the chapter by Beck and Sós [BS95] for an introduction to discrepancy theory. Bansal [Ban10] showed that a semidefinite relaxation can be used to design a pseudo-approximation algorithm for hereditary discrepancy. Matoušek [Mat11] showed that the determinant based lower bound of Lovász, Spencer and Vesztergombi [LSV86] is tight up to polylogarithmic factors. Larsen [Lar11] showed applications of hereditary discrepancy to data structure lower bounds, and Chandrasekaran and Vempala [CV11] recently showed applications of hereditary discrepancy to problems in integer programming.
Preliminaries
We start by introducing some basic notation.
For a matrix and a set , we denote by the submatrix of consisting of those columns of indexed by elements of . Occasionally we refer to a matrix whose columns form an orthonormal basis for some subspace of interest as the orthonormal basis of . is the set of orthogonal projections onto -dimensional subspaces of .
By and we denote, respectively, the smallest and largest singular value of . I.e., and . In general, is the -th largest singular value of , and is the -th largest eigenvalue of . We recall the minimax characterization of eigenvalues for symmetric matrices:
For a matrix (and the corresponding linear operator), we denote by the spectral norm of and the Frobenius norm of . By we denote the kernel of , i.e. the subspace of vectors for which .
For a set , we denote by its -dimensional volume. Often we use instead the volume radius
For a convex body , the polar body is defined by . The fundamental fact about polar bodies we use is that for any two convex bodies and
In the remainder of this paper, when we claim that a fact follows “by convex duality,” we mean that it is implied by (1).
A convex body is (centrally) symmetric if . The Minkowski norm induced by a symmetric convex body is defined as . The Minkowski norm induced by the polar body of is the dual norm of and also has the form . For convex symmetric , the induced norm and dual norm satisfy Hölder’s inequality:
An ellipsoid in is the image of under an affine map. All ellipsoids we consider are symmetric, and therefore, are equal to an image of the ball under a linear map . A full dimensional ellipsoid can be equivalently defined as . The polar body of a symmetric ellipsoid is the ellipsoid (or cylinder with an ellipsoid as its base in case is not full dimensional) .
We repeatedly use a classical theorem of Fritz John, characterizing the (unique) minimum volume enclosing ellipsoid (MEE) of any convex body . We note that John’s theorem is frequently stated in terms of the maximum volume enclosed ellipsoid in ; the two variants of the theorem are equivalent by convex duality. The MEE of is also known as a the Löwner or Löwner-John ellipsoid of .
Any convex body is contained in a unique ellipsoid of minimal volume. This ellipsoid is if and only if there exist unit vectors and positive reals such that
According to John’s characterization, when the MEE of is the ball , the contact points of and satisfy a structural property — the identity decomposes into a linear combination of the projection matrices onto the lines of the contact points. Intuitively, this means that “hits” in all directions — it has to, or otherwise can be “pinched” in order to produce a smaller ellipsoid that still contains . This intuition is formalized by a theorem of Vershynin, which generalizes the work of Bourgain and Tzafriri on restricted invertibility [BT87]. Vershynin ([Ver01] Theorem 3.1) shows that there exist contact points of and which are approximately pairwise orthogonal.
Let be a symmetric convex body whose minimum volume enclosing ellipsoid is the unit ball . Let be a linear map with spectral norm . Then for any , there exist constant , and contact points with such that the matrix satisfies
2 Statistical Estimation
A key element in our algorithms for the sparse case is the use of least squares estimation to reduce error. Below we present a bound on the error of least squares estimation with respect to symmetric convex bodies. This analysis appears to be standard in the statistics literature; a special case of it appears for example in [RWY11].
First we show the easier bound , which follows by the triangle inequality:
The second bound is based on Hölder’s inequality and the following simple but very useful fact, illustrated schematically in Figure 1:
3 Differential Privacy
Following recent work in differential privacy, we model private data as a database of rows, where each row of contains information about an individual. Formally, a database is a multiset of size of elements of the universe of possible user types. Our algorithms take as input a histogram of the database , where the -th component of encodes the number of individuals in of type . Notice that in this histogram representation, we have when is a database of size . Also, two neighboring databases and that differ in the presence or absence of a single individual correspond to two histograms and satisfying .
Through most of this paper, we work under the notion of approximate differential privacy. The definition follows.
When , we are in the regime of pure differential privacy.
An important basic property of differential privacy is that the privacy guarantees degrade smoothly under composition and are not affected by post-processing.
Let and satisfy - and -differential privacy, respectively. Then the algorithm which on input outputs the tuple satisfies -differential privacy.
In this paper we study the necessary and sufficient error incurred by differentially private algorithms for approximating linear queries. A set of linear queries is given by a query matrix or workload ; the exact answers to the queries on a histogram are given by the -dimensional vector .
We define error as total squared error. More precisely, for an algorithm and a subset , we define
We also write as . The optimal error achievable by any -differentially private algorithm for queries and databases of size up to is
where the infimum is taken over all -differentially private algorithms. When no restrictions are placed on the size of the database, the appropriate notion of optimal error is . Similarly, for an algorithm , the error when database size is not bounded is . A priori it is not clear that these quantities are necessarily finite, but we will show that this is the case.
In order to get tight dependence on the privacy parameter in our analyses, we will use the following relationship between and .
For any , any , any integer and for ,
Let be an -differentially private algorithm achieving . We will use as a black box to construct a -differentially private algorithm which satisfies the error guarantee .
The algorithm on input satisfying outputs . We need to show that satisfies -differential privacy. Let and be two neighboring inputs to , i.e. , and let be a measurable subset of the output . Denote and . We need to show that . To that end, define , , , , . Applying the -privacy guarantee of to each of the pairs of neighboring inputs , , , in sequence gives us
This finishes the proof of privacy for . It is straightforward to verify that . ∎
3.2 Gaussian Noise Mechanism
A basic mechanism for achieving -differential privacy for linear queries is adding appropriately scaled independent Gaussian noise to each query. This approach goes back to the work of Blum et al. [BDMN05], predating the definition of differential privacy. Next we define this basic mechanism formally and give a privacy guarantee. The privacy analysis of the Gaussian mechanism in the context of -differential privacy was first given in [DKM+06b]. We give the full proof here for completeness.
Let be a matrix such that . Then a mechanism which on input outputs , where , satisfies -differential privacy.
Let and let be the probability density function of . Let also , so implies . Define
We will prove that when , for all , . This suffices to prove -differential privacy. Indeed, let the algorithm output and fix any s.t. . Let and . For any measurable we have
Note that to bound we simply need to bound from above and below. Since , we can apply a Chernoff bound and we get
Substituting completes the proof. ∎
The following corollary is a useful geometric generalization of Lemma 4.
Let be a matrix of rank and let . Let ( is a linear map) be an ellipsoid containing . Then a mechanism that outputs where satisfies -differential privacy.
Since is full dimensional (by ) and contains , is full dimensional as well, and, therefore, is an invertible linear map. Define . For each column of , we have . Therefore, by Lemma 4, a mechanism that outputs (where is distributed as in the statement of the corollary) satisfies -differential privacy. Therefore, is -differentially private by the post-processing property of differential privacy. ∎
We present a composition theorem, specific to composing Gaussian noise mechanisms. We note that a similar composition result in a much more general setting but with slightly inferior dependence on the parameters is proven in [DRV10].
Let be vector spaces of respective dimensions , such that , and . Let be a matrix of rank and let . Let be the projection matrix for and let be an ellipsoid such that . Then the mechanism that outputs where for each , , satisfies -differential privacy.
Let . Since the random variables are pairwise independent Gaussian random variables, and has covariance matrix , we have that is a Gaussian random variable with covariance , whee . By Corollary 8, it is sufficient to show that the ellipsoid contains . By convex duality, this is equivalent to showing , which is in turn equivalent to . Recalling that and , we need to establish
We proceed by establishing (4). Since for all , , by duality and the same reasoning as above, we have that for all and , . Therefore, by the Cauchy-Schwarz inequality,
3.3 Noise Lower Bounds
We will make extensive use of a lower bound on the noise complexity of -differentially private mechanisms in terms of combinatorial discrepancy. First we need to define the notion of hereditary -discrepancy:
Next we present the lower bound, which is a simple extension of the discrepancy lower bound on noise recently proved by Muthukrishnan and Nikolov [MN12].
Let be an real matrix. For any constant and sufficiently small constant and ,
Substituting (5) into Theorem 10, we have that there exist constants and such that
For the remainder of this paper we fix some constants and for which (6) holds. Similarly to the notation for , we will also sometimes denote . We will use primarily the spectral lower bound (6) for arguing the optimality of our algorithms.
To show the small gap between the approximate and pure privacy (Theorem 2), we next develop a determinant based lower bound. We first switch from to the classical notion of hereditary discrepancy, equivalent to , by observing the following relation between and from [MN12]:
We then use an extension of the classical determinant lower bound for hereditary discrepancy, due to Lovász, Spencer, and Vesztergombi.
Let us first define linear discrepancy. For a matrix and , let be defined as
The linear discrepancy of is then defined as . We claim that for any ,
We complete the proof of the theorem by proving a lower bound on in terms of . We note that a similar lower bound can be proved for any variant of defined in terms of any norm. The exact lower bound will depend on the volume radius of the unit ball of the norm. Since the proof of (8) also works for any norm, we get a determinant lower bound for hereditary discrepancy defined in terms of any norm as well.
We show that for any matrix
Letting range over , range over all submatrices of , and applying the bounds (8) and (9) implies the theorem.
We proceed to prove (9). Note that if , (9) is trivially true; therefore, we may assume that . Note also that without loss of generality we can take to be the orthogonal projection onto the range of , since this is the projection operator that maximizes . Let be the ellipsoid . The inequality is equivalent to
Thus , and therefore . On the other hand, the volume of is equal to
Applying the standard estimate completes the proof. ∎
By the determinant lower bound, and (7), we get our determinant lower bound on the noise necessary for privacy. For some constant ,
Finally, we recall the stronger volume lower bound against -differential privacy from [HT10, BDKT12]. This lower bound is nearly optimal for -differential privacy, but does not hold for -differential privacy when is .
For any real matrix ,
where .
Furthermore, there exists an efficient mechanism (the generalized -norm mechanism) which is -differentially private and satisfies .
Algorithms for Approximate Privacy
In this section we present our main results: efficient nearly optimal algorithms for approximate privacy in the cases of dense databases () and sparse databases (). Both algorithms rely on recursively computing an orthonormal basis for , based on the minimum volume enclosing ellipsoid of the columns of the query matrix . We first present the algorithm for computing this basis, together with a property essential for the analyses of the two algorithms presented next.
We first present an algorithm (Algorithm 1) that, given a matrix , computes a set of orthonormal matrices , where . For each , , and the union of the columns of forms an orthonormal basis for . Thus, Algorithm 1 computes a basis for , and partitions (“decomposes”) it into bases of mutually orthogonal subspaces. This set of bases also induces a decomposition of into , where . The base decomposition of Algorithm 1 is essential to both our dense case and sparse case algorithms. Intuitively, for both cases we can show that the error of a simple mechanism applied to can be matched by an error lower bound for . The error lower bounds are based on the spectral lower bound on discrepancy; the geometric properties of the minimum enclosing ellipsoid of a convex body together with the restricted invertibility principle of Bourgain and Tzafriri are key in deriving the lower bounds.
The next lemma captures the technical property of the decomposition of Algorithm 1 that allows us to prove matching upper and lower error bounds for our dense and sparse case algorithms.
Let be the dimension of the span of . Furthermore, for , let , and let . For every , there exists a set , such that and .
Let us, for ease of notation, assume that is a power of . We prove that there exists a set , , such that
By applying an appropriate unitary transformation to the columns of , we may assume that the major axes of are co-linear with the standard basis vectors of , and, therefore, is a diagonal matrix with . This transformation comes without loss of generality, since it applies a unitary transformation to the columns of and and does not affect the singular values of any matrix for any . For the rest of the proof we assume that is diagonal.
Since is diagonal, is equal to , the -th standard basis vector. Therefore is diagonal and equal to the projection onto , and is also diagonal and equal to the projection onto . Consider (recall that we assumed that and therefore is non-singular). Since the minimum enclosing ellipsoid of is , we have that the minimum enclosing ellipsoid of is . Let be the projection onto . Then, by Theorem 7, and because , we have that there exists a set of size such that . We chose , and therefore , as well as to be diagonal matrices, so they all commute. Then, since is a projection matrix,
Observe that, since , we have
Therefore, . Substituting for into (14) completes the proof. ∎
2 The Dense Case: Correlated Gaussian Noise
Our first result is an efficient algorithm whose expected error matches the spectral lower bound up to polylogarithmic factors and is therefore nearly optimal. This proves Theorem 1. The algorithm adds correlated unbiased Gaussian noise to the exact answer . The noise distribution is computed based on the decomposition algorithm of the previous subsection.
Let be the output of Algorithm 2 on input a query matrix and private input . is -differentially private and for all small enough and all small enough with respect to satisfies
We start the proof of Theorem 13 with the privacy analysis. For ease of notation, we assume throughout the analysis that is a power of 2.
satisfies -differential privacy.
The lemma follows from Corollary 9. Next we describe in detail why the corollary applies.
Let be the base decomposition computed by Algorithm 1 on input . Let be the subspace spanned by the columns of and let be the dimension of . The projection matrix onto is . Let be the ellipsoid ( is ). By the definition of , , and therefore, . is distributed identically to . Therefore, by Corollary 9, satisfies -differential privacy. ∎
For all small enough and all small enough with respect to , for all ,
where , and are as defined in Algorithm 2.
By (15), it is enough to lower bound by . As a first step we lower bound . Then the lower bound on will follow from (6) and Lemma 3.
To lower bound , we invoke Lemma 5. It follows from the lemma that for every there exists a projection matrix and a set such that , and, furthermore, . Substituting into the the definition of , we have that for all .
Therefore, by (6), for all . Finally, by Lemma 3, there exists a small enough , for which , and this completes the proof. ∎
Proof of Theorem 13: The proof of the theorem follows from Lemma 6 and Lemma 7. The privacy guarantee is direct from Lemma 6. Next we prove that the error of is near optimal.
3 The Sparse Case: Least Squares Estimation
In this subsection we present an algorithm with stronger accuracy guarantees than Algorithm 2: it is optimal for any query matrix and any database size bound (Theorem 3). The algorithm combines the noise distribution of Algorithm 2 with a least squares estimation step. Privacy is guaranteed by noise addition, while the least squares estimation step reduces the error significantly when . The algorithm is shown as Algorithm 3.
To finish the proof of the lemma, we need to lower bound by . We will use Lemma 5 to lower bound by and then we will invoke Lemma 3 to get the right dependence on .
By Lemma 5, for every there exists a projection matrix and a set such that , and, furthermore, . By the definition of , for all , , and, therefore, . Take to be an arbitrary subset of of size . The smallest singular value of is a lower bound on the smallest singular value of :
where is the subset of coordinates on which is nonzero. Therefore, . Substituting into the definition of , we have
Therefore, by (6), for all . Finally, by Lemma 3, there exists a small enough , for which , and this completes the proof. ∎
We show that the first term on the right hand side of (16) is at most and the second term is .
Since, by the definition of , , we have the desired bound.
The last equality follows from the definition of the dual norm and from the fact that is a polytope with vertices , so any linear functional on is maximized at one of the vertices. From the fact that for all we have , from the triangle inequality, and from Lemma 9, we derive
4 Computational Complexity
In this subsection we consider the computational complexity of our algorithms. We pay special attention to approximating the minimum enclosing ellipsoid of a polytope and computing least squares estimators. For both problems we need to go into the properties of known approximation algorithms in order to verify that the approximations are sufficient to guarantee that our algorithms can be implemented in polynomial time without hurting their near-optimality.
Computationally the most expensive step of the base decomposition algorithm (Algorithm 1) is computing the minimum enclosing ellipsoid of . Computing the exact MEE can be costly: the fastest known algorithms have complexity on the order of [AS93]. However, for our purposes it is enough to compute an approximation of in Banach-Mazur distance, i.e. some ellipsoid satisfying for an absolute constant . Known approximation algorithms for MME guarantee that their output is an enclosing ellipsoid with volume approximately equal to that of the MEE [Kha96, TY07]. It is not immediately clear whether such an approximation is also a Banach-Mazur approximation. However, we can use the fact that the algorithms in [Kha96, KY05, TY07] output an ellipsoid satisfying approximate complimentary slackness conditions and show that approximates in Banach-Mazur sense for some projection onto a subspace of dimension . This suffices for a slightly weaker version of Lemma 5.
We begin with a definition. Let’s define a vector to be -optimal for if the following conditions are satisfied:
for all , where (we use this notation throughout this section).
-optimality implies the following property of the the ellipsoid which is key to our analysis.
Let be the minimum enclosing ellipsoid of , and let be -optimal for . Let also . Then,
Let . Since , we have that the MEE of is the unit ball, and, therefore, for all . Since , we have
By Markov’s inequality, . Let be the projection operator onto the subspace spanned by the left singular vectors of corresponding to . We have . Multiplying on both sides by , we get
Let be the matrix . Since is similar to , it is also a projection matrix onto a dimensional subspace. We have that , and therefore
Define . The inclusion above is equivalent to the positive semidefiniteness of the matrix . As is a projection onto a dimensional subspace, by the standard minimax characterization of eigenvalues we have . We recall the (dual) Weyl inequalities for symmetric matrices and :
The inequalities are standard and follow from the minimax characterization of eigenvalues and dimension counting arguments — see, e.g. Chapter 1 in [Tao12]. Substituting and , and , we have the inequality
Finally we give an analogue of Lemma 5 for the variant of the base decomposition algorithm that uses an approximate MEE. The proof follows from Lemma 10 and the arguments used to prove Lemma 5. We omit a full proof here.
Consider a variant of Algorithm 1 that, at each step, uses , where is -optimal for , rather than the minimum enclosing ellipsoid . Let be the dimension of the span of . For any there exists a subspace of dimension and a set of size , such that .
One can verify that in all our proofs we can substitute Lemma 11 for Lemma 5 without changing the asymptotics of our lower and upper bounds. Therefore, in all our algorithms we can use the variant of Algorithm 1 from Lemma 11 without compromising near-optimality. This variant of Algorithm 1 runs in time .
Notice that the base decomposition can be reused for different databases, as long as the query matrix stays unchanged; once the decomposition is computed the rest of the algorithm is very efficient: it involves some standard algebraic computations and sampling from an -dimensional gaussian distribution. Furthermore, any ellipsoid containing suffices for privacy, and one may use heuristic approximations to the MEE problem.
4.2 Least Squares Estimator
Except for base decomposition, the other potentially computationally expensive step in Algorithm 3 is the computation of a least squares estimator . This is a quadratic minimization problem, and can be approximated by the simple Frank-Wolfe gradient descent algorithm [FW56]. In particular, for a point such that , Lemma 1 holds to within an additive approximation factor , i.e. . We call such a point an additive approximation to the least squares estimator problem. By the analysis of Clarkson [Cla10], iterations of the Frank-Wolfe algorithm give an additive approximation where , for . In our case . In order to have near optimality for Algorithm 3, an additive approximation suffices. Using the triangle inequality and Cauchy-Schwarz, we can bound for as
Therefore, iterations of the Frank-Wolfe algorithm suffice. Since each iteration of the algorithm involves dot product computations and solving a homogeneous linear system in at most variables and at most equations, it follows that an approximate version of Algorithm 3 with unchanged asymptotic optimality guarantees can be implemented in time .
We note that the approximation algorithm of Khachiyan for the MEE problem, as well as its modification in [TY07], can also be interpreted as instances of the Frank-Wolfe algorithm (see [TY07] for details).
Results for Pure Privacy
Our geometric approach to approximate privacy allows us to better understand the optimal error required for approximate privacy vs. that required for pure privacy. Our first result bounds the gap between the optimal error bounds for the two notions of privacy in the dense case. Then we extend these ideas and give a -differentially private algorithm which nearly matches the guarantees of Algorithm 3 for sparse databases.
In this subsection we investigate the worst-case gap between (for small enough ) and over all query matrices . At the core of our analysis is a natural geometric fact: for any symmetric polytope with vertices in -dimensional space we can find a subset of vertices of whose symmetric convex hull has volume radius at most a factor smaller than the volume radius of . Our proof of this fact goes through analyzing the contact points of with its minimum enclosing volume ellipsoid, and a bound on the volume of polytopes with few vertices.
Let and let be an ellipsoid of minimal volume containing . There exists a set of size such that the matrix satisfies .
For the proof of 15 we will use John’s theorem (Theorem 6) and the following elementary algebraic result.
Let be -dimensional unit vectors and let be positive reals such that
Then there exists a set of size such that the matrix satisfies .
Notice that for all . By taking traces of both sides of (17), we have .
Let and let be the diagonal matrix with on the diagonal. Then we can write . By the Binet-Cauchy formula for the determinant,
The inequality (18) follows since each term appears times in the expansion of and all other terms in the expansion are positive. Using the inequality , we have that , and this completes the proof. ∎
Proof of Theorem 15: We can write the minimum enclosing ellipsoid as where is a linear map with determinant . Since does not change volumes, is a minimal volume ellipsoid of . Also, for any , where , we have
Therefore, it is sufficient to show that for such that is the minimal volume ellipsoid of , there exists a set such that the matrix satisfies . Since, by convexity, the contact points of are a subset of , the statement follows from Theorem 6 and Lemma 12.
Combined with the following theorem of Bárány and Füredi [BF88], and also Gluskin [Glu07] (with sharper bounds), Theorem 15 implies the corollary that for any -dimensional symmetric polytope one can find a set of vertices whose symmetric convex hull captures a significant fraction of the volume of the polytope.
Let and let be an ellipsoid containing . Then .
For any there exists a set such that
Finally, we describe the application to differential privacy. By Corollary 1, . Also, by (11), . Finally, Lemma 3 implies that for small enough with respect to . Putting all this together and using Theorem12, we have the following theorem (Theorem 2).
For small enough and all small enough with respect to , for any real matrix we have
2 Sparse Case under Pure Privacy
We further extend our results from Section 3.3 and show an efficient -differentially private algorithm which, on input any query matrix and any database size bound , nearly matches . This proves our main Theorem 4. In fact, our result is stronger: we show an -differentially private mechanism whose error nearly matches for all small enough with respect to . Thus, the result of this subsection can be seen as a generalization of Theorem 17 to the sparse databases regime.
Our algorithm for sparse databases under pure privacy closely follows Algorithm 3: we add noise from a distribution that is tailored to but oblivious to the database ; then we use least squares estimation to reduce error on sparse databases. However, Gaussian noise does not preserve -differential privacy, and we need to use a different noise distribution. Intuitively, one expects that adding noise sampled from a near-optimal distribution [HT10, BDKT12] and then computing a least squares estimator would be nearly optimal, analogously to Algorithm 3. We are not currently able to analyze the error of this algorithm, but instead we analyze a variant of Algorithm 3 where the Gaussian distribution is simply substituted with the generalized -norm distribution from [HT10]. Intuitively, we are able to show that the generalized -norm distribution “approximates a Gaussian” well enough for our analysis to go through. A main tool in our analysis is a classical concentration of measure inequality from convex geometry.
We begin with a slight generalization of the main upper bound result of Hardt and Talwar [HT10]. This generalization follows directly from the methods used in [HT10] with only minor modifications in the proofs. We omit a full derivation here. Also, while the methods of Hardt and Talwar will lead to a proof conditional on the truth of the Hyperplane conjecture from convex geometry, using the ideas of Bhaskara et al. [BDKT12] the result can be made unconditional.
Let be an real matrix and let . There exists an efficiently computable and efficiently sampleable distribution such that the following claims hold:
the algorithm which on input outputs for satisfies -differential privacy;
Using the distribution , we define our near optimal sparse-case algorithm satisfying pure differential privacy as Algorithm 4.
Let be the output of Algorithm 4 on input a query matrix , database size bound and private input . is -differentially private and for all small enough and all small enough with respect to satisfies
Once again privacy follows by a straightforward argument from the privacy of the underlying noise-adding mechanism, in this case the generalized -norm mechanism.
satisfies -differential privacy.
Next we prove the main technical lemma we need in order to show near optimality. The analysis is very similar to that of Lemma 9. The main technical challenge is to show that the distribution has all the properties we needed from the Gaussian distribution: covariance with bounded operator norm and exponential concentration. We use ideas from Section 4.1 and the following variant of a classical concentration of measure inequality, due to Borell [Bor75] (proved in the appendix).
Let be a log-concave distribution over . Assume that is a symmetric convex subset of such that . Then, for every we have
We are now ready to prove the counterpart of Lemma 9 for .
As in Algorithm 3, we define . Equivalently is the radius of the smallest -dimensional ball which contains . In the proof of Lemma 9 we argued that for all small enough and all small enough with respect to .
Therefore, for any , we can derive the following bound:
Thus applying Cauchy Schwarz once again, we conclude
For any , the set is symmetric and convex for any bound . Then, by Chebyshev’s inequality, and Theorem 20 there exists a constant such that for any and
Using a union bound and taking expectations completes the proof. ∎
Proof of Theorem 19: The privacy guarantee follows from Lemmas 13. By Lemma 14 analogously to the proof of Theorem 14, we can conclude that
Pythagoras theorem then implies the result.
Universal bounds
The mechanism of Algorithm 5 is -differentially private and satisfies
The privacy of the mechanism is immediate from Lemma 4. To analyze the error, we use Lemma 1 and the fact that to bound
where are columns of . We have used the fact that the is attained at one of the vertices of . Since each is a Gaussian with variance , exceeds with probability at most . Taking a union bound, we conclude that this expectation of the maximum is . Recall that . It follows that
For getting pure -DP, we simply substitute the generalized -norm distribution guaranteed by Theorem 18 instead of the Gaussian noise.
We first observe an upper bound on the volume radius of projections of .
Let and let . Let be a rank orthogonal projection that maps to . Then
Now we show that Algorithm 6 achieves the bound claimed in Theorem 5.
The mechanism of Algorithm 6 is -differentially private and satisfies
Extensions
In this section we describe a couple of extensions to our results. We show how to translate our optimality guarantees for total squared error in the dense case regime to worst case error over queries using the minimax theorem. We further show that our nearly optimal efficient mechanism in the dense case regime implies a polylogarithmic approximation to hereditary discrepancy.
Let , be given. There is an -DP mechanism , and a non-negative diagonal matrix with such that for all ,
Thus the mechanism satisfies
Recall that the mechanism is of the form where is a noise vector whose distribution is independent of . Thus the pair can be computed without looking at the database . Therefore, the mechanism that samples a from and runs it on is itself -DP. The theorem follows. ∎
Using the above result, we can now construct a mechanism that has small expected worse case error.
Let , be given. There is an -DP mechanism , and a non-negative diagonal matrix with such that
The resulting mechanism is not necessarily -differentially private any more. However, since its outcome can be computed by postprocessing the outcome of -differentially private mechanisms, it is still -differentially private. Scaling and by a factor of , and substituting for , we get the result. ∎
2 Approximating Hereditary Discrepancy
Muthukrishnan and Nikolov [MN12] show that hereditary discrepancy gives a lower bound on the error of any mechanism.
There exist constants such that the following holds: Let be a matrix and be an -differentially private mechanism. Then
On the other hand, the mechanism of Theorem 24 satisfies
The description of this mechanism (which is specified by a distribution over Gaussian noise addition mechanisms) thus serves as an efficiently computable and verifiable witness to an upper bound on the hereditary discrepancy of .
We next give a constrcutive version of this result, by appealing to a result of Bansal [Ban10] that shows that the discrepancy of any set system can be constructively upper bounded by times the hereditary vector discrepancy. The vector discrepancy of a matrix , denoted is defined as the smallest such that the following semidefinite program is feasible:
The hereditary vector discrepancy is simply . We will show that given the mechanism of Theorem 24, we can construct for any a solution to the SDP corresponding to . We note that the above SDP is a slight relaxation of the SDP used by Bansal: instead of the constraint for all , we simply require each to be at most one, and that the trace of is . It is easy to verify that [Ban10] implies that:
Suppose that for a matrix and a , for every with , it is the case that the SDP 20 is feasible for . Them there is polynomial time algorithm that finds a coloring of with discrepancy at most .
We consider a variant of the above SDP, where we drop the constraint and require the trace of to be slightly larger:
We first show that it suffices to satisfy the SDP 21.
Suppose that for a matrix , for every with , it is the case that the SDP 21 is feasible for and . Then for any with , the SDP 20 is feasible for and .
We construct a feasible solution to 20 by repeatedly using solutions to 21 for different values of the restriction . Fix an with and let . Let be the zero matrix indexed by . Let be a solution to SDP 21 for . Let , and let . For each , set . Finally we set .
Given such that , we let be a feasible solution to the SDP for and set , and . For each , set . We update . We stop once falls below , say in iteration . It is easy to see that we delete at least one from in each step, so that this process converges.
We claim that is a feasible solution to SDP 20. Observe that is simply a a non-negative linear combination of ’s and hence is positive semidefinite. By definition of , each diagonal entry is at most , so that the definition of ensures that for all . Moreover, for each , the entry , and there are at least such , so that the trace of is at least as required. Finally, is simply the sum . Thus it suffices to show that . Note that . It follows that so that as needed. The claim follows. ∎
Finally, we describe how to use the mechanism of Theorem 24 to construct a feasible solution to SDP 21.
Conclusions
We have presented near optimal mechanisms for any linear query for dense and sparse databases, under both pure and approximate differential privacy. Our mechanisms are simple and efficient, and it would be instructive to implement them so as to compare them with existing techniques.
Acknowledgements
The first and second named authors would like to thank Moritz Hardt and Daniel Dadush for several helpful discussions. The first named author was partially supported by a Simons Graduate Fellowship, award number 252861.
References
Appendix A Concentration of Log concave measures
The following measure concentration inequality is standard. We include a proof below for completeness. We start by stating the Brunn-Minkowski inequality.
Let be a log-concave measure on , let be numbers such that , and let be measurable sets such that the set is measurable. Then
This can be used to prove Borell’s lemma for arbitrary log concave distributions. We use the proof approach presented in [Gia03].
Applying the Brunn-Minkowski inequality and rearranging proves the result.
Appendix B From Concentration to Expectation
We repeatedly use the fact that a exponential or better bound on the upper tail implies a bound on the expectation. For completeness, we give a quantitative version with a proof.
The claim follows by linearity of expectation. ∎
Appendix C Hardness of Approximating Hereditary Discrepancy
The proof of Theorem 29 is a straight-forward reduction from the 2-colorability problem for 3-uniform hypergraphs. A maximization version of this problem (i.e. maximize the number of bichromatic edges) is also known as Max-E3-Set Splitting and is equivalent to NotAllEqual-SAT restricted to inputs with no negated variables.
A hypergraph , where , is 2-colorable if and only if there exists a set such that for all , and . The set is called a transversal of .
There exists a family of 3-uniform hypergraphs such that deciding whether a hypergraph in the family is 2-colorable is -complete.
Proof of Theorem 29: The reduction simply maps a 3-uniform hypergraph to its incidence matrix. I.e. for a hypergraph , where and , we create a matrix , where if and otherwise. Observe that if is 2-colorable, and this is witnessed by a transversal , then for all and for defined by . On the other hand, if is not 2-colorable, for any we have , since otherwise the set would be a transversal.
We note that Guruswami proved constant hardness of approximation for Max-E3-Set Splitting [Gur00]. In particular, he showed that it is -hard to distinguish 2-colorable 3-uniform hypergraphs from hypergraphs for which any coloring with 2 colors leaves at least a fraction of the edges monochromatic.