The Power of Linear Reconstruction Attacks

Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Smith

Introduction

The goal of private data analysis is to provide global, statistical properties of a database of sensitive information while protecting the privacy of the individuals whose records the database contains. There is a vast body of work on this problem in statistics and computer science.

Until a few years ago, most schemes proposed in the literature lacked rigor: typically, the schemes had either no formal privacy guarantees or ensured security only against a specific suite of attacks. The seminal results of Dinur and Nissim and Dinur, Dwork and Nissim initiated a rigorous study of the tradeoff between privacy and utility. The notion of differential privacy (Dwork, McSherry, Nissim and Smith , Dwork ) that emerged from this line of work provides rigorous guarantees even in the presence of a malicious adversary with access to arbitrary side information. Differential privacy requires, roughly, that any single individual’s data have little effect on the outcome of the analysis. Recently, many techniques have been developed for designing differentially private algorithms. A typical objective is to release as accurate an approximation as possible to some high-dimensional function ff evaluated on the database DD.

A complementary line of work seeks to establish lower bounds on how much distortion is necessary for particular functions ff. Some of these bounds apply only to specific notions of privacy (e.g., lower bounds for differential privacy ). A second class of bounds rules out any reasonable notion of privacy by giving algorithms to reconstruct almost all of the data DD given sufficiently accurate approximations to f(D)f(D) . We refer to the latter results as reconstruction attacks.

We consider reconstruction attacks against attribute privacy: consider a curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the applied data privacy literature, partly since it arises with medical and retail data.

Asymptotically, we say that a mechanism is attribute nonprivate if there is an infinite sequence of nn for which M\mathcal{M} allows (o(1),o(1))(o(1),o(1))-reconstruction. Here d=d(n)d=d(n) is a function of nn. We say the attack A\mathcal{A} is efficient if it runs in time \mboxpoly(n,d)\mbox{poly}(n,d).

Instead of simply showing that a setting of UU exists, we will normally aim to show that reconstruction is possible with high probability when UU is chosen from one of a class of natural distributions.

Such attacks were first considered in the context of data privacy by Dinur and Nissim . They showed that any mechanism which answers (or allows the user to compute) Ω(nlogn)\Omega(n\log n) random inner product queries with {0,1}\{0,1\} vectors on a database s{0,1}n{\mathbf{s}}\in\{0,1\}^{n} with o(n)o(\sqrt{n}) noise per query is not private. That is, they assume that the mechanism releases y=As+e\mathbf{y}=A{\mathbf{s}}+\mathbf{e}, where AA is a random matrix in {0,1}Ω(nlogn)×n\{0,1\}^{\Omega(n\log n)\times n} and e\mathbf{e} is a noise vector with e=o(n)\|\mathbf{e}\|_{\infty}=o(\sqrt{n}). Their attack was subsequently extended to use a linear number of queries , allow a small fraction of answers to be arbitrarily distorted , and run significantly more quickly .

In their simplest form, such inner product queries require the adversary to be able to “name rows”, that is, specify a coefficient for each component of the vector s{\mathbf{s}}. Thus, the lower bound does not seem to apply to any functionality that is symmetric in the rows of the database (such as, for example, “counting queries”).​It was pointed out in that in databases with more than one entry per row, random inner product queries on the sensitive attribute vector s{\mathbf{s}} can be simulated via hashing: for example, the adversary could ask for the sum the function H(Ui)siH(U_{i})\cdot s_{i} over the whole database, where H:{0,1}d1{0,1}H:\{0,1\}^{d-1}\to\{0,1\} is an appropriate hash function. This is a symmetric statistic, but it is unlikely to come up in a typical statistical publication.

Recently, linear attacks were also considered based on range query releases (which, again, are natural, linear queries).

We greatly expand the applicability of linear attacks in “natural” settings. Specifically, we show one can mount linear reconstruction attacks based on any release that gives:

the fraction of records that satisfy a given non-degenerate boolean function (a boolean function over pp variables is non-degenerate if its multilinear representation has degree exactly pp). Such releases include contingency tables as well as more complex outputs like the error rate of certain classifiers such as decision trees; or

the MM-estimator associated with a differentiable loss function. MM-estimators are a broad class of estimators which are obtained by minimizing sums of functions of the records (they are also called empirical risk minimization estimators). MM-estimators include the standard estimators for linear and logistic regression (both these estimators are associated with differentiable loss functions). See Section 4 for definitions.

Our contributions are two-fold. First, we show how these types of releases can be transformed into a (noisy) linear release problem, making them amenable to linear reconstruction attacks. This is already perhaps surprising, since many of the above statistics (like MM-estimators) are obtained by solving highly nonlinear formulations. After performing this transformation, we can apply polynomial-time methods (like least squares or LP decoding) on this linear release problem to estimate the sensitive data.

Second, we show how to analyze these attacks under various distributional assumptions on the data. This gives lower bounds on the noise needed to release these statistics attribute privately. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of (constant) size kk (out of a total of dd) nonsensitive boolean attributes. For a subset J[d]J\subseteq[d] of size kk, let UJU|_{J} denote the submatrix of UU consisting of the columns in JJ.

The (dk)\binom{d}{k} entries for a statistic are obtained by evaluating the same statistic on (UJs)(U|_{J}|{\mathbf{s}}) for all sets JJ of size kk. Specifically:

Consider a mechanism which releases, for every set JJ of size kk, a particular MM-estimator evaluated over (UJs)(U|_{J}|{\mathbf{s}}). We show that if the mechanism adds o(1/(λn))o(1/(\lambda\sqrt{n})) noise per entry and if d=Ω(n)d=\Omega(n), then it is attribute non-private. Here, λ\lambda is the Lipschitz constant of the loss function gradient. The loss function also needs to satisfy a mild variance condition. For the case of linear and logistic regression estimators, λ=Θ(1)\lambda=\Theta(1) for bounded data, and so the noise bound is o(1/n)o(1/\sqrt{n}).

The statements above are based on the least squares attack. For most settings, we show the LP decoding attack can also handle a constant fraction of entries with arbitrarily high noise (the exception is the setting of general MM-estimators).

Casting the releases as a system of linear equations requires two simple insights which we hope will be broadly useful. First, we note that when s=(s1,,sn){\mathbf{s}}=(s_{1},\ldots,s_{n}) is boolean, then any release which allows us to derive an equation which involves a sum over database records can in fact be made linear in s{\mathbf{s}}. Specifically, suppose we know that igi(si)=t\sum_{i}g_{i}(s_{i})=t, where tt is a real number and gig_{i} is an arbitrary real-valued function that could depend on the index ii, the public record UiU_{i}, and any released information. We can rewrite gi(si)g_{i}(s_{i}) as gi(0)+si(gi(1)gi(0))g_{i}(0)+s_{i}(g_{i}(1)-g_{i}(0)); the constraint igi(si)=t\sum_{i}g_{i}(s_{i})=t can then be written as isi(gi(1)gi(0))=tigi(0)\sum_{i}s_{i}\cdot(g_{i}(1)-g_{i}(0))=t-\sum_{i}g_{i}(0), which is affine in s{\mathbf{s}}. This allows us to derive linear constraints from a variety of not-obviously-linear releases; for example, it allows us to get linear attacks based on the error rate of a given binary classifier (see Section 3).

The techniques just mentioned give rise to a variety of linear reconstruction attacks, depending on the type of released statistics. We can provide theoretical guarantees on the performance of these attacks in some settings, for example when the same statistic is released about many subsets of the data (e.g., all sets of a given size kk) and when the data records themselves are drawn i.i.d. from some underlying distribution. The main technique here is to analyze the geometry of the constraint matrix AA that arises in the attack. For the case of non-degenerate boolean functions, we do so by relating the constraint matrix to a row-wise product of a matrix with i.i.d. entries (referred to as a random row product matrix, see Section 3.2.1), which was recently analyzed by Rudelson (see also ). The results of showed that the least singular value of a random row product matrix is asymptotically the same as that of a matrix of same dimensions with i.i.d. entries, and a random row product matrix is a Euclidean section. Our results show that a much broader class of matrices with correlated rows satisfy these properties.

In Section 2, we introduce some notation and review the least squares and LP decoding techniques for solving noisy linear systems. In Section 3, we present our results on evaluating non-degenerate boolean functions. As mentioned earlier, we first reduce the release problem to a linear reconstruction problem (Section 3.1), and then the attacks works by using either least squares or LP decoding techniques. The analysis requires analyzing spectral and geometric properties of the constraint matrix that arises in these attack which we do in Section 3.2. In Section 4, we present our results on releasing MM-estimators associated with differentiable loss functions. For clarity, we first discuss the attacks for the special cases of linear and logistic regression estimators (in Section 4.1), and then discuss the attacks for the general case (in Section 4.2).

Preliminaries

Let MM be an N×nN\times n real matrix with NnN\geq n. The singular values σj(M)\sigma_{j}(M) are the eigenvalues of MM\sqrt{M^{\top}M} arranged in non-increasing order. Of particular importance in this paper is the least singular value σn(M)=infz:z=1Mz\sigma_{n}(M)=\inf_{\mathbf{z}:\|\mathbf{z}\|=1}\|M\mathbf{z}\|. The unit sphere in nn dimensions centered at origin is denoted by Sn1={z:z=1}S^{n-1}=\{\mathbf{z}\,:\,\|\mathbf{z}\|=1\}.

1 Background on Noisy Linear Systems.

Noisy linear systems arise in a wide variety of statistical and signal-processing contexts. Suppose we are given a matrix AA and vector z\mathbf{z} such that z=As+e\mathbf{z}=A{\mathbf{s}}+\mathbf{e}, where e\mathbf{e} is assumed to be “small” (in a sense defined below). A natural approach to estimating s{\mathbf{s}} is to output s^=\mboxargminsAszp\hat{\mathbf{s}}=\mbox{argmin}_{{\mathbf{s}}}\,\|A{\mathbf{s}}-\mathbf{z}\|_{p} for some p1p\geq 1. We will consider p=1p=1 and 22; we summarize the assumptions and guarantees for each method below. When it is known that s{0,1}n{\mathbf{s}}\in\{0,1\}^{n}, the attacker can then round the entries of s^\hat{\mathbf{s}} to the nearer of {0,1}\{0,1\} to improve the estimate.

Widely used in regression problems, the least squares method guarantees a good approximation to s{\mathbf{s}} when the Euclidean norm e\|\mathbf{e}\| is small and AA has no small eigenvalues. It was first used in data privacy by Dwork and Yekhanin . For completeness, we present the entire analysis of this attack (in a general setting) here.

Let A=PΣQA=P\Sigma Q^{\top} be the singular value decomposition of AA. Here, PP is an orthogonal m×mm\times m matrix, Σ\Sigma is a diagonal m×nm\times n matrix, and QQ is an orthogonal n×nn\times n matrix. Let 0n×(mn)\mathbf{0}_{n\times(m-n)} be an n×(mn)n\times(m-n) matrix with all entries zero. Define

In the context of privacy, the “LP decoding” approach was first used by Dwork et al. . (The name stems from the fact that the minimization problem can be cast as a linear program.) The LP attack is slower than the least squares attack but can handle considerably more complex error patterns at the cost of a stronger assumption on AA. Recently, De gave a simple analysis of this attack based on the geometry of the operator AA. We need the following definition of Euclidean section.

Note that by Cauchy-Schwarz, the first inequality, mAsAs1\sqrt{m}\|A{\mathbf{s}}\|\geq\|A{\mathbf{s}}\|_{1}, always holds. We remark that when we say AA is Euclidean, we simply mean that there is some constant α>0\alpha>0 such that AA is α\alpha-Euclidean.

The following theorem gives a sufficient condition under which LP decoding gives a good approximation to s{\mathbf{s}}. The time bound here was derived from the LP algorithm of Vaidya , which uses O(((N1+N2)N22+(N1+N2)1.5N2)N3)O(((N_{1}+N_{2})N_{2}^{2}+(N_{1}+N_{2})^{1.5}N_{2})N_{3}) arithmetic operations where N1N_{1} is the number of constraints, N2N_{2} is the number of variables, and N3N_{3} is a bound on the number of bits used to describe the entries of AA. In our setting, the LP has nn variables, mm constraints, and N3N_{3} could be upper bounded by mnmn, and therefore the LP could be solved in O(m2n3+m2.5n2)O(m^{2}n^{3}+m^{2.5}n^{2}) time.

Releasing Evaluations of Non-Degenerate Boolean Functions

In this section, we analyze privacy lower bounds for releasing evaluations of non-degenerate boolean functions. We use the following standard definition of representing polynomial (see Appendix A for a background about representing boolean functions as multilinear polynomials).

A polynomial P(f)P^{(f)} over the reals represents a function ff over {0,1}k+1\{0,1\}^{k+1} if f(x1,,xk+1)=P(f)(x1,,xk+1)f(x_{1},\ldots,x_{k+1})=P^{(f)}(x_{1},\ldots,x_{k+1}) for all (x1,,xk+1){0,1}k+1(x_{1},\ldots,x_{k+1})\in\{0,1\}^{k+1}.

A function f:{0,1}k+1{0,1}f:\{0,1\}^{k+1}\to\{0,1\} is non-degenerate iff it can be written as a multilinear polynomial of degree k+1k+1.

If ff is non-degenerate, then it depends on all of its k+1k+1 variables. Note that non-degenerate functions constitute a large class of functions.​A simple counting argument shows that among the 22k+12^{2^{k+1}} boolean functions over k+1k+1 variables, 22k+1(2k+12k)2^{2^{k+1}}-\binom{2^{k+1}}{2^{k}} are non-degenerate. For example, it includes widely used boolean functions like AND, OR, XOR, MAJORITY, and depth k+1k+1 decision trees .

Let f:{0,1}k+1{0,1}f:\{0,1\}^{k+1}\to\{0,1\} represent the function that we want to evaluate on a database DD. Let D=(Us)({0,1})n×(d+1)D=(U|{\mathbf{s}})\in(\{0,1\})_{n\times(d+1)}, where U({0,1})n×dU\in(\{0,1\})_{n\times d} and s{0,1}n{\mathbf{s}}\in\{0,1\}^{n}. Let U=(δi,j)U=(\delta_{i,j}), i.e., δi,j\delta_{i,j} denotes the (i,j)(i,j)th entry in UU (with 1in1\leq i\leq n and 1jd1\leq j\leq d). Let

Note that JJ allows repeated entries.​We allow repeated entries for convenience of notation. Our results also hold if we use the more natural J[d]J\subseteq[d], J=k|J|=k. Let DJD|_{J} be the submatrix of DD restricted to columns indexed by JJ. For a fixed JJ, define F(DJ)F(D|_{J}) as

Note that F(DJ)F(D|_{J}) is an integer between to nn. Let Σf(D)\Sigma_{f}(D) be the vector obtained by computing FF on all different DJD|_{J}’s: Σf(D)=(F(DJ))\mboxwhereJ{1,,d}k\Sigma_{f}(D)=(F(D|_{J}))\mbox{ where }J\in\{1,\ldots,d\}^{k}. Note that Σf(D)\Sigma_{f}(D) is a vector of length dkd^{k}. The goal is to understand how much noise is needed to attribute privately release Σf(D)\Sigma_{f}(D) (or Σf(D)/n\Sigma_{f}(D)/n) when ff is non-degenerate.

Let f:{0,1}k+1{0,1}f:\{0,1\}^{k+1}\to\{0,1\} be a non-degenerate boolean function. Then

any mechanism which for every database D({0,1})n×(d+1)D\in(\{0,1\})_{n\times(d+1)} with ndkn\ll d^{k} releases Σf(D)\Sigma_{f}(D) by adding o(n)o(\sqrt{n}) (or releases Σf(D)/n\Sigma_{f}(D)/n by adding o(1/n)o(1/\sqrt{n})) noise to each entry is attribute non-private. The attack that achieves this non-privacy violation runs in O(dkn2)O(d^{k}n^{2}) time.

there exists a constant γ>0\gamma>0 such that any mechanism which for every database D({0,1})n×(d+1)D\in(\{0,1\})_{n\times(d+1)} with ndkn\ll d^{k} releases Σf(D)\Sigma_{f}(D) by adding o(n)o(\sqrt{n}) (or releases Σf(D)/n\Sigma_{f}(D)/n by adding o(1/n)o(1/\sqrt{n})) noise to at most 1γ1-\gamma fraction of the entries is attribute non-private. The attack that achieves this non-privacy violation runs in O(d2kn3+d2.5kn2)O(d^{2k}n^{3}+d^{2.5k}n^{2}) time.

For convenience of notation, in this section, we work mostly with the transpose of UU. Let T=UT=U^{\top}. So TT is a d×nd\times n matrix.

1 Reducing to a Linear Reconstruction Problem.

In this section, we reduce the problem of releasing Σf(D)\Sigma_{f}(D) for a database DD into a linear reconstruction problem. First, we define a simple decomposition of boolean functions. Consider a non-degenerate boolean function f:{0,1}k+1{0,1}f\,:\{0,1\}^{k+1}\to\{0,1\}. Now there exists two function f0:{0,1}k{0,1}f_{0}\,:\,\{0,1\}^{k}\to\{0,1\} and f1:{0,1}k{0,1}f_{1}\,:\,\{0,1\}^{k}\to\{0,1\} such that

Define f2(δ1,,δk)=f1(δ1,,δk)f0(δ1,,δk)f_{2}(\delta_{1},\ldots,\delta_{k})=f_{1}(\delta_{1},\ldots,\delta_{k})-f_{0}(\delta_{1},\ldots,\delta_{k}). Therefore,

Note that f2f_{2} is a function from {0,1}k{1,0,1}\{0,1\}^{k}\to\{-1,0,1\}. Since both f0f_{0} and f1f_{1} are both boolean functions and can be represented as multilinear polynomials over the variables δ1,,δk\delta_{1},\ldots,\delta_{k}, therefore f2f_{2} also could be represented as a multilinear polynomial over the variables δ1,,δk\delta_{1},\ldots,\delta_{k}. Since ff is represented by a multilinear polynomial of degree k+1k+1, therefore, the multilinear polynomial representing f2f_{2} has degree kk (if it has any lower degree, then ff could be represented as multilinear polynomial of degree strictly less than k+1k+1, which is a contradiction). To aid our construction, we need to define a particular function of matrices.

Let hh be a function from {0,1}k{1,0,1}\{0,1\}^{k}\to\{-1,0,1\}. Let T(1)=(δi,j(1)),T(2)=(δi,j(2)),,T(k)=(δi,j(k))T_{(1)}=(\delta^{(1)}_{i,j}),T_{(2)}=(\delta^{(2)}_{i,j}),\ldots,T_{(k)}=(\delta^{(k)}_{i,j}) be kk matrices with {0,1}\{0,1\} entries and dimensions d×nd\times n. Define a row function matrix (of dimension dk×nd^{k}\times n) Πh(T(1),,T(k))\Pi_{h}(T_{(1)},\ldots,T_{(k)}) as follows. Any row of this matrix will correspond to a sequence

of kk numbers, so the entries of Πh(T(1),,T(k))\Pi_{h}(T_{(1)},\ldots,T_{(k)}) will be denotedThe definition assumes a certain order of the rows of the matrix Πh(T(1),,T(k))\Pi_{h}(T_{(1)},\ldots,T_{(k)}). This order, however, is not important, for our analysis. Note that changing the relative positions of rows of a matrix doesn’t affect its eigenvalues and singular values. by πJ,a\pi_{J,a}, where a{1,,n}a\in\{1,\ldots,n\}. For J=(j1,j2,,jk)J=(j_{1},j_{2},\ldots,j_{k}) the entries of the matrix Πh(T(1),,T(k))\Pi_{h}(T_{(1)},\ldots,T_{(k)}) will be defined by the relation

The row product matrices from (see Definition 6) is a particular example of this construction where the function h(δ1,δ2,,δk)=δ1δ2δk,h(\delta_{1},\delta_{2},\ldots,\delta_{k})=\delta_{1}\cdot\delta_{2}\cdot\ldots\cdot\delta_{k},, which implies that Πh(T(1),,T(k))=T(1)T(k)\Pi_{h}(T_{(1)},\ldots,T_{(k)})=T_{(1)}\odot\dots\odot T_{(k)}, where \odot is the row-product operator from Definition 6.

Let D=(Us)D=(U|{\mathbf{s}}) be a database, and let T=UT=U^{\top}. Let U=(δi,j)U=(\delta_{i,j}). Consider any fixed J=(j1,,jk){1,,d}kJ=(j_{1},\ldots,j_{k})\in\{1,\ldots,d\}^{k}. Now for this JJ, there exists an entry in Σf(D)\Sigma_{f}(D) equaling i=1nf(δi,j1,,δi,jk,si)\sum_{i=1}^{n}f(\delta_{i,j_{1}},\ldots,\delta_{i,j_{k}},s_{i}). Now consider the matrices Πf2(T,,T)\Pi_{f_{2}}(T,\ldots,T) and Πf0(T,,T)\Pi_{f_{0}}(T,\ldots,T). Consider the rows in Πf0(T,,T)\Pi_{f_{0}}(T,\ldots,T) and Πf2(T,,T)\Pi_{f_{2}}(T,\ldots,T) corresponding to this above JJ. Let this be the llth row in these matrices. Then the llth row of the matrix Πf0(T,,T)\Pi_{f_{0}}(T,\ldots,T) has nn entries equaling f0(δi,j1,,δi,jk)f_{0}(\delta_{i,j_{1}},\ldots,\delta_{i,j_{k}}) and the llth row of the matrix Πf2(T,,T)\Pi_{f_{2}}(T,\ldots,T) has nn entries equaling f2(δi,j1,,δi,jk)f_{2}(\delta_{i,j_{1}},\ldots,\delta_{i,j_{k}}) for i=1,,ni=1,\ldots,n. Since

where Πf0(T,,T)l\Pi_{f_{0}}(T,\ldots,T)_{l} and Πf2(T,,T)l\Pi_{f_{2}}(T,\ldots,T)_{l} denote the llth row of matrices Πf0(T,,T)\Pi_{f_{0}}(T,\ldots,T) and Πf2(T,,T)\Pi_{f_{2}}(T,\ldots,T) respectively and 1n\mathbf{1}_{n} denotes the vector (1)n(1)^{n}. Now define a vector Hf(D)H_{f}(D) whose llth element (1ldk1\leq l\leq d^{k}) is

The length of vector Hf(D)H_{f}(D) is dkd^{k}. The above arguments show that all the entries of Hf(D)H_{f}(D) are contained in the vector Σf(D)\Sigma_{f}(D). Since every row in these Σf(D)\Sigma_{f}(D) correspond to some JJ, it also follows that all the entries of Σf(D)\Sigma_{f}(D) are contained in the vector Hf(D)H_{f}(D), implying the following claim.

Σf(D)=Hf(D)\Sigma_{f}(D)=H_{f}(D).​Under proper ordering of both the vectors.

The privacy mechanism releases a noisy approximation to Σf(D)\Sigma_{f}(D). Let y=(y1,,ydk)\mathbf{y}=(y_{1},\ldots,y_{d^{k}}) be this noisy approximation. The adversary tries to reconstruct an approximation of s{\mathbf{s}} from y\mathbf{y}. Let bfi=Πf0(T,,T)i,1nb_{f_{i}}=\langle\Pi_{f_{0}}(T,\ldots,T)_{i},\mathbf{1}_{n}\rangle, and bf=(bf1,,bfdk)\mathbf{b}_{f}=(b_{f_{1}},\ldots,b_{f_{d^{k}}}). Given y\mathbf{y}, the adversary solves the following linear reconstruction problem:

In the setting of attribute non-privacy the adversary knows TT, and therefore can compute Πf2(T,,T)\Pi_{f_{2}}(T,\ldots,T) and Πf0(T,,T)\Pi_{f_{0}}(T,\ldots,T) (hence, bf\mathbf{b}_{f}). The goal of the adversary is to compute a large fraction of s{\mathbf{s}} given y\mathbf{y}. The definition of iterated logarithm (log(q))(\log_{(q)}) is given in Definition 8. In the below analysis, we use a random matrix T\mathbf{T} and the least singular value lower bound on a random row function matrix from Theorem 3.9. We use boldface letters to denote random matrices.

Let f:{0,1}k+1{0,1}f:\{0,1\}^{k+1}\to\{0,1\} be a non-degenerate boolean function and ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d for a constant cc depending only on k,qk,q. Any privacy mechanism which for every database D({0,1})n×(d+1)D\in(\{0,1\})_{n\times(d+1)} releases Σf(D)\Sigma_{f}(D) by adding o(n)o(\sqrt{n}) (or releases Σf(D)/n\Sigma_{f}(D)/n by adding o(1/n)o(1/\sqrt{n})) noise to each entry is attribute non-private. The attack that achieves this non-privacy violation runs in O(dkn2)O(d^{k}n^{2}) time.

Consider the least squares attack outlined in Theorem 2.1 on Equation (3). Let T\mathbf{T} be a random matrix of dimension d×nd\times n with independent Bernoulli entries taking values and 11 with probability 1/21/2, and let database D=(Ts)D=(\mathbf{T}^{\top}|{\mathbf{s}}) for some s{0,1}n{\mathbf{s}}\in\{0,1\}^{n}. For analyzing the attack in Theorem 2.1, we need a lower bound on the least singular value of Πf2(T,,T)\Pi_{f_{2}}(\mathbf{T},\ldots,\mathbf{T}). The following claim follows from Theorem 3.9. Note that f2f_{2} is a function over kk variables.

For function f2f_{2} defined above and ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d (where cc is the constant from Theorem 3.8), the matrix Πf2(T,,T)\Pi_{f_{2}}(\mathbf{T},\ldots,\mathbf{T}) satisfies

Apply Theorem 3.9 with function h=f2h=f_{2}. ∎

Claim 2 shows that with exponentially high probability σn(Πf2(T,,T))=Ω(dk)\sigma_{n}(\Pi_{f_{2}}(\mathbf{T},\ldots,\mathbf{T}))=\Omega(\sqrt{d^{k}}). Invoking Theorem 3.9 with m=dkm=d^{k} and β=o(n)\beta=o(\sqrt{n}) shows that with exponentially high probability the adversary fails to recover only o(n)o(n) entries of s{\mathbf{s}}. Noise bound of o(n)o(\sqrt{n}) for releasing Σf(D)\Sigma_{f}(D) translates into a noise bound of o(1/n)o(1/\sqrt{n}) for releasing Σf(D)/n\Sigma_{f}(D)/n. ∎

The LP decoding attack solves a slightly different reconstruction problem than Equation (3). The reason is because Πf2(T,,T)\Pi_{f_{2}}(\mathbf{T},\ldots,\mathbf{T}) is not a Euclidean sectionIf we take a matrix T\mathbf{T} of dimension d×nd\times n with independent Bernoulli entries taking values and 11 with probability 1/21/2, the resulting matrix Πf2(T,,T)\Pi_{f_{2}}(\mathbf{T},\ldots,\mathbf{T}) is not a Euclidean section. This is because the matrix T\mathbf{T} is non-centered (expectation of each entry in the matrix is 1/21/2) which makes the Πf2(T,,T)\left\|\Pi_{f_{2}}(\mathbf{T},\ldots,\mathbf{T})\right\| to be dk\approx d^{k} instead of dk\sqrt{d^{k}}. (a property needed for applying Theorem 2.2). However, we show that a related reconstruction problem has all the properties needed for the LP decoding attack. The analysis goes via matrices with {1,1}\{1,-1\} entries which have the desired properties. We establish the Euclidean section property in Appendix B.

Let D=(Us)D=(U|{\mathbf{s}}) be a database, and let T=UT=U^{\top}. Let V=2T1d×nV=2T-\mathbf{1}_{d\times n} where 1d×n\mathbf{1}_{d\times n} is a d×nd\times n matrix of all 11’s. Define g:{1,1}k+1{1,1}g\,:\,\{-1,1\}^{k+1}\to\{-1,1\} as

where g1:{1,1}k{1,1}g_{1}:\{-1,1\}^{k}\to\{-1,1\} and g1:{1,1}k{1,1}g_{-1}:\{-1,1\}^{k}\to\{-1,1\}. Using the notation, g1=g1(ϕ1,,ϕk)g_{1}=g_{1}(\phi_{1},\ldots,\phi_{k}) and g1=g1(ϕ1,,ϕk)g_{-1}=g_{-1}(\phi_{1},\ldots,\phi_{k}), we get

Define g2=g2(ϕ1,,ϕk)=(g1g1)/2g_{2}=g_{2}(\phi_{1},\ldots,\phi_{k})=(g_{1}-g_{-1})/2 and g3=g3(ϕ1,,ϕk)=(g1+g1)/2g_{3}=g_{3}(\phi_{1},\ldots,\phi_{k})=(g_{1}+g_{-1})/2. Let us denote δi=(1+ϕi)/2\delta_{i}=(1+\phi_{i})/2. Using the decomposition of ff from Equation (1),

Using the notation, f0=f0(δ1,,δk)f_{0}=f_{0}(\delta_{1},\ldots,\delta_{k}) and f2=f2(δ1,,δk)f_{2}=f_{2}(\delta_{1},\ldots,\delta_{k}), and substituting the decomposition of ff and gg into Equation (4) gives:

Let D=(Us)D=(U|{\mathbf{s}}), T=UT=U^{\top}, and V=2T1d×nV=2T-\mathbf{1}_{d\times n}. Then

Note that from the earlier established decomposition

The reminder of the proof follows by using Equation (5) along with the definition of row-function matrices (Definition 5). ∎

Then Σf(D)i=qgi+Πg2(V,,V)i,s\Sigma_{f}(D)_{i}=q_{g_{i}}+\langle\Pi_{g_{2}}(V,\ldots,V)_{i},{\mathbf{s}}\rangle. Let qg=(qg1,,qgdk)\mathbf{q}_{g}=(q_{g_{1}},\ldots,q_{g_{d^{k}}}). We get

Let y\mathbf{y} be the noisy approximation to Σf(D)\Sigma_{f}(D) released by the privacy mechanism. Given y\mathbf{y}, the linear program that the adversary solves is:

The following theorem analyzes this attack using a random matrix V\mathbf{V}.

Let f:{0,1}k+1{0,1}f:\{0,1\}^{k+1}\to\{0,1\} be a non-degenerate boolean function and let ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d for a constant cc depending only on k,qk,q. Then there exists a constant γ=γ(k,q)>0\gamma=\gamma(k,q)>0 such that any mechanism which for every database D({0,1})n×(d+1)D\in(\{0,1\})_{n\times(d+1)} releases Σf(D)\Sigma_{f}(D) by adding o(n)o(\sqrt{n}) (or releases Σf(D)/n\Sigma_{f}(D)/n by adding o(1/n)o(1/\sqrt{n})) noise to at most 1γ1-\gamma fraction of the entries is attribute non-private. The attack that achieves this non-privacy violation runs in O(d2kn3+d2.5kn2)O(d^{2k}n^{3}+d^{2.5k}n^{2}) time.

The proof uses the LP decoding attack outlined in Theorem 2.2 on Equation (7). Let T\mathbf{T} be a random matrix of dimension d×nd\times n with independent Bernoulli entries taking values and 11 with probability 1/21/2, and let database D=(Ts)D=(\mathbf{T}^{\top}|{\mathbf{s}}) for some s{0,1}n{\mathbf{s}}\in\{0,1\}^{n} . Let V=2T1d×n\mathbf{V}=2\mathbf{T}-\mathbf{1}_{d\times n}. To use Theorem 2.2, we need to (i) establish that Πg2(V,,V)\Pi_{g_{2}}(\mathbf{V},\ldots,\mathbf{V}) is a Euclidean section and (ii) establish a lower bound on its least singular value. Since g2:{1,1}k{1,0,1}g_{2}\,:\,\{-1,1\}^{k}\to\{-1,0,1\} has a representation as a multilinear polynomial of degree kk, Theorem B.4 shows that Πg2(V,,V)\Pi_{g_{2}}(\mathbf{V},\ldots,\mathbf{V}) is with exponentially high probability a Euclidean section. Repeating an analysis similar to Theorem 3.9 shows that the least singular value of Πg2(V,,V)\Pi_{g_{2}}(\mathbf{V},\ldots,\mathbf{V}) is with exponentially high probability at least Ω(dk)\Omega(\sqrt{d^{k}}). Hence, with exponentially high probability both of the following statements hold simultaneously: (i) Πg2(V,,V)\Pi_{g_{2}}(\mathbf{V},\ldots,\mathbf{V}) is a Euclidean section and (ii) σn(Πg2(V,,V))=Ω(dk)\sigma_{n}(\Pi_{g_{2}}(\mathbf{V},\ldots,\mathbf{V}))=\Omega(\sqrt{d^{k}}).

Invoking Theorem 2.2 with β=o(n)\beta=o(\sqrt{n}), a=na=\sqrt{n}, b=dkb=d^{k}, and σ=Ω(dk)\sigma=\Omega(\sqrt{d^{k}}) shows that with exponentially high probability the adversary fails to recover only o(n)o(n) entries of s{\mathbf{s}}. This shows that the mechanism is attribute non-private.

In the running time analysis of Theorem 2.2, mm gets replaced by dkd^{k}, and N3N_{3} by dknd^{k}n (as the input matrix can be represented using O(dkn)O(d^{k}n) bits). Noise bound of o(n)o(\sqrt{n}) for releasing Σf(D)\Sigma_{f}(D) translates into a noise bound of o(1/n)o(1/\sqrt{n}) for releasing Σf(D)/n\Sigma_{f}(D)/n. ∎

2 Spectral and Geometric Properties of Random Row Function Matrices.

Analysis of our reconstruction attacks rely on spectral and geometric properties of random row function matrices that we discuss in this section. Rudelson and Kasiviswanathan et al. analyzed certain spectral and geometric properties of a certain class of correlated matrices that they referred to as row product matrices (or conjunction matrices). Our analysis builds upon these results. We first summarize some important definitions and useful results from in Section 3.2.1, and establish our least singular value bound in Section 3.2.2. The Euclidean section property is established in Appendix B.

For two matrices with the same number of columns we define the row product as a matrix whose rows consist of entry-wise product of the rows of original matrices.

Rudelson showed that if we take entry-wise product of kk independent random matrices of dimension d×nd\times n, then the largest and the least singular values of the resulting row product matrix (which of dimension dk×nd^{k}\times n) is asymptotically the same order as that of a dk×nd^{k}\times n matrix with i.i.d. entries. To formulate this result formally, we introduce a class of uniformly bounded random variables, whose variances are uniformly bounded below.

We would also need the notion of iterated logarithm (log(q)\log_{(q)}) that is defined as:

Let k,q,n,dk,q,n,d be natural numbers. Assume that ncdk/log(q)d.n\leq cd^{k}/\log_{(q)}d. Let T(1),,T(k)\mathbf{T}_{(1)},\ldots,\mathbf{T}_{(k)} be kk matrices with independent τ\tau-random entries and dimensions d×nd\times n. Then the kk-times entry-wise product T(1)T(2)T(k)\mathbf{T}_{(1)}\odot\mathbf{T}_{(2)}\odot\dots\odot\mathbf{T}_{(k)} is a dk×nd^{k}\times n matrix satisfying

The constants c,C1,c1,C2c,C_{1},c_{1},C_{2} depends only on kk and qq.

One of the main ingredients in proving the above theorem is following fact about the norm of row product matrices.

Let T\mathbf{T} be a matrix with independent τ\tau-random entries and dimension d×nd\times n. Then the kk-times entry-wise product TT\mathbf{T}\odot\dots\odot\mathbf{T} is a dk×nd^{k}\times n matrix satisfying

The constants c3,c4c_{3},c_{4} depends only on kk.

The bound on the norm appearing in the above theorem (asymptotically) matches that for a dk×nd^{k}\times n matrix with independent τ\tau-random entries (refer for more details).

2.2 Least Singular Value of Random Row Function Matrices.

We start by proving a simple proposition about functions that can be represented as multilinear polynomials. The main step behind the following proposition proof is the following simple fact about multilinear polynomials. Let P(h)P^{(h)} be a multilinear polynomial representing function hh, and let (δ1,,δk){0,1}k(\delta_{1},\ldots,\delta_{k})\in\{0,1\}^{k} and δi{0,1}\delta_{i}^{\prime}\in\{0,1\} then

Let hh be a function from {0,1}k{1,0,1}\{0,1\}^{k}\to\{-1,0,1\} having a representation as a multilinear polynomial of degree kk. Let P(h)P^{(h)} denote this multilinear polynomial. Let (δ1,,δk),(δ1,,δk){0,1}k(\delta_{1},\ldots,\delta_{k}),(\delta_{1}^{\prime},\ldots,\delta_{k}^{\prime})\in\{0,1\}^{k}. For I{1,,k}I\subseteq\{1,\ldots,k\} let δ(I){0,1}k\delta(I)\in\{0,1\}^{k} be the point with coordinates δj(I)=δj if jI;\mboxandδj(I)=δj if jI\delta_{j}(I)=\delta_{j}^{\prime}\text{ if }j\in I;\mbox{ and }\delta_{j}(I)=\delta_{j}\text{ if }j\notin I. Then

where 1/ch(k)1/c_{h}(k) is the coefficient of the monomial corresponding to all kk variables in the multilinear representation of hh.

By definition, we know that for all (δ1,,δk){0,1}k(\delta_{1},\ldots,\delta_{k})\in\{0,1\}^{k}, h(δ1,,δk)=P(h)(δ1,,δk)h(\delta_{1},\ldots,\delta_{k})=P^{(h)}(\delta_{1},\ldots,\delta_{k}). Since P(h)P^{(h)} is a linear function of δ1\delta_{1},

where δ1P(h)\frac{\partial\,}{\partial\delta_{1}}P^{(h)} denotes the partial derivative of P(h)(δ1,δ2,,δk)P^{(h)}(\delta_{1},\delta_{2},\ldots,\delta_{k}) with respect to δ1\delta_{1}. Repeating this for the other coordinates, we get

The last term in the right hand side is the coefficient of the polynomial P(h)(δ1,,δk)P^{(h)}(\delta_{1},\ldots,\delta_{k}) corresponding to the monomial δ1δk\delta_{1}\cdot\ldots\cdot\delta_{k}, and we denote it by 1/ch(k)1/c_{h}(k). ∎

Let T(1),,,T(k),T(1),,T(k)T_{(1)},\ldots,,T_{(k)},T^{\prime}_{(1)},\ldots,T^{\prime}_{(k)} be 2k2k matrices with {0,1}\{0,1\} entries and dimensions d×nd\times n. For a set I[k]I\subseteq[k] denote T(j)(I)=T(j) if jI\mboxandT(j)(I)=T(j) if jIT_{(j)}(I)=T_{(j)}\text{ if }j\in I\mbox{ and }T_{(j)}(I)=T^{\prime}_{(j)}\text{ if }j\notin I. Then the following holds,

Follows by a simple extension of Proposition 3.6 and using Definitions 5 and 6. ∎

Let k,q,n,dk,q,n,d be natural numbers. Assume that ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d. Consider a d×nd\times n matrix T\mathbf{T} with independent Bernoulli entries taking values 0 and 1 with probability 1/21/2. Let hh be a function from {0,1}k{1,0,1}\{0,1\}^{k}\to\{-1,0,1\} having a representation as a multilinear polynomial of degree kk. Then the matrix Πh(T,,T)\Pi_{h}(\mathbf{T},\ldots,\mathbf{T}) satisfies

The constants c,C,c1,c2c,C^{\prime},c_{1},c_{2} depend only on kk and qq.

Let d=2kd+ld=2kd^{\prime}+l, where 0l<2k0\leq l<2k. For j=1,,kj=1,\ldots,k denote by Tj1\mathbf{T}_{j}^{1} the submatrix of T\mathbf{T} consisting of rows (2d(j1)+1),,(2d(j1)+d)(2d^{\prime}(j-1)+1),\ldots,(2d^{\prime}(j-1)+d^{\prime}), and by Tj0\mathbf{T}_{j}^{0} the submatrix consisting or rows (2d(j1)+d+1),,2jd(2d^{\prime}(j-1)+d^{\prime}+1),\ldots,2jd^{\prime}. For a set I[k]I\subseteq[k] denote

The last inequality follows since I\forall I Πh(T(1)(I),,T(k)(I))\Pi_{h}(\mathbf{T}_{(1)}(I),\ldots,\mathbf{T}_{(k)}(I)) is a submatrix of Πh(T,,T)\Pi_{h}(\mathbf{T},\ldots,\mathbf{T}), which implies that

The entries of the matrices (T11T10),,(Tk1Tk0)(\mathbf{T}_{1}^{1}-\mathbf{T}_{1}^{0}),\ldots,(\mathbf{T}_{k}^{1}-\mathbf{T}_{k}^{0}) are independent τ\tau-random variables. Thus, Theorem 3.4 (note that d=O(d))d^{\prime}=O(d)) yields

which along with Equation (8) proves the theorem. Note that ch(k)c_{h}(k) can be bounded as a function of kk alone. ∎

Combining Theorem 3.8 with Cauchy-Schwarz’s inequality, we obtain a lower bound on the least singular value of Πh(T,,T)\Pi_{h}(\mathbf{T},\ldots,\mathbf{T}). It is well known that the least singular value of a dk×nd^{k}\times n matrix with independent τ\tau-random entries is dk\approx d^{k}. Therefore, we get that in spite of all the correlations that exist in Πh(T,,T)\Pi_{h}(\mathbf{T},\ldots,\mathbf{T}) its least singular value is asymptotically the same order as that of an i.i.d. matrix.

The right-hand side probability could be bounded using Theorem 3.8 and the left-hand probability is exactly Pr[σn(Πh(T,,T))Cdk]\Pr\left[\sigma_{n}(\Pi_{h}(\mathbf{T},\ldots,\mathbf{T}))\leq C^{\prime}\sqrt{d^{k}}\right]. ∎

Releasing M𝑀M-estimators

Let D=(Us)D=(U|{\mathbf{s}}) be a database dimension n×d+1n\times d+1, and let UU be a real-valued matrix of dimension n×dn\times d and s{0,1}n{\mathbf{s}}\in\{0,1\}^{n} be a (secret) vector. Let D=(δi,j)D=(\delta_{i,j}). Consider the submatrix DJD|_{J} of DD, where J{1,,d}kJ\in\{1,\ldots,d\}^{k}. Let θ^J\hat{\theta}_{J} be the MM-estimator for DJD|_{J} defined as

where U(i)jU_{(i)_{j}} is the jjth row in U(i)U_{(i)}. Then MM-estimators θ^i\hat{\theta}_{i} for i[d/k]i\in[d/k] is obtained by solving

This gives a set of constraints over s{\mathbf{s}} which the adversary could use to construct s^\hat{{\mathbf{s}}}. For the case of linear and logistic regression, Equation (10) reduces to a form U(i)sr=0U_{(i)}^{\top}{\mathbf{s}}-\mathbf{r}=0, where r\mathbf{r} is a vector independent of s{\mathbf{s}}. For general loss function, we would use the fact that s{\mathbf{s}} is binary and use a decomposition similar to Equation (1). The other issue is that the adversary gets only a noisy approximation of θ^1,,θ^d/k\hat{\theta}_{1},\ldots,\hat{\theta}_{d/k} and we overcome this problem by using the Lipschitz properties of the gradient of the loss function.

In the next subsection, we focus on the standard MLE’s for linear and logistic regression. In Section 4.2, we consider general MM-estimators. Here, we would require an additional variance condition on the loss function. We would use this following standard definition of Lipschitz continuous gradient.

Remark: Also, for any twice differentiable function GG, λI2G(x)\lambda I\succeq\partial^{2}\,G(\mathbf{x}) for all x\mathbf{x} (where 2G(x)\partial^{2}\,G(\mathbf{x}) denotes the Hessian matrix) .

1 Releasing Linear and Logistic Regression Estimators.

In this section, we establish distortion lower bounds for attribute privately releasing linear and logistic regression estimators.

Logistic regression models estimate probabilities of events as functions of independent variables. Let sis_{i} be binary variable representing the value on the dependent variable for iith input, and the values of kk independent variables for this same case be represented as xi,jx_{i,j} (j=1,,kj=1,\ldots,k). Let nn denote the total sample size, and let ζi\zeta_{i} denote the probability of success (Pr[si=1](\Pr[s_{i}=1]). The design matrix of independent variables, X=(xi,j)X=(x_{i,j}), is composed of nn rows and kk columns.

The logistic regression model equates the logit transform, the log-odds of the probability of a success, to the linear component:

where the notation \mboxvert(,,)\mbox{vert}(\cdot,\ldots,\cdot) denotes vertical concatenation of the argument matrices/vectors. Our analysis will require a bound on the Lipschitz constant of the gradient of the log-likelihood function, which we bound using the following claim.

Let U=(U(1),,U(d/k))U=(U_{(1)}|,\ldots,|U_{(d/k)}), where each U(i)U_{(i)} is an n×kn\times k matrix (assume dd is a multiple of kk). Let θ^i\hat{\theta}_{i} be the solution to the MLE equation

This reduction is very similar to the linear regression case. As before let U=(U(1),,U(d/k))U=(U_{(1)}|,\ldots,|U_{(d/k)}). Let θ^i\hat{\theta}_{i} be the solution to the MLE equation

In the following analysis, we use a random matrix for UU where each entry of the random matrix is an independent τ\tau-random variable (Definition 7).

Let us first look at the least squares attack. Let e=(e1,,ed)\mathbf{e}=(e_{1},\ldots,e_{d}) (i.e., e\mathbf{e} is the error vector). We have U(i)U(i)n\mathbf{U}_{(i)}^{\top}\mathbf{U}_{(i)}\leq n for all i[d]i\in[d] (as U\mathbf{U} is a τ\tau-random matrix, all entries in the matrix are at most 11, and U(i)\mathbf{U}_{(i)} is the iith column in U\mathbf{U}). The least squares attack produces an estimate s^\hat{{\mathbf{s}}} by solving:

The remaining analysis is similar to Theorem 2.1, except that again each error term eie_{i} is scaled up by a factor (at most) nn. Since U\mathbf{U} is a τ\tau-random matrix, it is well-known that if d2nd\geq 2n, then the least singular value of U\mathbf{U} is with exponentially high probability Ω(d)\Omega(\sqrt{d}) (see, e.g., ). Therefore, if a privacy mechanism adds o(n)/no(\sqrt{n})/nThe nn in the denominator is because of the scaling of the noise e\mathbf{e}. noise to each θ^i\hat{\theta}_{i}, then the least squares attack with exponentially high probability recovers 1o(1)1-o(1) fraction of the entries in s{\mathbf{s}}. The time for executing this attack is O(dn2)O(dn^{2}) as the attack requires computing the SVD of a d×nd\times n matrix.

The LP decoding attack produces an estimate s^\hat{{\mathbf{s}}} by solving:

where Λ\mboxlog=U(i)U(i)n\Lambda_{\mbox{log}}=\mathbf{U}_{(i)}^{\top}\mathbf{U}_{(i)}\leq n (using Claim 4). Therefore,

The least squares attack produces an estimate s^\hat{{\mathbf{s}}} by solving:

The remaining analysis follows as in the linear regression case (from Theorem 2.1) and again the errors are scaled by a factor (at most) nn. Therefore, if a privacy mechanism adds o(n)/n=o(1/n)o(\sqrt{n})/n=o(1/\sqrt{n}) noise to each θ^i\hat{\theta}_{i}, then the least squares attack with exponentially high probability recovers 1o(1)1-o(1) fraction of the entries in s{\mathbf{s}}. The time for executing this attack is O(dn2)O(dn^{2}).

The LP decoding attack produces an estimate s^\hat{{\mathbf{s}}} by solving:

Again like in the linear regression case (using Theorem 2.2), we get that any mechanism that adds o(n)/n=o(1/n)o(\sqrt{n})/n=o(1/\sqrt{n}) noise to at most 1γ1-\gamma fraction of the θ^i\hat{\theta}_{i}’s is attribute non-private. The time for executing this attack is O(d2n3+d2.5n2)O(d^{2}n^{3}+d^{2.5}n^{2}). ∎

Compared to Theorems 3.2 and 3.3, this above theorem requires a much larger dd (about O(n)O(n)). However, it is possible to reduce to dependence on dnO(1/k)d\approx n^{O(1/k)} if the released statistic is a degree kk polynomial kernel of these regression functions. We defer the details to the full version.

2 Releasing General M𝑀M-estimators.

In this section, we establish distortion lower bounds for attribute privately releasing MM-estimators associated with differentiable loss functions.

Now the MM-estimator (θ^i\hat{\theta}_{i}) between U(i)U_{(i)} and s{\mathbf{s}} can be found by setting L(θ;(U(i),s))=0\partial\,\mathcal{L}(\theta;(U_{(i)},{\mathbf{s}}))=0. Therefore, θ^i\hat{\theta}_{i} is the solution to (ignoring the scaling multiplier 1/n1/n)

Let D=(Us)D=(\mathbf{U}|{\mathbf{s}}), where U\mathbf{U} is a τ\tau-random matrix of dimension n×dn\times d .

Let A1,B2,A2,B2\mathbf{A}_{1},\mathbf{B}_{2},\mathbf{A}_{2},\mathbf{B}_{2} be four matrices of dimension d×nd\times n defined as follows:

The adversary solves the following reconstruction problem to compute s^\hat{{\mathbf{s}}}:

B2\mathbf{\mathbf{B}_{2}}: Least Singular Value. Since U\mathbf{U} is a τ\tau-random matrix, B2\mathbf{B}_{2} is another random matrix. However, B2\mathbf{B}_{2} may not be centered (i.e., its entries might have non-zero means). We can re-express B2\mathbf{B}_{2} as

Pr[σn(B2)c19d]exp(c20d)\Pr[\sigma_{n}(\mathbf{B}_{2})\leq c_{19}\sqrt{d}]\leq\exp(-c_{20}d).

B2=R+uv\mathbf{B}_{2}=\mathbf{R}+\mathbf{u}\mathbf{v}^{\top}, where R\mathbf{R} is a τ\tau^{\prime}-random matrix. The rank of uv\mathbf{u}\mathbf{v}^{\top} is 11, and its operator norm can be polynomially bounded in dd. Applying Lemma D.2 implies the result. ∎

Least Squares Attack. The least squares attack produces an estimate s^\hat{{\mathbf{s}}} by solving:

The analysis is similar to Theorem 2.1, except that each error term eie_{i} is scaled up by a factor (at most) λn\lambda n. Therefore, if a privacy mechanism adds o(1/(nλ))o(1/(\sqrt{n}\lambda)) noise to each θ^i\hat{\theta}_{i}, then the least squares attack with exponentially high probability recovers 1o(1)1-o(1) fraction of the entries in s{\mathbf{s}}. The time for executing this attack is O(dn2)O(dn^{2}).

References

Appendix A Preliminaries about Boolean Functions

We start with an alternate definition of non-degeneracy and show that it is equivalent to Definition 4.

A boolean function f:{0,1}k+1{0,1}f:\{0,1\}^{k+1}\to\{0,1\} is called non-degenerate if

Consider the function g:{1,1}k+1{1,1}g:\{-1,1\}^{k+1}\to\{-1,1\} defined by

Let ϕ=(ϕ1,,ϕk+1)\phi=(\phi_{1},\ldots,\phi_{k+1}). For S{1,,k+1}S\subseteq\{1,\ldots,{k+1}\} let χS(ϕ)=jSϕj\chi_{S}(\phi)=\prod_{j\in S}\phi_{j} be the corresponding Walsh function. Then the function gg can be decomposed as

Note that deg(f)=deg(g)\text{deg}(f)=\text{deg}(g), and so deg(f)=k+1\text{deg}(f)=k+1 iff g^(1,,k+1)0\widehat{g}(1,\ldots,k+1)\neq 0. We have

where δj=(1ϕj)/2\delta_{j}=(1-\phi_{j})/2, the lemma follows. ∎

Remark: There are 10 non-degenerate functions of two boolean variables:

The remaining 66 boolean functions of two variables are degenerate.

Appendix B Euclidean Section Property of Random Row Function Matrices

In this section, we establish the Euclidean section property needed for Theorem 3.3 (LP decoding attack). Let us consider a function h:{1,1}k{1,0,1}h\,:\,\{-1,1\}^{k}\to\{-1,0,1\} having a representation as a multilinear polynomial of degree kk. Let P(h)P^{(h)} denote this multilinear polynomial. We first prove a proposition analogous to Proposition 3.6 for hh. For I[k]I\subseteq[k], let us define

That is PI(h)(ϕ1,,ϕk)P^{(h)}_{I}(\phi_{1},\ldots,\phi_{k}) is the multilinear polynomial Ph(ϕ1,,ϕk)P_{h}(\phi_{1},\ldots,\phi_{k}) restricted to variables only in II. Under this notation

Let hh be a function from {1,1}k{1,0,1}\{-1,1\}^{k}\to\{-1,0,1\} having a representation as a multilinear polynomial of degree kk. Let P(h)P^{(h)} be this multilinear polynomial. Let (ϕ1,,ϕk){1,1}k(\phi_{1},\ldots,\phi_{k})\in\{-1,1\}^{k}. Then

where 1/ch(k)1/c_{h}(k) is the coefficient of the monomial corresponding to all kk variables in P(h)P^{(h)}.

The proof is similar to Proposition 3.6. Since P(h)P^{(h)} is a linear function in ϕ1\phi_{1},

where ϕ1P(h)\frac{\partial\,}{\partial\phi_{1}}P^{(h)} is the partial derivative of P(h)(ϕ1,,ϕk)P^{(h)}(\phi_{1},\ldots,\phi_{k}) with respect to ϕ1\phi_{1}. Repeating this for other coordinates, we get

so the above equation could be re-expressed as

The last term in the right hand side is the coefficient of the polynomial P(h)(ϕ1,,ϕk)P^{(h)}(\phi_{1},\ldots,\phi_{k}) corresponding to the monomial ϕ1ϕk\phi_{1}\cdot\ldots\cdot\phi_{k}, and we denote it by 1/ch(k)1/c_{h}(k). ∎

Let V(1)=(ϕi,j(1)),,V(k)=(ϕi,j(k))V_{(1)}=(\phi_{i,j}^{(1)}),\ldots,V_{(k)}=(\phi_{i,j}^{(k)}) be kk matrices with {1,1}\{-1,1\} entries and dimensions d×nd\times n. Let us define a dk×nd^{k}\times n matrix ΠPI(h)(V(1),,V(k))\Pi_{P^{(h)}_{I}}(V_{(1)},\ldots,V_{(k)}) as in Definition 5 using the multilinear polynomial PI(h)P^{(h)}_{I}. More concretely, for J=(j1,j2,,jk){1,,d}kJ=(j_{1},j_{2},\ldots,j_{k})\in\{1,\ldots,d\}^{k} the (J,a)(J,a) entry of the matrix ΠPI(h)(V(1),,V(k))\Pi_{P^{(h)}_{I}}(V_{(1)},\ldots,V_{(k)}) will be defined by the relation

Using this definition, the following corollary follows immediately from Proposition B.1.

Let V(1),,,V(k)V_{(1)},\ldots,,V_{(k)} be kk matrices with {1,1}\{-1,1\} entries and dimensions d×nd\times n. Then

Let k,n,dk,n,d be natural numbers. Consider a d×nd\times n matrix V\mathbf{V} with independent Bernoulli entries taking values 1-1 and 11 with probability 1/21/2. Let hh be a function from {1,1}k{1,0,1}\{-1,1\}^{k}\to\{-1,0,1\} having a representation as a multilinear polynomial of degree kk. Then the matrix Πh(V,,V)\Pi_{h}(\mathbf{V},\ldots,\mathbf{V}) satisfies

The constants c6,c7,c8c_{6},c_{7},c_{8} depend only on kk.

To prove the theorem, we use induction over the size of II. Our inductive claim is that for every I[k]I\subseteq[k],

where constants c6,c7,c8c_{6},c_{7},c_{8} depend only on I|I|.

Since V\mathbf{V} is a random {1,1}\{-1,1\} matrix, it is well known that (see e.g., ) there exists constant c10,c11c_{10},c_{11} such that

Therefore, there exists constants c6,c7,c8c_{6},c_{7},c_{8} such that

Step 2. Let I=k|I|=k, i.e., I={1,,k}=[k]I=\{1,\ldots,k\}=[k]. We want to bound ΠPI(h)(V,,V)\left\|\Pi_{P^{(h)}_{I}}(\mathbf{V},\ldots,\mathbf{V})\right\|. By inductive hypothesis, we assume that for every L[k]L\subset[k],

where constants c6,c7,c8c_{6},c_{7},c_{8} depend only on L|L|. From Theorem 3.5, we know that the kk-times entry wise product VV\mathbf{V}\odot\dots\odot\mathbf{V} satisfies the following norm condition (as V\mathbf{V} is matrix with independent τ\tau-random entries)

where again the constants c3,c4c_{3},c_{4} depend only on kk. From Equations (18) and (19), we get that there exists constants c6,c7,c8c_{6},c_{7},c_{8} (depending only on kk) such that

The last inequality comes by applying triangle inequality and then using from Equation (20). Note that for I=[k]I=[k], ΠPI(h)(V,,V)=Πh(V,,V)\left\|\Pi_{P^{(h)}_{I}}(\mathbf{V},\ldots,\mathbf{V})\right\|=\left\|\Pi_{h}(\mathbf{V},\ldots,\mathbf{V})\right\|. This completes the proof of the theorem. ∎

Let k,q,n,dk,q,n,d be natural numbers. Assume that ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d. Consider a d×nd\times n matrix V\mathbf{V} with independent Bernoulli entries taking values 1-1 and 11 with probability 1/21/2. Let hh be a function {1,1}k{1,0,1}\{-1,1\}^{k}\to\{-1,0,1\} having a representation as a multilinear polynomial of degree kk. Then the matrix Πh(V,,V)\Pi_{h}(\mathbf{V},\ldots,\mathbf{V}) is with exponentially high probability a Euclidean section.

Note that the proof idea of Theorem 3.8 also works for hh. That is if ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d then

If ncdk/log(q)dn\leq cd^{k}/\log_{(q)}d, then there exists a constant c13c_{13} (depending only on kk) such that

Therefore, with probability at least 1c1exp(c2d)c7exp(c8(n112k))1-c_{1}\exp\left(-c_{2}d\right)-c_{7}\exp\left(-c_{8}\left(n^{\frac{1}{12k}}\right)\right), there exists a α\alpha (depending only on kk and qq) such that

This shows that the matrix Πh(V,,V)\Pi_{h}(\mathbf{V},\ldots,\mathbf{V}) is with exponentially high probability a Euclidean section. ∎

Appendix C MLE’s for Linear and Logistic Regression

Consider the linear regression problem s=Xθ+ε{\mathbf{s}}=X\theta+\varepsilon. The likelihood function for linear regression (under the assumption that the entries in s{\mathbf{s}} are normally distributed) is:

where XiX_{i} is the iith row in XX. The log-likelihood is:

For the logistic regression problem, the likelihood function (under the assumption that the entries in s{\mathbf{s}} are binomially distributed) is:

Appendix D Least Singular Value of Perturbed Random Matrices

In this section, we bound the least singular value of a random matrix perturbed by a low rank matrix. We need the following fact.

The constants c14,c15c_{14},c_{15}, and c16c_{16} are all independent of nn and dd.

For z=0\mathbf{z}=0 this follows from Proposition 2.5 and the standard estimate of the norm of a subgaussian matrix (see e.g., Proposition 2.3 ). The proof for a general z\mathbf{z} follows the same lines.

The lemma below gives an estimate of the smallest singular value of a perturbed random matrix.

Let R\mathbf{R} be a d×nd\times n random matrix with independent centered subgaussian entries with variances uniformly bounded below and dc14nd\geq c_{14}n. Let DD be a deterministic d×nd\times n matrix such that rank(D)=K, Dda\text{rank}(D)=K,\ \left\|D\right\|\leq d^{a}, where a>0a>0 is a constant. If

The constants c17,c18,c19c_{17},c_{18},c_{19}, and c20c_{20} are all independent of nn and dd.

for a>1/2a>1/2 and Nc22K|\mathcal{N}|\leq c_{22}^{K} for a1/2a\leq 1/2. Assume that there exists xSn1\mathbf{x}\in S^{n-1} such that (R+D)xc15d/2\left\|(\mathbf{R}+D)\mathbf{x}\right\|\leq c_{15}\sqrt{d}/2. Choose zN\mathbf{z}\in\mathcal{N} so that Dxz<ϵ\left\|D\mathbf{x}-\mathbf{z}\right\|<\epsilon. Then

The last inequality follows from the assumption on KK. Here, c21c_{21} and c22c_{22} are constants independent of nn and dd. ∎