Private Learning and Sanitization: Pure vs. Approximate Differential Privacy

Amos Beimel, Kobbi Nissim, Uri Stemmer

Introduction

Learning is often applied to collections of sensitive data of individuals and it is important to protect the privacy of these individuals. We examine the sample complexity of private learning and a related task – sanitization – while preserving differential privacy . We show striking differences between the required sample complexity for these tasks under ϵ\epsilon-differential privacy (also called pure differential privacy) and its variant (ϵ,δ)(\epsilon,\delta)-differential privacy (also called approximate differential privacy).

Differential privacy protects the privacy of individuals by requiring that the information of an individual does not significantly affect the output. More formally, an algorithm AA satisfies the requirement of Pure Differential Privacy if for every two databases that differ on exactly one entry, and for every event defined over the output set of AA, the probability of this event is close up to a multiplicative factor of eϵ1+ϵe^{\epsilon}\approx 1+\epsilon whether AA is applied on one database or on the other. Approximate Differential Privacy is a relaxation of pure differential privacy where the above guarantee needs to be satisfied only for events whose probability is at least δ\approx\delta. We show that even negligible δ>0\delta>0 can have a significant effect on sample complexity of private learning and sanitization.

Private learning was introduced in as a combination of Valiant’s PAC learning model and differential privacy. For now, we can think of a private learner as a differentially private algorithm that operates on a set of classified random examples, and outputs a hypothesis that misclassifies fresh examples with probability at most (say) 110\frac{1}{10}. The work on private learning has mainly focused on pure privacy. On the one hand, Blum et al. and Kasiviswanathan et al. have showed, via generic constructions, that every finite concept class CC can be learned privately, using sample complexity proportional to poly(logC)\mathop{\rm{poly}}\nolimits(\log|C|) (often efficiently). On the other hand, a significant difference was shown between the sample complexity of traditional (non-private) learners (crystallized in terms of VC(C)\operatorname{\rm VC}(C) and smaller than logC\log|C| in many interesting cases) and private learners. As an example, let POINTd\operatorname{\tt POINT}_{d} be the class of point functions over the domain {0,1}d\{0,1\}^{d} (these are the functions that evaluate to one on exactly one point of the domain and to zero elsewhere). Consider the task of properly learning POINTd\operatorname{\tt POINT}_{d} where, after consulting its sample, the learner outputs a hypothesis that is by itself in POINTd\operatorname{\tt POINT}_{d}. Non-privately, learning POINTd\operatorname{\tt POINT}_{d} requires merely a constant number of examples (as VC(POINTd)=1\operatorname{\rm VC}(\operatorname{\tt POINT}_{d})=1). Privately, Ω(d)\Omega(d) examples are required . Curiously, the picture changes when the private learner is allowed to output a hypothesis not in POINTd\operatorname{\tt POINT}_{d} (such learners are called improper), as the sample complexity can be reduced to O(1)O(1) . This, however, comes with a price, as it was shown in that such learners must return hypotheses that evaluate to one on exponentially many points in {0,1}d\{0,1\}^{d} and, hence, are very far from all functions in POINTd\operatorname{\tt POINT}_{d}.

A complete characterization for the sample complexity of pure-private learners was recently given in , in terms of a new dimension – the Representation Dimension, that is, given a class CC, the number of samples needed and sufficient for privately learning CC is Θ(RepDim(C))\Theta(\operatorname{\rm RepDim}(C)). Following that, Feldman and Xiao showed an equivalence between the representation dimension of a concept CC and the randomized one-way communication complexity of the evaluation problem for concepts from CC. Using this equivalence they separated the sample complexity of pure-private learners from that of non-private ones. For example, they showed a lower bound of Ω(d)\Omega(d) on the sample complexity of every pure-private (proper or improper) learner for the class THRESHd\operatorname*{\tt THRESH}_{d} of threshold functions over the interval [0,2d1][0,2^{d}-1]. This is a strong separation from the non-private sample complexity, which is O(1)O(1) (as the VC dimension of this class is constant).

We show that the sample complexity of proper learning with approximate differential privacy can be significantly lower than that satisfying pure differential privacy. Our starting point for this work is an observation that with approximate (ϵ,δ)(\epsilon,\delta)-differential privacy, sample complexity of O(log(1/δ))O(\log(1/\delta)) suffices for learning points properly. This gives a separation between pure and approximate proper private learning for δ=2o(d)\delta=2^{-o(d)}.

The notion of differentially private sanitization was introduced in the work of Blum et al. . A sanitizer for a class of predicates CC is a differentially private mechanism translating an input database SS to an output database S^\hat{S} such that S^\hat{S} (approximately) agrees with SS on the fraction of the entries satisfying φ\varphi for all φC\varphi\in C, where every predicate φC\varphi\in C is a function from XX to {0,1}\{0,1\}. Blum et al. gave a generic construction of pure differentially private sanitizers exhibiting sample complexity O(VC(C)logX)O(\operatorname{\rm VC}(C)\log|X|). Lower bounds partially supporting this sample complexity were given by . As with private learning, we show significant differences between the sample complexity required for sanitization of simple predicate classes under pure and approximate differential privacy. We note that the construction of sanitizers is not generally computationally feasible .

1 Our Contributions

To simplify the exposition, we omit in this section dependency on all variables except for dd, corresponding to the representation length of domain elements.

A recent instantiation of the Propose-Test-Release (PTR) framework by Smith and Thakurta results, almost immediately, with a proper learner for points, exhibiting O(1)O(1) sample complexity while preserving approximate differential privacy. This simple technique does not suffice for our other constructions of learners and sanitizers, and we, hence, introduce new tools for coping with proper private learning of thresholds and axis-aligned rectangles, and sanitization for point functions and thresholds:

Choosing mechanism: Given a low-sensitivity quality function, one can use the exponential mechanism to choose an approximately maximizing solution. This requires, in general, a database of size logarithmic in the number of possible solutions. We identify a sub family of low-sensitivity functions, called bounded-growth functions, for which it is possible to significantly reduce the necessary database size when using the exponential mechanism.

Recursive algorithm for quasi-concave promise problems: We define a family of optimization problems, which we call Quasi-Concave Promise Problems. The possible solutions are ordered, and quasi-concavity means that if two solutions fhf\leq h have quality of at least X\mathcal{X}, then any solution fghf\leq g\leq h also has quality of at least X\mathcal{X}. The optimization goal is, when there exists a solution with a promised quality of (at least) rr, to find a solution with quality r\approx r. We observe that a quasi-concave promise problem can be privately approximated using a solution to a smaller instance of a quasi-concave promise problem. This allows us to construct an efficient recursive algorithm solving such problems privately. We show that the task of learning THRESHd\operatorname*{\tt THRESH}_{d} is, in fact, a quasi-concave promise problem, and it can be privately solved using our algorithm with sample size roughly 2O(logd)2^{O(\log^{*}d)}. Sanitization for THRESHd\operatorname*{\tt THRESH}_{d} does not exactly fit the model of quasi-concave promise problems but can still be solved by iteratively defining and solving a small number of quasi-concave promise problems.

We give new private proper-learning algorithms for the classes POINTd\operatorname{\tt POINT}_{d} and THRESHd\operatorname*{\tt THRESH}_{d}. We also construct a new private proper-learner for (a discrete version of) the class of all axis-aligned rectangles over nn dimensions. Our algorithms exhibit sample complexity that is significantly lower than bounds given in prior work, separating pure and approximate private learning. Similarly, we construct sanitizers for POINTd\operatorname{\tt POINT}_{d} and THRESHd\operatorname*{\tt THRESH}_{d}, again with sample complexity that is significantly lower than bounds given in prior work, separating sanitization in the pure and approximate privacy cases. Our algorithms are time-efficient.

Gupta et al. have given reductions in both directions between agnostic learning of a concept class CC, and the sanitization task for the same class CC. The learners and sanitizers they consider are limited to access their data via statistical queries (such algorithm can be easily transformed to satisfy differential privacy ). In Section 5 we show a similar reduction from the task of privately learning a concept class CC to the sanitization task of CC, where the sanitizer’s access to the database is unrestricted. This allows us to exploit lower bounds on the sample complexity of private learners and show an explicit class of predicates CC over a domain XX for which every private sanitizer requires databases of size Ω(VC(C)logX)\Omega(\operatorname{\rm VC}(C)\log|X|). A similar lower bound was shown by Hardt and Rothblum , achieving tighter results in terms of the approximation parameter. Their work proves the existence of such a concept class, but does not give an explicit one.

In Section 6 we examine private learning under a relaxation of differential privacy called label privacy (see and references therein), where the learner is required to only protect the privacy of the labels in the sample. Chaudhuri et al. have proved lower bounds for label-private learners in terms of the doubling dimension of the target concept class. We show that the VC dimension completely characterizes the sample complexity of such learners, that is, the sample complexity of learning with label privacy is equal (up to constants) to learning without privacy.

2 Open Questions

This work raises two kinds of research directions. First, this work presents (time and sample efficient) private learners and sanitizers for relatively simple concept classes. It would be natural to try and construct private learners and sanitizers for more complex concept classes. In particular, constructing a (time and sample efficient) private learner for hyperplanes would be very interesting to the community.

Another very interesting research direction is to try and understand the sample complexity of approximate-private learners. Currently, no lower bounds are known on the sample complexity of such learners. On the other hand, no generic construction for such learners is known to improve the sample complexity achieved by the generic construction of Kasiviswanathan et al. for pure-private learners. Characterizing the sample complexity of approximate-private learners is a very interesting open question.

3 Other Related Work

Most related to our work is the work on private learning and its sample complexity and the early work on sanitization mentioned above. Another related work is the work of De , who proved a separation between pure ϵ\epsilon-differential privacy and approximate (ϵ,δ)(\epsilon,\delta)-differential privacy. Specifically, he demonstrated that there exists a query where it is sufficient to add noise O(nlog(1/δ))O(\sqrt{n\log(1/\delta)}) when δ>0\delta>0 and Ω(n)\Omega(n) noise is required when δ=0\delta=0. Earlier work by Hardt and Talwar separated pure from approximate differential privacy for δ=nO(1)\delta=n^{-O(1)}.

Another interesting gap between pure and approximate differential privacy is the following. Blum et al. have given a generic construction of pure-private sanitizers, in which the sample complexity grows as 1α3\frac{1}{\alpha^{3}} (where α\alpha is the approximation parameter). Following that, Hardt and Rothblum showed that with approximate-privacy, the sample complexity can be reduce to grow as 1α2\frac{1}{\alpha^{2}}. Currently, it is unknown whether this gap is essential.

Preliminaries

We use Oγ(f(t))O_{\gamma}(f(t)) as a shorthand for O(h(γ)f(t))O(h(\gamma)\cdot f(t)) for some non-negative function hh. In informal discussions, we sometimes use O~(f(t))\widetilde{O}(f(t)) instead of O(f(t)polylog(f(t)))O(f(t)\cdot{\rm polylog}(f(t))). For example, 2log(d)log(d)=O~(2log(d))2^{\log^{*}(d)}\cdot\log^{*}(d)=\widetilde{O}\left(2^{\log^{*}(d)}\right).

We use XX to denote an arbitrary domain, and XdX_{d} for the domain {0,1}d\{0,1\}^{d}. We use XmX^{m} (and respectively XdmX_{d}^{m}) for the cartesian mthm^{\text{t}h} power of XX, i.e., Xm=(X)mX^{m}=(X)^{m}, and use X=m=0XmX^{*}=\bigcup_{m=0}^{\infty}{X^{m}}.

Given a distribution D\mathcal{D} over a domain XX, we denote D(j)PrxD[x=j]\mathcal{D}(j)\triangleq\Pr_{x\sim\mathcal{D}}[x=j] for jXj\in X, and D(J)PrxD[xJ]\mathcal{D}(J)\triangleq\Pr_{x\sim\mathcal{D}}[x\in J] for JXJ\subseteq X.

1 Differential Privacy

Differential privacy aims at protecting information of individuals. We consider a database, where each entry contains information pertaining to an individual. An algorithm operating on databases is said to preserve differential privacy if a change of a single record of the database does not significantly change the output distribution of the algorithm. Intuitively, this means that whatever is learned about an individual could also be learned with her data arbitrarily modified (or without her data). Formally:

Databases S1XmS_{1}\in X^{m} and S2XmS_{2}\in X^{m} over a domain XX are called neighboring if they differ in exactly one entry.

A randomized algorithm AA is (ϵ,δ)(\epsilon,\delta)-differentially private if for all neighboring databases S1,S2XmS_{1},S_{2}\in X^{m}, and for all sets F\mathcal{F} of outputs,

The probability is taken over the random coins of AA. When δ=0\delta=0 we omit it and say that AA preserves ϵ\epsilon-differential privacy.

We use the term pure differential privacy when δ=0\delta=0 and the term approximate differential privacy when δ>0\delta>0, in which case δ\delta is typically a negligible function of the database size mm.

We will later present algorithms that access their input database using (several) differentially private mechanisms. We will use the following composition theorems.

If A1A_{1} and A2A_{2} satisfy (ϵ1,δ1)(\epsilon_{1},\delta_{1}) and (ϵ2,δ2)(\epsilon_{2},\delta_{2}) differential privacy, respectively, then their concatenation A(S)=A1(S),A2(S)A(S)=\langle A_{1}(S),A_{2}(S)\rangle satisfies (ϵ1+ϵ2,δ1+δ2)(\epsilon_{1}+\epsilon_{2},\delta_{1}+\delta_{2})-differential privacy.

Moreover, a similar theorem holds for the adaptive case, where a mechanism interacts with kk adaptively chosen differentially private mechanisms.

A mechanism that permits kk adaptive interactions with mechanisms that preserves (ϵ,δ)(\epsilon,\delta)-differential privacy (and does not access the database otherwise) ensures (kϵ,kδ)(k\epsilon,k\delta)-differential privacy.

Note that the privacy guaranties of the above bound deteriorates linearly with the number of interactions. By bounding the expected privacy loss in each interaction (as opposed to worst-case), Dwork et al. showed the following stronger composition theorem, where privacy deteriorates (roughly) as kϵ+kϵ2\sqrt{k}\epsilon+k\epsilon^{2} (rather than kϵk\epsilon).

Let 0<ϵ,δ10<\epsilon,\delta^{\prime}\leq 1, and let δ\delta\in. A mechanism that permits kk adaptive interactions with mechanisms that preserves (ϵ,δ)(\epsilon,\delta)-differential privacy (and does not access the database otherwise) ensures (ϵ,kδ+δ)(\epsilon^{\prime},k\delta+\delta^{\prime})-differential privacy, for ϵ=2kln(1/δ)ϵ+2kϵ2\epsilon^{\prime}=\sqrt{2k\ln(1/\delta^{\prime})}\cdot\epsilon+2k\epsilon^{2}.

2 Preliminaries from Learning Theory

A concept c:X{0,1}c:X\rightarrow\{0,1\} is a predicate that labels examples taken from the domain XX by either 0 or 1. A concept class CC over XX is a set of concepts (predicates) mapping XX to {0,1}\{0,1\}. A learning algorithm is given examples sampled according to an unknown probability distribution D\mathcal{D} over XX, and labeled according to an unknown target concept cCc\in C. The learning algorithm is successful when it outputs a hypothesis hh that approximates the target concept over samples from D\mathcal{D}. More formally:

The generalization error of a hypothesis h:X{0,1}h:X\rightarrow\{0,1\} is defined as

If errorD(c,h)α{\rm error}_{\mathcal{D}}(c,h)\leq\alpha we say that hh is α\alpha-good for cc and D\mathcal{D}.

Algorithm AA is an (α,β,m)(\alpha,\beta,m)-PAC learner for a concept class CC over XX using hypothesis class HH if for all concepts cCc\in C, all distributions D\mathcal{D} on XX, given an input of mm samples S=(z1,,zm)S=(z_{1},\ldots,z_{m}), where zi=(xi,c(xi))z_{i}=(x_{i},c(x_{i})) and each xix_{i} is drawn i.i.d. from D\mathcal{D}, algorithm AA outputs a hypothesis hHh\in H satisfying

The probability is taken over the random choice of the examples in SS according to D\mathcal{D} and the coin tosses of the learner AA. If HCH\subseteq C then AA is called a proper PAC learner; otherwise, it is called an improper PAC learner.

For a labeled sample S=(xi,yi)i=1mS=(x_{i},y_{i})_{i=1}^{m}, the empirical error of hh is

2.2 The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis (VC) Dimension is a combinatorial measure of concept classes, which characterizes the sample size of PAC learners.

That is, BB is shattered by CC if CC realizes all possible dichotomies over BB.

The VC-Dimension of a concept class CC (over a domain XX), denoted as VC(C)\operatorname{\rm VC}(C), is the cardinality of the largest set BXB\subseteq X shattered by CC. If arbitrarily large finite sets can be shattered by CC, then VC(C)=\operatorname{\rm VC}(C)=\infty.

Observe that as ΠC(B)C\Pi_{C}(B)\leq|C| a set BB can be shattered only if BlogC|B|\leq\log|C| and hence VC(C)logC\operatorname{\rm VC}(C)\leq\log|C|.

2.3 VC Bounds

Classical results in computational learning theory state that a sample of size θ(VC(C))\theta(\operatorname{\rm VC}(C)) is both necessary and sufficient for the PAC learning of a concept class CC. The following two theorems give upper and lower bounds on the sample complexity.

Any algorithm for PAC learning a concept class CC must have sample complexity Ω(VC(C)α)\Omega(\frac{\operatorname{\rm VC}(C)}{\alpha}), where α\alpha is the approximation parameter.

Let CC and D\mathcal{D} be a concept class and a distribution over a domain XX. Let α,β>0\alpha,\beta>0, and m8α(VC(C)ln(16α)+ln(2β))m\geq\frac{8}{\alpha}(\operatorname{\rm VC}(C)\ln(\frac{16}{\alpha})+\ln(\frac{2}{\beta})). Fix a concept cCc\in C, and suppose that we draw a sample S=(xi,yi)i=1mS=(x_{i},y_{i})_{i=1}^{m}, where xix_{i} are drawn i.i.d. from D\mathcal{D} and yi=c(xi)y_{i}=c(x_{i}). Then,

So, for any concept class CC, any algorithm that takes a sample of m=Ωα,β(VC(C))m=\Omega_{\alpha,\beta}(\operatorname{\rm VC}(C)) labeled examples and produces as output a concept hCh\in C that agrees with the sample is a PAC learner for CC. Such an algorithm is a PAC learner for CC using CC (that is, both the target concept and the returned hypotheses are taken from the same concept class CC), and, therefore, there always exist a hypothesis hCh\in C with errorS(h)=0{\rm error}_{S}(h)=0 (e.g., the target concept itself).

The next theorem handles (in particular) the agnostic case, in which a learning algorithm for a concept class CC is using a hypotheses class HCH\neq C, and given a sample SS (labeled by some cCc\in C), a hypothesis hh with errorS(h)=0{\rm error}_{S}(h)=0 might not exist in HH.

Let D\mathcal{D} and HH be a distribution and a concept class over a domain XX, and let f:X{0,1}f:X\rightarrow\{0,1\} be some concept, not necessarily in HH. For a sample S=(xi,f(xi))i=1mS=(x_{i},f(x_{i}))_{i=1}^{m} where m50VC(H)α2ln(1αβ)m\geq\frac{50\operatorname{\rm VC}(H)}{\alpha^{2}}\ln(\frac{1}{\alpha\beta}) and {xi}\{x_{i}\} are drawn i.i.d. from D\mathcal{D}, it holds that

Notice that in the agnostic case the sample complexity is proportional to 1α2\frac{1}{\alpha^{2}}, as opposed to 1α\frac{1}{\alpha} when learning a class CC using CC.

3 Private Learning

In private learning, we would like to accomplish the same goal as in non-private learning, while protecting the privacy of the input database.

Let AA be an algorithm that gets an input S={z1,,zm}S=\{z_{1},\ldots,z_{m}\}. Algorithm AA is an (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-PPAC learner for a concept class CC over XX using hypothesis class HH if

Privacy. Algorithm AA is (ϵ,δ)(\epsilon,\delta)-differentially private (as in Definition 2.2);

Utility. Algorithm AA is an (α,β,m)(\alpha,\beta,m)-PAC learner for CC using HH (as in Definition 2.7).

When δ=0\delta=0 (pure privacy) we omit it from the list of parameters.

Note that the utility requirement in the above definition is an average-case requirement, as the learner is only required to do well on typical samples (i.e., samples drawn i.i.d. from a distribution D\mathcal{D} and correctly labeled by a target concept cCc\in C). In contrast, the privacy requirement is a worst-case requirement, and Inequality (1) must hold for every pair of neighboring databases (no matter how they were generated, even if they are not consistent with any concept in CC).

4 Sanitization

Given a database S=(x1,,xm)S=(x_{1},\ldots,x_{m}) containing elements from some domain XX, the goal of sanitization mechanisms is to output (while preserving differential privacy) another database S^\hat{S} that is in some sense similar to SS. This returned database S^\hat{S} is called a sanitized database.

Let c:X{0,1}c:X\rightarrow\{0,1\} be a concept. The counting query Qc:XQ_{c}:X^{*}\rightarrow is

That is, Qc(S)Q_{c}(S) is the fraction of the entries in SS that satisfy the concept cc. Given a database SS, a sanitizer for a concept class CC is required to output a sanitized database S^\hat{S} s.t. Qc(S)Qc(S^)Q_{c}(S)\approx Q_{c}(\hat{S}) for every cCc\in C. For computational reasons, sanitizers are sometimes allowed not to return an actual database, but rather a data structure capable of approximating Qc(S)Q_{c}(S) for every cCc\in C.

Let CC be a concept class and let SS be a database. A function Est:C\operatorname{\rm Est}:C\rightarrow is called α\alpha-close to SS if Qc(S)Est(c)α|Q_{c}(S)-\operatorname{\rm Est}(c)|\leq\alpha for every cCc\in C. If, furthermore, Est\operatorname{\rm Est} is defined in terms of a database S^\hat{S}, i.e., Est(c)=Qc(S^)\operatorname{\rm Est}(c)=Q_{c}(\hat{S}), we say that S^\hat{S} is α\alpha-close to SS.

Let CC be a class of concepts mapping XX to {0,1}\{0,1\}. Let AA be an algorithm that on an input database SXS\in X^{*} outputs a description of a function Est:C\operatorname{\rm Est}:C\rightarrow. Algorithm AA is an (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-improper-sanitizer for predicates in the class CC, if

AA is (ϵ,δ)(\epsilon,\delta)-differentially private;

For every input SXmS\in X^{m}, it holds that \Pr\limits_{A}\left[\mbox{\rm\operatorname{\rm Est}isis\alphacloseto-close toS}\right]\geq 1-\beta.

The probability is over the coin tosses of algorithm AA. If on an input database SS algorithm AA outputs another database S^X\hat{S}\in X^{*}, and Est()\operatorname{\rm Est}(\cdot) is defined as Est(c)=Qc(S^)\operatorname{\rm Est}(c)=Q_{c}(\hat{S}), then algorithm AA is called a proper-sanitizer (or simply a sanitizer). As before, when δ=0\delta=0 (pure privacy) we omit it from the set of parameters.

Note that without the privacy requirements sanitization is a trivial task as it is possible to simply output the input database SS. Furthermore, ignoring computational complexity, an (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-improper-sanitizer can always be transformed into a (2α,β,ϵ,δ,m)(2\alpha,\beta,\epsilon,\delta,m)-sanitizer, by finding a database S^\hat{S} of mm entries that is α\alpha-close to Est\operatorname{\rm Est}. Such a database must exist except with probability β\beta (as in particular SS is α\alpha-close to Est\operatorname{\rm Est}), and is 2α2\alpha-close to SS (by the triangle inequality).

The following theorems state some of the known results on the sample complexity of pure-privacy sanitizers. We start with an upper bound on the necessary sample complexity.

There exists a constant Γ\Gamma such that for any class of predicates CC over a domain XX, and any parameters α,β,ϵ\alpha,\beta,\epsilon, there exists an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for CC, provided that the size of the database, denoted mm, is at least

The above theorem states that, in principle, data sanitization is possible. The input database may be required to be as big as the representation size of elements in XX. The next theorem states a general lower bound (far from the above upper bound) on the sample complexity of any concept class CC. Better bounds are known for specific concept classes .

Let CC be a class of predicates, and let mVC(C)2m\leq\frac{\operatorname{\rm VC}(C)}{2}. For any 0<β<10<\beta<1 bounded away from 1 by a constant, for any ϵ1\epsilon\leq 1, if AA is an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for CC, then α14+16ϵ\alpha\geq\frac{1}{4+16\epsilon}.

Recall that a proper sanitizer operates on an input database SXmS\in X^{m}, and outputs a sanitized database S^X\hat{S}\in X^{*}. The following is a simple corollary of Theorem 2.14, stating that the size of S^\hat{S} does not necessarily depend on the size of the input database SS.

Let CC be a concept class. For any database SS there exists a database S^\hat{S} of size n=O(VC(C)α2log(1α))n=O(\frac{\operatorname{\rm VC}(C)}{\alpha^{2}}\log(\frac{1}{\alpha})) such that maxhCQh(S)Qh(S^)α\max_{h\in C}|Q_{h}(S)-Q_{h}(\hat{S})|\leq\alpha.

In particular, the above theorem implies that an (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-sanitizer AA can always be transformed into a (2α,β,ϵ,δ,m)(2\alpha,\beta,\epsilon,\delta,m)-sanitizer AA^{\prime} s.t. the sanitized databases returned by AA^{\prime} are always of fixed size n=O(VC(C)α2log(1α))n=O(\frac{\operatorname{\rm VC}(C)}{\alpha^{2}}\log(\frac{1}{\alpha})). This can be done by finding a database S^\hat{S} of nn entries that is α\alpha-close to the sanitized database returned by AA. Using the triangle inequality, S^\hat{S} is (w.h.p.) 2α2\alpha-close to the input database.

5 Basic Differentially-Private Mechanisms

The most basic constructions of differentially private algorithms are via the Laplace mechanism as follows.

A random variable has probability distribution Lap(b)\mathop{\rm{Lap}}\nolimits(b) if its probability density function is f(x)=12bexp(xb)f(x)=\frac{1}{2b}\exp(-\frac{|x|}{b}), where xRx\in\R.

A function f:Xmnf:X^{m}\rightarrow{}^{n} has sensitivity kk if for every neighboring D,DXmD,D^{\prime}\in X^{m}, it holds that f(D)f(D)1k||f(D)-f(D^{\prime})||_{1}\leq k.

Let f:Xmnf:X^{m}\rightarrow{}^{n} be a sensitivity kk function. The mechanism AA that on input DXmD\in X^{m} adds independently generated noise with distribution Lap(kϵ)\mathop{\rm{Lap}}\nolimits(\frac{k}{\epsilon}) to each of the nn output terms of f(D)f(D) preserves ϵ\epsilon-differential privacy. Moreover,

where Ai(D)A_{i}(D) and fi(D)f_{i}(D) are the ithi^{\text{t}h} coordinates of A(D)A(D) and f(D)f(D).

5.2 The Exponential Mechanism

We next describe the exponential mechanism of McSherry and Talwar . Let XX be a domain and HH a set of solutions. Given a quality function q:X×HNq:X^{*}\times H\rightarrow\N, and a database SXS\in X^{*}, the goal is to chooses a solution hHh\in H approximately maximizing q(S,h)q(S,h). The mechanism chooses a solution probabilistically, where the probability mass that is assigned to each solution hh increases exponentially with its quality q(S,h)q(S,h):

Input: parameter ϵ\epsilon, finite solution set HH, database SXmS\in X^{m}, and a sensitivity 1 quality function qq. 1. Randomly choose hHh\in H with probability exp(ϵq(S,h)/2)fHexp(ϵq(S,f)/2).\frac{\exp\left(\epsilon\cdot q(S,h)/2\right)}{\sum_{f\in H}\exp\left(\epsilon\cdot q(S,f)/2\right)}. 2. Output hh.

(i) The exponential mechanism is ϵ\epsilon- differentially private. (ii) Let e^maxfH{q(S,f)}\hat{e}\triangleq\max_{f\in H}\{q(S,f)\} and Δ>0\Delta>0. The exponential mechanism outputs a solution hh such that q(S,h)(e^Δm)q(S,h)\leq(\hat{e}-\Delta m) with probability at most Hexp(ϵΔm/2)|H|\cdot\exp(-\epsilon\Delta m/2).

Kasiviswanathan et al. showed in 2008 that the exponential mechanism can be used as a generic private learner – when used with the quality function q(S,h)={i:h(xi)=yi}q(S,h)=|\{i:h(x_{i})=y_{i}\}|, the probability that the exponential mechanism outputs a hypothesis hh such that errorS(h)>minfH{errorS(f)}+Δ{\rm error}_{S}(h)>\min_{f\in H}\{{\rm error}_{S}(f)\}+\Delta is at most Hexp(ϵΔm/2)|H|\cdot\exp(-\epsilon\Delta m/2). This results in a generic private proper-learner for every finite concept class CC, with sample complexity Oα,β,ϵ(logC)O_{\alpha,\beta,\epsilon}(\log|C|).

We restate a simplified variant of algorithm Adist\mathcal{A}_{\rm dist} by Smith and Thakurta , which is an instantiation of the Propose-Test-Release framework . Let q:X×HNq:X^{*}\times H\rightarrow\N be a sensitivity-1 quality function over a domain XX and a set of solutions HH. Given a database SXS\in X^{*}, the goal is to choose a solution hHh\in H maximizing q(S,h)q(S,h), under the assumption that the optimal solution hh scores much better than any other solution in HH.

Algorithm Adist\mathcal{A}_{\rm dist} Input: parameters ϵ,δ\epsilon,\delta, database SXS\in X^{*}, sensitivity-1 quality function qq. 1. Let h1h2h_{1}\neq h_{2} be two highest score solutions in HH, where q(S,h1)q(S,h2)q(S,h_{1})\geq q(S,h_{2}). 2. Let gap=q(S,h1)q(S,h2){\rm gap}=q(S,h_{1})-q(S,h_{2}) and gap=gap+Lap(1ϵ){\rm gap}^{*}={\rm gap}+\mathop{\rm{Lap}}\nolimits(\frac{1}{\epsilon}). 3. If gap<1ϵlog(1δ){\rm gap}^{*}<\frac{1}{\epsilon}\log(\frac{1}{\delta}) then output \bot and halt. 4. Output h1h_{1}.

(i) Algorithm Adist\mathcal{A}_{\rm dist} is (ϵ,δ)(\epsilon,\delta)- differentially private. (ii) When given an input database SS for which gap1ϵlog(1βδ){\rm gap}\geq\frac{1}{\epsilon}\log(\frac{1}{\beta\delta}), algorithm Adist\mathcal{A}_{\rm dist} outputs h1h_{1} maximizing q(h,S)q(h,S) with probability at least (1β)(1-\beta).

6 Concentration Bounds

Learning with Approximate Privacy

We present proper (ϵ,δ)(\epsilon,\delta)-private learners for two simple concept classes, POINTd\operatorname{\tt POINT}_{d} and THRESHd\operatorname*{\tt THRESH}_{d}, demonstrating separations between pure and approximate private proper learning.

For jXdj\in X_{d} let cj:Xd{0,1}c_{j}:X_{d}\rightarrow\{0,1\} be defined as cj(x)=1c_{j}(x)=1 if x=jx=j and cj(x)=0c_{j}(x)=0 otherwise. Define the concept class POINTd={cj}jXd\operatorname{\tt POINT}_{d}=\{c_{j}\}_{j\in X_{d}}.

Note that the VC dimension of POINTd\operatorname{\tt POINT}_{d} is 1, and, therefore, there exists a proper non-private learner for POINTd\operatorname{\tt POINT}_{d} with sample complexity Oα,β(1)O_{\alpha,\beta}(1). Beimel et al. proved that every proper ϵ\epsilon-private learner for POINTd\operatorname{\tt POINT}_{d} must have sample complexity Ω(d)=Ω(logPOINTd)\Omega(d)=\Omega(\log|\operatorname{\tt POINT}_{d}|). They also showed that there exists an improper ϵ\epsilon-private learner for this class, with sample complexity Oα,β,ϵ(1)O_{\alpha,\beta,\epsilon}(1). An alternative private learner for this class was presented in .

As we will now see, algorithm Adist\mathcal{A}_{\rm dist} (defined in Section 2.5) can be used as a proper (ϵ,δ)(\epsilon,\delta)-private learner for POINTd\operatorname{\tt POINT}_{d} with sample complexity Oα,β,ϵ,δ(1)O_{\alpha,\beta,\epsilon,\delta}(1). This is our first (and simplest) example separating the sample complexity of pure and approximate private proper-learners. Consider the following algorithm.

Input: parameters α,β,ϵ,δ\alpha,\beta,\epsilon,\delta, and a database S(Xd+1)mS\in(X_{d+1})^{m}. 1. For every xXdx\in X_{d}, define q(S,x)q(S,x) as the number of appearances of (x,1)(x,1) in SS. 2. Execute Adist\mathcal{A}_{\rm dist} on SS with the quality function qq and parameters α2,β2,ϵ,δ\frac{\alpha}{2},\frac{\beta}{2},\epsilon,\delta. 3. If the output was jj then return cjc_{j}. 4. Else, if the output was \bot then return a random ciPOINTdc_{i}\in\operatorname{\tt POINT}_{d}.

Let α,β,ϵ,δ\alpha,\beta,\epsilon,\delta be s.t. 1αβ2d\frac{1}{\alpha\beta}\leq 2^{d}. The above algorithm is an efficient (α,β,ϵ,δ)(\alpha,\beta,\epsilon,\delta)-PPAC proper learner for POINTd\operatorname{\tt POINT}_{d} using a sample of m=O(1αϵln(1βδ))m=O\left(\frac{1}{\alpha\epsilon}\ln(\frac{1}{\beta\delta})\right) labeled examples.

For intuition, consider a target concept cjc_{j} and an underlying distribution D\mathcal{D}. Whenever D(j)\mathcal{D}(j) is noticeable, a typical sample SS contains many copies of the point jj labeled as 11. As every other point iji\neq j will be labeled as , we expect q(S,j)q(S,j) to be significantly higher than any other q(S,i)q(S,i), and we can use algorithm Adist\mathcal{A}_{\rm dist} to identify jj.

Recall that the above algorithm outputs a random ciPOINTdc_{i}\in\operatorname{\tt POINT}_{d} whenever Adist\mathcal{A}_{\rm dist} outputs \bot. In order for this random cic_{i} to be good (w.h.p.) we needed 2d2^{d} (i.e., the number of possible concepts) to be at least 1αβ\frac{1}{\alpha\beta}. This requirement could be avoided by outputting the all zero hypothesis c00c_{0}\equiv 0 whenever Adist\mathcal{A}_{\rm dist} outputs \bot. However, this approach results in a proper learner only if we add the all zero concept to POINTd\operatorname{\tt POINT}_{d}.

For 0j2d0\leq j\leq 2^{d} let cj:Xd{0,1}c_{j}:X_{d}\rightarrow\{0,1\} be defined as cj(x)=1c_{j}(x)=1 if x<jx<j and cj(x)=0c_{j}(x)=0 otherwise. Define the concept class THRESHd={cj}0j2d\operatorname*{\tt THRESH}_{d}=\{c_{j}\}_{0\leq j\leq 2^{d}}.

Note that VC(THRESHd)=1\operatorname{\rm VC}(\operatorname*{\tt THRESH}_{d})=1, and, therefore, there exists a proper non-private learner for THRESHd\operatorname*{\tt THRESH}_{d} with sample complexity Oα,β(1)O_{\alpha,\beta}(1). As THRESHd=2d+1|\operatorname*{\tt THRESH}_{d}|=2^{d}+1, one can use the generic construction of Kasiviswanathan et al. and get a proper ϵ\epsilon-private learner for this class with sample complexity Oα,β,ϵ(d)O_{\alpha,\beta,\epsilon}(d). Feldman and Xiao showed that this is in fact optimal, and every ϵ\epsilon-private learner for this class (proper or improper) must have sample complexity Ω(d)\Omega(d).

Our learner for POINTd\operatorname{\tt POINT}_{d} relied on a strong “stability” property of the problem: Given a labeled sample, either a random concept is (w.h.p.) a good output, or, there is exactly one consistent concept in the class, and every other concept has large empirical error. This, however, is not the case when dealing with THRESHd\operatorname*{\tt THRESH}_{d}. In particular, many hypotheses can have low empirical error, and changing a single entry of a sample SS can significantly affect the set of hypotheses consistent with it.

In Section 3.3, we present a proper (ϵ,δ)(\epsilon,\delta)-private learner for THRESHd\operatorname*{\tt THRESH}_{d} with sample complexity (roughly) 2O(log(d))2^{O(\log^{*}(d))}. We use this section for motivating the construction. We start with two simplifying assumptions. First, when given a labeled sample SS, we aim at choosing a hypothesis hTHRESHdh\in\operatorname*{\tt THRESH}_{d} approximately minimizing the empirical error (rather than the generalization error). Second, we assume that we are given a “diverse” sample SS that contains many points labeled as 11 and many points labeled as . Those two assumptions (and any other informalities made hereafter) will be removed in Section 3.3.

We refer to points labeled as 11 in SS as ones, and to points labeled as as zeros. Imagine for a moment that we already have a differentially private algorithm that given SS outputs an interval GXdG\subseteq X_{d} with the following two properties:

The interval GG contains “a lot” of ones, and “a lot” of zeros in SS.

Every interval IXdI\subseteq X_{d} of length Gk\leq\frac{|G|}{k} does not contain, simultaneously, “too many” ones and “too many” zeros in SS, where kk is some constant.

Given such a 44-good interval GG, we can (without using the sample SS) define a set HH of five hypotheses s.t. at least one of them has small empirical error. To see this, consider Figure 2, where GG is divided into four equal intervals g1,g2,g3,g4g_{1},g_{2},g_{3},g_{4}, and five hypotheses h1,,h5h_{1},\ldots,h_{5} are defined s.t. the points where they switch from one to zero are located at the edges of g1,g2,g3,g4g_{1},g_{2},g_{3},g_{4}.

After defining such a set HH, we could use the exponential mechanism to choose a hypothesis hHh\in H with small empirical error on SS. As the size of HH is constant, this requires only a constant number of samples. To conclude, finding a 44-good interval GG (while preserving privacy) is sufficient for choosing a good hypothesis. We next explain how to find such an interval.

Assume, for now, that we have a differentially private algorithm that given a sample SS, returns an interval length JJ s.t. there exists a 2-good interval GXdG\subseteq X_{d} of length G=J|G|=J. This length JJ is used to find an explicit 4-good interval as follows. Divide XdX_{d} into intervals {Ai}\{A_{i}\} of length 2J2J, and into intervals {Bi}\{B_{i}\} of length 2J2J right shifted by JJ as in Figure 3.

As the promised 22-good interval GG is of length JJ, at least one of the above intervals contains GG. We next explain how to privately choose such interval. If, e.g., GA2G\subseteq A_{2} then A2A_{2} contains both a lot of zeros and a lot of ones. The target concept must switch inside A2A_{2}, and, therefore, every other AiA2A_{i}\neq A_{2} cannot contain both zeros and ones. For every interval AiA_{i}, define its quality q(Ai)q(A_{i}) to be the minimum between the number of zeros in AiA_{i} and the number of ones in AiA_{i}. Therefore, q(A2)q(A_{2}) is large, while q(Ai)=0q(A_{i})=0 for every AiA2A_{i}\neq A_{2}. That is, A2A_{2} scores much better than any other AiA_{i} under this quality function qq. The sensitivity of q()q() is one and we can use algorithm Adist\mathcal{A}_{\rm dist} to privately identify A2A_{2}. It suffices, e.g., that q(A2)14αmq(A_{2})\geq\frac{1}{4}\alpha m, and we can, therefore, set our “a lot” bound to be 14αm\frac{1}{4}\alpha m. Recall that GA2G\subseteq A_{2} is a 22-good interval, and that A2=2G|A_{2}|=2|G|. The identified A2A_{2} is, therefore, a 44-good interval.

To conclude, if we could indeed find (while preserving privacy) a length JJ s.t. there exists a 22-good interval GG of that length, then our task would be completed.

At first attempt, one might consider preforming a binary search for such a length 0J2d0\leq J\leq 2^{d}, in which every comparison will be made using the Laplace mechanism. More specifically, for every length 0J2d0\leq J\leq 2^{d}, define

If, e.g., Q(J)=100Q(J)=100 for some JJ, then there exists an interval [a,b]Xd[a,b]\subseteq X_{d} of length JJ that contains at least 100100 ones and at least 100100 zeros. Moreover, every interval of length J\leq J either contains at most 100100 ones, or, contains at most 100100 zeros.

Note that Q()Q(\cdot) is a monotonically non-decreasing function, and that Q(0)=0Q(0)=0 (as in a correctly labeled sample a point cannot appear both with the label 1 and with the label 0). Recall our assumption that the sample SS is “diverse” (contains many points labeled as 11 and many points labeled as ), and, therefore, Q(2d)Q(2^{d}) is large. Hence, there exists a JJ s.t. Q(J)Q(J) is “big enough” (say at least 14αm\frac{1}{4}\alpha m) while Q(J1)Q(J-1) is “small enough” (say at most 34αm\frac{3}{4}\alpha m). That is, a JJ s.t. (1) there exists an interval of length JJ containing lots of ones and lots of zeros; and (2), every interval of length <J<J cannot contain too many ones and too many zeros simultaneously. Such a JJ can easily be (privately) obtained using a (noisy) binary search. However, as there are dd noisy comparisons, this solution requires a sample of size dO(1)d^{O(1)} in order to achieve reasonable utility guarantees.

As a second attempt, one might consider preforming a binary search, not on 0J2d0\leq J\leq 2^{d}, but rather on the power jj of an interval of length 2j2^{j}. That is, preforming a search for a power 0jd0\leq j\leq d for which there exists a 22-good interval of length 2j2^{j}. Here there are only log(d)\log(d) noisy comparisons, and the sample size is reduced to logΩ(1)(d)\log^{\Omega(1)}(d). Again, a (noisy) binary search on 0jd0\leq j\leq d can (privately) yield an appropriate length J=2jJ=2^{j} s.t. Q(2j)Q(2^{j}) is “big enough”, while Q(2j1)Q(2^{j-1}) is “small enough”. Such a J=2jJ=2^{j} is, indeed, a length of a 22-good interval. Too see this, note that as Q(2j)Q(2^{j}) is “big enough”, there exists an interval of length 2j2^{j} containing lots of ones and lots of zeros. Moreover, as Q(2j1)Q(2^{j-1}) is “small enough”, every interval of length 2j1=122j2^{j-1}=\frac{1}{2}2^{j} cannot contain too many ones and too many zeros simultaneously.

A binary search as above would have to operate on noisy values of Q()Q(\cdot) (as otherwise differential privacy cannot be obtained). For this reason, we set the bounds for “big enough” and “small enough” to overlap. Namely, we search for a value jj such that Q(2j)α4mQ(2^{j})\geq\frac{\alpha}{4}m and Q(2j1)3α4mQ(2^{j-1})\leq\frac{3\alpha}{4}m, where α\alpha is our approximation parameter, and mm is the sample size.

To summarize, using a binary search we find a length J=2jJ=2^{j} such that there exists a 2-good interval of length JJ. Then, using Adist\mathcal{A}_{\rm dist}, we find a 4-good interval. Finally, we partition this interval to 4 intervals, and using the exponential mechanism we choose a starting point or end point of one of these intervals as our the threshold.

We will apply recursion to reduce the costs of computing J=2jJ=2^{j} to 2O(log(d))2^{O(\log^{*}(d))}. The tool performing the recursion would be formalized and analyzed in the next section. This tool will later be used in our construction of a proper (ϵ,δ)(\epsilon,\delta)-private learner for THRESHd\operatorname*{\tt THRESH}_{d}.

3 Privately Approximating Quasi-Concave Promise Problems

We next define the notions that enable our recursive algorithm.

A Quasi-Concave Promise Problem consists of an ordered set of possible solutions [0,T]={0,1,,T}[0,T]=\{0,1,\ldots,T\}, a database SXmS\in X^{m}, a sensitivity-1 quality function Q:X×[0,T]RQ:X^{*}\times[0,T]\rightarrow\R, an approximation parameter α\alpha, and another parameter rr (called a quality promise).

If Q(S,)Q(S,\cdot) is quasi-concave and if there exists a solution p[0,T]p\in[0,T] for which Q(S,p)rQ(S,p)\geq r then a good output for the problem is a solution k[0,T]k\in[0,T] satisfying Q(S,k)(1α)rQ(S,k)\geq(1-\alpha)r. The outcome is not restricted otherwise.

Consider a sample S=(xi,yi)i=1mS=(x_{i},y_{i})_{i=1}^{m}, labeled by some target function cjTHRESHdc_{j}\in\operatorname*{\tt THRESH}_{d}. The goal of choosing a hypothesis with small empirical error can be viewed as a quasi-concave promise problem as follows. Set the range of possible solutions to [0,2d][0,2^{d}], the approximation parameter to α\alpha and the quality promise to mm. Define Q(S,k)={i:ck(xi)=yi}Q(S,k)=|\{i:c_{k}(x_{i})=y_{i}\}|; i.e., Q(S,k)Q(S,k) is the number of points in SS correctly classified by ckTHRESHdc_{k}\in\operatorname*{\tt THRESH}_{d}. Note that the target concept cjc_{j} satisfies Q(S,j)=mQ(S,j)=m. Our task is to find a hypothesis hkTHRESHdh_{k}\in\operatorname*{\tt THRESH}_{d} s.t. errorS(hk)α{\rm error}_{S}(h_{k})\leq\alpha, which is equivalent to finding k[0,2d]k\in[0,2^{d}] s.t. Q(S,k)(1α)mQ(S,k)\geq(1-\alpha)m.

To see that Q(S,)Q(S,\cdot) is quasi-concave, let uvwu\leq v\leq w be s.t. Q(S,u),Q(S,w)λQ(S,u),Q(S,w)\geq\lambda. Consider jj, the index of the target concept, and assume w.l.o.g. that jvj\leq v (the other case is symmetric). That is, j vwj\leq\ v\leq w. Note that cvc_{v} errs only on points in between jj and vv, and cwc_{w} errs on all these points. That is, errorS(cv)errorS(cw){\rm error}_{S}(c_{v})\leq{\rm error}_{S}(c_{w}), and, therefore, Q(S,v)λQ(S,v)\geq\lambda. See Figure 4 for an illustration.

Note that if the sample SS in the above example is not consistent with any cTHRESHdc\in\operatorname*{\tt THRESH}_{d}, then there is no jj s.t. Q(S,j)=mQ(S,j)=m, and the quality promise is void. Moreover, in such a case Q(S,)Q(S,\cdot) might not be quasi-concave.

We are interested in solving quasi-concave promise problems while preserving differential privacy. As motivated by Remark 3.9, privacy must be preserved even when Q(S,)Q(S,\cdot) is not quasi-concave or Q(S,p)<rQ(S,p)<r for all p[0,T]p\in[0,T]. Our algorithm RecConcaveRecConcave is presented in Figure 5 (see inline comments for some of the underlying intuition).

We start the analysis of Algorithm RecConcaveRecConcave by bounding the number of recursive calls.

On a range [0,T][0,T] there could be at most log(T)=log(T)\log^{\lceil*\rceil}(T)=\log^{*}(T) recursive calls throughout the execution of RecConcaveRecConcave.

Before proceeding to the privacy analysis, we make the following simple observation.

Let {f1,f2,,fN}\{f_{1},f_{2},\ldots,f_{N}\} be a set of sensitivity-1 functions mapping XX^{*} to . Then fmax(S)=maxi{fi(S)}f_{max}(S)=\max_{i}\{f_{i}(S)\} and fmin(S)=mini{fi(S)}f_{min}(S)=\min_{i}\{f_{i}(S)\} are sensitivity-1 functions.

We now proceed with the privacy analysis of algorithm RecConcaveRecConcave.

When executed on a sensitivity-1 quality function QQ, parameters ϵ,δ\epsilon,\delta, and a bound on the recursion depth NN, algorithm RecConcaveRecConcave preserves (3Nϵ,3Nδ)(3N\epsilon,3N\delta)-differential privacy.

Note that since QQ is a sensitivity-1 function, all of the quality functions defined throughout the execution of RecConcaveRecConcave are of sensitivity 1 (see Observation 3.11). In each recursive call algorithm RecConcaveRecConcave invokes at most three differentially private mechanisms – once with the Exponential Mechanism (on Step 1 or on Step 11), and at most twice with algorithm Adist\mathcal{A}_{\rm dist} (on Step 9). As there are at most NN recursive calls, we conclude that throughout the entire execution algorithm RecConcaveRecConcave invokes most 3N3N mechanisms, each (ϵ,δ)(\epsilon,\delta)-differentially private. Hence, using Theorem 2.4, algorithm RecConcaveRecConcave is (3Nϵ,3Nδ)(3N\epsilon,3N\delta)-differentially private. ∎

We now turn to proving the correctness of algorithm RecConcaveRecConcave. As the proof is by induction (on the number of recursive calls), we need to show that each of the recursive calls to RecConcaveRecConcave is made with appropriate inputs. We first claim that the function q(S,)q(S,\cdot) constructed in Step 4 is quasi-concave. Note that for this claim we do not need to assume that Q(S,)Q(S,\cdot) is quasi-concave.

Let Q:X×[0,T]RQ:X^{*}\times[0,T]\rightarrow\R be a quality function, and let the functions L(,)L(\cdot,\cdot) and q(,)q(\cdot,\cdot) be as in steps 3, 4 of algorithm RecConcaveRecConcave. Then, for every SXS\in X^{*}, it holds that q(S,)q(S,\cdot) is quasi-concave.

Fix SXS\in X^{*}. First observe that the function

is monotonically non-increasing (as a function of jj). To see this, note that if L(S,j)=XL(S,j)=\mathcal{X}, then there exists an interval of length 2j2^{j} in which every point has quality at least X\mathcal{X}. In particular, there exists such an interval of length 122j\frac{1}{2}2^{j}, and L(S,j1)XL(S,j-1)\geq\mathcal{X}.

We use logN()\log^{\lceil N\rceil}(\cdot) to denote the outcome of NN iterative applications of the function log()\lceil\log(\cdot)\rceil, i.e., logN(n)=logloglogN times(n)\log^{\lceil N\rceil}(n)=\underbrace{\lceil\log\lceil\log\lceil\cdots\lceil\log}_{N\text{ times}}(n)\rceil\cdots\rceil\rceil\rceil. Observe that logN(n)2+logloglogN times(n)\log^{\lceil N\rceil}(n)\leq 2+\underbrace{\log\log\cdots\log}_{N\text{ times}}(n) for every Nlog(n)N\leq\log^{*}(n). For example logloglog(n)loglog(2+log(n))loglog(2log(n))=log1+loglog(n)log(2+loglog(n))log(2loglog(n))=1+logloglog(n)2+logloglog(n)\lceil\log\lceil\log\lceil\log(n)\rceil\rceil\rceil\leq\lceil\log\lceil\log(2+\log(n))\rceil\rceil\leq\lceil\log\lceil\log(2\log(n))\rceil\rceil=\lceil\log\lceil 1+\log\log(n)\rceil\rceil\leq\lceil\log(2+\log\log(n))\rceil\leq\lceil\log(2\log\log(n))\rceil=\lceil 1+\log\log\log(n)\rceil\leq 2+\log\log\log(n).

Let Q:X×[0,T]RQ:X^{*}\times[0,T]\rightarrow\R be a sensitivity-1 quality function, and let SXS\in X^{*} be a database s.t. Q(S,)Q(S,\cdot) is quasi-concave. Let α12\alpha\leq\frac{1}{2} and let β,ϵ,δ,r,N\beta,\epsilon,\delta,r,N be s.t.

When executed on S,[0,T],r,α,ϵ,δ,NS,[0,T],r,\alpha,\epsilon,\delta,N, algorithm RecConcaveRecConcave fails to outputs an index jj s.t. Q(S,j)(1α)rQ(S,j)\geq(1-\alpha)r with probability at most 2βN2\beta N.

The proof is by induction on the number of recursive calls, denoted as tt. For t=1t=1 (i.e., T32T\leq 32 or N=1N=1), the exponential mechanism ensures that for r2αϵlog(Tβ)r\geq\frac{2}{\alpha\epsilon}\log(\frac{T}{\beta}), the probability of algorithm RecConcaveRecConcave failing to output a jj s.t. Q(S,j)(1α)rQ(S,j)\geq(1-\alpha)r is at most β\beta.

Assume that the stated lemma holds whenever algorithm RecConcaveRecConcave performs at most t1t-1 recursive calls, and let S,[0,T],r,α,ϵ,δ,NS,[0,T],r,\alpha,\epsilon,\delta,N be inputs (satisfying the conditions of Lemma 3.14) on which algorithm RecConcaveRecConcave preforms tt recursive calls. Consider the first call in the execution of RecConcaveRecConcave on those inputs, and denote by TT^{\prime} the smallest power of 2 s.t. TTT^{\prime}\geq T. In order to apply the inductive assumption, we need to show that for the recursive call in step 6, all the conditions of Lemma 3.14 hold.

We first note that by Claim 3.13, the quality function q(S,)q(S,\cdot) defined of step 4 is quasi-concave. We next show that the recursive call is preformed with an appropriate quality promise R=α2rR=\frac{\alpha}{2}r. The conditions of the lemma ensure that L(S,0)rL(S,0)\geq r, and, by definition, we have that L(S,log(T)+1)0L(S,\log(T^{\prime})+1)\leq 0. There exists therefore a j[0,log(T)]j\in[0,\log(T^{\prime})] for which L(S,j)(1α2)rL(S,j)\geq(1-\frac{\alpha}{2})r, and L(S,j+1)<(1α2)rL(S,j+1)<(1-\frac{\alpha}{2})r. Plugging these inequalities in the definition of q(S,j)q(S,j) we get that q(S,j)α2rq(S,j)\geq\frac{\alpha}{2}r. Therefore, there exists an index j[0,log(T)]j\in[0,\log(T^{\prime})] with quality q(S,j)Rq(S,j)\geq R. Moreover, the recursive call of step 6 executes RecConcaveRecConcave on the range [0,log(T)]=[0,log(T)][0,\log(T^{\prime})]=[0,\lceil\log(T)\rceil] with (N1)(N-1) as the bound on the recursion depth, with α~14\widetilde{\alpha}\triangleq\frac{1}{4} as the approximation parameter, and with a quality promise RR satisfying

We next show that w.h.p. at least one of the two intervals A,BA,B chosen on Step 9, contains a lot of points with high score. Denote the index returned by the recursive call of step 6 as kk. By the inductive assumption, with probability at least (12β(N1))(1-2\beta(N-1)), the index kk is s.t. q(S,k)(114)R=3α8rq(S,k)\geq(1-\frac{1}{4})R=\frac{3\alpha}{8}r; we proceed with the analysis assuming that this event happened. By the definition of q(S,k)q(S,k), this means that L(S,k)q(S,k)+(1α)r(15α8)rL(S,k)\geq q(S,k)+(1-\alpha)r\geq(1-\frac{5\alpha}{8})r and that L(S,k+1)rq(S,k)(13α8)rL(S,k+1)\leq r-q(S,k)\leq(1-\frac{3\alpha}{8})r. That is, there exists an interval GG of length 2k2^{k} s.t. iG\forall i\in G it holds that Q(S,i)(15α8)rQ(S,i)\geq(1-\frac{5\alpha}{8})r, and every interval of length 22k2\cdot 2^{k} contains at least one point ii s.t. Q(S,i)(13α8)rQ(S,i)\leq(1-\frac{3\alpha}{8})r.

Note that if P1P_{1} (or P2P_{2}) is trimmed, then there are no points on the left of (or on the right of) PP. So, the interval PP contains the point pp with quality Q(S,p)rQ(S,p)\geq r and every point i[0,T]Pi\in[0,T]\setminus P has quality of at most (13α8)r(1-\frac{3\alpha}{8})r. Moreover, PP is of length 42k14\cdot 2^{k}-1. As the intervals of the partitions {Ai}\{A_{i}\} and {Bi}\{B_{i}\} are of length 82k8\cdot 2^{k}, and the {Bi}\{B_{i}\}’s are shifted by 42k4\cdot 2^{k}, there must exist an interval C{Ai}{Bi}C\in\{A_{i}\}\cup\{B_{i}\} s.t. PCP\subseteq C. Assume without loss of generality that C{Ai}C\in\{A_{i}\}.

Recall that the quality u(S,)u(S,\cdot) of an interval II is defined as the maximal quality Q(S,i)Q(S,i) of a point iIi\in I. Therefore, as pCp\in C, the quality of CC is at least rr. On the other hand, the quality of every AiCA_{i}\neq C is at most (13α8)r(1-\frac{3\alpha}{8})r. That is, the interval CC scores better (under uu) than any other interval in {Ai}\{A_{i}\} by at least an additive factor of 3α8r1ϵlog(1βδ)\frac{3\alpha}{8}r\geq\frac{1}{\epsilon}\log(\frac{1}{\beta\delta}). By the properties of Adist\mathcal{A}_{\rm dist}, with probability at least (1β)(1-\beta), the chosen interval AA in step 9 is s.t. PAP\subseteq A. We proceed with the analysis assuming that this is the case.

Consider again the interval PP containing the point pp, and recall that there exists an interval GG of length 2k2^{k} containing only points with quality Q(S,)Q(S,\cdot) of at least (15α8)r(1-\frac{5\alpha}{8})r. Such an interval must be contained in PP. Otherwise, by the quasi-concavity of Q(S,)Q(S,\cdot), all the points between GG and the point pp must also have quality at least (15α8)r(1-\frac{5\alpha}{8})r, and, in particular, PP must indeed contain such an interval.

So, the chosen interval AA in step 9 is of length 82k8\cdot 2^{k}, and it contains a sub interval of length 2k2^{k} in which every point has quality at least (15α8)r(1-\frac{5\alpha}{8})r. That is, at least 116\frac{1}{16} out of the points in (AB)(A\cup B) has quality at least (15α8)r(1-\frac{5\alpha}{8})r. Therefore, as r4αϵlog(16β)r\geq\frac{4}{\alpha\epsilon}\log(\frac{16}{\beta}), the exponential mechanism ensures that the probability of step 10 failing to return a point h(AB)h\in(A\cup B) with Q(S,h)(1α)rQ(S,h)\geq(1-\alpha)r is at most β\beta. As there are at least 116AB\frac{1}{16}|A\cup B| solutions with quality at least (15α8)r(1-\frac{5\alpha}{8})r, the probability that the exponential mechanism outputs a specific solution h(AB)h\in(A\cup B) with Q(S,h)(1α)rQ(S,h)\geq(1-\alpha)r is at most exp(ϵ2(1α)r)116ABexp(ϵ2(15α8)r)\frac{\exp(\frac{\epsilon}{2}(1-\alpha)r)}{\frac{1}{16}|A\cup B|\exp(\frac{\epsilon}{2}(1-\frac{5\alpha}{8})r)}. Hence, the probability that the exponential mechanism outputs any solution h(AB)h\in(A\cup B) with Q(S,h)(1α)rQ(S,h)\geq(1-\alpha)r is at most 16exp(ϵ2(1α)r)exp(ϵ2(15α8)r)16\frac{\exp(\frac{\epsilon}{2}(1-\alpha)r)}{\exp(\frac{\epsilon}{2}(1-\frac{5\alpha}{8})r)}, which is at most β\beta for our choice of rr.

All in all, with probability at least (12β(N1)2β)=(12βN)(1-2\beta(N-1)-2\beta)=(1-2\beta N), algorithm RecConcaveRecConcave returns an index j[0,T]j\in[0,T] s.t. Q(S,j)(1α)rQ(S,j)\geq(1-\alpha)r. ∎

Combining Lemma 3.12 and Lemma 3.14 we get the following theorem.

Let algorithm RecConcaveRecConcave be executed on a range [0,T][0,T], a sensitivity-1 quality function QQ, a database SS, a bound on the recursion depth NN, privacy parameters ϵ3N,δ3N\frac{\epsilon}{3N},\frac{\delta}{3N}, approximation parameter α\alpha, and a quality promise rr. The following two statements hold:

Algorithm RecConcaveRecConcave preserves (ϵ,δ)(\epsilon,\delta)-differential privacy.

If SS is s.t. Q(S,)Q(S,\cdot) is quasi-concave, and if

then algorithm RecConcaveRecConcave fails to outputs an index jj s.t. Q(S,j)(1α)rQ(S,j)\geq(1-\alpha)r with probability at most β\beta.

Recall that the number of recursive calls on a range [0,T][0,T] is always bounded by log(T)\log^{*}(T), and note that for N=log(T)N=\log^{*}(T) we have that logN(T)1\log^{\lceil N\rceil}(T)\leq 1. Therefore, the promise requirement in Inequality (2) can be replaced with 8^{\log^{*}(T)}\cdot\frac{36\log^{*}(T)}{\alpha\epsilon}\log\Big{(}\frac{12\log^{*}(T)}{\beta\delta}\Big{)}.

The computational efficiency of algorithm RecConcaveRecConcave depends on the quality function Q(,)Q(\cdot,\cdot). Note, however, that it suffices to efficiently implement the top level call (i.e., without the recursion). This is true because an iteration of algorithm RecConcaveRecConcave, operating on a range [0,T][0,T], can easily be implemented in time poly(T)\mathop{\rm{poly}}\nolimits(T), and the range given as input to recursive calls is logarithmic in the size of the initial range.

As we will now see, algorithm RecConcaveRecConcave can be used as a proper (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-private learner for THRESHd\operatorname*{\tt THRESH}_{d}. Recall Example 3.8 (showing that the goal of choosing a hypothesis with small empirical error can be viewed as a quasi-concave promise problem), and consider the following algorithm.

Algorithm LearnThresholdsLearnThresholds Input: A labeled sample S=(xi,yi)i=1mS=(x_{i},y_{i})_{i=1}^{m} and parameters α,ϵ,δ,N\alpha,\epsilon,\delta,N. 1. Denote α^=α2,    ϵ^=ϵ3N\hat{\alpha}=\frac{\alpha}{2},\;\;\hat{\epsilon}=\frac{\epsilon}{3N}, and δ^=δ3N\hat{\delta}=\frac{\delta}{3N}. 2. For every 0j2d0\leq j\leq 2^{d}, define Q(S,j)={i:cj(xi)=yi}Q(S,j)=|\{i:c_{j}(x_{i})=y_{i}\}|. 3. Execute algorithm RecConcaveRecConcave on the sample SS, the range [0,2d][0,2^{d}], the quality function Q(,)Q(\cdot,\cdot), the promise mm, and parameters α^,ϵ^,δ^,N\hat{\alpha},\hat{\epsilon},\hat{\delta},N. Denote the returned value as kk. 4. Return ckc_{k}.

For every 1Nlog(2d)1\leq N\leq\log^{*}(2^{d}), Algorithm LearnThresholdsLearnThresholds is an efficient proper (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-PPAC learner for THRESHd\operatorname*{\tt THRESH}_{d}, where the sample size is

By Theorem 3.15, algorithm LearnThresholdsLearnThresholds is (ϵ,δ)(\epsilon,\delta)-differentially private. For the utility analysis, fix a target concept cjTHRESHdc_{j}\in\operatorname*{\tt THRESH}_{d}, and a distribution D\mathcal{D} on XdX_{d}, and let SS be a sample drawn i.i.d. from D\mathcal{D} and labeled by cjc_{j}. Define the following two good events:

  hTHRESHd,    errorD(h,cj)errorS(h)α2\forall\;h\in\operatorname*{\tt THRESH}_{d},\;\;|{\rm error}_{\mathcal{D}}(h,c_{j})-{\rm error}_{S}(h)|\leq\frac{\alpha}{2}.

Algorithm RecConcaveRecConcave returns kk s.t. errorS(ck)α2{\rm error}_{S}(c_{k})\leq\frac{\alpha}{2}

Clearly, when both E1,E2E_{1},E_{2} occur, algorithm LearnThresholdsLearnThresholds succeeds in outputting an α\alpha-good hypothesis for cjc_{j} and D\mathcal{D}. Note that as VC(THRESHd)=1\operatorname{\rm VC}(\operatorname*{\tt THRESH}_{d})=1, Theorem 2.14 ensures that for m200α2ln(4αβ)m\geq\frac{200}{\alpha^{2}}\ln(\frac{4}{\alpha\beta}), event E1E_{1} happens with probability at least (1β2)(1-\frac{\beta}{2}).

Next, note that for the target concept cjc_{j} it holds that Q(S,j)=mQ(S,j)=m, and algorithm RecConcaveRecConcave is executed on step 3 with a valid quality promise. Moreover, as shown in Example 3.8, algorithm RecConcaveRecConcave is executed with a quasi-concave quality function.

So, algorithm RecConcaveRecConcave is executed on step 3 with a valid quality promise and with a quasi-concave quality function. For

algorithm RecConcaveRecConcave ensures that with probability at least (1β2)(1-\frac{\beta}{2}), the index kk at step 2 is s.t. Q(k)(1α2)mQ(k)\geq(1-\frac{\alpha}{2})m. The empirical error of ckc_{k} is at most α2\frac{\alpha}{2} in such a case. Therefore, Event E2E_{2} happens with probability at least (1β2)(1-\frac{\beta}{2}). Overall, we conclude that LearnThresholdsLearnThresholds is a proper (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-PPAC learner for CC, where

By using N=log(2d)N=\log^{*}(2^{d}) in the above theorem, we can bound the sample complexity of LearnThresholdsLearnThresholds by

5 Axis-Aligned Rectangles in High Dimension

Consider the class of all axis-aligned rectangles (or hyperrectangles) in the Euclidean space n. A concept in this class could be thought of as the product of nn intervals, one on each axis. We briefly describe an efficient approximate-private proper-learner for a discrete version of this class.

Let Xdn=({0,1}d)nX_{d}^{n}=(\{0,1\}^{d})^{n} denote a discrete nn-dimensional domain, in which every axis consists of 2d2^{d} points {0,1,,2d1}\{0,1,\ldots,2^{d}-1\}. For every a=(a1,,an),b=(b1,,bn)Xdn\vec{a}=(a_{1},\ldots,a_{n}),\vec{b}=(b_{1},\ldots,b_{n})\in X_{d}^{n} define the concept c[a,b]:Xdn{0,1}c_{[\vec{a},\vec{b}]}:X_{d}^{n}\rightarrow\{0,1\} where c[a,b](x)=1c_{[\vec{a},\vec{b}]}(\vec{x})=1 if and only if for every 1in1\leq i\leq n it holds that aixibia_{i}\leq x_{i}\leq b_{i}. Define the concept class of all axis-aligned rectangles over XdnX^{n}_{d} as RECTANGLEdn={c[a,b]}a,bXdn\operatorname*{\tt RECTANGLE}_{d}^{n}=\{c_{[\vec{a},\vec{b}]}\}_{\vec{a},\vec{b}\in X_{d}^{n}}.

The VC dimension of this class is 2n2n, and, thus, it can be learned non-privately with sample complexity Oα,β(n)O_{\alpha,\beta}(n). Note that RECTANGLEdn=2O(nd)|\operatorname*{\tt RECTANGLE}_{d}^{n}|=2^{O(nd)}, and, therefore, the generic construction of Kasiviswanathan et al. yields an inefficient proper ϵ\epsilon-private learner for this class with sample complexity Oα,β,ϵ(nd)O_{\alpha,\beta,\epsilon}(nd).

In , Kearns gave an efficient (noise resistant) non-private learner for this class. The learning model there was a variant of the statistical queries model , in which the learner is also being given access to the underling distribution D\mathcal{D}. Every learning algorithm in the statistical queries model can be transformed to satisfy differential privacy while preserving efficiency . However, as Kearns’ algorithm assumes direct access to D\mathcal{D}, this transformation cannot be applied directly.

Kearns’ algorithm begins by sampling D\mathcal{D} and using the drawn samples to divide each axis i[n]i\in[n] into O(n/α)O(n/\alpha) intervals Ii={I}{\cal I}_{i}=\{I\} with the property that the xix_{i} component of a random point from D\mathcal{D} is approximately equally likely to fall into each of the intervals in Ii{\cal I}_{i}. The algorithm proceeds by estimating the boundary of the target rectangle separately for every dimension ii: For every interval IIiI\in{\cal I}_{i}, the algorithm uses statistical queries to estimate the probability that a positively labeled input has its xix_{i} component in II, i.e.,

The algorithm places the left boundary of the hypothesis rectangle in the ii-th dimension at the left-most interval IIiI\in{\cal I}_{i} such that pIp_{I} is significant, and analogously on the right.

Note that once the interval sets Ii{\cal I}_{i} are defined for each axis i[n]i\in[n], estimating every single pIp_{I} can be done via statistical queries, and can, therefore, be made private using the transformation of . Alternatively, estimating (simultaneously) all of the pIp_{I}’s (on the ithi^{\text{t}h} axis) could be done privately using the laplacian mechanism. This use of the laplacian mechanism is known as a histogram (see Theorem 2.24).

Thus, our task is to privately partition each axis. The straight forward approach for privately finding Ii{\cal I}_{i} is by a noisy binary search for the boundary of each of the n/αn/\alpha intervals (in each axis). This would result in Ω(d)\Omega(d) noisy comparisons, which, in turn, results in a private learner with a high sample complexity.

We now overcome this issue using a sanitizer for THRESHd\operatorname*{\tt THRESH}_{d}. Such a sanitizer will be constructed in Section 4.2; here we use it for privately finding Ii{\cal I}_{i}.

Fix α,β,ϵ,δ\alpha,\beta,\epsilon,\delta. There exists an efficient (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-sanitizer for THRESHd\operatorname*{\tt THRESH}_{d}, where m=O~β,ϵ,δ(1α2.58log(d))m=\widetilde{O}_{\beta,\epsilon,\delta}\left(\frac{1}{\alpha^{2.5}}\cdot 8^{\log^{*}(d)}\right).

As we next explain, such a sanitizer can be used to (privately) divide the axes. Given an interval [a,b]Xd[a,b]\subseteq X_{d} and a sample SS, we denote the probability mass of [a,b][a,b] under D\mathcal{D} as D[a,b]\mathcal{D}[a,b], and the number of sample points in this interval as #S[a,b]\#_{S}[a,b]. Standard arguments in learning theory (specifically, Theorem 2.14) state that for a large enough sample (whose size is bigger than the VC dimensions of the intervals class) w.h.p. 1S#S[a,b]D[a,b]\frac{1}{|S|}\#_{S}[a,b]\approx\mathcal{D}[a,b] for every interval [a,b]Xd[a,b]\subseteq X_{d}.

On an input database S(Xd)S\in(X_{d})^{*}, such a sanitizer for THRESHd\operatorname*{\tt THRESH}_{d} outputs an alternative database S^(Xd)\hat{S}\in(X_{d})^{*} s.t. 1S^#S^[0,b]1S#S[0,b]\frac{1}{|\hat{S}|}\#_{\hat{S}}[0,b]\approx\frac{1}{|S|}\#_{S}[0,b] for every interval [0,b]Xd[0,b]\subseteq X_{d}. Hence, for every interval [a,b]Xd[a,b]\subseteq X_{d} we have that

So, in order to divide the ithi^{\text{t}h} axis we apply the above mentioned sanitizer, and divide the axis using the returned sanitized database. In order to accumulate error of up to α/n\alpha/n on each axis (as required by Kearns’ algorithm), we need to execute the above mentioned sanitizer with an approximation parameter of (roughly) α/n\alpha/n. Every such execution requires, therefore, a sample of O~α,β,ϵ,δ(n2.58log(d))\widetilde{O}_{\alpha,\beta,\epsilon,\delta}\left(n^{2.5}\cdot 8^{\log^{*}(d)}\right) elements. As there are nn such executions (one for each axis), using Theorem 2.5 (composition theorem), the described learner is of sample complexity O~α,β,ϵ,δ(n38log(d))\widetilde{O}_{\alpha,\beta,\epsilon,\delta}\left(n^{3}\cdot 8^{\log^{*}(d)}\right).

There exists an efficient (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-PPAC proper-learner for RECTANGLEdn\operatorname*{\tt RECTANGLE}_{d}^{n}, where

This should be contrasted with θα,β(n)\theta_{\alpha,\beta}(n), which is the non-private sample complexity for this class (as the VC\operatorname{\rm VC}-dimension of RECTANGLEdn\operatorname*{\tt RECTANGLE}_{d}^{n} is 2n2n), and with θα,β,ϵ(nd)\theta_{\alpha,\beta,\epsilon}(nd) which is the pure-private sample complexity for this class.The general construction of Kasiviswanathan et al. yields an (inefficient) pure-private proper-learner for this class with sample complexity Oα,β,ϵ(nd)O_{\alpha,\beta,\epsilon}(nd). Feldman and Xiao showed that this is in fact optimal, and every ϵ\epsilon-private (proper or improper) learner for this class must have sample complexity Ω(nd)\Omega(nd).

Sanitization with Approximate Privacy

In this section we present (ϵ,δ)(\epsilon,\delta)-private sanitizers for several concept classes, and separate the database size necessary for (ϵ,0)(\epsilon,0)-private sanitizers from the database size sufficient for (ϵ,δ)(\epsilon,\delta)-private sanitizers.

Recall that in our private PAC learner for POINTd\operatorname{\tt POINT}_{d}, given a typical labeled sample, there exists a unique concept in the class that stands out (we used algorithm Adist\mathcal{A}_{\rm dist} to identify it). This is not the case in the context of sanitization, as a given database SS can have many α\alpha-close sanitized databases S^\hat{S}. We will overcome this issue by using the following private tool for approximating a restricted class of choosing problems.

A function q:X×FNq:X^{*}\times\mathcal{F}\rightarrow\N defines an optimization problem over the domain XX and solution set F\mathcal{F}: Given a dataset SS over domain XX choose fFf\in\mathcal{F} that (approximately) maximizes q(S,f)q(S,f). We are interested in a subset of these optimization problems, which we call bounded-growth choice problems. In this section we consider a database SXS\subseteq X^{*} as a multiset.

Given qq and SS define optq(S)=maxfF{q(S,f)}\mathop{\rm{opt}}\nolimits_{q}(S)=\max_{f\in\mathcal{F}}\{q(S,f)\}. A solution fFf\in\mathcal{F} is called α\alpha-good for a database SS if q(S,f)optq(S)αSq(S,f)\geq\mathop{\rm{opt}}\nolimits_{q}(S)-\alpha|S|.

A quality function q:X×FNq:X^{*}\times\mathcal{F}\rightarrow\N is kk-bounded-growth if:

q(,f)=0q(\emptyset,f)=0 for all fFf\in\mathcal{F}.

If S2=S1{x}S_{2}=S_{1}\cup\{x\}, then (i) q(S1,f)+1q(S2,f)q(S1,f)q(S_{1},f)+1\geq q(S_{2},f)\geq q(S_{1},f) for all fFf\in\mathcal{F}; and (ii) there are at most kk solutions fFf\in\mathcal{F} s.t. q(S2,f)>q(S1,f)q(S_{2},f)>q(S_{1},f).

In words, the second requirement means that (i) Adding an element to the database could either have no effect on the score of a solution ff, or can increase the score by exactly 11; and (ii) There could be at most kk solutions whose scores are increased (by 11). Note that a kk-bounded-growth quality function is, in particular, a sensitivity-1 function as two neighboring S,SS,S^{\prime} must be of the form D{x1}D\cup\{x_{1}\} and D{x2}D\cup\{x_{2}\} respectively. Hence, q(S,f)q(S,f)q(D,f)+1q(D,f)=1q(S,f)-q(S^{\prime},f)\leq q(D,f)+1-q(D,f)=1 for every solution ff.

As an example of a 1-bounded growth quality function, consider the following q:X×XNq:X^{*}\times X\rightarrow\N. Given a database S=(x1,,xm)S=(x_{1},\ldots,x_{m}) containing elements from some domain XX, define q(S,a)=\big{|}\{i:x_{i}=a\}\big{|}. That is, q(S,a)q(S,a) is the number of appearances of aa in SS. Clearly, q(,f)=0q(\emptyset,f)=0 for all fXf\in X. Moreover, adding an element aXa\in X to a database SS increases by 1 the quality of q(S,a)q(S,a), and does not effect the quality of every other bab\neq a.

The choosing mechanism (in Figure 8) is a private algorithm for approximately solving bounded-growth choice problems. Step 1 of the algorithm checks whether a good solutions exist, as otherwise any solution is approximately optimal (and the mechanism returns \bot). Step 2 invokes the exponential mechanism, but with the small set G(S)G(S) instead of F\mathcal{F}.

When qq is a kk-bounded-growth quality function, the choosing mechanism preserves (ϵ,δ)(\epsilon,\delta)-differential privacy for databases of m16αϵln(16kαβϵδ)m\geq\frac{16}{\alpha\epsilon}\ln(\frac{16k}{\alpha\beta\epsilon\delta}) elements.

Let S,SS,S^{\prime} be neighboring databases of mm elements. We need to show that Pr[A(S)R]exp(ϵ)Pr[A(S)R]+δ\Pr[A(S)\in R]\leq\exp(\epsilon)\cdot\Pr[A(S^{\prime})\in R]+\delta for any set of outputs RR. Note first that by the properties of the Laplace Mechanism,

Using m16αϵln(12δ)m\geq\frac{16}{\alpha\epsilon}\ln(\frac{1}{2\delta}), we get that

Let G(S)G(S) and G(S)G(S^{\prime}) be the sets used in step 2 in the execution SS and on SS^{\prime} respectively. We will show that the following two facts hold:

Fact  1:Fact\;1: For every fG(S)G(S)f\in G(S)\setminus G(S^{\prime}), it holds that Pr[A(S)=f]δk\Pr[A(S)=f]\leq\frac{\delta}{k}.

Fact  2:Fact\;2: For every possible output fG(S)G(S)f\notin G(S)\setminus G(S^{\prime}), it holds that Pr[A(S)=f]eϵPr[A(S)=f]\Pr[A(S)=f]\leq e^{\epsilon}\Pr[A(S^{\prime})=f].

For proving Fact 1, let fG(S)G(S)f\in G(S)\setminus G(S^{\prime}). That is, q(S,f)1q(S,f)\geq 1 and q(S,f)=0q(S^{\prime},f)=0. As qq is (in particular) a sensitivity-1 function, it must be, therefore, that q(S,f)=1q(S,f)=1. As there exists f^S\hat{f}\in S with q(S,f^)αm4q(S,\hat{f})\geq\frac{\alpha m}{4}, we have that

which is at most δk\frac{\delta}{k} for m16αϵ(ϵ4+ln(kδ))m\geq\frac{16}{\alpha\epsilon}(\frac{\epsilon}{4}+\ln(\frac{k}{\delta})).

For proving Fact 2, let fG(S)G(S)f\notin G(S)\setminus G(S^{\prime}) be a possible output of A(S)A(S). If f(G(S){})f\notin(G(S)\cup\{\bot\}) then trivially Pr[A(S)=f]=0eϵPr[A(S)=f]\Pr[A(S)=f]=0\leq e^{\epsilon}\Pr[A(S^{\prime})=f]. We have already established (in Inequality (3)) that for f=f=\bot it holds that Pr[A(S)=]eϵ/4Pr[A(S)=]\Pr[A(S)=\bot]\leq e^{\epsilon/4}\Pr[A(S^{\prime})=\bot]. It remains, hence, to deal with the case where fG(S)G(S)f\in G(S)\cap G(S^{\prime}). For this case, we use the following Fact 3, proved below.

Fact  3:Fact\;3: hG(S)exp(ϵ4q(S,h))eϵ/2hG(S)exp(ϵ4q(S,h))\sum\limits_{h\in G(S^{\prime})}\exp(\frac{\epsilon}{4}q(S^{\prime},h))\leq e^{\epsilon/2}\cdot\sum\limits_{h\in G(S)}\exp(\frac{\epsilon}{4}q(S,h)).

Using Fact 3, for every possible output fG(S)G(S)f\in G(S)\cap G(S^{\prime}) we have that

We now prove Fact 3. Denote XhG(S)exp(ϵ4q(S,h))\mathcal{X}\triangleq\sum\limits_{h\in G(S)}\exp(\frac{\epsilon}{4}q(S,h)). We first show that

That is, we need to show that Xkeϵ/41\mathcal{X}\geq\frac{k}{e^{\epsilon/4}-1}. As 1+ϵ4eϵ/41+\frac{\epsilon}{4}\leq e^{\epsilon/4}, it suffices to show that X4kϵ\mathcal{X}\geq\frac{4k}{\epsilon}. Recall that there exists a solution f^\hat{f} s.t. q(S,f^)αm4q(S,\hat{f})\geq\frac{\alpha m}{4}. Therefore, Xexp(ϵ4αm4)\mathcal{X}\geq\exp(\frac{\epsilon}{4}\frac{\alpha m}{4}), which is at least 4kϵ\frac{4k}{\epsilon} for m16αϵln(4kϵ)m\geq\frac{16}{\alpha\epsilon}\ln(\frac{4k}{\epsilon}). This proves (4).

Now, recall that as qq is kk-growth-bounded, for every hFh\in\mathcal{F} it holds that q(S,h)q(S,h)1|q(S,h)-q(S^{\prime},h)|\leq 1. Moreover, G(S)G(S)k|G(S^{\prime})\setminus G(S)|\leq k, and every h(G(S)G(S))h\in(G(S^{\prime})\setminus G(S)) obeys q(S,h)=1q(S^{\prime},h)=1. Hence,

This concludes the proof of Fact 3, and completes the proof of the lemma. ∎

The utility analysis for the choosing mechanism is rather straight forward:

When qq is a kk-bounded-growth quality function, given a database SS of m16αϵln(16kαβϵδ)m\geq\frac{16}{\alpha\epsilon}\ln(\frac{16k}{\alpha\beta\epsilon\delta}) elements, the choosing mechanism outputs an α\alpha-good solution for SS with probability at least 1β1-\beta.

Note that if q(S,f)<αmq(S,f)<\alpha m for every solution ff, then every solution is an α\alpha-good solution, and the mechanism cannot fail. Assume, therefore, that there exists a solution ff s.t. q(f,S)αmq(f,S)\geq\alpha m, and recall that the mechanism defines best(S){\rm best}(S) as maxfF{q(f,S)}+Lap(4ϵ)\max_{f\in\mathcal{F}}\left\{q(f,S)\right\}+\mathop{\rm{Lap}}\nolimits(\frac{4}{\epsilon}). Now consider the following two good events:

The exponential mechanism chooses a solution ff s.t. q(S,f)opt(S)αmq(S,f)\geq\mathop{\rm{opt}}\nolimits(S)-\alpha m.

If E2E_{2} occurs then the mechanism outputs an α\alpha-good solution. Note that the event E2E_{2} is contained inside the event E1E_{1}, and, therefore, Pr[E2]=Pr[E1E2]=Pr[E1]Pr[E2E1]\Pr[E_{2}]=\Pr[E_{1}\wedge E_{2}]=\Pr[E_{1}]\cdot\Pr[E_{2}|E_{1}]. By the properties of the Laplace Mechanism, Pr[E1](112exp(ϵ4αm2))\Pr[E_{1}]\geq\left(1-\frac{1}{2}\exp(-\frac{\epsilon}{4}\frac{\alpha m}{2})\right), which is at least (1β2)(1-\frac{\beta}{2}) for m8αϵln(1β)m\geq\frac{8}{\alpha\epsilon}\ln(\frac{1}{\beta}).

By the growth-boundedness of qq, and as SS is of size mm, there are at most kmkm possible solutions ff with q(f,S)>0q(f,S)>0. That is, G(S)km|G(S)|\leq km. By the properties of the Exponential Mechanism, we have that Pr[E2E1](1kmexp(αϵm4))\Pr[E_{2}|E_{1}]\geq\left(1-km\cdot\exp(-\frac{\alpha\epsilon m}{4})\right), which is at least (1β2)(1-\frac{\beta}{2}) for m8αϵln(16kαβϵ)m\geq\frac{8}{\alpha\epsilon}\ln(\frac{16k}{\alpha\beta\epsilon}). For our choice of mm we have, therefore, that Pr[E2](1β2)(1β2)(1β)\Pr[E_{2}]\geq(1-\frac{\beta}{2})(1-\frac{\beta}{2})\geq(1-\beta).

All in all, for m16αϵln(16kαβϵδ)m\geq\frac{16}{\alpha\epsilon}\ln(\frac{16k}{\alpha\beta\epsilon\delta}) we get that with probability at least (1β)(1-\beta) it outputs an α\alpha-good solution for its input database. ∎

Beimel et al. showed that every pure ϵ\epsilon-private sanitizer for POINTd\operatorname{\tt POINT}_{d}, must operate on databases of Ω(d)\Omega(d) elements. In this section we present an (ϵ,δ)(\epsilon,\delta)-private sanitizer for POINTd\operatorname{\tt POINT}_{d} with sample complexity Oα,β,ϵ,δ(1)O_{\alpha,\beta,\epsilon,\delta}(1). This separates the database size necessary for (ϵ,0)(\epsilon,0)-private sanitizers from the database size sufficient for (ϵ,δ)(\epsilon,\delta)-private sanitizers.

Let S=(x1,x2,,xm)XdmS=(x_{1},x_{2},\ldots,x_{m})\in X_{d}^{m} be a database of dd-bit strings. For every cjPOINTdc_{j}\in\operatorname{\tt POINT}_{d}, the query Qcj:XdQ_{c_{j}}:X_{d}^{*}\rightarrow is defined to be the fraction of the strings in the database that equal jj

Our sanitizing algorithm invokes the Choosing Mechanism to choose points xXdx\in X_{d}. Consider the following q:Xd×XdNq:X_{d}^{*}\times X_{d}\rightarrow\N. Given a database SXdmS\in X_{d}^{m} and a point xXdx\in X_{d}, define q(S,x)q(S,x) to be the number of appearances of xx in SS. By Example 4.3, qq defines a 1-bounded-growth choosing problem. Moreover, given a subset RXdR\subseteq X_{d} consider the restriction of qq to the subset RR defined as qR(S,x)=q(S,x)q_{R}(S,x)=q(S,x) for xRx\in R and zero otherwise. The function qRq_{R} is a 1-bounded-growth quality function. Our sanitizer SanPointsSanPoints appears in Figure 9.

Fix α,β,ϵ,δ\alpha,\beta,\epsilon,\delta. For mO(1α1.5ϵln(1δ)ln(1αβϵδ))m\geq O\left(\frac{1}{\alpha^{1.5}\epsilon}\sqrt{\ln(\frac{1}{\delta})}\ln(\frac{1}{\alpha\beta\epsilon\delta})\right), algorithm SanPointsSanPoints is an efficient (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-improper-sanitizer for POINTd\operatorname{\tt POINT}_{d}.

We start with the utility analysis. Fix a database S=(x1,,xm)S=(x_{1},\ldots,x_{m}), and consider the execution of algorithm SanPointsSanPoints on SS. Denote the element chosen by the Choosing Mechanism on the ithi^{\rm th} iteration of step 2 by bib_{i}, and denote the set of all such elements as B={b1,,b2/α}{}B=\{b_{1},\ldots,b_{2/\alpha}\}\setminus\{\bot\}. Moreover, let RiR_{i} denote the set RR as is was at the beginning of the ithi^{\text{th}} iteration. Consider the following two bad events:

bB\exists b\in B s.t. Qcb(S)Est(b)>α|Q_{c_{b}}(S)-\operatorname{\rm Est}(b)|>\alpha.

aB\exists a\notin B s.t. Qca(S)>αQ_{c_{a}}(S)>\alpha.

If none of these two events happen, then algorithm SanPointsSanPoints succeeds in outputting an estimation Est\operatorname{\rm Est} s.t. \forall c_{j}\in\operatorname{\tt POINT}_{d}\;\;\big{|}Q_{c_{j}}(S)-\operatorname{\rm Est}(j)\big{|}\leq\alpha. We now bound the probability of both events.

We now bound Pr[E2]\Pr[E_{2}]. By the properties of the Choosing Mechanism (Lemma 4.5), with probability at least (1αβ4)(1-\frac{\alpha\beta}{4}), an execution of the Choosing Mechanism on step 2a returns an α2\frac{\alpha}{2}-good solution bib_{i} s.t.

Using the union bound on the number of iterations, we get that with probability at least (1β2)(1-\frac{\beta}{2}), Inequality (5) holds for every iteration 1iα21\leq i\leq\frac{\alpha}{2}. We will now see that in such a case, event E2E_{2} does not occur. Assume to the contrary that there exists an aBa\notin B s.t. Qca(S)>αQ_{c_{a}}(S)>\alpha. Therefore, for every iteration ii it holds that maxxXd{qRi(S,x)}>αm\max_{x\in X_{d}}\{q_{R_{i}}(S,x)\}>\alpha m and thus qRi(S,bi)>α2mq_{R_{i}}(S,b_{i})>\frac{\alpha}{2}m. This means that there exist (at least) 2α\frac{2}{\alpha} different points biXdb_{i}\in X_{d} that appear in SS more than α2m\frac{\alpha}{2}m times, which contradicts the fact that the size of SS is mm.

All in all, Pr[E2]β2\Pr[E_{2}]\leq\frac{\beta}{2}, and the probability of algorithm SanPointsSanPoints failing to output an estimation Est\operatorname{\rm Est} s.t. \forall c_{j}\in\operatorname{\tt POINT}_{d}\;\;\big{|}Q_{c_{j}}(S)-\operatorname{\rm Est}(j)\big{|}\leq\alpha is at most β\beta.

The above algorithm SanPointsSanPoints can also be used as a sanitizer for the concept class kPOINTd\operatorname{\tt k-POINT}_{d}, defined as follows. For every AXdA\subseteq X_{d} s.t. A=k|A|=k, the concept class kPOINTd\operatorname{\tt k-POINT}_{d} contains the concept cA:Xd{0,1}c_{A}:X_{d}\rightarrow\{0,1\}, defined as cA(x)=1c_{A}(x)=1 if xAx\in A and cA(x)=0c_{A}(x)=0 otherwise.

Let S=(x1,x2,,xm)XdmS=(x_{1},x_{2},\ldots,x_{m})\in X_{d}^{m} be a database. For every cIkPOINTdc_{I}\in\operatorname{\tt k-POINT}_{d}, the query QcI:XdQ_{c_{I}}:X_{d}^{*}\rightarrow is defined as

Fix k,α,β,ϵ,δk,\alpha,\beta,\epsilon,\delta. For mO(k1.5α1.5ϵln(1δ)ln(kαβϵδ))m\geq O\left(\frac{k^{1.5}}{\alpha^{1.5}\epsilon}\sqrt{\ln(\frac{1}{\delta})}\ln(\frac{k}{\alpha\beta\epsilon\delta})\right), the above algorithm is an efficient (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-improper-sanitizer for kPOINTd\operatorname{\tt k-POINT}_{d}.

The privacy of the above algorithm is immediate. Fix a database S=(x1,x2,,xm)XdmS=(x_{1},x_{2},\ldots,x_{m})\in X_{d}^{m}. By Theorem 4.6, with probability at least (1β)(1-\beta), the estimation Est\operatorname{\rm Est} on step 1 is s.t. \forall j\in X_{d}\;\;\big{|}\frac{1}{m}\sum_{i=1}^{m}{\mathds{1}_{\{x_{i}=j\}}}-\operatorname{\rm Est}(j)\big{|}\leq\frac{\alpha}{k}. Now fix a set IXdI\subseteq X_{d} of cardinality kk. As QcI(S)=1m{i  :  xiI}Q_{c_{I}}(S)=\frac{1}{m}|\{i\;:\;x_{i}\in I\}|, we have that QcI(S)iIEst(i)kαk=α|Q_{c_{I}}(S)-\sum_{i\in I}{\operatorname{\rm Est}(i)}|\leq k\frac{\alpha}{k}=\alpha. ∎

Recall that THRESHd={c0,,c2d}\operatorname*{\tt THRESH}_{d}=\{c_{0},\ldots,c_{2^{d}}\}, where cj(x)=1c_{j}(x)=1 if and only if x<jx<j. Let S=(x1,,xm)XdmS=(x_{1},\ldots,x_{m})\in X_{d}^{m} be a database. For every cjTHRESHdc_{j}\in\operatorname*{\tt THRESH}_{d}, the query Qcj:XdQ_{c_{j}}:X_{d}^{*}\rightarrow is defined as

As THRESHd=2d+1|\operatorname*{\tt THRESH}_{d}|=2^{d}+1, one can use the generic construction of Blum et al. , and get an ϵ\epsilon-private sanitizer for this class with sample complexity O(d)O(d). By , this is the best possible when guaranteeing pure privacy (ignoring the dependency on α,β\alpha,\beta and ϵ\epsilon). We next present a recursive sanitizer for THRESHd\operatorname*{\tt THRESH}_{d}, guaranteeing approximated privacy and exhibiting sample complexity O~α,β,ϵ,δ(8log(d))\widetilde{O}_{\alpha,\beta,\epsilon,\delta}(8^{\log^{*}(d)}).

The algorithm maintains its sanitized database S^\hat{S} as a global variable, which is initialized as the empty set. In addition, for the privacy analysis, we would need a bound on the number of recursive calls. It will be convenient to maintain another global variable, callscalls, initialized at the desired bound and decreased in every recursive call.

Every iteration of algorithm SanThresholdsSanThresholds can access its input database at most twice using the laplacian mechanism (on steps 2,11), at most once using the Choosing Mechanism (on step 9 or on step 10b), and at most once using algorithm RecConcaveRecConcave (on step 8). By the properties of the laplacian mechanism, every interaction with it preserves (ϵ,0)(\epsilon,0)-differential privacy. Note that the quality function with which we call the Choosing Mechanism is at most 2-growth-bounded. Therefore, as m1024αϵln(2048αβϵδ)m\geq\frac{1024}{\alpha\epsilon}\ln(\frac{2048}{\alpha\beta\epsilon\delta}), every such interaction with the Choosing Mechanism preserves (ϵ,δ)(\epsilon,\delta)-differential privacy. Last, for our choice of ϵ^,δ^\hat{\epsilon},\hat{\delta}, every interaction with algorithm RecConcaveRecConcave preserves (ϵ,δ)(\epsilon,\delta)-differential privacy.

We start the utility analysis of SanThresholdsSanThresholds with the following simple claim.

The function Q(S,)Q(S,\cdot), defined on step 6, is quasi-concave.

First note that the function I(S,)I(S,\cdot) defined on step 5 is non-decreasing. Now, let uvwu\leq v\leq w be s.t. Q(S,u),Q(S,w)xQ(S,u),Q(S,w)\geq x. That is,

Using the fact that I(S,)I(S,\cdot) is non-decreasing, we have that I(S,u)I(S,v)I(S,u)\leq I(S,v) and that I(S,v1)I(S,w1)I(S,v-1)\leq I(S,w-1). Therefore

Note that every iteration of algorithm SanThresholdsSanThresholds draws at most 2 random samples (on steps 2 and 11) from Lap(1ϵ)\mathop{\rm{Lap}}\nolimits(\frac{1}{\epsilon}). We now proceed with the utility analysis by identifying 3 good events that occur with high probability (over the coin tosses of the algorithm).

Fix α,β,ϵ,δ\alpha,\beta,\epsilon,\delta. Let SanThresholdsSanThresholds be executed with calls\operatorname{\rm calls} initialized to c77αc\geq\frac{77}{\alpha}, and on a database SS of m\geq 8^{\log^{*}(d)}\cdot\frac{60c}{\alpha\epsilon}\log^{*}(d)\log\big{(}\frac{12\log^{*}(d)}{\beta\epsilon\delta}\big{)} elements. With probability at least (13cβ)(1-3c\beta) the following 3 events happen:

In every random draw of Lap(1ϵ)\mathop{\rm{Lap}}\nolimits(\frac{1}{\epsilon}) throughout the execution of SanThresholdsSanThresholds it holds that Lap(1ϵ)αm16c|\mathop{\rm{Lap}}\nolimits(\frac{1}{\epsilon})|\leq\frac{\alpha m}{16c}.

Every interaction with algorithm RecConcaveRecConcave on step 8 succeeds in returning a value zz s.t. Q(S,z)3αm128Q(S,z)\geq\frac{3\alpha m}{128}.

Every iteration that halts after step 13, defines an interval [a,b][a,b] s.t. #S[a,b]5αm128\#_{S}[a,b]\geq\frac{5\alpha m}{128}.

First note that it suffices to lower bound the terms Pr[B1],  Pr[B2B1]\Pr[B_{1}],\;\Pr[B_{2}|B_{1}], and Pr[B3B1B2]\Pr[B_{3}|B_{1}\wedge B_{2}], as by the chain rule of conditional probability it holds that

We now bound each of those terms, starting with Pr[B1]\Pr[B_{1}]. In every single draw, the probability of Lap(1ϵ)>αm16c|\mathop{\rm{Lap}}\nolimits(\frac{1}{\epsilon})|>\frac{\alpha m}{16c} is at most exp(αϵm16c)\exp(\frac{-\alpha\epsilon m}{16c}), which is at most β2\frac{\beta}{2} for m16cαϵln(2β)m\geq\frac{16c}{\alpha\epsilon}\ln(\frac{2}{\beta}). As cc (the initial value of calls\operatorname{\rm calls}) limits the number of iteration, we get that Pr[B1](1cβ)\Pr[B_{1}]\geq(1-c\beta).

So, given that event (B1B2)(B_{1}\wedge B_{2}) has occurred, in every iteration that halts after step 13, the probability of defining [a,b][a,b] s.t. #S[a,b]<5αm128\#_{S}[a,b]<\frac{5\alpha m}{128} is at most β\beta. As there are at most cc iterations, we see that Pr[B3B1B2](1cβ)\Pr[B_{3}|B_{1}\wedge B_{2}]\geq(1-c\beta).

All in all, for cβ3c\beta\leq 3 we get that

Every iteration of algorithm SanThresholdsSanThresholds that does not halt on step 1 defines an interval [a,b][a,b] (on exactly one of the steps 3,9,10b). This interval [a,b][a,b] is not part of any range that is given as input to any future recursive call. Moreover, if none of the recursive calls throughout the execution of SanThresholdsSanThresholds halts on step 1, these [a,b][a,b] intervals form a partition of the initial range. We now proceed with the utility analysis by identifying yet another 3 good events (at a somewhat higher level) that occur whenever (B1B2B3)(B_{1}\wedge B_{2}\wedge B_{3}) occur.

Fix α,β,ϵ,δ\alpha,\beta,\epsilon,\delta. Let SanThresholdsSanThresholds be executed with calls\operatorname{\rm calls} initialized to c77αc\geq\frac{77}{\alpha}, and on a database SS of m\geq 8^{\log^{*}(d)}\cdot\frac{60c}{\alpha\epsilon}\log^{*}(d)\log\big{(}\frac{12\log^{*}(d)}{\beta\epsilon\delta}\big{)} elements. With probability at least (13cβ)(1-3c\beta) the following 3 events happen:

There are at most 77α\frac{77}{\alpha} recursive calls, none of them halts on the first step.

Every iteration defines [a,b][a,b] s.t. #S[a,b1]αm2\#_{S}[a,b-1]\leq\frac{\alpha m}{2}. That is, every iteration defines [a,b][a,b] s.t. the interval [a,b1][a,b-1] contains at most αm2\frac{\alpha m}{2} points in SS.

In every iteration #S[a,b]#^[a,b]αm4α77\left|\#_{S}[a,b]-\hat{\#}[a,b]\right|\leq\frac{\alpha m}{4}\frac{\alpha}{77}.

Consider again events B1,B2,B3B_{1},B_{2},B_{3} defined in Claim 4.10. We will show that the event (E1E2E3)(E_{1}\wedge E_{2}\wedge E_{3}) is implied by (B1B2B3)(B_{1}\wedge B_{2}\wedge B_{3}) (which happens with probability at least (13cβ)(1-3c\beta) by Claim 4.10). We, therefore, continue the proof assuming that (B1B2B3)(B_{1}\wedge B_{2}\wedge B_{3}) has occurred.

We begin by showing that event E1E_{1} occurs. Denote the number of iterations that halts on steps 1-3 as y1y_{1}, and the number of complete iterations (i.e., that halts after step 13) as y2y_{2}. Clearly, y12y2y_{1}\leq 2y_{2}. Now, as event B3B_{3} has occurred, we have that every iteration that halts after step 13 defines an interval [a,b][a,b] s.t. #S[a,b]5αm128\#_{S}[a,b]\geq\frac{5\alpha m}{128}. This interval does not intersect any range given as input to future calls, and, therefore, y21285αy_{2}\leq\frac{128}{5\alpha}. The total number of iterations is, therefore, bounded by 3y23845α<77α3y_{2}\leq\frac{384}{5\alpha}<\frac{77}{\alpha}. Thus, whenever calls\operatorname{\rm calls} is initialized to at least 77/α77/\alpha, there are at most 77α\frac{77}{\alpha} iterations, none of them halts on step 1. That is, E1E_{1} occurs.

We next show that E3E_{3} occurs. As we have seen, event B3B_{3} ensures that no iteration halts on step 1. Therefore every iteration defines #^[a,b]\hat{\#}[a,b] by adding a random draw of Lap(1ϵ)\mathop{\rm{Lap}}\nolimits(\frac{1}{\epsilon}) to #S[a,b]\#_{S}[a,b]. As event B1B_{1} has occurred, it holds that #S[a,b]#^[a,b]αm16cαm4α77\left|\#_{S}[a,b]-\hat{\#}[a,b]\right|\leq\frac{\alpha m}{16c}\leq\frac{\alpha m}{4}\frac{\alpha}{77}. So, E3E_{3} occurs.

Consider an iteration of algorithm SanThresholdsSanThresholds that defines [a,b][a,b] on step 9. In that iteration, [a,b][a,b] is defined as [a,a][a,a]. Trivially, the empty interval [a,b1]=[a,a1][a,b-1]=[a,a-1] contains at most αm2\frac{\alpha m}{2} points.

Consider an iteration of algorithm SanThresholdsSanThresholds that defines [a,b][a,b] on step 10b (of length at most 22z2\cdot 2^{z}). As event B2B_{2} has occurred, zz is s.t. Q(S,z)3αm128Q(S,z)\geq\frac{3\alpha m}{128}. In particular L(S,z1)9αm128L(S,z-1)\leq\frac{9\alpha m}{128}, and every interval of length 122z\frac{1}{2}2^{z} contains at most 9αm128\frac{9\alpha m}{128} points in SS. Therefore #S[a,b1]49αm128αm2\#_{S}[a,b-1]\leq 4\frac{9\alpha m}{128}\leq\frac{\alpha m}{2}. Note that we needed zz to be at least 1 (ensured by the If condition on step 10), as otherwise the constraint on intervals of length 122z\frac{1}{2}2^{z} has no meaning.

At any case, we have that E2E_{2} must occur.

We will now complete the utility analysis by showing that the input database SS and the sanitized database S^\hat{S} (at the end of SanThresholdsSanThresholds’ execution) are α\alpha-close whenever (E1E2E3)(E_{1}\wedge E_{2}\wedge E_{3}) occurs.

Fix α,β,ϵ,δ\alpha,\beta,\epsilon,\delta. Let SanThresholdsSanThresholds be executed on the range XdX_{d}, a global variable calls\operatorname{\rm calls} initialize to c77αc\geq\frac{77}{\alpha}, and on a database SS of m\geq 8^{\log^{*}(d)}\cdot\frac{60c}{\alpha\epsilon}\log^{*}(d)\log\big{(}\frac{12\log^{*}(d)}{\beta\epsilon\delta}\big{)} elements. With probability at least (13cβ)(1-3c\beta), the sanitized database S^\hat{S} at the end of the execution is s.t. Qcj(S)Qcj(S^)α|Q_{c_{j}}(S)-Q_{c_{j}}(\hat{S})|\leq\alpha for every cjTHRESHdc_{j}\in\operatorname*{\tt THRESH}_{d}.

Denote S=(x1,,xm)S=(x_{1},\ldots,x_{m}), and S^=(x1^,,xn^)\hat{S}=(\hat{x_{1}},\ldots,\hat{x_{n}}). Note that S=m|S|=m and that S^=n|\hat{S}|=n. By Claim 4.11, the event E1E2E3E_{1}\cap E_{2}\cap E_{3} occurs with probability at least (13cβ)(1-3c\beta). We will show that in such a case, the sanitized database S^\hat{S} is s.t. Qcj(S)Qcj(S^)α|Q_{c_{j}}(S)-Q_{c_{j}}(\hat{S})|\leq\alpha for every cjTHRESHdc_{j}\in\operatorname*{\tt THRESH}_{d}.

As event E1E_{1} has occurred, the intervals [a,b][a,b] defined throughout the execution of SanThresholdsSanThresholds defines a partition of the domain XdX_{d}. Denote those intervals as [a1,b1],  [a2,b2],  ,  [aw,bw][a_{1},b_{1}],\;[a_{2},b_{2}],\;\ldots,\;[a_{w},b_{w}], where a1=0,  bw=2d1a_{1}=0,\;b_{w}=2^{d}-1, and ai+1=bi+1a_{i+1}=b_{i}+1. Now fix some cjTHRESHdc_{j}\in\operatorname*{\tt THRESH}_{d}, and let tt be s.t. j[at,bt]j\in[a_{t},b_{t}]. We have that

As event E1E_{1} has occurred, t77αt\leq\frac{77}{\alpha}, and

Similar arguments show that Qcj(S)3α4+1m#S^[0,j1]Q_{c_{j}}(S)\geq-\frac{3\alpha}{4}+\frac{1}{m}\#_{\hat{S}}[0,j-1], and so Qcj(S)1m#S^[0,j1]3α4\left|Q_{c_{j}}(S)-\frac{1}{m}\#_{\hat{S}}[0,j-1]\right|\leq\frac{3\alpha}{4}.

Recall that the sanitized database S^\hat{S} is of size nn, and that Qcj(S^)=1n#S^[0,j1]Q_{c_{j}}(\hat{S})=\frac{1}{n}\#_{\hat{S}}[0,j-1]. As event (E1E3)(E_{1}\cap E_{3}) has occurred, we have that nm+αm4=(1+α4)mn\leq m+\frac{\alpha m}{4}=(1+\frac{\alpha}{4})m. Therefore,

Similar arguments show that 1m#S^[0,j1]1n#S^[0,j1]α4\left|\frac{1}{m}\#_{\hat{S}}[0,j-1]-\frac{1}{n}\#_{\hat{S}}[0,j-1]\right|\leq\frac{\alpha}{4}. By the triangle inequality we have therefore that Qcj(S)Qcj(S^)3α4+α4=α\left|Q_{c_{j}}(S)-Q_{c_{j}}(\hat{S})\right|\leq\frac{3\alpha}{4}+\frac{\alpha}{4}=\alpha. ∎

The following theorem is an immediate consequence of Lemma 4.12 and Lemma 4.8.

Fix α,β,ϵ,δ\alpha,\beta,\epsilon,\delta. There exists an efficient (α,β,ϵ,δ,m)(\alpha,\beta,\epsilon,\delta,m)-sanitizer for THRESHd\operatorname*{\tt THRESH}_{d}, where

4 Sanitization with Pure Privacy

Here we give a general lower bound on the database size of pure private sanitizers. Beimel et al. showed that every pure ϵ\epsilon-private sanitizer for POINTd\operatorname{\tt POINT}_{d} must operate on databases of Ω(d)\Omega(d) elements. With slight modifications, their proof technique can yield a much more general result.

Given a concept class CC over a domain XX, we denote the effective size of XX w.r.t. CC as

That is, XCX_{C} is the cardinality of the biggest subset X~X\widetilde{X}\subseteq X s.t. every two different elements of X~\widetilde{X} are labeled differently by at least one concept in CC.

Let CC be a concept class over a domain XX. For every (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for CC (proper or improper) it holds that m=Ω(1ϵα(logXC+log(1/β)))m=\Omega\left(\frac{1}{\epsilon\alpha}(\log X_{C}+\log(1/\beta))\right).

Let X~X\widetilde{X}\subseteq X be s.t. X~=XC|\widetilde{X}|=X_{C} and every two different elements of X~\widetilde{X} are labeled differently by at least one concept in CC. Fix some x1X~x_{1}\in\widetilde{X}, and for every xiX~x_{i}\in\widetilde{X}, construct a database SiX~mS_{i}\in\widetilde{X}^{m} by setting (13α)m(1-3\alpha)m entries as x1x_{1} and the remaining 3αm3\alpha m entries as xix_{i} (for i=1i=1 all entries of S1S_{1} are x1x_{1}). Note that for all iji\neq j, databases SiS_{i} and SjS_{j} differ on 3αm3\alpha m entries.

Let A\mathcal{A} be an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for CC. Without loss of generality, we can assume that A\mathcal{A} is a proper sanitizer (otherwise, we could transform it into a proper one by replacing α\alpha with 2α2\alpha). See Remark 2.18.

Solving for mm, we get that m=Ω(1ϵα(logXC+log(1/β)))m=\Omega(\frac{1}{\epsilon\alpha}(\log X_{C}+\log(1/\beta))). ∎

Lemma 4.15, together with a lower bound from , yields the following result:

Let CC be a concept class over a domain XX. If A\mathcal{A} is an (18,18,12,m)(\frac{1}{8},\frac{1}{8},\frac{1}{2},m)-sanitizer for CC, then m=Ω(log(XC)+VC(C))m=\Omega(\log(X_{C})+\operatorname{\rm VC}(C)).

Immediate from Lemma 4.15 and Theorem 2.20. ∎

The above lower bound is the best possible general lower bound in terms of XCX_{C} and VC(C)\operatorname{\rm VC}(C) (up to a factor of logVC(C)\log\operatorname{\rm VC}(C)). To see this, let n<dn<d, and consider a concept class over XdX_{d} containing the following two kinds of concepts. The first kind are 2n2^{n} concepts shattering the left nn points of XdX_{d} (and zero everywhere else). The second kind are (2dn)(2^{d}-n) “point concepts” over the right (2dn)(2^{d}-n) points of XdX_{d} (and zero on the first nn). Formally, for every j=(j0,j1,,jn1){0,1}nj=(j_{0},j_{1},\ldots,j_{n-1})\in\{0,1\}^{n}, let cj:Xd{0,1}c_{j}:X_{d}\rightarrow\{0,1\} be defines as cj(x)=jxc_{j}(x)=j_{x} if x<nx<n and cj(x)=0c_{j}(x)=0 otherwise. Define the concept class CL={cj}jXnC_{L}=\{c_{j}\}_{j\in X_{n}}. For every nj<2dn\leq j<2^{d}, define fj:Xd{0,1}f_{j}:X_{d}\rightarrow\{0,1\} as fj(x)=1f_{j}(x)=1 if x=jx=j and fj(x)=0f_{j}(x)=0 otherwise. Define the concept class CR={fj}nj<2dC_{R}=\{f_{j}\}_{n\leq j<2^{d}}. Now define C=CLCRC=C_{L}\bigcup C_{R}.

We can now construct a sanitizer for CC by applying the generic construction of separately for CLC_{L} and for CRC_{R}. Given a database SS, this will result in two sanitized databases S^L,S^R\widehat{S}_{L},\widehat{S}_{R}, with which we can answer all queries in the class CC – a query for cCLc\in C_{L} is answered using S^L\widehat{S}_{L}, and a query for fCRf\in C_{R} is answered using S^R\widehat{S}_{R}. The described (improper) sanitizer for CC is of sample complexity Oα,β,ϵ(log(XC)+VC(C)logVC(C))O_{\alpha,\beta,\epsilon}(\log(X_{C})+\operatorname{\rm VC}(C)\log\operatorname{\rm VC}(C)).

Sanitization and Proper Private PAC

Similar techniques are used for both data sanitization and private learning, suggesting relationships between the two tasks. We now explore one such relationship in proving a lower bound on the sample complexity needed for sanitization (under pure differential privacy). In particular, we show a reduction from the task of private learning to the task of data sanitization, and then use a lower bound on private learners to derive a lower bound on data sanitization. A similar reduction was given by Gupta et al. , where it is stated in terms of statistical queries. They showed that the existence of a sanitizer that accesses the database using at most kk statistical queries, implies the existence of a learner that makes at most 2k2k statistical queries. We complement their proof and add the necessary details in order to show that the existence of an arbitrary sanitizer (that is not restricted to access its data via statistical queries) implies the existence of a private learner.

We will refer to an element of Xd+1X_{d+1} as xy\vec{x}\circ y, where xXd\vec{x}\in X_{d}, and y{0,1}y\in\{0,1\}.

1 Sanitization Implies Proper PPAC

We show that sanitization of a class CC implies private learning of CC. Consider an input labeled sample S=(xi,yi)i=1m(X×{0,1})mS=(x_{i},y_{i})_{i=1}^{m}\in(X\times\{0,1\})^{m}, labeled by some concept cCc\in C. The key observation is that in order to privately output a good hypothesis it is suffices to first produce a sanitization S^\hat{S} of SS (w.r.t. a slightly different concept class ClabelC^{\rm label}, to be defined) and then to output a hypothesis hCh\in C that minimizes the empirical error over the sanitized database S^\hat{S}. To complete the proof we then show that sanitization for CC implies sanitization for ClabelC^{\rm label}.

In order for the chosen hypothesis hh to have small generalization error (rather then just small empirical error), our input database SS must contain at least VC(C)α2log(1αβ)\frac{\operatorname{\rm VC}(C)}{\alpha^{2}}\log(\frac{1}{\alpha\beta}) elements. We therefore start with the following simple (technical) lemma, handling a case where our initial sanitizer operates only on smaller databases.

If there exists an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for a class CC, then for every qNq\in\N s.t. q18βln(1/β)q\geq\frac{18}{\beta}\ln(1/\beta) there exists a ((2α+2β),β,ϵ,qm)((2\alpha+2\beta),\beta,\epsilon,qm)-sanitizer for CC.

Fix qNq\in\N and let AA be an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for a class CC over a domain XX. Note that by Theorem 2.21, there exists a (2α,32β,ϵ,m)(2\alpha,\frac{3}{2}\beta,\epsilon,m)-sanitizer AA^{\prime} s.t. the sanitized databases returned by AA^{\prime} are always of fixed sized n=O(VC(C)α2log(1αβ))n=O(\frac{\operatorname{\rm VC}(C)}{\alpha^{2}}\log(\frac{1}{\alpha\beta})). We now construct a ((2α+2β),β,ϵ,qm)((2\alpha+2\beta),\beta,\epsilon,qm)-sanitizer BB as follows.

As AA^{\prime} is ϵ\epsilon-differentially private, so is BB. Denote S^=(z^1,z^1,,z^qn)(X)qn\hat{S}=(\hat{z}_{1},\hat{z}_{1},\ldots,\hat{z}_{qn})\in(X)^{qn}. Recall that q18βln(1/β)q\geq\frac{18}{\beta}\ln(1/\beta), and, hence, using the Chernoff bound, with probability at least (1β)(1-\beta) it holds that at least (12β)q(1-2\beta)q of the Si^\hat{S_{i}}’s are 2α2\alpha-good for their matching SiS_{i}’s. In such a case S^\hat{S} is (2α+2β)(2\alpha+2\beta)-good for SS: for every fCf\in C it holds that

As at least (12β)q(1-2\beta)q of the Si^\hat{S_{i}}’s are 2α2\alpha-good for their matching SiS_{i}’s, and as trivially Qf(Si)1Q_{f}(S_{i})\leq 1 for each database Si^\hat{S_{i}} that is not 2α2\alpha-good,

Similar arguments show that Qf(S)Qf(S^)(2α+2β)Q_{f}(S)\geq Q_{f}(\hat{S})-(2\alpha+2\beta). Algorithm BB is, therefore, a ((2α+2β),β,ϵ,qm)((2\alpha+2\beta),\beta,\epsilon,qm)-sanitizer for CC, as required. ∎

As mentioned above, our first step in showing that sanitization for a class CC implies private learning for CC is to show that privately learning CC is implied by sanitization for the slightly modified class ClabelC^{\rm label}, defined as follows. For a given predicate cc over XdX_{d}, we define the predicate clabelc^{\rm label} over Xd+1X_{d+1} as

Note that clabel(xσ)=σc(x)c^{\rm label}(\vec{x}\circ\sigma)=\sigma\oplus c(\vec{x}) for σ{0,1}\sigma\in\{0,1\}. For a given class of predicates CC over XdX_{d}, we define Clabel={clabel  :  cC}C^{\rm label}=\{c^{\rm label}\;:\;c\in C\}.

VC(C)VC(Clabel)2VC(C)\operatorname{\rm VC}(C)\leq\operatorname{\rm VC}(C^{\rm label})\leq 2\cdot\operatorname{\rm VC}(C).

For the first inequality notice that if a set SXdS\subseteq X_{d} is shuttered by CC then the set S0S\circ 0 is shuttered by ClabelC^{\rm label}. For the second inequality, assume SXd+1S\subseteq X_{d+1} is shattered by ClabelC^{\rm label}. Consider the partition of SS to S0S_{0} and S1S_{1}, where Sσ={xyS  :  y=σ}S_{\sigma}=\{\vec{x}\circ y\in S\;:\;y=\sigma\}. For at least one σ{0,1}\sigma\in\{0,1\}, it holds that SσS2|S_{\sigma}|\geq\frac{|S|}{2}. Hence, the set S^={x  :  xσSσ}\hat{S}=\{\vec{x}\;:\;\vec{x}\cdot\sigma\in S_{\sigma}\} is shattered by CC and VC(Clabel)2S^2VC(C)\operatorname{\rm VC}(C^{\rm label})\leq 2\cdot|\hat{S}|\leq 2\cdot\operatorname{\rm VC}(C). ∎

The next lemma shows that for every concept class CC, a sanitizer for ClabelC^{\rm label} implies a private learner for CC. In the next lemma, this connection is made under the assumption that the given sanitizer operates on large enough databases. This assumption will be removed in the lemma that follows.

Let CC be a class of predicates over XdX_{d}. If there exists an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer AA for ClabelC^{\rm label}, where m50VC(C)γ2ln(1γβ)m\geq\frac{50\operatorname{\rm VC}(C)}{\gamma^{2}}\ln(\frac{1}{\gamma\beta}) for some γ>0\gamma>0, then there exists a proper ((2α+γ),2β,ϵ,m)((2\alpha+\gamma),2\beta,\epsilon,m)-PPAC learner for CC.

Let AA be an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer, and consider the following algorithm LearnLearn:

As AA is ϵ\epsilon-differentially private, so is LearnLearn. For the utility analysis, fix some target concept ctCc_{t}\in C and a distribution D\mathcal{D} over XdX_{d}, and define the following two good events:

\forall h\in C,\;\;\big{|}{\rm error}_{S}(h)-{\rm error}_{\hat{S}}(h)\big{|}\leq\alpha.

hC,    errorD(h,ct)errorS(h)γ\forall h\in C,\;\;|{\rm error}_{\mathcal{D}}(h,c_{t})-{\rm error}_{S}(h)|\leq\gamma.

We first show that if these 2 good events happen, algorithm LearnLearn returns a (2α+γ)(2\alpha+\gamma)-good hypothesis. As the target concept satisfies errorS(ct)=0{\rm error}_{S}(c_{t})=0, event E1E_{1} ensures the existence of a concept fCf\in C s.t. errorS^(f)α{\rm error}_{\hat{S}}(f)\leq\alpha. Thus, algorithm LearnLearn chooses a hypothesis hCh\in C s.t. errorS^(h)α{\rm error}_{\hat{S}}(h)\leq\alpha. Using event E1E_{1} again, this hh obeys errorS(h)2α{\rm error}_{S}(h)\leq 2\alpha. Therefore, event E2E_{2} ensures that hh satisfies errorD(h,ct)2α+γ{\rm error}_{\mathcal{D}}(h,c_{t})\leq 2\alpha+\gamma.

We will now show that these 2 events happen with high probability. By the definition of ClabelC^{\rm label}, for every clabelClabelc^{\rm label}\in C^{\rm label} we have that

Therefore, as AA is an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for ClabelC^{\rm label}, event E1E_{1} happens with probability at least (1β)(1-\beta). As m50VC(C)γ2ln(1γβ)m\geq\frac{50\operatorname{\rm VC}(C)}{\gamma^{2}}\ln(\frac{1}{\gamma\beta}), Theorem 2.14 ensures that event E2E_{2} happens with probability at least (1β)(1-\beta) as well. All in all, LearnLearn is a proper ((2α+γ),2β,ϵ,m)((2\alpha+\gamma),2\beta,\epsilon,m)-PPAC learner for CC. ∎

The above lemma describes a reduction from the task of privately learning a concept class CC to the sanitization task of the slightly different concept class ClabelC^{\rm label}. We next show that given a sanitizer for a class CC, it is possible to construct a sanitizer for ClabelC^{\rm label}. Along the way we will also slightly increase the sample complexity of the starting sanitizer, in order to be able to use Lemma 5.3. This results in a reduction from the task of privately learning a concept class CC to the sanitization task of the same concept class CC.

If there exists an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for a class CC, then there exists a ((5α+4β),5β,6ϵ,t)((5\alpha+4\beta),5\beta,6\epsilon,t)-sanitizer for ClabelC^{\rm label}, where

Let AA^{\prime} be an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for a class CC. By replacing α\alpha with 2α2\alpha, and β\beta with 2β2\beta, we can assume that the sanitized databases returned by AA^{\prime} are always of fixed size n=O(VC(C)α2log(1αβ))n=O(\frac{\operatorname{\rm VC}(C)}{\alpha^{2}}\log(\frac{1}{\alpha\beta})) (see Theorem 2.21). Moreover, we can assume that AA^{\prime} treats its input database as a multiset (as otherwise we could alter AA to first randomly shuffle its input database). Denote M=m18βln(2αβ)(1+1mϵ)M=m\left\lceil\frac{18}{\beta}\ln(\frac{2}{\alpha\beta})\cdot\left(1+\frac{1}{m\epsilon}\right)\right\rceil. By Lemma 5.1 for every qMqM (where qNq\in\N) there exists a ((4α+4β),2β,ϵ,qM)((4\alpha+4\beta),2\beta,\epsilon,qM) sanitizer AA for CC (as qM=qmqM=q^{\prime}m for an integer qq^{\prime}). Denote t=6α2Mt=\left\lceil\frac{6}{\alpha^{2}}\right\rceil M, and consider algorithm BB presented in Figure 12

Note that the output on Step 8 is just a post-processing of the 4 outputs on Step 7. We first show that each of those 4 outputs preserves differential privacy, and, hence, BB is private (with slightly bigger privacy parameter, see Theorem 2.3).

By the properties of the laplacian mechanism, m0^\hat{m_{0}} and m1^\hat{m_{1}} each preserves ϵ\epsilon-differential privacy. The analysis for S0~\widetilde{S_{0}} and S1~\widetilde{S_{1}} is symmetric, and we next give the analysis for S0~\widetilde{S_{0}}. Denote by B0B_{0} an algorithm identical to the first 7 steps of BB, except that the only output of B0B_{0} on Step 7 is S0~\widetilde{S_{0}}. We now show that B0B_{0} is private.

Fix two neighboring databases D,DD,D^{\prime}, and let FF be a set of possible outputs. Note that as D,DD,D^{\prime} are neighboring, it holds that S0[D]S_{0}[D] and S0[D]S_{0}[D^{\prime}] are identical up to an addition or a change of one entry. Therefore, whenever m0[D]=m0[D]=Lm_{0}[D]=m_{0}[D^{\prime}]=L, we have that S0[D,L]S_{0}[D,L] and S0[D,L]S_{0}[D^{\prime},L] are neighboring databases. Moreover, by the properties of the laplacian mechanism, for every value LL it holds that Pr[m0[D]=L]eϵPr[m0[D]=L]\Pr[m_{0}[D]=L]\leq e^{\epsilon}\Pr[m_{0}[D^{\prime}]=L]. Hence,

Overall (since we use two ϵ\epsilon-private algorithms and two (2ϵ)(2\epsilon)-private algorithms), algorithm BB is (6ϵ)(6\epsilon)-differentially private. As for the utility analysis, fix a database D=(xi,yi)i=1tD=(x_{i},y_{i})_{i=1}^{t} and consider the execution of BB on DD. We now show that w.h.p. the sanitized database D~\widetilde{D} is (5α+4β)(5\alpha+4\beta)-close to DD.

Fix a concept clabelClabelc^{\rm label}\in C^{\rm label}. It holds that

By the properties of algorithm AA, with probability at least (14β)(1-4\beta) we have that S0~\widetilde{S_{0}} and S1~\widetilde{S_{1}} are (4α+4β)(4\alpha+4\beta)-close to S0^\hat{S_{0}} and to S1^\hat{S_{1}} (respectively). We proceed with the analysis assuming that this is the case. Hence,

Note that as clabel(xi0)=c(xi)c^{\rm label}(x_{i}\circ 0)=c(x_{i}) and as clabel(xi1)=1c(xi)c^{\rm label}(x_{i}\circ 1)=1-c(x_{i}), it holds that

Denoting D~=(zi)i=1r(Xd+1)r\widetilde{D}=(z_{i})_{i=1}^{r}\in(X_{d+1})^{r} (where r=n(m0^+m1^)r=n(\hat{m_{0}}+\hat{m_{1}})), we get

Similar arguments show that Qclabel(D)Qclabel(D~)(5α+4β)Q_{c^{\rm label}}(D)\geq Q_{c^{\rm label}}(\widetilde{D})-(5\alpha+4\beta). Algorithm BB is, therefore, a (5α+4β),5β,6ϵ,t)(5\alpha+4\beta),5\beta,6\epsilon,t)-sanitizer for ClabelC^{\rm label}, where

Let α,ϵ18\alpha,\epsilon\leq\frac{1}{8}, and let CC be a class of predicates. If there exists an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer AA for CC, then there exists a proper ((15α+12β),10β,6ϵ,t)((15\alpha+12\beta),10\beta,6\epsilon,t)-PPAC learner for CC, where t=Oα,β,ϵ(m)t=O_{\alpha,\beta,\epsilon}(m).

Let AA be an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-sanitizer for CC. Note that by Theorem 2.20, it must be that mVC(C)2m\geq\frac{\operatorname{\rm VC}(C)}{2}. By Lemma 5.4, there exists a ((5α+4β),5β,6ϵ,t)((5\alpha+4\beta),5\beta,6\epsilon,t)-sanitizer for ClabelC^{\rm label}, where t=Oα,β,ϵ(m)t=O_{\alpha,\beta,\epsilon}(m) and t100mα2ln(1αβ)50VC(C)α2ln(1αβ)t\geq\frac{100m}{\alpha^{2}}\ln(\frac{1}{\alpha\beta})\geq\frac{50\operatorname{\rm VC}(C)}{\alpha^{2}}\ln(\frac{1}{\alpha\beta}). By Lemma 5.3, there exists a proper ((15α+12β),10β,6ϵ,t)((15\alpha+12\beta),10\beta,6\epsilon,t)-PPAC learner for CC. ∎

Given an efficient proper-sanitizer for CC and assuming the existence of an efficient non-private learner for CC, this reduction results in an efficient private learner for CC.

Next we prove a lower bound on the database size of every sanitizer for kPOINTd\operatorname{\tt k-POINT}_{d} that preserves pure differential privacy.

Consider the following concept class over XdX_{d}. For every AXdA\subseteq X_{d} s.t. A=k|A|=k, the concept class kPOINTd\operatorname{\tt k-POINT}_{d} contains the concept cA:Xd{0,1}c_{A}:X_{d}\rightarrow\{0,1\}, defined as cA(x)=1c_{A}(x)=1 if xAx\in A and cA(x)=0c_{A}(x)=0 otherwise. The VC dimension of kPOINTd\operatorname{\tt k-POINT}_{d} is kk (assuming 2d2k2^{d}\geq 2k).

To prove a lower bound on the sample complexity of sanitization, we first prove a lower bound on the sample complexity of the related learning problem and then use the reduction (Theorem 5.5). Thus, we start by showing that every private proper learner for kPOINTd\operatorname{\tt k-POINT}_{d} requires Ω(kdαϵ)\Omega(\frac{kd}{\alpha\epsilon}) labeled examples. A similar version of this lemma appeared in Beimel et al. , where it is shown that every private proper learner for POINTd\operatorname{\tt POINT}_{d} requires Ω(dαϵ)\Omega(\frac{d}{\alpha\epsilon}) labeled examples.

Let α<15\alpha<\frac{1}{5}, and let k,dk,d be s.t. 2dk1.12^{d}\geq k^{1.1}. If LL is a proper (α,12,ϵ,m)(\alpha,\frac{1}{2},\epsilon,m)-PPAC learner for kPOINTd\operatorname{\tt k-POINT}_{d}, then m=Ω(kdαϵ)m=\Omega(\frac{kd}{\alpha\epsilon}).

Let LL be a proper (α,12,ϵ,m)(\alpha,\frac{1}{2},\epsilon,m)-PPAC learner for kPOINTd\operatorname{\tt k-POINT}_{d}. Without loss of generality, we can assume that m5ln(4)3αm\geq\frac{5\ln(4)}{3\alpha} (since LL can ignore part of the sample).

Consider a maximal cardinality subset BkPOINTdB\subseteq\operatorname{\tt k-POINT}_{d} s.t. for every cABc_{A}\in B it holds that 0dA0^{d}\notin A, and moreover, for every cA1cA2Bc_{A_{1}}\neq c_{A_{2}}\in B it holds that A1A2k2|A_{1}\cap A_{2}|\leq\frac{k}{2}. We have that B(2d14e2k)k/2|B|\geq\left(\frac{2^{d}-1}{4e^{2}k}\right)^{k/2}. To see this, we could construct such a set using the following greedy algorithm. Initiate B^=\hat{B}=\emptyset, and C=kPOINTd{cIkPOINTd  :  0dI}C=\operatorname{\tt k-POINT}_{d}\setminus\{c_{I}\in\operatorname{\tt k-POINT}_{d}\;:\;0^{d}\in I\}. While CC\neq\emptyset, arbitrarily choose a concept cACc_{A}\in C, add cAc_{A} to B^\hat{B}, and remove from CC every concept cIc_{I} s.t. AIk2|A\cap I|\leq\frac{k}{2}.

Clearly, for every two cA1cA2B^c_{A_{1}}\neq c_{A_{2}}\in\hat{B} it holds that A1A2k2|A_{1}\cap A_{2}|\leq\frac{k}{2}. Moreover, at every step, the number of concepts that are removed from CC is at most

For every cABc_{A}\in B we will now define a distribution DA\mathcal{D}_{A}, a set of hypotheses G(A)G(A), and a database SAS_{A}. The distribution DA\mathcal{D}_{A} is defined as

Define the set G(A)kPOINTdG(A)\subseteq\operatorname{\tt k-POINT}_{d} as all α\alpha-good hypothesis for (cA,DA)(c_{A},\mathcal{D}_{A}) in kPOINTd\operatorname{\tt k-POINT}_{d}. Note that for every hIkPOINTdh_{I}\in\operatorname{\tt k-POINT}_{d} s.t. errorDA(hI,cA)α{\rm error}_{\mathcal{D}_{A}}(h_{I},c_{A})\leq\alpha it holds that IA4k5|I\cap A|\geq\frac{4k}{5}. Therefore, for every cA1cA2Bc_{A_{1}}\neq c_{A_{2}}\in B it holds that G(A1)G(A2)=G(A_{1})\cap G(A_{2})=\emptyset (as A1A2k2|A_{1}\cap A_{2}|\leq\frac{k}{2}, and as A1=A2=I=k|A_{1}|=|A_{2}|=|I|=k).

By the utility properties of LL, we have that PrL,DA[L(S)G(A)]12\Pr_{L,\mathcal{D}_{A}}[L(S)\in G(A)]\geq\frac{1}{2}. We say that a database SS of mm labeled examples is good if the unlabeled example 0d0^{d} appears in SS at least (18α)m(1-8\alpha)m times. Let SS be a database constructed by taking mm i.i.d. samples from DA\mathcal{D}_{A}, labeled by cAc_{A}. By the Chernoff bound, SS is good with probability at least 1exp(3αm/5)1-\exp(-3\alpha m/5). Hence,

Note that, as 0dA0^{d}\notin A, every appearance of the example 0d0^{d} in SS is labeled by . Therefore, there exists a good database SS of mm samples that contains the entry 0d00^{d}\circ 0 at least (18α)m(1-8\alpha)m times, and PrL[L(S)G(A)]14\Pr_{L}\left[L(S)\in G(A)\right]\geq\frac{1}{4}, where the probability is only over the randomness of LL. We define SAS_{A} as such a database.

Note that all of the databases SAiS_{A_{i}} defined here are of distance at most 8αm8\alpha m from one another. The privacy of LL ensures, therefore, that for any two such SAi,SAjS_{A_{i}},S_{A_{j}} it holds that PrL[L(SAi)G(Aj)]14exp(8αϵm)\Pr_{L}[L(S_{A_{i}})\in G(A_{j})]\geq\frac{1}{4}\exp(-8\alpha\epsilon m).

Solving for mm yields m=Ω(kαϵ(dln(k)))m=\Omega(\frac{k}{\alpha\epsilon}(d-\ln(k))). Recall that 2dk1.12^{d}\geq k^{1.1}, and, hence, m=Ω(kdαϵ)m=\Omega(\frac{kd}{\alpha\epsilon})

The constant 1.11.1 in the above lemma could be replaced with any constant strictly bigger than 11. Moreover, whenever 2d=O(k)2^{d}=O(k) we have that kPOINTd=(2dk)=2O(2d)|\operatorname{\tt k-POINT}_{d}|={2^{d}\choose k}=2^{O(2^{d})} and, hence, the generic construction of Kasiviswanathan et al. yields a proper ϵ\epsilon-private learner for this class with sample complexity Oα,β,ϵ(2d)=Oα,β,ϵ(k)O_{\alpha,\beta,\epsilon}(2^{d})=O_{\alpha,\beta,\epsilon}(k).

In the next lemma we will use the last lower bound on the sample complexity of private learners, together with the reduction of Theorem 5.5, and derive a lower bound on the database size necessary for pure private sanitizers for kPOINTd\operatorname{\tt k-POINT}_{d}.

Let ϵ18\epsilon\leq\frac{1}{8}, and let kk and dd be s.t. 2dk1.12^{d}\geq k^{1.1}. Every (1150,1150,ϵ,m)(\frac{1}{150},\frac{1}{150},\epsilon,m)-sanitizer for kPOINTd\operatorname{\tt k-POINT}_{d} requires databases of size

Let AA be a (1150,1150,ϵ,m)(\frac{1}{150},\frac{1}{150},\epsilon,m)-sanitizer for kPOINTd\operatorname{\tt k-POINT}_{d}. By Theorem 5.5, there exists a proper (950,115,6ϵ,t)(\frac{9}{50},\frac{1}{15},6\epsilon,t)-PPAC learner for kPOINTd\operatorname{\tt k-POINT}_{d}, where t=O(m)t=O\left(m\right). By Lemma 5.7, t=Ω(kdϵ)t=\Omega\left(\frac{kd}{\epsilon}\right), and hence m=Ω(kdϵ)m=\Omega\left(\frac{kd}{\epsilon}\right). ∎

Recall that in the proof of Theorem 5.5, we increased the sample complexity in order to use Lemma 5.3. This causes a slackness of α2\alpha^{2} in the database size of the resulting learner, which, in turn, eliminates the dependency in α\alpha in the above lower bound. For the class kPOINTdlabel\operatorname{\tt k-POINT}_{d}^{\rm label} it is possible to obtain a better lower bound, by using the reduction of Lemma 5.3 twice.

Let α150\alpha\leq\frac{1}{50} and ϵ18\epsilon\leq\frac{1}{8}. There exist a d0=d0(α,ϵ)d_{0}=d_{0}(\alpha,\epsilon) s.t. for every kk and dd s.t. 2dmax{k1.1  ,  2d0}2^{d}\geq\max\{k^{1.1}\;,\;2^{d_{0}}\}, it holds that every (α,150,ϵ,m)(\alpha,\frac{1}{50},\epsilon,m)-sanitizer for kPOINTdlabel\operatorname{\tt k-POINT}_{d}^{\rm label} must operate on databases of size

Let AA be a (150,150,ϵ,m)(\frac{1}{50},\frac{1}{50},\epsilon,m)-sanitizer for a class kPOINTdlabel\operatorname{\tt k-POINT}_{d}^{\rm label}, where ϵ18\epsilon\leq\frac{1}{8}. Note that by Theorem 2.20, it must be that mVC(kPOINTdlabel)2VC(kPOINTd)2m\geq\frac{\operatorname{\rm VC}(\operatorname{\tt k-POINT}_{d}^{\rm label})}{2}\geq\frac{\operatorname{\rm VC}(\operatorname{\tt k-POINT}_{d})}{2}. In order to use Lemma 5.3, we need a slightly stronger guarantee, and therefore use Lemma 5.1 to increase the input database size as follows.

Denote q=100503ln(502)q=\left\lceil 100\cdot 50^{3}\ln(50^{2})\right\rceil. By Lemma 5.1, there exists a (225,150,ϵ,t)(\frac{2}{25},\frac{1}{50},\epsilon,t)-sanitizer BB for kPOINTdlabel\operatorname{\tt k-POINT}_{d}^{\rm label}, where

By Lemma 5.3, there exists a proper (950,125,ϵ,t)(\frac{9}{50},\frac{1}{25},\epsilon,t)-PPAC learner for kPOINTd\operatorname{\tt k-POINT}_{d}. By Lemma 5.7, t=Ω(kdϵ)t=\Omega\left(\frac{kd}{\epsilon}\right), and hence

Let α150\alpha\leq\frac{1}{50} and ϵ18\epsilon\leq\frac{1}{8}, and let BB be an (α,150,ϵ,m)(\alpha,\frac{1}{50},\epsilon,m)-sanitizer for kPOINTdlabel\operatorname{\tt k-POINT}_{d}^{\rm label}. As BB is, in particular, a (150,150,ϵ,m)(\frac{1}{50},\frac{1}{50},\epsilon,m)-sanitizer for kPOINTdlabel\operatorname{\tt k-POINT}_{d}^{\rm label}, where ϵ18\epsilon\leq\frac{1}{8}, Equation 15 states that there exists a constant λ\lambda s.t. mλkdϵm\geq\lambda\frac{kd}{\epsilon}. Asserting that dd050ϵλα2ln(50α)d\geq d_{0}\triangleq\frac{50\epsilon}{\lambda\alpha^{2}}\ln(\frac{50}{\alpha}), we ensure that m50kα2ln(50α)m\geq\frac{50k}{\alpha^{2}}\ln(\frac{50}{\alpha}). By reusing Lemme 5.3, we now get that there exists a proper (3α,125,ϵ,m)(3\alpha,\frac{1}{25},\epsilon,m)-PPAC learner for kPOINTd\operatorname{\tt k-POINT}_{d}. Lemma 5.7 now states that

Label-Private Learners

In this section we consider relaxed definitions of private learners preserving pure privacy (i.e., δ=0\delta=0). We start with the model of label privacy (see and references therein). In this model, privacy must only be preserved for the labels of the elements in the database, and not necessarily for their identity. This is a reasonable privacy requirement when the identity of individuals in a population are known publicly but not their labels. In general, this is not a reasonable assumption.

We consider a database S=(xi,yi)i=1mS=(x_{i},y_{i})_{i=1}^{m} containing labeled points from some domain XX, and denote Sx=(xi)i=1mXmS_{x}=(x_{i})_{i=1}^{m}\in X^{m}, and Sy=(yi)i=1m{0,1}mS_{y}=(y_{i})_{i=1}^{m}\in\{0,1\}^{m}.

Let AA be an algorithm that gets as input a database SxXmS_{x}\in X^{m} and its labels Sy{0,1}mS_{y}\in\{0,1\}^{m}. Algorithm AA is an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-Label Private PAC Learner for a concept class CC over XX if

Privacy. SxXm\forall S_{x}\in X^{m}, algorithm A(Sx,)=ASx()A(S_{x},\cdot)=A_{S_{x}}(\cdot) is ϵ\epsilon-differentially private (as in Definition 2.2);

Utility. Algorithm AA is an (α,β,m)(\alpha,\beta,m)-PAC learner for CC (as in Definition 2.7).

Chaudhuri et al. proved lower bounds on the sample complexity of label-private learners for a class CC in terms of its doubling dimension. As we will now see, the correct measure for characterizing the sample complexity of such learners is the VC dimension, and the sample complexity of label-private learners is actually of the same order as that of non-private learners (assuming α,β\alpha,\beta, and ϵ\epsilon are constants).

Let CC be a concept class over a domain XX. For every α,β,ϵ\alpha,\beta,\epsilon, there exists an (α,β,ϵ,m)(\alpha,\beta,\epsilon,m)-Label Private PAC learner for CC, where m=Oα,β,ϵ(VC(C))m=O_{\alpha,\beta,\epsilon}(\operatorname{\rm VC}(C)). The learner might not be efficient.

In Figure 13 we describe a label-private algorithm AA. Algorithm AA constructs a set of hypotheses HH as follows: It samples an unlabeled sample S1S_{1}, and defines BB as the set of points in S1S_{1}. For every labeling of the points in BB realized by CC, add to HH an arbitrary concept consistent with this labeling. Afterwards, algorithm AA uses the exponential mechanism to choose a hypothesis out of HH.

Note that steps 1-4 of algorithm AA are independent of the labeling vector SyS_{y}. By the properties of the exponential mechanism (which is used to access SyS_{y} on Step 5), for every set of elements SxS_{x}, algorithm A(Sx,)A(S_{x},\cdot) is ϵ\epsilon-differentially private.

For the utility analysis, fix a target concept cCc\in C and a distribution D\mathcal{D} over XX, and define the following 3 good events:

The constructed set HH contains at least one hypothesis ff s.t. errorS2(f)α4{\rm error}_{S^{2}}(f)\leq\frac{\alpha}{4}.

For every hHh\in H s.t. errorS2(h)α2{\rm error}_{S^{2}}(h)\leq\frac{\alpha}{2}, it holds that errorD(c,h)α{\rm error}_{\mathcal{D}}(c,h)\leq\alpha.

The exponential mechanism chooses an hh such that errorS2(h)α4+minfH{errorS2(f)}{\rm error}_{S^{2}}(h)\leq\frac{\alpha}{4}+\min_{f\in H}\left\{{\rm error}_{S^{2}}(f)\right\}.

We first show that if these 3 good events happen, then algorithm AA returns an α\alpha-good hypothesis. Event E1E_{1} ensures the existence of a hypothesis fHf\in H s.t. errorS2(f)α4{\rm error}_{S^{2}}(f)\leq\frac{\alpha}{4}. Thus, event E1E3E_{1}\cap E_{3} ensures algorithm AA chooses (using the exponential mechanism) a hypothesis hHh\in H s.t. errorS2(h)α2{\rm error}_{S^{2}}(h)\leq\frac{\alpha}{2}. Event E2E_{2} ensures, therefore, that this hh obeys errorD(c,h)α{\rm error}_{\mathcal{D}}(c,h)\leq\alpha.

Fix a hypothesis hh s.t. errorD(c,h)>α{\rm error}_{\mathcal{D}}(c,h)>\alpha. Using the Chernoff bound, the probability that errorS2(h)α2{\rm error}_{S^{2}}(h)\leq\frac{\alpha}{2} is less than exp((mn)α/8)\exp(-(m-n)\alpha/8). As H=2B2n|H|=2^{|B|}\leq 2^{n}, the probability that there is such a hypothesis in HH is at most 2nexp((mn)α/8)2^{n}\cdot\exp(-(m-n)\alpha/8). For m8α(n+ln(4β))m\geq\frac{8}{\alpha}(n+\ln(\frac{4}{\beta})), this probability is at most β4\frac{\beta}{4}, and event E2E_{2} happens with probability at least (1β4)(1-\frac{\beta}{4}).

The exponential mechanism ensures that the probability of event E3E_{3} is at least 1Hexp(ϵαm/8)1-|H|\cdot\exp(-\epsilon\alpha m/8) (see Proposition 2.25), which is at least (1β4)(1-\frac{\beta}{4}) for m8αϵ(n+ln(4β))m\geq\frac{8}{\alpha\epsilon}(n+\ln(\frac{4}{\beta})).

All in all, by setting n=32α(VC(C)ln(64α)+ln(8β))n=\frac{32}{\alpha}(\operatorname{\rm VC}(C)\ln(\frac{64}{\alpha})+\ln(\frac{8}{\beta})) and m768α2ϵ(VC(C)ln(64α)+2ln(8β))m\geq\frac{768}{\alpha^{2}\epsilon}(\operatorname{\rm VC}(C)\ln(\frac{64}{\alpha})+2\ln(\frac{8}{\beta})), we ensure that the probability of AA failing to output an α\alpha-good hypothesis is at most β\beta. ∎

2 Label Privacy Extension

We consider a slight generalization of the label privacy model. Recall that given a labeled sample, a private learner is required to preserve the privacy of the entire sample, while a label-private learner is only required to preserve privacy for the labels of each entry.

Consider a scenario where there is no need in preserving the privacy of the distribution D\mathcal{D} (for example, D\mathcal{D} might be publicly known), but we still want to preserve the privacy of the entire sample SS. We can model this scenario as a learning algorithm AA which is given as input 2 databases – a labeled database SS, and an unlabeled database DD. For every database DD, algorithm A(D,)=AD()A(D,\cdot)=A_{D}(\cdot) must preserve differential privacy. We will refer to such a learner as a Semi-Private learner.

Clearly, Ω(VC(C))\Omega(\operatorname{\rm VC}(C)) samples are necessary in order to semi-privately learn a concept class CC, as this is the case for non-private learners.The lower bound of Ω(VC(C))\Omega(\operatorname{\rm VC}(C)) is worst case over choices of distributions D\mathcal{D}. For a specific distribution, less samples may suffice. This lower bound is tight, as the above generic learner could easily be adjusted for the semi-privacy model, and result in a generic semi-private learner with sample complexity Oα,β,ϵ(VC(C))O_{\alpha,\beta,\epsilon}(\operatorname{\rm VC}(C)). To see this, recall that in the above algorithm, the input sample SS is divided into S1S_{1} and S2S_{2}. Note that the labels in S1S_{1} are ignored, and, hence, S1S_{1} could be replaced with an unlabeled database. Moreover, note that S2S_{2} is only accessed using the exponential mechanism (on Step 5), which preserves the privacy both for the labels and for the examples in S2S_{2}.

Consider the task of learning a concept class CC, and suppose that the relevant distribution over the population is publicly known. Now, given a labeled database SS, we can use a semi-private learner and guarantee privacy both for the labellings and for the mere existence of an individual in the database. That is, in such a case, the privacy guarantee of a semi-private learner is the same as that of a private learner. Moreover, the necessary sample complexity is Oα,β,ϵ(VC(C))O_{\alpha,\beta,\epsilon}(\operatorname{\rm VC}(C)), which should be contrasted with Oα,β,ϵ(logC)O_{\alpha,\beta,\epsilon}(\log|C|) which is the sample complexity that would result from the general construction of Kasiviswanathan et al. .

We thank Salil Vadhan and Jon Ullman for helpful discussions of ideas in this work.

References