Answering n^{2+o(1)} Counting Queries with Differential Privacy is Hard

Jonathan Ullman

Introduction

Consider a database D({0,1}d)nD\in(\{0,1\}^{d})^{n}, in which each of the nn rows corresponds to an individual’s record, and each record consists of dd binary attributes. The goal of privacy-preserving data analysis is to enable rich statistical analyses on the database while protecting the privacy of the individuals. It is especially desirable to achieve differential privacy [DMNS06], which guarantees that no individual’s data has a significant influence on the information released about the database.

Some of the most basic statistics on a database are counting queries, which are queries of the form, “What fraction of individual records in DD satisfy some property qq?” In particular we would like to construct differentially private sanitizers that, given a database DD and kk counting queries q1,,qkq_{1},\dots,q_{k} from a family Q\mathcal{Q}, outputs an approximate answer to each of the queries. We would like the number of queries, kk, to be as large as possible, and the set of feasible queries, Q\mathcal{Q}, to be as general as possible. Ideally, Q\mathcal{Q}, would contain all counting queries.It may require super-polynomial time just to evaluate an arbitrary counting query, which would rule out efficiency for reasons that have nothing to do with privacy. For this discussion, we restrict attention to queries that are efficiently computable, so are not the bottleneck in the computation. Moreover, we would like the algorithm to run as efficiently as possible.

Some of the earliest work in differential privacy [DMNS06] gave an efficient sanitizer—the so-called Laplace Mechanism. The Laplace Mechanism answers any set of kk arbitrary efficiently computable counting queries by perturbing the answers with appropriately calibrated random noise, providing good accuracy (say, within ±.01\pm.01 of the true answer) as long as kn2k\lesssim n^{2}.

Unfortunately, we show that this is not the case—there is no efficient, differentially private algorithm that takes a database D({0,1}d)nD\in(\{0,1\}^{d})^{n}, and Θ~(n2)\widetilde{\Theta}(n^{2}) arbitrary, efficiently computable counting queries as input and outputs an approximate answer to each of the queries. One way to summarize our results is that, unless we restrict the set Q\mathcal{Q} of allowable queries, or allow exponential running time, then the Laplace Mechanism is essentially the best possible algorithm for answering counting queries with differential privacy.

We also show that, the same theorem holds even for queries that are computable by unbounded-fan-in circuits of depth 66 over the basis {,,¬}\{\land,\lor,\neg\} (a subset of the well-studied class AC0AC^{0}), albeit under a stronger (but still plausible) cryptographic assumption.

Theorem 1.2 should be contrasted with the results of Hardt, Rothblum, and Servedio [HRS12] as well as Thaler, Ullman, and Vadhan [TUV12], which give efficient sanitizers for answering nΩ(k)n2n^{\Omega(\sqrt{k})}\gg n^{2} monotone kk-way conjunction queries, a much simpler class than polynomial-size depth-6 circuits.A monotone kk-way conjunction query on a database D({0,1}d)D\in(\{0,1\}^{d})^{*} is specified by a set of positions S[d]S\subseteq[d], S=kd|S|=k\leq d, and asks “What fraction of records in DD have a 11 in every position in SS?”.

We prove our results by building on the connection between differentially private sanitizers for counting queries and traitor-tracing schemes discovered by Dwork et al. [DNR+09]. Traitor-tracing schemes were introduced by Chor, Fiat, and Naor [CFN94] for the purpose of identifying pirates who violate copyright restrictions. Roughly speaking, a (fully collusion-resilient) traitor-tracing scheme allows a sender to generate keys for nn users so that 1) the sender can broadcast encrypted messages that can be decrypted by any user, and 2) any efficient pirate decoder capable of decrypting messages can be traced to at least one of the users who contributed a key to it, even if an arbitrary coalition of the users combined their keys in an arbitrary efficient manner to construct the decoder.

Dwork et al. show that the existence of traitor-tracing schemes implies hardness results for one-shot sanitizers. Very informally, they argue as follows: Suppose a coalition of users takes their keys and builds a database D({0,1}d)nD\in(\{0,1\}^{d})^{n} where each record contains one of their user keys. The family Q\mathcal{Q} will contain a query qcq_{c} for each possible ciphertext cc. The query qcq_{c} asks “What fraction of the records (user keys) in DD would decrypt the ciphertext cc to the message 11?” Every user can decrypt, so if the sender encrypts a message m{0,1}m\in\{0,1\} as a ciphertext cc, then every user will decrypt cc to mm. Thus the answer to the counting query, qcq_{c}, will be mm.

Suppose there were an efficient one-shot sanitizer for Q\mathcal{Q}. Then the coalition could use it to efficiently produce a summary of the database DD that enables one to efficiently compute an approximate answer to every query qcq_{c}, which would also allow one to efficiently decrypt the ciphertext. Such a summary can be viewed as an efficient pirate decoder, and thus the tracing algorithm can use the summary to trace one of the users in the coalition. However, if there is a way to identify one of the users in the database from the summary, then the summary is not differentially private.

In order to instantiate their result, they need a traitor-tracing scheme. Since Q\mathcal{Q} contains a query for every ciphertext, the parameter to optimize is the length of the ciphertexts. Using the fully collusion-resilient traitor-tracing scheme of Boneh, Sahai, and Waters [BSW06], which has ciphertexts of length O~(n)\widetilde{O}(\sqrt{n}), they obtain a family of queries of size 2O~(n)2^{\widetilde{O}(\sqrt{n})} for which there is no efficient one-shot sanitizer. Dwork et al. also discovered a partial converse—proving hardness of one-shot sanitization for a smaller family of queries requires constructing traitor-tracing schemes with shorter ciphertexts, which is a seemingly difficult open problem.

Our Approach

In our setting of sanitization (rather than one-shot sanitization, as studied by Dwork et al. [DNR+09]), we don’t expect to answer every query in Q\mathcal{Q}, only a much smaller set of queries requested by the analyst. At first glance, this should make answering the queries much easier, and thus make it more difficult to demonstrate hardness. However, the attacker does have the power to choose the queries which he wants answered, and can choose queries that are most difficult to sanitize. Our first observation is that in the traitor-tracing scenario, the tracing algorithms only query the pirate decoder on a polynomial number of ciphertexts, which are randomly chosen and depend on the particular keys that were instantiated for the scheme. For many schemes, even O~(n2)\widetilde{O}(n^{2}) queries is sufficient. Thus it would seem that the tracing algorithm could simply decide which queries it will make, give those queries as input to the sanitizer, and then use the answers to those queries to identify a user and violate differential privacy.

However, this intuition ignores an important issue. Many traitor-tracing schemes (including [BSW06]) can only trace stateless pirate decoders, which essentially commit to a response to each possible query (or a distribution over responses) once and for all. For one-shot sanitizers, the private summary is necessarily stateless, and thus the result of Dwork et al. can be instantiated with any scheme that allows tracing of stateless pirate decoders. However, an arbitrary sanitizer might give answers that depend on the sequence of queries. Thus, in order to prove our results, we will need a traitor-tracing scheme that can trace stateful pirate decoders.

The problem of tracing stateful pirates is quite natural even without the implications for private data analysis. Indeed, this problem has been studied in the literature, originally by Kiayias and Yung [KY01]. They considered pirates that can abort and record history. However, their solution, and all others known, does not apply to our specific setting due to a certain “watermarking assumption” that doesn’t apply when proving hardness-of-sanitization (see discussion below). To address this problem, we also refine the basic connection between traitor-tracing schemes and differential privacy by showing that, in many respects, fairly weak traitor-tracing schemes suffice to establish the hardness of preserving privacy. In particular, although the pirate decoder obtained from a sanitizer may be stateful and record history, the accuracy requirement of the sanitizer means that the corresponding pirate decoder cannot abort, which will make it easier to construct a traitor-tracing scheme for these kinds of pirates. Indeed, we construct such a scheme to establish Theorem 1.1.

The scheme also has weakened requirements in other respects, having nothing to do with the statefulness of the pirate or the tracing algorithm. These weakened requirements allow us to reduce the complexity of the decryption, which means that the queries used by the attacker do not need to be arbitrary polynomial-size circuits, but instead can be circuits of constant depth, which allows us to establish Theorem 1.2. Another technical issue arises in that all kk queries must be given to the sanitizer at once, whereas tracing algorithms typically are allowed to query the pirate interactively. However, we are able to show that the scheme we construct can be traced using one round of queries. See Sections 3.1 and 4 for a precise statement of the kind of traitor-tracing scheme that suffices and Section 5 for our construction.

Our construction is based on a well-known fully collusion resilient traitor-tracing scheme [CFN94], but with a modified tracing algorithm. The tracing algorithm uses fingerprinting codes [BS98, Tar08], which have been employed before in the context of traitor-tracing and content distribution, but our tracing algorithm is different from all those we are aware of. The resulting scheme allows for tracing with a minimal number of non-adaptively chosen queries, achieves tracing without context-specific watermarking assumptions, simplifies the decryption circuit (at the expense of weakening the security parameters and functionality). The restriction to non-aborting pirates may not be so natural in the setting of content distribution, which may explain why the scheme was not previously known.

2 Related Work

In addition to the hardness results for one-shot sanitizations [DNR+09], which apply to arbitrary one-shot sanitizers, there are several hardness-of-sanitization results for restricted classes of sanitizers. Dwork et al. proved stronger hardness results for sanitizers whose output is a “synthetic database”—roughly, a new database (of the same dimensions) that approximately preserves the answer to some set of queries. Their results were extended by Ullman and Vadhan [UV11], who showed that it is hard to generate a private synthetic database that is accurate for essentially any non-trivial family of queries, even 22-literal conjunctions.

Gupta et al. [GHRU11] considered algorithms that can only access the database by making a polynomial number of “statistical queries” (essentially counting queries). They showed that such algorithms cannot be a one-shot sanitizer (even ignoring privacy constraints) that approximately answers certain simple families of counting queries with high accuracy.

Finally, Dwork, Naor, and Vadhan [DNV12] gave information-theoretic lower bounds for stateless sanitizers, which take kk queries as input, but whose answers to each query do not depend on the other k1k-1 input queries. They showed that (even computationally unbounded) stateless sanitizers can answer at most O~(n2)\widetilde{O}(n^{2}) queries with non-trivial accuracy, while satisfying differential privacy. The Laplace Mechanism is a stateless sanitizer that answers Ω~(n2)\widetilde{\Omega}(n^{2}) queries, and thus their result is tight in this respect. Although their result is information theoretic, and considers a highly restricted type of sanitizer, their techniques are related to ours. We elaborate on this connection in the appendix.

Preliminaries

Let a database D({0,1}d)nD\in(\{0,1\}^{d})^{n} be a collection of nn rows (x(1),,x(n)){0,1}d(x^{(1)},\dots,x^{(n)})\in\{0,1\}^{d}. We say that two databases D,D({0,1}d)nD,D^{\prime}\in(\{0,1\}^{d})^{n} are adjacent if they differ only on a single row, and we denote this by DDD\sim D^{\prime}.

Let M ⁣:({0,1}d)nR\mathcal{M}\colon(\{0,1\}^{d})^{n}\to\mathcal{R} be a randomized algorithm that takes a database as input (where nn and dd are varying parameters). M\mathcal{M} is (ε,δ)(\varepsilon,\delta)-differentially private if for every two adjacent databases DDD\sim D^{\prime} and every subset SRS\subseteq\mathcal{R},

If M\mathcal{M} is (ε,δ)(\varepsilon,\delta)-differentially private for some functions ε=ε(n)=O(1)\varepsilon=\varepsilon(n)=O(1), δ=δ(n)=o(1/n)\delta=\delta(n)=o(1/n), we will drop the parameters ε\varepsilon and δ\delta and say that M\mathcal{M} is differentially private.

The choice of ε=O(1),δ=o(1/n)\varepsilon=O(1),\delta=o(1/n) is a fairly weak setting of the privacy parameters, and most known constructions of differentially private mechanisms for various tasks give quantitatively stronger privacy guarantees. These parameters are essentially the weakest possible, as (ε,δ)(\varepsilon,\delta)-differentially privacy is not a satisfactory privacy guarantee for ε=ω(1)\varepsilon=\omega(1) or δ=Ω(1/n)\delta=\Omega(1/n). That our lower bounds apply to the parameters specified in Definition 2.1 makes our results stronger.

Sanitizers for Counting Queries

Since an algorithm that always outputs \bot satisfies Definition 2.1, we also need to specify what it means for the sanitizer to be useful. In this paper we study sanitizers that give accurate answers to counting queries. A counting query on {0,1}d\{0,1\}^{d} is defined by a predicate q ⁣:{0,1}d{0,1}q\colon\{0,1\}^{d}\to\{0,1\}. Abusing notation, we define the evaluation of the query qq on a database D=(x(1),,x(n))({0,1}d)nD=(x^{(1)},\dots,x^{(n)})\in(\{0,1\}^{d})^{n} to be

Now we formally define what it means for a sanitizer to give accurate answers.

Let DD be a database and q1,,qkq_{1},\dots,q_{k} be a set of counting queries. A sequence of answers a1,,aka_{1},\dots,a_{k} is α\alpha-accurate for q1,,qkq_{1},\dots,q_{k} on DD if

If M\mathcal{M} is (α,β,Q,k)(\alpha,\beta,\mathcal{Q},k)-accurate for any (constant) α<1/2\alpha<1/2 and β=β(n)=o(1/n2)\beta=\beta(n)=o(1/n^{2}), we will drop α\alpha and β\beta and say that M\mathcal{M} is (Q,k)(\mathcal{Q},k)-accurate.

The choice of α<1/2,β=o(1/n2)\alpha<1/2,\beta=o(1/n^{2}) is also significantly weaker than what can be achieved by known constructions of sanitizers. These parameters are also essentially the weakest parameters possible, as a mechanism that answers 1/21/2 to every query achieves α=1/2\alpha=1/2, β=0\beta=0 for any number of arbitrary queries. Also, if there is a mechanism that achieves (α,β,Q,k)(\alpha,\beta,\mathcal{Q},k)-accuracy for β<1/2\beta<1/2, then there is another mechanism that achieves (α,o(1/n2),Q,k)(\alpha,o(1/n^{2}),\mathcal{Q},k)-accuracy with only an O(logn)O(\log n) loss in the privacy parameters and the efficiency of the mechanism.Given a sanitizer M\mathcal{M} that answers every query accurately with probability 1/2+Ω(1)1/2+\Omega(1), one can obtain a mechanism M\mathcal{M}^{\prime} that answers every query accurately with probability 1β1-\beta. M\mathcal{M}^{\prime} will run M\mathcal{M} independently r=O(log(1/β))r=O(\log(1/\beta)) times and answers each query with the median of the rr answers for that query. That our lower bound applies to the parameters specified in Definition 2.2 makes our results stronger.

Efficiency of Sanitizers

Simply, a sanitizer is efficient if it runs in time polynomial in the length of its input. To make the statement more precise, we need to specify how the queries are given to the sanitizer as input.

For comparison with our results, we will recall the properties of some known mechanisms, stated in our terminology and for our choice of parameters:

There exists a sanitizer MLap\mathcal{M}_{\mathsf{Lap}} that is 1) differentially private, 2) efficient, and 3) (Qall(d),Ω~(n2))(\mathcal{Q}^{(d)}_{\mathsf{all}},\widetilde{\Omega}(n^{2}))-accurate.

Traitor-Tracing Schemes

In this section we give define a traitor-tracing scheme. Throughout, we will use ATT\mathsf{A_{TT}} to denote algorithms associated with traitor-tracing schemes.

We now give a definition of a traitor-tracing scheme, heavily tailored to the task of proving hardness results for generic sanitizers. We will sacrifice some consistency with the standard definitions. See below for further discussion of the ways in which our definition departs from the standard definition of traitor-tracing. In some cases, the non-standard aspects of the definition will be necessary to establish our results, and in others it will be for convenience. Despite these differences, we will henceforth refer to schemes satisfying our definition simply as traitor-tracing schemes.

Intuitively, a traitor-tracing scheme is a form of broadcast encryption, in which a sender can broadcast an encrypted message that can be decrypted by each of a large set of users. The standard notion of security for such a scheme would require that an adversary that doesn’t have any of the keys cannot decrypt the message. A traitor-tracing scheme has the additional property that given any decoder capable of decrypting the message (which must in a very loose sense “know” at least one of the keys), there is a procedure for determining which user’s key is being used. Moreover, we want the scheme to be “collusion resilient,” in that even if a coalition of users gets together and combines their keys in some way to produce a decoder, there is still a procedure that identifies at least one member of the coalition.

The algorithm GenTT\mathsf{Gen}_{\mathsf{TT}} takes a security parameter, κ\kappa, and returns a sequence of n=n(κ)n=n(\kappa) user keys sk=(sk(1),,sk(n)){0,1}κ\vec{sk}=(sk^{(1)},\dots,sk^{(n)})\in\{0,1\}^{\kappa}. Formally, sk=(sk(1),,sk(n))\mboxRGenTT(1κ)\vec{sk}=(sk^{(1)},\dots,sk^{(n)})\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}_{\mathsf{TT}}(1^{\kappa}).

The algorithm EncTT\mathsf{Enc}_{\mathsf{TT}} takes a sequence of nn user keys sk\vec{sk} and a message bit b{0,1}b\in\{0,1\}, and generates a ciphertext cC=C(κ)c\in\mathcal{C}=\mathcal{C}^{(\kappa)}. Formally, c\mboxREncTT(sk,b)c\leftarrow_{\mbox{\tiny R}}\mathsf{Enc}_{\mathsf{TT}}(\vec{sk},b).

The algorithm TraceTT\mathsf{Trace_{TT}} takes as input a set of user keys sk({0,1}κ)n(κ)\vec{sk}\in(\{0,1\}^{\kappa})^{n(\kappa)} and an oracle P ⁣:(C(κ))kTT(κ){0,1}kTT(κ)\mathcal{P}\colon(\mathcal{C}^{(\kappa)})^{k_{\mathsf{TT}}(\kappa)}\to\{0,1\}^{k_{\mathsf{TT}}(\kappa)}, makes one kTTk_{\mathsf{TT}}-tuple of queries, (c1,,ckTT)C(κ)(c_{1},\dots,c_{k_{\mathsf{TT}}})\in\mathcal{C}^{(\kappa)} to its oracle (kTT=kTT(κ)k_{\mathsf{TT}}=k_{\mathsf{TT}}(\kappa)), and returns the name of a user i[n(κ)]i\in[n(\kappa)]. Formally, i\mboxRTraceTTP(sk)i\leftarrow_{\mbox{\tiny R}}\mathsf{Trace_{TT}^{\mathcal{P}}}(\vec{sk}).

Intuitively, think of the oracle P\mathcal{P} as being given some subset of keys skS=(sk(i))iS\vec{sk}_{S}=(sk^{(i)})_{i\in S} for a non-empty set S[n]S\subseteq[n], and TraceTT\mathsf{Trace_{TT}} is attempting to identify a user iSi\in S. Clearly, if P\mathcal{P} ignores its input and always returns , TraceTT\mathsf{Trace_{TT}} cannot have any hope of success, so we must assume that P\mathcal{P} is capable of decrypting ciphertexts.

In other words, if every user key sk(i)sk^{(i)} (for iSi\in S) decrypts cc to 11 (resp. ), then P(skS,)\mathcal{P}(\vec{sk}_{S},\cdot) decrypts cc to 11 (resp. ), with high probability.

We can now define a secure, (n,kTT)(n,k_{\mathsf{TT}})-traitor-tracing scheme:

The traitor-tracing schemes we consider are somewhat different than those previously studied in the literature.

We do not require the encryption or tracing algorithms to use public keys. In the typical application of traitor-tracing schemes to content distribution, these would be desirable features, however they are not necessary for proving hardness of sanitization.

We only require that the tracing algorithm succeeds with probability 1o(1/n)1-o(1/n). Typically one would require that the tracing algorithm succeeds with probability 1nω(1)1-n^{-\omega(1)}.

We do not give the pirate decoder access to an encryption oracle. In other words, we do not require CPA security. Most traitor-tracing schemes in the literature are public-key, making this distinction irrelevant. Here, we only need an encryption scheme that is secure for an a priori bounded number of messages.

We allow the pirate decoder to be stateful, but in an unusual way. We require (roughly) that if any of the queries are ciphertexts generated by Enc(sk,b)\mathsf{Enc}(\vec{sk},b), then the pirate decoder answers bb to those queries, regardless of the other queries issued. In many models, the pirate is allowed to abort, and answer \bot if it detects that it is being traced. However, we do allow our pirate to correlate its answers to different queries, subject to this accuracy constraint. We also allow the pirate to see all the queries made by the tracer at once, which is more power than is typically given to the pirate.

Roughly, the first three modifications will allow us to find a candidate scheme with very simple decryption and the fourth modification will allow us to trace stateful pirates even in the setting of bit-encryption.

2 Decryption Function Families

For Theorem 1.2, we are interested in traitor-tracing schemes where DecTT\mathsf{Dec}_{\mathsf{TT}} is a “simple” function of the user key (for every ciphertext cCc\in\mathcal{C}).

Let (GenTT,EncTT,DecTT)(\mathsf{Gen}_{\mathsf{TT}},\mathsf{Enc}_{\mathsf{TT}},\mathsf{Dec}_{\mathsf{TT}}) be a traitor-tracing scheme where GenTT\mathsf{Gen}_{\mathsf{TT}} produces keys in {0,1}κ\{0,1\}^{\kappa} and EncTT\mathsf{Enc}_{\mathsf{TT}} produce ciphertexts in C=C(κ)\mathcal{C}=\mathcal{C}^{(\kappa)}. For every cCc\in\mathcal{C}, we define the cc-decryption function qc ⁣:{0,1}κ{0,1}q_{c}\colon\{0,1\}^{\kappa}\to\{0,1\} to be qc(sk)=DecTT(sk,c)q_{c}(sk)=\mathsf{Dec}_{\mathsf{TT}}(sk,c). We define the decryption function family QDecTT(κ)={qc}cC(κ)\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}}^{(\kappa)}=\left\{q_{c}\right\}_{c\in\mathcal{C}^{(\kappa)}}.

In what follows, we will say that ΠTT\Pi_{\mathsf{TT}} is an traitor-tracing scheme with decryption function family QDecTT(κ)\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}}^{(\kappa)}.

Attacking Efficient Sanitizers

In this section we will prove our main result, showing that the existence of traitor-tracing schemes (as in Definition 3.2) implies that efficient sanitizers cannot answer too many counting queries while satisfying differential privacy.

The main difference between Theorem 4.1 and the result of Dwork et al. [DNR+09] is that we only assume the existence of a sanitizer for kTT(d)k_{\mathsf{TT}}(d) queries from Q(d)=QDecTT(d)\mathcal{Q}^{(d)}=\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}}^{(d)}, whereas Dwork et al. assume the existence of a one-shot sanitizer that answers every query in Q(d)\mathcal{Q}^{(d)}. To offset the weaker assumption on the sanitizer, we assume that the traitor-tracing scheme is secure against certain stateful pirate decoders (as in Definition 3.1) whereas Dwork et al. only need to trace stateless pirates. Theorem 4.1 also explicitly allows the traitor-tracing scheme to have the relaxed functionality and security properties discussed at the end of Section 3, although it is implicit in Dwork et al. that the relaxed properties are sufficient to prove hardness results.

We now sketch the proof: Every function qcQ(d)q_{c}\in\mathcal{Q}^{(d)} is viewed as a query qc(x)q_{c}(x) on a database row x{0,1}dx\in\{0,1\}^{d}. Assume there is an efficient sanitizer is that is (Q(d),kTT(d))(\mathcal{Q}^{(d)},k_{\mathsf{TT}}(d))-accurate for these queries. The fact that M\mathcal{M} is accurate for these queries will imply that (after small modifications) M\mathcal{M} is a kTTk_{\mathsf{TT}}-available pirate decoder (Definition 3.1). Here is where we differ from Dwork et al., who assume that M\mathcal{M} accurately answers all queries in Q(d)\mathcal{Q}^{(d)}, in which case M\mathcal{M} can be viewed as a stateless pirate decoder (but must solve a harder sanitization problem).

We complete the proof as in Dwork et al. Consider two experiments: In the first, we construct an nn-row database DD by running GenTT(1d)\mathsf{Gen}_{\mathsf{TT}}(1^{d}) to obtain nn user keys, and putting one in each row of DD. Then we run TraceTT\mathsf{Trace_{TT}} on M(D,)\mathcal{M}(D,\cdot) and obtain a user ii. Since M\mathcal{M} is useful, and ΠTT\Pi_{\mathsf{TT}} is secure, we will have that i[n]i\in[n] with probability close to 11, and thus there is an i[n]i^{*}\in[n] such that i=ii=i^{*} with probability 1/n\gtrsim 1/n.

In the second experiment, we construct a database DD^{\prime} exactly as in the first, however we exclude the key sk(i)sk^{(i^{*})}. Since DD and DD^{\prime} differ in only one row, differential privacy requires that TraceTT\mathsf{Trace_{TT}}, run with oracle M(D,)\mathcal{M}(D^{\prime},\cdot), still outputs ii^{*} with probability Ω(1/n)\Omega(1/n). However, in this experiment, ii^{*}, sk(i)sk^{(i^{*})} is no longer given to the pirate decoder, and thus security of ΠTT\Pi_{\mathsf{TT}} says that TraceTT\mathsf{Trace_{TT}}, run with this oracle, must output ii^{*} with probability o(1/n)o(1/n). Thus, we will obtain a contradiction.

Let ΠTT=(GenTT,EncTT,DecTT,TraceTT)\Pi_{\mathsf{TT}}=(\mathsf{Gen}_{\mathsf{TT}},\mathsf{Enc}_{\mathsf{TT}},\mathsf{Dec}_{\mathsf{TT}},\mathsf{Trace_{TT}}) be the assumed traitor-tracing scheme, and assume there exists an efficient, differentially private, (Q(d),kTT(d))(\mathcal{Q}^{(d)},k_{\mathsf{TT}}(d))-accurate sanitizer M\mathcal{M}. We define the pirate decoder PM\mathcal{P}_{\mathcal{M}} as follows:

Next, we claim that if M\mathcal{M} is accurate for Q(d)\mathcal{Q}^{(d)}, then PM\mathcal{P}_{\mathcal{M}} is a useful pirate decoder.

If M\mathcal{M} is (QDecTT,kTT)(\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}},k_{\mathsf{TT}})-accurate, then PM\mathcal{P}_{\mathcal{M}} is a kTTk_{\mathsf{TT}}-useful pirate decoder.

Let sk{0,1}d\vec{sk}\in\{0,1\}^{d} be a set of user keys for ΠTT\Pi_{\mathsf{TT}} and let S[n]S\subseteq[n] be a subset of the users such that Sn1|S|\geq n-1. Suppose cC(d)c\in\mathcal{C}^{(d)} and b{0,1}b\in\{0,1\} are such that for every iSi\in S, DecTT(sk(i),c)=b\mathsf{Dec}_{\mathsf{TT}}(sk^{(i)},c)=b. Then we have that, for DD as in PM\mathcal{P}_{\mathcal{M}},

Let c1,,ckTTc_{1},\dots,c_{k_{\mathsf{TT}}} be a set of ciphertexts, qc1,,qckTTq_{c_{1}},\dots,q_{c_{k_{\mathsf{TT}}}} and a1,,akTTa_{1},\dots,a_{k_{\mathsf{TT}}} be as in PM\mathcal{P}_{\mathcal{M}}. The accuracy of M\mathcal{M} (with constant error α<1/2\alpha<1/2) guarantees that

Since Sn1|S|\geq n-1, o(1/S2)=o(1/n2)o(1/|S|^{2})=o(1/n^{2}). Assuming a1,,akTTa_{1},\dots,a_{k_{\mathsf{TT}}} is accurate up to error α<1/2\alpha<1/2 for qc1,,qckTTq_{c_{1}},\dots,q_{c_{k_{\mathsf{TT}}}}, aja_{j} will be rounded to exactly qcjq_{c_{j}} whenever qcj(D){0,1}q_{c_{j}}(D)\in\{0,1\}. That is,

Thus, PM\mathcal{P}_{\mathcal{M}} is kTTk_{\mathsf{TT}}-useful. This completes the proof of the claim. ∎

Let S(κ)=[n(κ)]{i(κ)}S(\kappa)=[n(\kappa)]\setminus\{i^{*}(\kappa)\} Now we claim that if M\mathcal{M} is differentially private, then TraceTT\mathsf{Trace_{TT}} will output i(κ)i^{*}(\kappa) with significant probability, even PM\mathcal{P}_{\mathcal{M}} is not given the key of user i(κ)i^{*}(\kappa).

If M\mathcal{M} is differentially private (for ε=O(1)\varepsilon=O(1), δ=o(1/n)\delta=o(1/n)), then

Fix any κ\kappa and let kTT=kTT(κ)k_{\mathsf{TT}}=k_{\mathsf{TT}}(\kappa) and i=i(κ)i^{*}=i^{*}(\kappa), S=S(κ)S=S(\kappa) as above. Let D=skD=\vec{sk} and Di=skSD_{-i^{*}}=\vec{sk}_{S}. Take TT to be the set of responses b^1,,b^kTT\widehat{b}_{1},\dots,\widehat{b}_{k_{\mathsf{TT}}} such that TraceTT(sk)\mathsf{Trace_{TT}}(\vec{sk}), after querying its oracle on ciphertexts c1,,ckTTc_{1},\dots,c_{k_{\mathsf{TT}}} and receiving responses b^1,,b^kTT\widehat{b}_{1},\dots,\widehat{b}_{k_{\mathsf{TT}}}, outputs ii^{*} (TT depends on the coins of GenTT\mathsf{Gen}_{\mathsf{TT}} and TraceTT\mathsf{Trace_{TT}}). By differential privacy, we have that

Note that the queries constructed by PM\mathcal{P}_{\mathcal{M}} depends only on c1,,ckTTc_{1},\dots,c_{k_{\mathsf{TT}}}, not on skS\vec{sk}_{S}. Also note that the final rounding step does not depend on the input at all. Thus, for every T{0,1}kTTT\subseteq\{0,1\}^{k_{\mathsf{TT}}}

The claim follows by combining with (1). ∎

To complete the proof, notice that the probability in Claim 4.3 is exactly the probability that TraceTT\mathsf{Trace_{TT}} outputs the user ii^{*}, when given the oracle PM(skS)\mathcal{P}_{\mathcal{M}}(\vec{sk}_{S}), for S=[n]{i}S=[n]\setminus\left\{i^{*}\right\}. However, the fact that PM\mathcal{P}_{\mathcal{M}} is efficient, and ΠTT\Pi_{\mathsf{TT}} is a secure traitor-tracing scheme implies that this probability is o(1/n)o(1/n). Thus we have obtained a contradiction. This completes the proof of the Theorem. ∎

Constructions of Traitor-Tracing Schemes

In this section we show how to construct traitor-tracing schemes that satisfy Definition 3.2, and thus can be used to instantiate Theorem 4.1. First we will informally describe a simple construction that requires the tracing algorithm to make a sub-optimal number of queries, but will hopefully give the reader more intuition about the construction and how it differs from previous constructions of traitor-tracing schemes. Then we will give precise definitions of the encryption schemes (Section 5.2) and fingerprinting codes (Section 5.3) required for our construction. Then we will present the final construction more formally (Section 5.4) and prove its security. Finally, we will use the weakened security requirements of the encryption scheme to show that our traitor-tracing scheme can be instantiated so that decryption is computable by constant-depth circuits (Section 5.6).

Our construction is a variant of the most basic tracing traitor-tracing scheme [CFN94]. Start with any encryption scheme (Gen,Enc,Dec)(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec}). Generate an independent key sk(i)\mboxRGensk^{(i)}\leftarrow_{\mbox{\tiny R}}\mathsf{Gen} for each user (we will ignore the security parameter in the informal description). To encrypt a bit b{0,1}b\in\{0,1\}, we encrypt it under each user’s key independently and concatenate the ciphertexts. That is

Clearly each user can decrypt the ciphertext by applying Dec\mathsf{Dec}, as long as she knows which part of the ciphertext to decrypt.

Now we describe how an available pirate decoder for this scheme can be traced. As with all traitor-tracing schemes, we will form ciphertexts that different users would decrypt differently, assuming they decrypt as intended using the algorithm DecTT(sk(i),)\mathsf{Dec}_{\mathsf{TT}}(sk^{(i)},\cdot). We can do so with the following algorithm:

for i=0,1,,ni=0,1,\dots,n. The algorithm forms a ciphertext that users 1,,i1,\dots,i will decrypt to and users i+1,,ni+1,\dots,n will decrypt to 11.

The tracing algorithm generates a random sequence i1,,ikTT{0,1,,n}i_{1},\dots,i_{k_{\mathsf{TT}}}\in\left\{0,1,\dots,n\right\}, for kTT=(n+1)sk_{\mathsf{TT}}=(n+1)s, such that each element of {0,1,,n}\left\{0,1,\dots,n\right\} appears exactly ss times, where ss is a parameter to be chosen later. Then, for every jj it generates a ciphertext cj\mboxRTrEncTT(sk,ij)c_{j}\leftarrow_{\mbox{\tiny R}}\mathsf{TrEnc_{TT}}(\vec{sk},i_{j}). Next, it queries PskS(c1,,ckTT)\mathcal{P}_{\vec{sk}_{S}}(c_{1},\dots,c_{k_{\mathsf{TT}}}). Given the output of the pirate, the tracing algorithm computes

for i=0,1,,ni=0,1,\dots,n. Finally, the tracing algorithm outputs any ii^{*} such that PiPi11/nP_{i^{*}}-P_{i^{*}-1}\geq 1/n.

The tracing algorithm generates a random sequence of indices i1,,ikTT{0,1,,n}i_{1},\dots,i_{k_{\mathsf{TT}}}\in\left\{0,1,\dots,n\right\}, for kTT=(n+1)sk_{\mathsf{TT}}=(n+1)s, such that each element of {0,1,,n}\left\{0,1,\dots,n\right\} appears exactly ss times, where ss is a parameter to be chosen later. Then, for every jj it generates a ciphertext cj\mboxRTrEncTT(sk,ij)c_{j}\leftarrow_{\mbox{\tiny R}}\mathsf{TrEnc_{TT}}(\vec{sk},i_{j}). Next, it queries PskS(c1,,ckTT)\mathcal{P}_{\vec{sk}_{S}}(c_{1},\dots,c_{k_{\mathsf{TT}}}). Given the output of the pirate, the tracing algorithm computes Pi=1sj:ij=iP(sk,c1,,ckTT)jP_{i}=\frac{1}{s}\sum_{j:i_{j}=i}\mathcal{P}(\vec{sk},c_{1},\dots,c_{k_{\mathsf{TT}}})_{j} for i=0,1,,ni=0,1,\dots,n. Finally, the tracing algorithm outputs any ii^{*} such that PiPi11/nP_{i^{*}}-P_{i^{*}-1}\geq 1/n.

Now we explain why this algorithm successfully traces efficient available pirate decoders. Notice that if we choose cc according to TrEncTT(sk,0)\mathsf{TrEnc_{TT}}(\vec{sk},0), then every user decrypts cc to , so P0=0P_{0}=0. Similarly, Pn=1P_{n}=1. Thus, there exists ii^{*} such that PiPi11/nP_{i^{*}}-P_{i^{*}-1}\geq 1/n. Next, we argue that ii^{*} is in SS except with small probability. Notice that TrEncTT(sk,i)\mathsf{TrEnc_{TT}}(\vec{sk},i^{*}) and TrEncTT(sk,i1)\mathsf{TrEnc_{TT}}(\vec{sk},i^{*}-1) differ only in the message encrypted under key sk(i)sk^{(i^{*})}, so if i∉Si^{*}\not\in S, this key is unknown to the pirate decoder. The security of the encryption scheme (made precise in Definition 5.2) guarantees that if sk(i)sk^{(i^{*})} is unknown to an efficient pirate, then we can replace kTTk_{\mathsf{TT}} uses of Enc(sk(i),1)\mathsf{Enc}(sk^{(i^{*})},1) with Enc(sk(i),0)\mathsf{Enc}(sk^{(i^{*})},0), and this change will only affect the success probability of the pirate by o(1/n)o(1/n). But after we make this replacement, TrEncTT(sk,i)\mathsf{TrEnc_{TT}}(\vec{sk},i^{*}) and TrEncTT(sk,i1)\mathsf{TrEnc_{TT}}(\vec{sk},i^{*}-1) are (perfectly, information-theoretically) indistinguishable to the pirate. Since the sequence of indices i1,,ikTTi_{1},\dots,i_{k_{\mathsf{TT}}} is random, the pirate has no information about which elements iji_{j} are ii^{*} and which are i1i^{*}-1. Thus, if the pirate wants to make PiP_{i^{*}} larger than Pi1P_{i^{*}-1}, for some i∉Si^{*}\not\in S, she can do no better than to “guess”. If we take s=O~(n2)s=\widetilde{O}(n^{2}), and apply a Chernoff bound, it turns out that for every i∉Si\not\in S, PiPi1=o(1/n)P_{i}-P_{i-1}=o(1/n). This conclusion also holds after we take into account the security loss of the encryption scheme, which is o(1/n)o(1/n). Thus, the scheme we described is a secure traitor-tracing scheme in the sense of Definition 3.2.

In arguing that the scheme is secure, we used the fact that P0=0P_{0}=0 and Pn=1P_{n}=1 no matter what other queries are made to the pirate. In many applications, this assumption would not be reasonable. However, when the pirate is derived from an accurate sanitizer, this condition will be satisfied.

For this scheme, the tracer makes (n+1)s=O~(n3)(n+1)s=\widetilde{O}(n^{3}) queries. Before proceeding, we will explain how to reduce the number of queries from O~(n3)\widetilde{O}(n^{3}) to O~(n2)\widetilde{O}(n^{2}). The high-level argument that the scheme is secure used two facts:

By the availability of the pirate decoder, if every user in SS would decrypt a ciphertext cc to bb, then the pirate decrypts cc to bb (in the above, P0=0,Pn=1P_{0}=0,P_{n}=1).

Because of the encryption, a pirate decoder without user ii’s key “doesn’t know” how user ii would decrypt each ciphertext.

Systems leveraging these two properties to identify a colluding user are called fingerprinting codes [BS98], and have been studied extensively. In fact, the tracing algorithm we described is identical to the tracing algorithm we define in Section 5.4, but instantiated with the fingerprinting code of Boneh and Shaw [BS98], which has length O~(n3)\widetilde{O}(n^{3}). Tardos [Tar08] constructed shorter fingerprinting codes, with length O~(n2)\widetilde{O}(n^{2}), which we use to reduce the number of queries to trace.

Next we define the precise security requirement we need out of the underlying encryption scheme, and then we will give a formal definition of fingerprinting codes.

2 Encryption Schemes

We will build our traitor-tracing scheme from a suitable encryption scheme. An encryption scheme is tuple of efficient algorithms (Gen,Enc,Dec)(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec}). All the algorithms may be randomized except for Dec\mathsf{Dec}. The scheme has the following syntactic properties:

First we define (perfectly) correct decryptionIt would not substantially affect our results if Dec\mathsf{Dec} were allowed to fail with negligible probability, however we will assume perfect correctness for ease of presentation.

We require that our schemes have the following kEnck_{\mathsf{Enc}}-message security property.

Notice that we do not require ΠEnc\Pi_{\mathsf{Enc}} to be secure against adversaries that are given Enc(sk,)\mathsf{Enc}(sk,\cdot) as an oracle. That is, we do not require CPA security.

3 Fingerprinting Codes

As we alluded to above, our tracing algorithm will use a fingerprinting code, introduced by Boneh and Shaw [BS98]. A fingerprinting code is a pair of efficient (possibly randomized) algorithms (GenFP,TraceFP)(\mathsf{Gen}_{\mathsf{FP}},\mathsf{Trace_{FP}}) with the following syntax.

Informally, if all users in SS have a (resp. 11) in the jj-th symbol of their codeword, then they must produce a word with (resp. 11) as the jj-th symbol. We also define the critical positions to be the set of indices for which this constraint is binding. That is,

The security of a fingerprinting code asserts that an adversary who is given a subset WSW_{S} of the codewords should not be able to produce an element of F(WS)F(W_{S}) that does not trace to a user iSi\in S. More formally,

where the two executions of AFP\mathcal{A}_{\mathsf{FP}} are understood to be the same.

Tardos [Tar08] gave a construction of fingerprinting codes of essentially optimal length, improving on the original construction of Boneh and Shaw [BS98].

4 The Traitor-Tracing Scheme

We are now ready to state the construction more formally. The key generation, encryption, and decryption algorithms are as we described in the sketch (Section 5.1), and stated below.

If P\mathcal{P} is available, its output will be a feasible codeword for WSW_{S}. To see this, recall that if every user iSi\in S decrypts cjc_{j} to the same bit, then an available pirate decoder P(skS,)\mathcal{P}(\vec{sk}_{S},\cdot), decrypts cjc_{j} to that bit. However, the critical positions of WSW_{S} are exactly those for which every user iSi\in S has the same symbol in position jj. Thus, the codeword returned by the pirate is feasible, and the fingerprinting code’s tracing algorithm can identify a user in SS.

The catch in this argument is that TrEncTT\mathsf{TrEnc_{TT}} takes all of WW as input, however an attacker for the fingerprinting code is only allowed to see WSW_{S}, and thus cannot simulate TrEncTT\mathsf{TrEnc_{TT}} in a security reduction. However, if P\mathcal{P} only has keys skS\vec{sk}_{S}, and i∉Si\not\in S, then an efficient P\mathcal{P} cannot decrypt the ii-th component of a ciphertext c=(c(1),,c(n))c=(c^{(1)},\dots,c^{(n)}). But these are the only components that depend on w(i)w^{(i)}. So w(i)w^{(i)} is computationally hidden from P\mathcal{P} anyway, and we could replace that codeword with a string of zeros without significantly affecting the success probability of P\mathcal{P}. Formalizing this intuition will yield a valid attacker for the fingerprinting code, and obtain a contradiction.

the encryption scheme and fingerprinting code have sufficiently strong security,

the encryption scheme is secure for sufficiently many queries,

the encryption scheme is secure against adversaries whose running time is as long as the pirate decoder’s, for every a>0a>0,

Then ΠTT\Pi_{\mathsf{TT}} instantiated with ΠEnc\Pi_{\mathsf{Enc}} and ΠFP\Pi_{\mathsf{FP}} is an (n,kTT)(n,k_{\mathsf{TT}})-traitor-tracing scheme.

where the probability is also taken over the coins of P\mathcal{P} and TraceTT\mathsf{Trace_{TT}}. Since there are only n(κ)n(\kappa) such sets, for a randomly chosen i\mboxR[n(κ)]i\leftarrow_{\mbox{\tiny R}}[n(\kappa)], we have

Both of these probabilities are also taken over the coins of P\mathcal{P} and TraceTT\mathsf{Trace_{TT}}. We will show that such a pirate decoder must either violate the security of the encryption scheme or violate the security of the fingerprinting code.

Consider the following algorithm AFPP\mathcal{A}_{\mathsf{FP}}^{\mathcal{P}}

Since the fingerprinting code is secure, for a randomly chosen i\mboxR[n]i\leftarrow_{\mbox{\tiny R}}[n] (in fact, for every i[n]i\in[n]),

This condition could hold simply because AFP\mathcal{A}_{\mathsf{FP}} outputs an infeasible codeword with high probability, not because we are successfully tracing a user in SS. The next claim states that if P\mathcal{P} is an available pirate decoder, then this is not the case.

If P\mathcal{P} is kTTk_{\mathsf{TT}}-useful, then, by definition, for every sk=(sk(1),,sk(n))\vec{sk}=(sk^{(1)},\dots,sk^{(n)}), every i[n]i\subseteq[n], and every c1,,ckTTc_{1},\dots,c_{k_{\mathsf{TT}}}, if every user iii^{\prime}\neq i decrypts some cjc_{j} to the same bit bjb_{j}, then so does P(ski,)\mathcal{P}(\vec{sk}_{-i},\cdot) (with high probability). That is, for b^1,,b^kTT\mboxRP(ski,c1,,ckTT)\widehat{b}_{1},\dots,\widehat{b}_{k_{\mathsf{TT}}}\leftarrow_{\mbox{\tiny R}}\mathcal{P}(\vec{sk}_{-i},c_{1},\dots,c_{k_{\mathsf{TT}}}),

Since P\mathcal{P} outputs feasible codewords with high probability, we obtain

by combining the previous claim with (3).

There are only two differences between the success of the pirate decoder in fooling TraceTT\mathsf{Trace_{TT}} and the success of the fingerprinting adversary in fooling TraceFP\mathsf{Trace_{FP}} (in the experiment described in (5)): The first is that in the traitor-tracing security condition, P\mathcal{P} is given ski\vec{sk}_{-i} for a fixed i[n]i\in[n], whereas the fingerprinting adversary is given WiW_{-i} for a random i\mboxR[n]i\leftarrow_{\mbox{\tiny R}}[n]. This difference only affects the error by a factor of nn. That is, for every i[n]i\in[n]

The second difference is that in TraceTT\mathsf{Trace_{TT}}, the ciphertexts given to the pirate are generated by TrEncTT(sk,W)\mathsf{TrEnc_{TT}}(\vec{sk},W) whereas in AFP\mathcal{A}_{\mathsf{FP}} the ciphertexts are generated by TrEncTT(sk,W~i)\mathsf{TrEnc_{TT}}(\vec{sk},\widetilde{W}_{-i}). But these ciphertexts only differ in the ii-th component, and sk(i)sk^{(i)} is unknown to P\mathcal{P}, so this does not affect the behavior of the pirate decoder significantly. This fact is established in the following claim.

Let ΠEnc=(Gen,Enc,Dec)\Pi_{\mathsf{Enc}}=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec}) be the encryption scheme. The main observation required to prove the claim is that the two experiments we want to relate can both be simulated without sk(i)sk^{(i)}, given challenges for the encryption scheme (Definition 5.2). Fix a codebook W\mboxRGenFP(1n)W\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}_{\mathsf{FP}}(1^{n}). Now consider two distributions on ciphertexts (of ΠEnc\Pi_{\mathsf{Enc}}): In either case, generate a random key sk(i)\mboxRGen(1κ)sk^{(i)}\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}(1^{\kappa})

We now complete the proof of the theorem by combining Equation (5) and Claim 5.8.

Recall that the two goals of constructing a new traitor-tracing scheme were to trace stateful pirates and to reduce the complexity of decryption. We addressed tracing of stateful pirates in the previous section, and now we turn to the complexity of decryption. We do so by instantiating the traitor-tracing scheme with various encryption schemes and making two observations: 1) The type of encryption schemes we require are sufficiently weak that there already exist plausible candidates with a very simple decryption operation, and 2) Decryption for the traitor-tracing scheme is not much more complex than decryption for the underlying encryption scheme. We summarize the second point with the following simple lemma.

Let ΠTT\Pi_{\mathsf{TT}} be as defined, with ΠEnc\Pi_{\mathsf{Enc}} as its underlying encryption scheme. Let (sk,i)=sk{0,1}κ(\overline{sk},i)=sk\in\{0,1\}^{\kappa} and c=(c(1),,c(n))C(κ)c=(c^{(1)},\dots,c^{(n)})\in\mathcal{C}^{(\kappa)} be any user key and ciphertext for ΠTT\Pi_{\mathsf{TT}}. Then

Here, the function 1x(y)\mathbf{1}_{x}(y) takes the value 11 if y=xy=x and otherwise. The lemma follows directly from the construction of DecTT\mathsf{Dec}_{\mathsf{TT}}. Also note that the function 1i ⁣:{0,1}logn{0,1}\mathbf{1}_{i^{\prime}}\colon\{0,1\}^{\lceil\log n\rceil}\to\{0,1\} is just a conjunction of logn\lceil\log n\rceil bits (a single gate of fan-in O(logn)O(\log n)), and we need to compute nn of these functions. In addition to computing 1i\mathbf{1}_{i^{\prime}} and Decc(i)\mathsf{Dec}_{c^{(i^{\prime})}}, there are nn conjunctions and a single outer disjunction. Thus we add an additional n+1n+1 gates, compute decryption nn times, and increase the depth by 22. Hence, an intuitive summary of the lemma is that if Dec\mathsf{Dec} can be implemented by circuits of size ss and depth hh, DecTT\mathsf{Dec}_{\mathsf{TT}} can be implemented by circuits of size n(s+O(logn))=O~(ns)n\cdot(s+O(\log n))=\widetilde{O}(ns) and depth h+2h+2. This summary will be precise enough to state our main results.

By combining Lemma 5.9 with Theorem 5.6, we easily obtain the following corollary.

Theorem 1.1 in the introduction follows by combining Theorem 4.1 with Corollary 5.10.

We will now consider the possibility of constructing a traitor-tracing scheme where the decryption functionality can be implemented by circuits of constant depth, and thus obtaining hardness results for generic sanitizers that are efficient for constant-depth queries (Theorem 1.2). First, we summarize our observation that the traitor-tracing scheme almost preserves the depth of the decryption function.

The corollary is clear from Lemma 5.9 and Theorem 5.6.

The corollary is not interesting without an encryption scheme that can be decrypted by constant-depth circuits. However, we observe that such a scheme (meeting our relaxed security criteria) can be constructed from a sufficiently good local pseudorandom generator (PRG). A recent result of Applebaum [App12] gave the first plausible candidate construction of a local PRG for the range of parameters we need, giving plausibility to the assumption that such PRGs (and, as we show, traitor-tracing schemes with constant-depth decryption) exist. We note that local PRGs actually imply encryption schemes with local decryption, which is stronger than just constant-depth decryption. Although it may be significantly easier to construct encryption schemes that only have constant-depth decryption, we are not aware of any other ways of constructing such a scheme.

If, in addition, if each bit of the output depends only on some set of LL bits of the input, then G\mathsf{G} is a (εPRG,L)(\varepsilon_{\mathsf{PRG}},L)-local pseudorandom generator.

It is a well known result in Cryptography that pseudorandom generators imply encryption schemes satisfying Definition 5.2 (for certain ranges of parameters). We will use a particular construction whose decryption can be computed in constant-depth whenever the underlying PRG is locally-computable (or, more generally, computable by constant-depth circuits). The construction is the standard “computational one-time pad”, however we give a construction to verify that the decryption can be computed by constant-depth circuits.

The security follows from standard arguments: If we choose a random s\mboxR{0,1}κs\leftarrow_{\mbox{\tiny R}}\{0,1\}^{\kappa}, then G(s)\mathsf{G}(s) is indistinguishable from uniform up to error εPRG\varepsilon_{\mathsf{PRG}}. If we generate kEnck_{\mathsf{Enc}} encryptions with key ss, and no two encryptions use the same choice of rr, then the output is indistinguishable from encryptions using uniform random bits in place of G(s)\mathsf{G}(s). If we use uniform random bits in place of G\mathsf{G}, then the message is information-theoretically hidden. The probability that no two encryptions out of kEnck_{\mathsf{Enc}} use the same choice of rr is at most kEnc2/sPRGk_{\mathsf{Enc}}^{2}/s_{\mathsf{PRG}}, so we lose this term in the security of the encryption scheme.

Let 1i(j)\mathbf{1}_{i}(j) be the indicator variable for the condition j=ij=i. For every c=(r,b)Cc=(r,b)\in\mathcal{C}, we can write

Combining Corollary 5.11 with Lemma 5.13 easily yields the following corollary.

Theorem 1.2 in the introduction follows by combining Theorem 4.1 with Corollary 5.14.

Acknowledgements

We thank Cynthia Dwork for suggesting that we look further at the connection between traitor-tracing and differential privacy. We thank Salil Vadhan for helpful discussions about the connection between traitor-tracing and differential privacy, and about the presentation of this work. We also thank Dan Boneh, Moritz Hardt, Hart Montgomery, Ananth Raghunathan, Aaron Roth, Guy Rothblum, and Thomas Steinke for helpful discussions.

References

Appendix A Additional Related Work

In this appendix, we elaborate on the relationship between sanitizers, interactive sanitizers, and one-shot sanitizers, and on the relationship between our results and prior work on the complexity of differentially private sanitization.

Dwork, Naor, and Vadhan [DNV12] gave information theoretic lower bounds for stateless sanitizers. These are sanitizers that take kk queries as input, but whose answers to each query do not depend on the other k1k-1 input queries.

Another interpretation of our results, which can be used to give an alternative proof of our results, is that we construct a family of queries for which “keeping state doesn’t help”.

They consider a game where an nn-row database is chosen at random, and a random subset of n1n-1 of those rows is given to the attacker. The attacker wants to violate privacy by recovering the nn-th row. To do so, the attacker chooses n2\sim n^{2} queries (randomly, from a distribution that depends on the n1n-1 known rows) and requests answers to these queries. Using these answers, they show that there is a particular way for the attacker to (randomly) guess the missing row, that will succeed with sufficient probability to constitute a privacy violation. Their argument is in two steps: 1) The expected correlation between the answers given by a stateless sanitizer and the value of the queries on the missing row is significant. 2) A stateless sanitizer cannot give answers that are correlated with too many rows that are not in the database. Combining these two steps shows that the attacker has a significant chance of identifying the nn-th row from its correlation with the answers.

Typically, the intuition behind the analysis of traitor-tracing schemes follows roughly the same two steps: 1) There will be some correlation between the decryptions returned by the efficient pirate and the decryptions that would be returned by some member of the coalition (using only his own key). 2) There will not be significant correlation between the decryptions returned by the efficient pirate and the decryptions that would be returned by any user not a member of the coalition. This is exactly the intuition we sketched in the simpler (sub-optimal) construction. In the final construction, much of this argument is made “inside” the construction of the fingerprinting code. If we “unrolled” the analysis of the fingerprinting code directly into our construction, we would make exactly the same arguments.

Other Types of Sanitizers

The three variants we’ve described satisfy some interesting relationships. First, if we have an algorithm that runs in time TT and releases a summary that enables an analyst to answer any query in Q\mathcal{Q} in time TT, then we also have an interactive sanitizer that runs in time 2T2T per query that answers any sequence of kk queries from Q\mathcal{Q}. Secondly, if we have an interactive sanitizer that answers up to KK queries from Q\mathcal{Q} in time TT per query, then we also have a non-interactive sanitizer that answers any kKk\leq K queries from Q\mathcal{Q} in time TkTk. Thus, holding Q\mathcal{Q} fixed and assuming kn2k\gg n^{2}, the problem we consider is the easiest form of private counting query release, and the lower bounds we prove imply lower bounds for the other variants.

Indeed, in the data release problem one cannot take Q\mathcal{Q} to include all efficiently computable counting queries. Sanitizers (both interactive and non-interactive) are supposed to circumvent this problem by allowing the queries to be arbitrary, but only answering the kk queries that are needed. However our results show that they can only circumvent the problem if we allow superpolynomial computation or we take kn2k\lesssim n^{2}.