Answering n^{2+o(1)} Counting Queries with Differential Privacy is Hard

Jonathan Ullman

Introduction

Consider a database $D\in(\{0,1\}^{d})^{n}$ , in which each of the $n$ rows corresponds to an individual’s record, and each record consists of $d$ binary attributes. The goal of privacy-preserving data analysis is to enable rich statistical analyses on the database while protecting the privacy of the individuals. It is especially desirable to achieve differential privacy [DMNS06], which guarantees that no individual’s data has a significant influence on the information released about the database.

Some of the most basic statistics on a database are counting queries, which are queries of the form, “What fraction of individual records in $D$ satisfy some property $q$ ?” In particular we would like to construct differentially private sanitizers that, given a database $D$ and $k$ counting queries $q_{1},\dots,q_{k}$ from a family $\mathcal{Q}$ , outputs an approximate answer to each of the queries. We would like the number of queries, $k$ , to be as large as possible, and the set of feasible queries, $\mathcal{Q}$ , to be as general as possible. Ideally, $\mathcal{Q}$ , would contain all counting queries.It may require super-polynomial time just to evaluate an arbitrary counting query, which would rule out efficiency for reasons that have nothing to do with privacy. For this discussion, we restrict attention to queries that are efficiently computable, so are not the bottleneck in the computation. Moreover, we would like the algorithm to run as efficiently as possible.

Some of the earliest work in differential privacy [DMNS06] gave an efficient sanitizer—the so-called Laplace Mechanism. The Laplace Mechanism answers any set of $k$ arbitrary efficiently computable counting queries by perturbing the answers with appropriately calibrated random noise, providing good accuracy (say, within $\pm.01$ of the true answer) as long as $k\lesssim n^{2}$ .

Unfortunately, we show that this is not the case—there is no efficient, differentially private algorithm that takes a database $D\in(\{0,1\}^{d})^{n}$ , and $\widetilde{\Theta}(n^{2})$ arbitrary, efficiently computable counting queries as input and outputs an approximate answer to each of the queries. One way to summarize our results is that, unless we restrict the set $\mathcal{Q}$ of allowable queries, or allow exponential running time, then the Laplace Mechanism is essentially the best possible algorithm for answering counting queries with differential privacy.

We also show that, the same theorem holds even for queries that are computable by unbounded-fan-in circuits of depth $6$ over the basis $\{\land,\lor,\neg\}$ (a subset of the well-studied class $AC^{0}$ ), albeit under a stronger (but still plausible) cryptographic assumption.

Theorem 1.2 should be contrasted with the results of Hardt, Rothblum, and Servedio [HRS12] as well as Thaler, Ullman, and Vadhan [TUV12], which give efficient sanitizers for answering $n^{\Omega(\sqrt{k})}\gg n^{2}$ monotone $k$ -way conjunction queries, a much simpler class than polynomial-size depth-6 circuits.A monotone $k$ -way conjunction query on a database $D\in(\{0,1\}^{d})^{*}$ is specified by a set of positions $S\subseteq[d]$ , $|S|=k\leq d$ , and asks “What fraction of records in $D$ have a $1$ in every position in $S$ ?”.

We prove our results by building on the connection between differentially private sanitizers for counting queries and traitor-tracing schemes discovered by Dwork et al. [DNR+09]. Traitor-tracing schemes were introduced by Chor, Fiat, and Naor [CFN94] for the purpose of identifying pirates who violate copyright restrictions. Roughly speaking, a (fully collusion-resilient) traitor-tracing scheme allows a sender to generate keys for $n$ users so that 1) the sender can broadcast encrypted messages that can be decrypted by any user, and 2) any efficient pirate decoder capable of decrypting messages can be traced to at least one of the users who contributed a key to it, even if an arbitrary coalition of the users combined their keys in an arbitrary efficient manner to construct the decoder.

Dwork et al. show that the existence of traitor-tracing schemes implies hardness results for one-shot sanitizers. Very informally, they argue as follows: Suppose a coalition of users takes their keys and builds a database $D\in(\{0,1\}^{d})^{n}$ where each record contains one of their user keys. The family $\mathcal{Q}$ will contain a query $q_{c}$ for each possible ciphertext $c$ . The query $q_{c}$ asks “What fraction of the records (user keys) in $D$ would decrypt the ciphertext $c$ to the message $1$ ?” Every user can decrypt, so if the sender encrypts a message $m\in\{0,1\}$ as a ciphertext $c$ , then every user will decrypt $c$ to $m$ . Thus the answer to the counting query, $q_{c}$ , will be $m$ .

Suppose there were an efficient one-shot sanitizer for $\mathcal{Q}$ . Then the coalition could use it to efficiently produce a summary of the database $D$ that enables one to efficiently compute an approximate answer to every query $q_{c}$ , which would also allow one to efficiently decrypt the ciphertext. Such a summary can be viewed as an efficient pirate decoder, and thus the tracing algorithm can use the summary to trace one of the users in the coalition. However, if there is a way to identify one of the users in the database from the summary, then the summary is not differentially private.

In order to instantiate their result, they need a traitor-tracing scheme. Since $\mathcal{Q}$ contains a query for every ciphertext, the parameter to optimize is the length of the ciphertexts. Using the fully collusion-resilient traitor-tracing scheme of Boneh, Sahai, and Waters [BSW06], which has ciphertexts of length $\widetilde{O}(\sqrt{n})$ , they obtain a family of queries of size $2^{\widetilde{O}(\sqrt{n})}$ for which there is no efficient one-shot sanitizer. Dwork et al. also discovered a partial converse—proving hardness of one-shot sanitization for a smaller family of queries requires constructing traitor-tracing schemes with shorter ciphertexts, which is a seemingly difficult open problem.

Our Approach

In our setting of sanitization (rather than one-shot sanitization, as studied by Dwork et al. [DNR+09]), we don’t expect to answer every query in $\mathcal{Q}$ , only a much smaller set of queries requested by the analyst. At first glance, this should make answering the queries much easier, and thus make it more difficult to demonstrate hardness. However, the attacker does have the power to choose the queries which he wants answered, and can choose queries that are most difficult to sanitize. Our first observation is that in the traitor-tracing scenario, the tracing algorithms only query the pirate decoder on a polynomial number of ciphertexts, which are randomly chosen and depend on the particular keys that were instantiated for the scheme. For many schemes, even $\widetilde{O}(n^{2})$ queries is sufficient. Thus it would seem that the tracing algorithm could simply decide which queries it will make, give those queries as input to the sanitizer, and then use the answers to those queries to identify a user and violate differential privacy.

However, this intuition ignores an important issue. Many traitor-tracing schemes (including [BSW06]) can only trace stateless pirate decoders, which essentially commit to a response to each possible query (or a distribution over responses) once and for all. For one-shot sanitizers, the private summary is necessarily stateless, and thus the result of Dwork et al. can be instantiated with any scheme that allows tracing of stateless pirate decoders. However, an arbitrary sanitizer might give answers that depend on the sequence of queries. Thus, in order to prove our results, we will need a traitor-tracing scheme that can trace stateful pirate decoders.

The problem of tracing stateful pirates is quite natural even without the implications for private data analysis. Indeed, this problem has been studied in the literature, originally by Kiayias and Yung [KY01]. They considered pirates that can abort and record history. However, their solution, and all others known, does not apply to our specific setting due to a certain “watermarking assumption” that doesn’t apply when proving hardness-of-sanitization (see discussion below). To address this problem, we also refine the basic connection between traitor-tracing schemes and differential privacy by showing that, in many respects, fairly weak traitor-tracing schemes suffice to establish the hardness of preserving privacy. In particular, although the pirate decoder obtained from a sanitizer may be stateful and record history, the accuracy requirement of the sanitizer means that the corresponding pirate decoder cannot abort, which will make it easier to construct a traitor-tracing scheme for these kinds of pirates. Indeed, we construct such a scheme to establish Theorem 1.1.

The scheme also has weakened requirements in other respects, having nothing to do with the statefulness of the pirate or the tracing algorithm. These weakened requirements allow us to reduce the complexity of the decryption, which means that the queries used by the attacker do not need to be arbitrary polynomial-size circuits, but instead can be circuits of constant depth, which allows us to establish Theorem 1.2. Another technical issue arises in that all $k$ queries must be given to the sanitizer at once, whereas tracing algorithms typically are allowed to query the pirate interactively. However, we are able to show that the scheme we construct can be traced using one round of queries. See Sections 3.1 and 4 for a precise statement of the kind of traitor-tracing scheme that suffices and Section 5 for our construction.

Our construction is based on a well-known fully collusion resilient traitor-tracing scheme [CFN94], but with a modified tracing algorithm. The tracing algorithm uses fingerprinting codes [BS98, Tar08], which have been employed before in the context of traitor-tracing and content distribution, but our tracing algorithm is different from all those we are aware of. The resulting scheme allows for tracing with a minimal number of non-adaptively chosen queries, achieves tracing without context-specific watermarking assumptions, simplifies the decryption circuit (at the expense of weakening the security parameters and functionality). The restriction to non-aborting pirates may not be so natural in the setting of content distribution, which may explain why the scheme was not previously known.

2 Related Work

In addition to the hardness results for one-shot sanitizations [DNR+09], which apply to arbitrary one-shot sanitizers, there are several hardness-of-sanitization results for restricted classes of sanitizers. Dwork et al. proved stronger hardness results for sanitizers whose output is a “synthetic database”—roughly, a new database (of the same dimensions) that approximately preserves the answer to some set of queries. Their results were extended by Ullman and Vadhan [UV11], who showed that it is hard to generate a private synthetic database that is accurate for essentially any non-trivial family of queries, even $2$ -literal conjunctions.

Gupta et al. [GHRU11] considered algorithms that can only access the database by making a polynomial number of “statistical queries” (essentially counting queries). They showed that such algorithms cannot be a one-shot sanitizer (even ignoring privacy constraints) that approximately answers certain simple families of counting queries with high accuracy.

Finally, Dwork, Naor, and Vadhan [DNV12] gave information-theoretic lower bounds for stateless sanitizers, which take $k$ queries as input, but whose answers to each query do not depend on the other $k-1$ input queries. They showed that (even computationally unbounded) stateless sanitizers can answer at most $\widetilde{O}(n^{2})$ queries with non-trivial accuracy, while satisfying differential privacy. The Laplace Mechanism is a stateless sanitizer that answers $\widetilde{\Omega}(n^{2})$ queries, and thus their result is tight in this respect. Although their result is information theoretic, and considers a highly restricted type of sanitizer, their techniques are related to ours. We elaborate on this connection in the appendix.

Preliminaries

Let a database $D\in(\{0,1\}^{d})^{n}$ be a collection of $n$ rows $(x^{(1)},\dots,x^{(n)})\in\{0,1\}^{d}$ . We say that two databases $D,D^{\prime}\in(\{0,1\}^{d})^{n}$ are adjacent if they differ only on a single row, and we denote this by $D\sim D^{\prime}$ .

Let $\mathcal{M}\colon(\{0,1\}^{d})^{n}\to\mathcal{R}$ be a randomized algorithm that takes a database as input (where $n$ and $d$ are varying parameters). $\mathcal{M}$ is $(\varepsilon,\delta)$ -differentially private if for every two adjacent databases $D\sim D^{\prime}$ and every subset $S\subseteq\mathcal{R}$ ,

If $\mathcal{M}$ is $(\varepsilon,\delta)$ -differentially private for some functions $\varepsilon=\varepsilon(n)=O(1)$ , $\delta=\delta(n)=o(1/n)$ , we will drop the parameters $\varepsilon$ and $\delta$ and say that $\mathcal{M}$ is differentially private.

The choice of $\varepsilon=O(1),\delta=o(1/n)$ is a fairly weak setting of the privacy parameters, and most known constructions of differentially private mechanisms for various tasks give quantitatively stronger privacy guarantees. These parameters are essentially the weakest possible, as $(\varepsilon,\delta)$ -differentially privacy is not a satisfactory privacy guarantee for $\varepsilon=\omega(1)$ or $\delta=\Omega(1/n)$ . That our lower bounds apply to the parameters specified in Definition 2.1 makes our results stronger.

Sanitizers for Counting Queries

Since an algorithm that always outputs $\bot$ satisfies Definition 2.1, we also need to specify what it means for the sanitizer to be useful. In this paper we study sanitizers that give accurate answers to counting queries. A counting query on $\{0,1\}^{d}$ is defined by a predicate $q\colon\{0,1\}^{d}\to\{0,1\}$ . Abusing notation, we define the evaluation of the query $q$ on a database $D=(x^{(1)},\dots,x^{(n)})\in(\{0,1\}^{d})^{n}$ to be

Now we formally define what it means for a sanitizer to give accurate answers.

Let $D$ be a database and $q_{1},\dots,q_{k}$ be a set of counting queries. A sequence of answers $a_{1},\dots,a_{k}$ is $\alpha$ -accurate for $q_{1},\dots,q_{k}$ on $D$ if

If $\mathcal{M}$ is $(\alpha,\beta,\mathcal{Q},k)$ -accurate for any (constant) $\alpha<1/2$ and $\beta=\beta(n)=o(1/n^{2})$ , we will drop $\alpha$ and $\beta$ and say that $\mathcal{M}$ is $(\mathcal{Q},k)$ -accurate.

The choice of $\alpha<1/2,\beta=o(1/n^{2})$ is also significantly weaker than what can be achieved by known constructions of sanitizers. These parameters are also essentially the weakest parameters possible, as a mechanism that answers $1/2$ to every query achieves $\alpha=1/2$ , $\beta=0$ for any number of arbitrary queries. Also, if there is a mechanism that achieves $(\alpha,\beta,\mathcal{Q},k)$ -accuracy for $\beta<1/2$ , then there is another mechanism that achieves $(\alpha,o(1/n^{2}),\mathcal{Q},k)$ -accuracy with only an $O(\log n)$ loss in the privacy parameters and the efficiency of the mechanism.Given a sanitizer $\mathcal{M}$ that answers every query accurately with probability $1/2+\Omega(1)$ , one can obtain a mechanism $\mathcal{M}^{\prime}$ that answers every query accurately with probability $1-\beta$ . $\mathcal{M}^{\prime}$ will run $\mathcal{M}$ independently $r=O(\log(1/\beta))$ times and answers each query with the median of the $r$ answers for that query. That our lower bound applies to the parameters specified in Definition 2.2 makes our results stronger.

Efficiency of Sanitizers

Simply, a sanitizer is efficient if it runs in time polynomial in the length of its input. To make the statement more precise, we need to specify how the queries are given to the sanitizer as input.

For comparison with our results, we will recall the properties of some known mechanisms, stated in our terminology and for our choice of parameters:

There exists a sanitizer $\mathcal{M}_{\mathsf{Lap}}$ that is 1) differentially private, 2) efficient, and 3) $(\mathcal{Q}^{(d)}_{\mathsf{all}},\widetilde{\Omega}(n^{2}))$ -accurate.

Traitor-Tracing Schemes

In this section we give define a traitor-tracing scheme. Throughout, we will use $\mathsf{A_{TT}}$ to denote algorithms associated with traitor-tracing schemes.

We now give a definition of a traitor-tracing scheme, heavily tailored to the task of proving hardness results for generic sanitizers. We will sacrifice some consistency with the standard definitions. See below for further discussion of the ways in which our definition departs from the standard definition of traitor-tracing. In some cases, the non-standard aspects of the definition will be necessary to establish our results, and in others it will be for convenience. Despite these differences, we will henceforth refer to schemes satisfying our definition simply as traitor-tracing schemes.

Intuitively, a traitor-tracing scheme is a form of broadcast encryption, in which a sender can broadcast an encrypted message that can be decrypted by each of a large set of users. The standard notion of security for such a scheme would require that an adversary that doesn’t have any of the keys cannot decrypt the message. A traitor-tracing scheme has the additional property that given any decoder capable of decrypting the message (which must in a very loose sense “know” at least one of the keys), there is a procedure for determining which user’s key is being used. Moreover, we want the scheme to be “collusion resilient,” in that even if a coalition of users gets together and combines their keys in some way to produce a decoder, there is still a procedure that identifies at least one member of the coalition.

The algorithm $\mathsf{Gen}_{\mathsf{TT}}$ takes a security parameter, $\kappa$ , and returns a sequence of $n=n(\kappa)$ user keys $\vec{sk}=(sk^{(1)},\dots,sk^{(n)})\in\{0,1\}^{\kappa}$ . Formally, $\vec{sk}=(sk^{(1)},\dots,sk^{(n)})\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}_{\mathsf{TT}}(1^{\kappa})$ .

The algorithm $\mathsf{Enc}_{\mathsf{TT}}$ takes a sequence of $n$ user keys $\vec{sk}$ and a message bit $b\in\{0,1\}$ , and generates a ciphertext $c\in\mathcal{C}=\mathcal{C}^{(\kappa)}$ . Formally, $c\leftarrow_{\mbox{\tiny R}}\mathsf{Enc}_{\mathsf{TT}}(\vec{sk},b)$ .

The algorithm $\mathsf{Trace_{TT}}$ takes as input a set of user keys $\vec{sk}\in(\{0,1\}^{\kappa})^{n(\kappa)}$ and an oracle $\mathcal{P}\colon(\mathcal{C}^{(\kappa)})^{k_{\mathsf{TT}}(\kappa)}\to\{0,1\}^{k_{\mathsf{TT}}(\kappa)}$ , makes one $k_{\mathsf{TT}}$ -tuple of queries, $(c_{1},\dots,c_{k_{\mathsf{TT}}})\in\mathcal{C}^{(\kappa)}$ to its oracle ( $k_{\mathsf{TT}}=k_{\mathsf{TT}}(\kappa)$ ), and returns the name of a user $i\in[n(\kappa)]$ . Formally, $i\leftarrow_{\mbox{\tiny R}}\mathsf{Trace_{TT}^{\mathcal{P}}}(\vec{sk})$ .

Intuitively, think of the oracle $\mathcal{P}$ as being given some subset of keys $\vec{sk}_{S}=(sk^{(i)})_{i\in S}$ for a non-empty set $S\subseteq[n]$ , and $\mathsf{Trace_{TT}}$ is attempting to identify a user $i\in S$ . Clearly, if $\mathcal{P}$ ignores its input and always returns , $\mathsf{Trace_{TT}}$ cannot have any hope of success, so we must assume that $\mathcal{P}$ is capable of decrypting ciphertexts.

In other words, if every user key $sk^{(i)}$ (for $i\in S$ ) decrypts $c$ to $1$ (resp. ), then $\mathcal{P}(\vec{sk}_{S},\cdot)$ decrypts $c$ to $1$ (resp. ), with high probability.

We can now define a secure, $(n,k_{\mathsf{TT}})$ -traitor-tracing scheme:

The traitor-tracing schemes we consider are somewhat different than those previously studied in the literature.

We do not require the encryption or tracing algorithms to use public keys. In the typical application of traitor-tracing schemes to content distribution, these would be desirable features, however they are not necessary for proving hardness of sanitization.

We only require that the tracing algorithm succeeds with probability $1-o(1/n)$ . Typically one would require that the tracing algorithm succeeds with probability $1-n^{-\omega(1)}$ .

We do not give the pirate decoder access to an encryption oracle. In other words, we do not require CPA security. Most traitor-tracing schemes in the literature are public-key, making this distinction irrelevant. Here, we only need an encryption scheme that is secure for an a priori bounded number of messages.

We allow the pirate decoder to be stateful, but in an unusual way. We require (roughly) that if any of the queries are ciphertexts generated by $\mathsf{Enc}(\vec{sk},b)$ , then the pirate decoder answers $b$ to those queries, regardless of the other queries issued. In many models, the pirate is allowed to abort, and answer $\bot$ if it detects that it is being traced. However, we do allow our pirate to correlate its answers to different queries, subject to this accuracy constraint. We also allow the pirate to see all the queries made by the tracer at once, which is more power than is typically given to the pirate.

Roughly, the first three modifications will allow us to find a candidate scheme with very simple decryption and the fourth modification will allow us to trace stateful pirates even in the setting of bit-encryption.

2 Decryption Function Families

For Theorem 1.2, we are interested in traitor-tracing schemes where $\mathsf{Dec}_{\mathsf{TT}}$ is a “simple” function of the user key (for every ciphertext $c\in\mathcal{C}$ ).

Let $(\mathsf{Gen}_{\mathsf{TT}},\mathsf{Enc}_{\mathsf{TT}},\mathsf{Dec}_{\mathsf{TT}})$ be a traitor-tracing scheme where $\mathsf{Gen}_{\mathsf{TT}}$ produces keys in $\{0,1\}^{\kappa}$ and $\mathsf{Enc}_{\mathsf{TT}}$ produce ciphertexts in $\mathcal{C}=\mathcal{C}^{(\kappa)}$ . For every $c\in\mathcal{C}$ , we define the $c$ -decryption function $q_{c}\colon\{0,1\}^{\kappa}\to\{0,1\}$ to be $q_{c}(sk)=\mathsf{Dec}_{\mathsf{TT}}(sk,c)$ . We define the decryption function family $\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}}^{(\kappa)}=\left\{q_{c}\right\}_{c\in\mathcal{C}^{(\kappa)}}$ .

In what follows, we will say that $\Pi_{\mathsf{TT}}$ is an traitor-tracing scheme with decryption function family $\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}}^{(\kappa)}$ .

Attacking Efficient Sanitizers

In this section we will prove our main result, showing that the existence of traitor-tracing schemes (as in Definition 3.2) implies that efficient sanitizers cannot answer too many counting queries while satisfying differential privacy.

The main difference between Theorem 4.1 and the result of Dwork et al. [DNR+09] is that we only assume the existence of a sanitizer for $k_{\mathsf{TT}}(d)$ queries from $\mathcal{Q}^{(d)}=\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}}^{(d)}$ , whereas Dwork et al. assume the existence of a one-shot sanitizer that answers every query in $\mathcal{Q}^{(d)}$ . To offset the weaker assumption on the sanitizer, we assume that the traitor-tracing scheme is secure against certain stateful pirate decoders (as in Definition 3.1) whereas Dwork et al. only need to trace stateless pirates. Theorem 4.1 also explicitly allows the traitor-tracing scheme to have the relaxed functionality and security properties discussed at the end of Section 3, although it is implicit in Dwork et al. that the relaxed properties are sufficient to prove hardness results.

We now sketch the proof: Every function $q_{c}\in\mathcal{Q}^{(d)}$ is viewed as a query $q_{c}(x)$ on a database row $x\in\{0,1\}^{d}$ . Assume there is an efficient sanitizer is that is $(\mathcal{Q}^{(d)},k_{\mathsf{TT}}(d))$ -accurate for these queries. The fact that $\mathcal{M}$ is accurate for these queries will imply that (after small modifications) $\mathcal{M}$ is a $k_{\mathsf{TT}}$ -available pirate decoder (Definition 3.1). Here is where we differ from Dwork et al., who assume that $\mathcal{M}$ accurately answers all queries in $\mathcal{Q}^{(d)}$ , in which case $\mathcal{M}$ can be viewed as a stateless pirate decoder (but must solve a harder sanitization problem).

We complete the proof as in Dwork et al. Consider two experiments: In the first, we construct an $n$ -row database $D$ by running $\mathsf{Gen}_{\mathsf{TT}}(1^{d})$ to obtain $n$ user keys, and putting one in each row of $D$ . Then we run $\mathsf{Trace_{TT}}$ on $\mathcal{M}(D,\cdot)$ and obtain a user $i$ . Since $\mathcal{M}$ is useful, and $\Pi_{\mathsf{TT}}$ is secure, we will have that $i\in[n]$ with probability close to $1$ , and thus there is an $i^{*}\in[n]$ such that $i=i^{*}$ with probability $\gtrsim 1/n$ .

In the second experiment, we construct a database $D^{\prime}$ exactly as in the first, however we exclude the key $sk^{(i^{*})}$ . Since $D$ and $D^{\prime}$ differ in only one row, differential privacy requires that $\mathsf{Trace_{TT}}$ , run with oracle $\mathcal{M}(D^{\prime},\cdot)$ , still outputs $i^{*}$ with probability $\Omega(1/n)$ . However, in this experiment, $i^{*}$ , $sk^{(i^{*})}$ is no longer given to the pirate decoder, and thus security of $\Pi_{\mathsf{TT}}$ says that $\mathsf{Trace_{TT}}$ , run with this oracle, must output $i^{*}$ with probability $o(1/n)$ . Thus, we will obtain a contradiction.

Let $\Pi_{\mathsf{TT}}=(\mathsf{Gen}_{\mathsf{TT}},\mathsf{Enc}_{\mathsf{TT}},\mathsf{Dec}_{\mathsf{TT}},\mathsf{Trace_{TT}})$ be the assumed traitor-tracing scheme, and assume there exists an efficient, differentially private, $(\mathcal{Q}^{(d)},k_{\mathsf{TT}}(d))$ -accurate sanitizer $\mathcal{M}$ . We define the pirate decoder $\mathcal{P}_{\mathcal{M}}$ as follows:

Next, we claim that if $\mathcal{M}$ is accurate for $\mathcal{Q}^{(d)}$ , then $\mathcal{P}_{\mathcal{M}}$ is a useful pirate decoder.

If $\mathcal{M}$ is $(\mathcal{Q}_{\mathsf{Dec}_{\mathsf{TT}}},k_{\mathsf{TT}})$ -accurate, then $\mathcal{P}_{\mathcal{M}}$ is a $k_{\mathsf{TT}}$ -useful pirate decoder.

Let $\vec{sk}\in\{0,1\}^{d}$ be a set of user keys for $\Pi_{\mathsf{TT}}$ and let $S\subseteq[n]$ be a subset of the users such that $|S|\geq n-1$ . Suppose $c\in\mathcal{C}^{(d)}$ and $b\in\{0,1\}$ are such that for every $i\in S$ , $\mathsf{Dec}_{\mathsf{TT}}(sk^{(i)},c)=b$ . Then we have that, for $D$ as in $\mathcal{P}_{\mathcal{M}}$ ,

Let $c_{1},\dots,c_{k_{\mathsf{TT}}}$ be a set of ciphertexts, $q_{c_{1}},\dots,q_{c_{k_{\mathsf{TT}}}}$ and $a_{1},\dots,a_{k_{\mathsf{TT}}}$ be as in $\mathcal{P}_{\mathcal{M}}$ . The accuracy of $\mathcal{M}$ (with constant error $\alpha<1/2$ ) guarantees that

Since $|S|\geq n-1$ , $o(1/|S|^{2})=o(1/n^{2})$ . Assuming $a_{1},\dots,a_{k_{\mathsf{TT}}}$ is accurate up to error $\alpha<1/2$ for $q_{c_{1}},\dots,q_{c_{k_{\mathsf{TT}}}}$ , $a_{j}$ will be rounded to exactly $q_{c_{j}}$ whenever $q_{c_{j}}(D)\in\{0,1\}$ . That is,

Thus, $\mathcal{P}_{\mathcal{M}}$ is $k_{\mathsf{TT}}$ -useful. This completes the proof of the claim. ∎

Let $S(\kappa)=[n(\kappa)]\setminus\{i^{*}(\kappa)\}$ Now we claim that if $\mathcal{M}$ is differentially private, then $\mathsf{Trace_{TT}}$ will output $i^{*}(\kappa)$ with significant probability, even $\mathcal{P}_{\mathcal{M}}$ is not given the key of user $i^{*}(\kappa)$ .

If $\mathcal{M}$ is differentially private (for $\varepsilon=O(1)$ , $\delta=o(1/n)$ ), then

Fix any $\kappa$ and let $k_{\mathsf{TT}}=k_{\mathsf{TT}}(\kappa)$ and $i^{*}=i^{*}(\kappa)$ , $S=S(\kappa)$ as above. Let $D=\vec{sk}$ and $D_{-i^{*}}=\vec{sk}_{S}$ . Take $T$ to be the set of responses $\widehat{b}_{1},\dots,\widehat{b}_{k_{\mathsf{TT}}}$ such that $\mathsf{Trace_{TT}}(\vec{sk})$ , after querying its oracle on ciphertexts $c_{1},\dots,c_{k_{\mathsf{TT}}}$ and receiving responses $\widehat{b}_{1},\dots,\widehat{b}_{k_{\mathsf{TT}}}$ , outputs $i^{*}$ ( $T$ depends on the coins of $\mathsf{Gen}_{\mathsf{TT}}$ and $\mathsf{Trace_{TT}}$ ). By differential privacy, we have that

Note that the queries constructed by $\mathcal{P}_{\mathcal{M}}$ depends only on $c_{1},\dots,c_{k_{\mathsf{TT}}}$ , not on $\vec{sk}_{S}$ . Also note that the final rounding step does not depend on the input at all. Thus, for every $T\subseteq\{0,1\}^{k_{\mathsf{TT}}}$

The claim follows by combining with (1). ∎

To complete the proof, notice that the probability in Claim 4.3 is exactly the probability that $\mathsf{Trace_{TT}}$ outputs the user $i^{*}$ , when given the oracle $\mathcal{P}_{\mathcal{M}}(\vec{sk}_{S})$ , for $S=[n]\setminus\left\{i^{*}\right\}$ . However, the fact that $\mathcal{P}_{\mathcal{M}}$ is efficient, and $\Pi_{\mathsf{TT}}$ is a secure traitor-tracing scheme implies that this probability is $o(1/n)$ . Thus we have obtained a contradiction. This completes the proof of the Theorem. ∎

Constructions of Traitor-Tracing Schemes

In this section we show how to construct traitor-tracing schemes that satisfy Definition 3.2, and thus can be used to instantiate Theorem 4.1. First we will informally describe a simple construction that requires the tracing algorithm to make a sub-optimal number of queries, but will hopefully give the reader more intuition about the construction and how it differs from previous constructions of traitor-tracing schemes. Then we will give precise definitions of the encryption schemes (Section 5.2) and fingerprinting codes (Section 5.3) required for our construction. Then we will present the final construction more formally (Section 5.4) and prove its security. Finally, we will use the weakened security requirements of the encryption scheme to show that our traitor-tracing scheme can be instantiated so that decryption is computable by constant-depth circuits (Section 5.6).

Our construction is a variant of the most basic tracing traitor-tracing scheme [CFN94]. Start with any encryption scheme $(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})$ . Generate an independent key $sk^{(i)}\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}$ for each user (we will ignore the security parameter in the informal description). To encrypt a bit $b\in\{0,1\}$ , we encrypt it under each user’s key independently and concatenate the ciphertexts. That is

Clearly each user can decrypt the ciphertext by applying $\mathsf{Dec}$ , as long as she knows which part of the ciphertext to decrypt.

Now we describe how an available pirate decoder for this scheme can be traced. As with all traitor-tracing schemes, we will form ciphertexts that different users would decrypt differently, assuming they decrypt as intended using the algorithm $\mathsf{Dec}_{\mathsf{TT}}(sk^{(i)},\cdot)$ . We can do so with the following algorithm:

for $i=0,1,\dots,n$ . The algorithm forms a ciphertext that users $1,\dots,i$ will decrypt to and users $i+1,\dots,n$ will decrypt to $1$ .

The tracing algorithm generates a random sequence $i_{1},\dots,i_{k_{\mathsf{TT}}}\in\left\{0,1,\dots,n\right\}$ , for $k_{\mathsf{TT}}=(n+1)s$ , such that each element of $\left\{0,1,\dots,n\right\}$ appears exactly $s$ times, where $s$ is a parameter to be chosen later. Then, for every $j$ it generates a ciphertext $c_{j}\leftarrow_{\mbox{\tiny R}}\mathsf{TrEnc_{TT}}(\vec{sk},i_{j})$ . Next, it queries $\mathcal{P}_{\vec{sk}_{S}}(c_{1},\dots,c_{k_{\mathsf{TT}}})$ . Given the output of the pirate, the tracing algorithm computes

for $i=0,1,\dots,n$ . Finally, the tracing algorithm outputs any $i^{*}$ such that $P_{i^{*}}-P_{i^{*}-1}\geq 1/n$ .

The tracing algorithm generates a random sequence of indices $i_{1},\dots,i_{k_{\mathsf{TT}}}\in\left\{0,1,\dots,n\right\}$ , for $k_{\mathsf{TT}}=(n+1)s$ , such that each element of $\left\{0,1,\dots,n\right\}$ appears exactly $s$ times, where $s$ is a parameter to be chosen later. Then, for every $j$ it generates a ciphertext $c_{j}\leftarrow_{\mbox{\tiny R}}\mathsf{TrEnc_{TT}}(\vec{sk},i_{j})$ . Next, it queries $\mathcal{P}_{\vec{sk}_{S}}(c_{1},\dots,c_{k_{\mathsf{TT}}})$ . Given the output of the pirate, the tracing algorithm computes $P_{i}=\frac{1}{s}\sum_{j:i_{j}=i}\mathcal{P}(\vec{sk},c_{1},\dots,c_{k_{\mathsf{TT}}})_{j}$ for $i=0,1,\dots,n$ . Finally, the tracing algorithm outputs any $i^{*}$ such that $P_{i^{*}}-P_{i^{*}-1}\geq 1/n$ .

Now we explain why this algorithm successfully traces efficient available pirate decoders. Notice that if we choose $c$ according to $\mathsf{TrEnc_{TT}}(\vec{sk},0)$ , then every user decrypts $c$ to , so $P_{0}=0$ . Similarly, $P_{n}=1$ . Thus, there exists $i^{*}$ such that $P_{i^{*}}-P_{i^{*}-1}\geq 1/n$ . Next, we argue that $i^{*}$ is in $S$ except with small probability. Notice that $\mathsf{TrEnc_{TT}}(\vec{sk},i^{*})$ and $\mathsf{TrEnc_{TT}}(\vec{sk},i^{*}-1)$ differ only in the message encrypted under key $sk^{(i^{*})}$ , so if $i^{*}\not\in S$ , this key is unknown to the pirate decoder. The security of the encryption scheme (made precise in Definition 5.2) guarantees that if $sk^{(i^{*})}$ is unknown to an efficient pirate, then we can replace $k_{\mathsf{TT}}$ uses of $\mathsf{Enc}(sk^{(i^{*})},1)$ with $\mathsf{Enc}(sk^{(i^{*})},0)$ , and this change will only affect the success probability of the pirate by $o(1/n)$ . But after we make this replacement, $\mathsf{TrEnc_{TT}}(\vec{sk},i^{*})$ and $\mathsf{TrEnc_{TT}}(\vec{sk},i^{*}-1)$ are (perfectly, information-theoretically) indistinguishable to the pirate. Since the sequence of indices $i_{1},\dots,i_{k_{\mathsf{TT}}}$ is random, the pirate has no information about which elements $i_{j}$ are $i^{*}$ and which are $i^{*}-1$ . Thus, if the pirate wants to make $P_{i^{*}}$ larger than $P_{i^{*}-1}$ , for some $i^{*}\not\in S$ , she can do no better than to “guess”. If we take $s=\widetilde{O}(n^{2})$ , and apply a Chernoff bound, it turns out that for every $i\not\in S$ , $P_{i}-P_{i-1}=o(1/n)$ . This conclusion also holds after we take into account the security loss of the encryption scheme, which is $o(1/n)$ . Thus, the scheme we described is a secure traitor-tracing scheme in the sense of Definition 3.2.

In arguing that the scheme is secure, we used the fact that $P_{0}=0$ and $P_{n}=1$ no matter what other queries are made to the pirate. In many applications, this assumption would not be reasonable. However, when the pirate is derived from an accurate sanitizer, this condition will be satisfied.

For this scheme, the tracer makes $(n+1)s=\widetilde{O}(n^{3})$ queries. Before proceeding, we will explain how to reduce the number of queries from $\widetilde{O}(n^{3})$ to $\widetilde{O}(n^{2})$ . The high-level argument that the scheme is secure used two facts:

By the availability of the pirate decoder, if every user in $S$ would decrypt a ciphertext $c$ to $b$ , then the pirate decrypts $c$ to $b$ (in the above, $P_{0}=0,P_{n}=1$ ).

Because of the encryption, a pirate decoder without user $i$ ’s key “doesn’t know” how user $i$ would decrypt each ciphertext.

Systems leveraging these two properties to identify a colluding user are called fingerprinting codes [BS98], and have been studied extensively. In fact, the tracing algorithm we described is identical to the tracing algorithm we define in Section 5.4, but instantiated with the fingerprinting code of Boneh and Shaw [BS98], which has length $\widetilde{O}(n^{3})$ . Tardos [Tar08] constructed shorter fingerprinting codes, with length $\widetilde{O}(n^{2})$ , which we use to reduce the number of queries to trace.

Next we define the precise security requirement we need out of the underlying encryption scheme, and then we will give a formal definition of fingerprinting codes.

2 Encryption Schemes

We will build our traitor-tracing scheme from a suitable encryption scheme. An encryption scheme is tuple of efficient algorithms $(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})$ . All the algorithms may be randomized except for $\mathsf{Dec}$ . The scheme has the following syntactic properties:

First we define (perfectly) correct decryptionIt would not substantially affect our results if $\mathsf{Dec}$ were allowed to fail with negligible probability, however we will assume perfect correctness for ease of presentation.

We require that our schemes have the following $k_{\mathsf{Enc}}$ -message security property.

Notice that we do not require $\Pi_{\mathsf{Enc}}$ to be secure against adversaries that are given $\mathsf{Enc}(sk,\cdot)$ as an oracle. That is, we do not require CPA security.

3 Fingerprinting Codes

As we alluded to above, our tracing algorithm will use a fingerprinting code, introduced by Boneh and Shaw [BS98]. A fingerprinting code is a pair of efficient (possibly randomized) algorithms $(\mathsf{Gen}_{\mathsf{FP}},\mathsf{Trace_{FP}})$ with the following syntax.

Informally, if all users in $S$ have a (resp. $1$ ) in the $j$ -th symbol of their codeword, then they must produce a word with (resp. $1$ ) as the $j$ -th symbol. We also define the critical positions to be the set of indices for which this constraint is binding. That is,

The security of a fingerprinting code asserts that an adversary who is given a subset $W_{S}$ of the codewords should not be able to produce an element of $F(W_{S})$ that does not trace to a user $i\in S$ . More formally,

where the two executions of $\mathcal{A}_{\mathsf{FP}}$ are understood to be the same.

Tardos [Tar08] gave a construction of fingerprinting codes of essentially optimal length, improving on the original construction of Boneh and Shaw [BS98].

4 The Traitor-Tracing Scheme

We are now ready to state the construction more formally. The key generation, encryption, and decryption algorithms are as we described in the sketch (Section 5.1), and stated below.

If $\mathcal{P}$ is available, its output will be a feasible codeword for $W_{S}$ . To see this, recall that if every user $i\in S$ decrypts $c_{j}$ to the same bit, then an available pirate decoder $\mathcal{P}(\vec{sk}_{S},\cdot)$ , decrypts $c_{j}$ to that bit. However, the critical positions of $W_{S}$ are exactly those for which every user $i\in S$ has the same symbol in position $j$ . Thus, the codeword returned by the pirate is feasible, and the fingerprinting code’s tracing algorithm can identify a user in $S$ .

The catch in this argument is that $\mathsf{TrEnc_{TT}}$ takes all of $W$ as input, however an attacker for the fingerprinting code is only allowed to see $W_{S}$ , and thus cannot simulate $\mathsf{TrEnc_{TT}}$ in a security reduction. However, if $\mathcal{P}$ only has keys $\vec{sk}_{S}$ , and $i\not\in S$ , then an efficient $\mathcal{P}$ cannot decrypt the $i$ -th component of a ciphertext $c=(c^{(1)},\dots,c^{(n)})$ . But these are the only components that depend on $w^{(i)}$ . So $w^{(i)}$ is computationally hidden from $\mathcal{P}$ anyway, and we could replace that codeword with a string of zeros without significantly affecting the success probability of $\mathcal{P}$ . Formalizing this intuition will yield a valid attacker for the fingerprinting code, and obtain a contradiction.

the encryption scheme and fingerprinting code have sufficiently strong security,

the encryption scheme is secure for sufficiently many queries,

the encryption scheme is secure against adversaries whose running time is as long as the pirate decoder’s, for every $a>0$ ,

Then $\Pi_{\mathsf{TT}}$ instantiated with $\Pi_{\mathsf{Enc}}$ and $\Pi_{\mathsf{FP}}$ is an $(n,k_{\mathsf{TT}})$ -traitor-tracing scheme.

where the probability is also taken over the coins of $\mathcal{P}$ and $\mathsf{Trace_{TT}}$ . Since there are only $n(\kappa)$ such sets, for a randomly chosen $i\leftarrow_{\mbox{\tiny R}}[n(\kappa)]$ , we have

Both of these probabilities are also taken over the coins of $\mathcal{P}$ and $\mathsf{Trace_{TT}}$ . We will show that such a pirate decoder must either violate the security of the encryption scheme or violate the security of the fingerprinting code.

Consider the following algorithm $\mathcal{A}_{\mathsf{FP}}^{\mathcal{P}}$

Since the fingerprinting code is secure, for a randomly chosen $i\leftarrow_{\mbox{\tiny R}}[n]$ (in fact, for every $i\in[n]$ ),

This condition could hold simply because $\mathcal{A}_{\mathsf{FP}}$ outputs an infeasible codeword with high probability, not because we are successfully tracing a user in $S$ . The next claim states that if $\mathcal{P}$ is an available pirate decoder, then this is not the case.

If $\mathcal{P}$ is $k_{\mathsf{TT}}$ -useful, then, by definition, for every $\vec{sk}=(sk^{(1)},\dots,sk^{(n)})$ , every $i\subseteq[n]$ , and every $c_{1},\dots,c_{k_{\mathsf{TT}}}$ , if every user $i^{\prime}\neq i$ decrypts some $c_{j}$ to the same bit $b_{j}$ , then so does $\mathcal{P}(\vec{sk}_{-i},\cdot)$ (with high probability). That is, for $\widehat{b}_{1},\dots,\widehat{b}_{k_{\mathsf{TT}}}\leftarrow_{\mbox{\tiny R}}\mathcal{P}(\vec{sk}_{-i},c_{1},\dots,c_{k_{\mathsf{TT}}})$ ,

Since $\mathcal{P}$ outputs feasible codewords with high probability, we obtain

by combining the previous claim with (3).

There are only two differences between the success of the pirate decoder in fooling $\mathsf{Trace_{TT}}$ and the success of the fingerprinting adversary in fooling $\mathsf{Trace_{FP}}$ (in the experiment described in (5)): The first is that in the traitor-tracing security condition, $\mathcal{P}$ is given $\vec{sk}_{-i}$ for a fixed $i\in[n]$ , whereas the fingerprinting adversary is given $W_{-i}$ for a random $i\leftarrow_{\mbox{\tiny R}}[n]$ . This difference only affects the error by a factor of $n$ . That is, for every $i\in[n]$

The second difference is that in $\mathsf{Trace_{TT}}$ , the ciphertexts given to the pirate are generated by $\mathsf{TrEnc_{TT}}(\vec{sk},W)$ whereas in $\mathcal{A}_{\mathsf{FP}}$ the ciphertexts are generated by $\mathsf{TrEnc_{TT}}(\vec{sk},\widetilde{W}_{-i})$ . But these ciphertexts only differ in the $i$ -th component, and $sk^{(i)}$ is unknown to $\mathcal{P}$ , so this does not affect the behavior of the pirate decoder significantly. This fact is established in the following claim.

Let $\Pi_{\mathsf{Enc}}=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})$ be the encryption scheme. The main observation required to prove the claim is that the two experiments we want to relate can both be simulated without $sk^{(i)}$ , given challenges for the encryption scheme (Definition 5.2). Fix a codebook $W\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}_{\mathsf{FP}}(1^{n})$ . Now consider two distributions on ciphertexts (of $\Pi_{\mathsf{Enc}}$ ): In either case, generate a random key $sk^{(i)}\leftarrow_{\mbox{\tiny R}}\mathsf{Gen}(1^{\kappa})$

We now complete the proof of the theorem by combining Equation (5) and Claim 5.8.

Recall that the two goals of constructing a new traitor-tracing scheme were to trace stateful pirates and to reduce the complexity of decryption. We addressed tracing of stateful pirates in the previous section, and now we turn to the complexity of decryption. We do so by instantiating the traitor-tracing scheme with various encryption schemes and making two observations: 1) The type of encryption schemes we require are sufficiently weak that there already exist plausible candidates with a very simple decryption operation, and 2) Decryption for the traitor-tracing scheme is not much more complex than decryption for the underlying encryption scheme. We summarize the second point with the following simple lemma.

Let $\Pi_{\mathsf{TT}}$ be as defined, with $\Pi_{\mathsf{Enc}}$ as its underlying encryption scheme. Let $(\overline{sk},i)=sk\in\{0,1\}^{\kappa}$ and $c=(c^{(1)},\dots,c^{(n)})\in\mathcal{C}^{(\kappa)}$ be any user key and ciphertext for $\Pi_{\mathsf{TT}}$ . Then

Here, the function $\mathbf{1}_{x}(y)$ takes the value $1$ if $y=x$ and otherwise. The lemma follows directly from the construction of $\mathsf{Dec}_{\mathsf{TT}}$ . Also note that the function $\mathbf{1}_{i^{\prime}}\colon\{0,1\}^{\lceil\log n\rceil}\to\{0,1\}$ is just a conjunction of $\lceil\log n\rceil$ bits (a single gate of fan-in $O(\log n)$ ), and we need to compute $n$ of these functions. In addition to computing $\mathbf{1}_{i^{\prime}}$ and $\mathsf{Dec}_{c^{(i^{\prime})}}$ , there are $n$ conjunctions and a single outer disjunction. Thus we add an additional $n+1$ gates, compute decryption $n$ times, and increase the depth by $2$ . Hence, an intuitive summary of the lemma is that if $\mathsf{Dec}$ can be implemented by circuits of size $s$ and depth $h$ , $\mathsf{Dec}_{\mathsf{TT}}$ can be implemented by circuits of size $n\cdot(s+O(\log n))=\widetilde{O}(ns)$ and depth $h+2$ . This summary will be precise enough to state our main results.

By combining Lemma 5.9 with Theorem 5.6, we easily obtain the following corollary.

Theorem 1.1 in the introduction follows by combining Theorem 4.1 with Corollary 5.10.

We will now consider the possibility of constructing a traitor-tracing scheme where the decryption functionality can be implemented by circuits of constant depth, and thus obtaining hardness results for generic sanitizers that are efficient for constant-depth queries (Theorem 1.2). First, we summarize our observation that the traitor-tracing scheme almost preserves the depth of the decryption function.

The corollary is clear from Lemma 5.9 and Theorem 5.6.

The corollary is not interesting without an encryption scheme that can be decrypted by constant-depth circuits. However, we observe that such a scheme (meeting our relaxed security criteria) can be constructed from a sufficiently good local pseudorandom generator (PRG). A recent result of Applebaum [App12] gave the first plausible candidate construction of a local PRG for the range of parameters we need, giving plausibility to the assumption that such PRGs (and, as we show, traitor-tracing schemes with constant-depth decryption) exist. We note that local PRGs actually imply encryption schemes with local decryption, which is stronger than just constant-depth decryption. Although it may be significantly easier to construct encryption schemes that only have constant-depth decryption, we are not aware of any other ways of constructing such a scheme.

If, in addition, if each bit of the output depends only on some set of $L$ bits of the input, then $\mathsf{G}$ is a $(\varepsilon_{\mathsf{PRG}},L)$ -local pseudorandom generator.

It is a well known result in Cryptography that pseudorandom generators imply encryption schemes satisfying Definition 5.2 (for certain ranges of parameters). We will use a particular construction whose decryption can be computed in constant-depth whenever the underlying PRG is locally-computable (or, more generally, computable by constant-depth circuits). The construction is the standard “computational one-time pad”, however we give a construction to verify that the decryption can be computed by constant-depth circuits.

The security follows from standard arguments: If we choose a random $s\leftarrow_{\mbox{\tiny R}}\{0,1\}^{\kappa}$ , then $\mathsf{G}(s)$ is indistinguishable from uniform up to error $\varepsilon_{\mathsf{PRG}}$ . If we generate $k_{\mathsf{Enc}}$ encryptions with key $s$ , and no two encryptions use the same choice of $r$ , then the output is indistinguishable from encryptions using uniform random bits in place of $\mathsf{G}(s)$ . If we use uniform random bits in place of $\mathsf{G}$ , then the message is information-theoretically hidden. The probability that no two encryptions out of $k_{\mathsf{Enc}}$ use the same choice of $r$ is at most $k_{\mathsf{Enc}}^{2}/s_{\mathsf{PRG}}$ , so we lose this term in the security of the encryption scheme.

Let $\mathbf{1}_{i}(j)$ be the indicator variable for the condition $j=i$ . For every $c=(r,b)\in\mathcal{C}$ , we can write

Combining Corollary 5.11 with Lemma 5.13 easily yields the following corollary.

Theorem 1.2 in the introduction follows by combining Theorem 4.1 with Corollary 5.14.

Acknowledgements

We thank Cynthia Dwork for suggesting that we look further at the connection between traitor-tracing and differential privacy. We thank Salil Vadhan for helpful discussions about the connection between traitor-tracing and differential privacy, and about the presentation of this work. We also thank Dan Boneh, Moritz Hardt, Hart Montgomery, Ananth Raghunathan, Aaron Roth, Guy Rothblum, and Thomas Steinke for helpful discussions.

References

Appendix A Additional Related Work

In this appendix, we elaborate on the relationship between sanitizers, interactive sanitizers, and one-shot sanitizers, and on the relationship between our results and prior work on the complexity of differentially private sanitization.

Dwork, Naor, and Vadhan [DNV12] gave information theoretic lower bounds for stateless sanitizers. These are sanitizers that take $k$ queries as input, but whose answers to each query do not depend on the other $k-1$ input queries.

Another interpretation of our results, which can be used to give an alternative proof of our results, is that we construct a family of queries for which “keeping state doesn’t help”.

They consider a game where an $n$ -row database is chosen at random, and a random subset of $n-1$ of those rows is given to the attacker. The attacker wants to violate privacy by recovering the $n$ -th row. To do so, the attacker chooses $\sim n^{2}$ queries (randomly, from a distribution that depends on the $n-1$ known rows) and requests answers to these queries. Using these answers, they show that there is a particular way for the attacker to (randomly) guess the missing row, that will succeed with sufficient probability to constitute a privacy violation. Their argument is in two steps: 1) The expected correlation between the answers given by a stateless sanitizer and the value of the queries on the missing row is significant. 2) A stateless sanitizer cannot give answers that are correlated with too many rows that are not in the database. Combining these two steps shows that the attacker has a significant chance of identifying the $n$ -th row from its correlation with the answers.

Typically, the intuition behind the analysis of traitor-tracing schemes follows roughly the same two steps: 1) There will be some correlation between the decryptions returned by the efficient pirate and the decryptions that would be returned by some member of the coalition (using only his own key). 2) There will not be significant correlation between the decryptions returned by the efficient pirate and the decryptions that would be returned by any user not a member of the coalition. This is exactly the intuition we sketched in the simpler (sub-optimal) construction. In the final construction, much of this argument is made “inside” the construction of the fingerprinting code. If we “unrolled” the analysis of the fingerprinting code directly into our construction, we would make exactly the same arguments.

Other Types of Sanitizers

The three variants we’ve described satisfy some interesting relationships. First, if we have an algorithm that runs in time $T$ and releases a summary that enables an analyst to answer any query in $\mathcal{Q}$ in time $T$ , then we also have an interactive sanitizer that runs in time $2T$ per query that answers any sequence of $k$ queries from $\mathcal{Q}$ . Secondly, if we have an interactive sanitizer that answers up to $K$ queries from $\mathcal{Q}$ in time $T$ per query, then we also have a non-interactive sanitizer that answers any $k\leq K$ queries from $\mathcal{Q}$ in time $Tk$ . Thus, holding $\mathcal{Q}$ fixed and assuming $k\gg n^{2}$ , the problem we consider is the easiest form of private counting query release, and the lower bounds we prove imply lower bounds for the other variants.

Indeed, in the data release problem one cannot take $\mathcal{Q}$ to include all efficiently computable counting queries. Sanitizers (both interactive and non-interactive) are supposed to circumvent this problem by allowing the queries to be arbitrary, but only answering the $k$ queries that are needed. However our results show that they can only circumvent the problem if we allow superpolynomial computation or we take $k\lesssim n^{2}$ .