The Complexity of Computing the Optimal Composition of Differential Privacy
Jack Murtagh, Salil Vadhan
Introduction
Differential privacy is a framework that allows statistical analysis of private databases while minimizing the risks to individuals in the databases. The idea is that an individual should be relatively unaffected whether he or she decides to join or opt out of a research dataset. More specifically, the probability distribution of outputs of a statistical analysis of a database should be nearly identical to the distribution of outputs on the same database with a single person’s data removed. Here the probability space is over the coin flips of the randomized differentially private algorithm that handles the queries. To formalize this, we call two databases with rows each neighboring if they are identical on at least rows, and define differential privacy as follows:
where the probabilities are over the coin flips of the algorithm .
In the practice of differential privacy, we generally think of as a small, non-negligible, constant (e.g. ). We view as a “security parameter” that is cryptographically small (e.g. ). One of the important properties of differential privacy is that if we run multiple distinct differentially private algorithms on the same database, the resulting composed algorithm is also differentially private, albeit with some degradation in the privacy parameters . In this paper, we are interested in quantifying the degradation of privacy under composition. We will denote the composition of differentially private algorithms as where
A handful of composition theorems already exist in the literature. The first basic result says:
For every , , and -differentially private algorithms , the composition satisfies -differential privacy.
This tells us that under composition, the privacy parameters of the individual algorithms “sum up,” so to speak. We care about understanding composition because in practice we rarely want to release only a single statistic about a dataset. Releasing many statistics may require running multiple differentially private algorithms on the same database. Composition is also a very useful tool in algorithm design. Often, new differentially private algorithms are created by combining several simpler algorithms. Composition theorems help us analyze the privacy properties of algorithms designed in this way.
Theorem 1.2 shows a linear degradation in global privacy as the number of algorithms in the composition grows and it is of interest to improve on this bound. If we can prove that privacy degrades more slowly under composition, we can get more utility out of our algorithms under the same global privacy guarantees. Dwork, Rothblum, and Vadhan gave the following improvement on the basic summing composition above [DRV10].
Theorem 1.3 shows that privacy under composition degrades by a function of which is an improvement if . It can be shown that a degradation function of is necessary even for the simplest differentially private algorithms, such as randomized response [War65].
Despite giving an asymptotically correct upper bound for the global privacy parameter, , Theorem 1.3 is not exact. We want an exact characterization because, beyond being theoretically interesting, constant factors in composition theorems can make a substantial difference in the practice of differential privacy. Furthermore, Theorem 1.3 only applies to “homogeneous” composition where each individual algorithm has the same pair of privacy parameters, . In practice we often want to analyze the more general case where some individual algorithms in the composition may offer more or less privacy than others. That is, given algorithms , we want to compute the best achievable privacy parameters for . Formally, we want to compute the function:
It is convenient for us to view as given and then compute the best , but the dual formulation, viewing as given, is equivalent (by binary search). Actually, we want a function that depends only on the privacy parameters of the individual algorithms:
Empirically (see Appendix A), this optimal bound provides a 30-40 savings in compared to Theorem 1.3 (and a savings compared to an improved asymptotic bound from [KOV15]). The problem remains to find the optimal composition behavior for the more general heterogeneous case. Kairouz, Oh, and Viswanath also provide an upper bound for heterogeneous composition that generalizes the degradation found in Theorem 1.3 for homogeneous composition but do not comment on how close it is to optimal.
We begin by extending the results of Kairouz, Oh, and Viswanath [KOV15] to the general heterogeneous case.
Theorem 1.5 exactly characterizes the optimal composition behavior for any arbitrary set of differentially private algorithms. It also shows that optimal composition can be computed in time exponential in by computing the sum over by brute force. Of course in practice an exponential-time algorithm is not satisfactory for large . Our next result shows that this exponential complexity is necessary:
There is a polynomial-time algorithm that given rational and , outputs satisfying
where , assuming constant-time arithmetic operations.
Note that we incur a relative error of in approximating and an additive error of in approximating . Since we always take to be non-negligible or even constant, we get a very good approximation when is polynomially small or even a constant. Thus, it is acceptable that the running time is polynomial in .
In addition to the results listed above, our proof of Theorem 1.5 also provides a somewhat simpler proof of the Kairouz-Oh-Viswanath homogeneous composition theorem (Theorem 1.4 [KOV15]). The proof in [KOV15] introduces a view of differential privacy through the lens of hypothesis testing and uses geometric arguments. Our proof relies only on elementary techniques commonly found in the differential privacy literature.
The theoretical results presented here were motivated by our work on an applied project called “Privacy Tools for Sharing Research Data”privacytools.seas.harvard.edu. We are building a system that will allow researchers with sensitive datasets to make differentially private statistics about their data available through data repositories using the Dataversedataverse.org platform [Cro11, Kin07]. Part of this system is a tool that helps both data depositors and data analysts distribute a global privacy budget across many statistics. Users select which statistics they would like to compute and are given estimates of how accurately each statistic can be computed. They can also redistribute their privacy budget according to which statistics they think are most valuable in their dataset. We implemented the approximation algorithm from Theorem 1.7 and integrated it with this tool to ensure that users get the most utility out of their privacy budget.
Technical Preliminaries
A useful notation for thinking about differential privacy is defined below.
For two discrete random variables and taking values in the same output space , the -approximate max-divergence of and is defined as:
Notice that an algorithm is differentially private if and only if for all pairs of neighboring databases, , we have . The standard fact that differential privacy is closed under “post processing” [DMNS06, DR13] now can be formulated as:
If is any randomized function, then
The composition results in our paper actually hold for a more general model of composition than the one described in the introduction. The model is called -fold adaptive composition and was formalized in [DRV10]. We generalize their formulation to the heterogeneous setting where privacy parameters may differ across different algorithms in the composition.
The idea is that instead of running differentially private algorithms chosen all at once on a single database, we can imagine an adversary adaptively engaging in a “composition game.” The game takes as input a bit and privacy parameters . A randomized adversary , tries to learn through rounds of interaction as follows: on the th round of the game, chooses an -differentially private algorithm and two neighboring databases . then receives an output where the internal randomness of is independent of the internal randomness of . The choices of and may depend on as well as the adversary’s own randomness.
The outcome of this game is called the view of the adversary, which is defined to be along with ’s coin tosses. The algorithms and databases from each round can be reconstructed from . Now we can formally define privacy guarantees under -fold adaptive composition.
We say that the sequences of privacy parameters satisfy -differential privacy under adaptive composition if for every adversary we have , where represents the view of in composition game with privacy parameter inputs .
Computing real-valued functions.
where is the true optimal parameter with full precision.
Characterization of OptComp
Following [KOV15], we show that to analyze the composition of arbitrary -DP algorithms, it suffices to analyze the composition of the following simple variant of randomized response [War65].
Given an -DP algorithm with output space and neighboring databases , let be the probability mass functions of and , respectively. The definition of differential privacy tells us that for all sets :
The left-hand side of the first inequality is maximized by for
and the left-hand side of the second inequality is maximized by
We will show how to simulate using the following algorithm.
To see that all of the terms are non-negative we need to show that the recurring terms and are non-negative and the rest follows by inspection.
For every -DP algorithm, with output space and neighboring databases and , and are non-negative where and are defined in Equations 4 and 5.
The other inequality follows by symmetry. ∎
Fix neighboring databases, and let be the probability mass functions of on , respectively. We will use and as defined above in Equations 2, 3, 4, and 5. Fix . is defined in the table below.
We need to show that is a valid probability distribution for each . All of the terms are non-negative because and are non-negative by Lemma 3.4.
The sums of and are immediate from the definitions of and , respectively:
A symmetrical argument works for . We now analyze the sum for . The sum for follows by symmetry. We use the following identities:
From here we break the calculation into the three possible cases:
Case 3:
All of the weights are non-negative because , , and is also at most , which we verify now:
Next we check the probabilities with which outputs and when .
For all ,
For the other direction, it suffices to show that for every that are -differentially private, we have
without loss of generality. Given , the set that maximizes the right-hand side is
We can further split into with
Fix an adversary . On each round , uses its coin tosses and the previous outputs to select an -differentially private algorithm and neighboring databases . Let be the view of with the given privacy parameters under composition game for and .
For let
Hardness of OptComp
is the class of all counting problems associated with decision problems in NP. It is a set of functions that count the number of solutions to some NP problem. More formally:
A function is called -hard if every function can be computed in polynomial time given oracle access to . That is, evaluations of can be done in one time step.
If a function is -hard, then there is no polynomial-time algorithm for computing it unless there is a polynomial-time algorithm for counting the number of solutions of all NP problems.
A function is called -easy if there is some function such that can be computed in polynomial time given oracle access to .
If a function is both -hard and -easy, we say it is -complete. Proving that computing OptComp is -complete can be broken into two steps: showing that it is -easy and showing that it is -hard.
For convenience we will view rational and as given arguments to OptComp and compute . Recall that the two versions of OptComp, viewing as given and computing and vice versa, are equivalent up to a polynomial factor (just run binary search over values of computing polynomially many bits of precision). So the formulation we choose for the proof will not affect whether OptComp is in P or not. Recall that in our model of computing real valued functions, we will take another input and we will output an approximation of to bits of precision in polynomial time using a oracle where satisfies the following:
Notice that the only part of the expression above that cannot be computed in polynomial time is the summation over subsets of . If we knew the sum, computing would be easy given our inputs. We show how to compute the sum in polynomial time using a P oracle and it follows that computing is P-easy .
We can now phrase a decision problem in NP: Does there exist a pair such that ? This is in NP because given a witness , we can compute and compare the output to , thereby verifying the solution, in polynomial time. Since this is an NP problem, a oracle can count the number of solutions to it in one time step. Notice that for every set , the number of solutions (pairs of the form satisfying ) is exactly because will output for . So over all possible sets , the number of solutions as counted by the oracle equals . Dividing this by gives us the sum up to an additive error of , which can be used to compute to bits of precision in polynomial time. This only required one call to a oracle. So computing OptComp is -easy. ∎
Next we show that computing OptComp is also -hard through a series of reductions. We start with a multiplicative version of the partition problem that is known to be -complete by Ehrgott [Ehr00]. The problems in the chain of reductions are defined below.
is the following problem: given a set of positive integers, count the number of partitions such that
All of the remaining problems in our chain of reductions take inputs where is the root of a positive integer for all and some positive integer . All of the reductions we present actually hold for every positive integer , including (in which case the inputs are integers). However, we will constrain to be large enough so that our inputs are in the range . This is because in the final reduction to OptComp, values in the proof are set to . We want to show that our reductions hold for reasonable values of ’s in a differential privacy setting so throughout the proofs we use ’s to correspond to ’s in the final reduction. In fact, we will later state our reductions as applying to instances where (and hence ) for any desired .
(The real numbers are specified in the input by and with the input size being the combined bit length of these integers in binary).
(The real numbers and are specified in the input by and with the input size being the combined bit length of these integers in binary).
(The real numbers are specified in the input by and with the input size being the combined bit length of these integers and the numerator and denominator of in binary).
Since the output of SUM-PARTITION is irrational, the actual computational problem is defined according to our convention in Section 2 for computing real-valued functions. That is, given an additional input , compute a number such that
We prove that computing OptComp is -hard by the following series of reductions:
For every constant , is -hard, even on instances where .
There is a one-to-one correspondence between solutions to the problem and solutions to the given instance. We can solve in polynomial time with a oracle. Therefore is -hard. ∎
For every constant , is -hard, even on instances where .
Let be a constant. We will reduce from , so consider an instance of the problem, of th roots of integers , respectively. We may assume since is also a constant greater than .
Set , where . Notice that . Set . Notice that so by setting integers and we get that
which meets the input requirement for . So we can use a oracle to count the number of partitions such that
Let . We will argue that if and only if , which completes the proof. There are two cases to consider: and .
Case 1: . In this case, we have:
So there is a one-to-one correspondence between solutions to the instance where and solutions to the original instance .
Case 2: . Solutions now look like:
One way this can be true is if for all . We can check ahead of time if our input set contains all ones. If it does, then there are partitions that yield equal products (all except and ) so we can just output as the solution and not even use our oracle. The only other way to satisfy the above expression is for which cannot happen because . So there are no solutions in the case that .
Therefore the output of the oracle on is the solution to the problem. So is -hard. ∎
For the next two proofs we will make use of the following fact to bound the amount of precision needed when approximating irrational numbers by rational ones in our reductions:
For all real numbers and functions that are differentiable on the interval :
For every constant , SUM-PARTITION is -hard even on instances where and where there are no partitions such that .
We will use a SUM-PARTITION oracle to solve given a set of th roots of positive integers , respectively, and a positive real number for integers given in the input. Notice that for every :
Above, must be a positive integer greater than , which tells us that the gap in products from every partition must take a particular form. This means that for a given and , can only be non-zero on a discrete set of possible values of . So given our T-PARTITION instance we can find a such that the above has no solutions for in the interval . Specifically, solve the above quadratic for . If is not an integer, then we know the answer to the T-PARTITION instance is 0, so assume is an integer and set . We can also find an interval just below where no value of in the interval can yield a solution above by setting . We use these discreteness properties twice in the proof. Also notice that these intervals are not too small:
and where is the input length (i.e. the bit lengths of the integers ).
where the last inequality follows from Fact 4.11. This final value is only exponentially small because is upper bounded by , which is at most exponentially large in the bit length of the ’s. A very similar proof shows that is only exponentially small. ∎
This means that we can always find such that is rational and can be fully specified with a bit length that is polynomial in the input length. Fix such a quantity . For all , define . Then, since -PARTITION has no solutions for :
We now show how to compute the two sums in the final term using the SUM-PARTITION oracle. We will give the procedure for computing and the case with will follow by symmetry. The oracle returns a real number, so by our model of computing real valued functions, we will also give the oracle an additional input that specifies the number of bits of precision in its output. Ultimately we only need to approximate each sum to within . This will give an approximation to the problem to within , thereby solving it by rounding the approximation because the solution will be an integer. We want to set the input to the SUM-PARTITION oracle to be such that for all , we have:
Taking and thinking of , it suffices that all positive solutions to each of the following two inequalities are the same:
The positive solutions to the left one are , and to the right one are . Setting the right-hand sides equal gives
Since might be irrational and SUM-PARTITION takes as input rational values of , we need to find a rational that approximates and preserves the set of solutions . Recall from Claim 4.13 that there is an (only) exponentially small interval below such that for all , . This translates to a corresponding interval such that for all , equivalence (6) holds. Furthermore, this interval is also only exponentially small.
where is the input length (i.e. the bit lengths of the integers ).
To see this, view from Equation 7 as a function of , and calculate the derivative:
(Recall that ). This is only exponentially small in the input length by Claim 4.13. ∎
So we can choose a rational that can be specified with a number of bits that is polynomial in the input length and preserves . However the SUM-PARTITION oracle gives us
whereas we want to compute the right-hand side without the coefficient. To get this we just pick another rational such that . If precision were not an issue, we could run our SUM-PARTITION oracle for and and receive the output:
Then the following linear combination of and gives us what we want:
Computing and to within yields an approximation of to within .
We just need to approximate and to within to get the desired precision. This additive error is only exponentially small by Claim 4.14. ∎
Running this whole procedure again for , which we fixed above gives us all the information we need to count the number of solutions to the instance we were given. We can solve in polynomial time with four calls to a SUM-PARTITION oracle. Therefore SUM-PARTITION is -hard. ∎
Now we prove that computing OptComp is -complete.
We have already shown that computing OptComp is -easy. Here we prove that it is also -hard, thereby proving -completeness.
This last expression is exactly the solution to the instance of SUM-PARTITION we were given. Taking precision into account, the input SUM-PARTITION instance has an additional input that specifies the desired number of bits of precision in the output and we can only pass OptComp rational values so we will have to approximate for all and . Again there is a worry that when we approximate these values the set of partitions that make might change. We want to get enough precision in our inputs so that the set of partitions over which we sum does not change and enough precision so that the output is accurate to bits. We will calculate the approximations required for each of these two goals separately and the final precision that we use will just be the maximum of the two. We prove that we can achieve both of these goals with the next two claims.
There exists a polynomial in the length of the input (the bit lengths of , and the numerator and denominator of ) such that if for each , then the set of partitions satisfying
is the same as the set of partitions satisfying
Recall that SUM-PARTITION is P-hard even on instances where there are no partitions such that so we may assume our input instance of SUM-PARTITION has no such partitions and still prove the hardness of OptComp. So to ensure that we have enough precision such that the set over which we sum does not change, we must make the error smaller than the minimum possible (in absolute value) nonzero outcome of . We now bound this quantity. Let
Since is rational, for two integers and . Let and . Then:
Where the last line follows from Fact 4.11 applied to the function . is only exponentially small because is at most exponentially large in the bit length of the integers . We claim that is at least for all . Fix :
Where the last implication follows because is just a difference of integers so the closest nonzero value it can take on is . ∎
There exists a polynomial in the length of the input (the bit lengths of , and the numerator and denominator of ) such that if for each , then
We will choose where is the polynomial that exists from Claim 4.16 and will be determined later. Define
Bounding each term in the final expression above by then gives us the accuracy we want. We will show directly how to bound the second term and the argument for the first term follows symmetrically. By hypothesis we have that for all :
Since and for all we get:
Picking such that then suffices to bound the absolute value of the sum by . Repeating the same calculation for will yield the same approximation except without the factor of . So we can bound both terms by (and therefore their sum by ) by approximating each to a precision that is polynomial in , which proves the claim. ∎
Approximation of OptComp
There is a polynomial-time algorithm that given rational and , outputs satisfying
where , assuming constant-time arithmetic operations.
We prove Theorem 1.7 using the following three lemmas:
In other words, if the values we are given are all integer multiples of some where is rational, we can determine whether or not the composition of those privacy parameters is -DP in pseudo-polynomial time, for every positive integer . Running binary search over integers , we can find the minimum such integer. When is small, this gives us a good overestimate of the optimal composition of the discrete input privacy parameters. This means that given any inputs to OptComp, we can discretize and polynomially bound the values to new values for all and use Lemma 5.2 to approximate OptComp. The next lemma tells us that this is also a good approximation of OptComp.
For all and :
Next we prove the three lemmas and then show that Theorem 1.7 follows.
Each cell in the table can be computed in constant time given earlier cells where . Thus filling the entire table takes time . ∎
Given a rational and for positive integers and rational Theorem 1.5 tells us that answering whether or not
is equivalent to answering whether or not the following inequality holds:
The right-hand side and are easy to compute given the inputs (note that is rational for all because each is an integer power of ). So in order to check the inequality, we will show how to compute the sum. Define
and observe that by setting , we have
Multiplying both sides by gives:
The above inequality together with Theorem 1.5 means that showing the following will complete the proof:
Since for every , it suffices to show:
This inequality holds term by term. If a right-hand term is zero , then so is the corresponding left-hand term . For the nonzero terms, the factor of ensures that the right-hand terms are larger than the left-hand terms. ∎
Lemma 5.2 tells us that we can determine whether a set of privacy parameters satisfies some differential privacy guarantee if the values and are all positive integer multiples of some where is rational. We are given rational and . Let be the arithmetic mean of the values. Let , set , and for all set and . We will use the following bounds on in the proof:
With these settings, the ’s are non-negative integers, the values are all integer multiples of and is rational. So for every positive integer we can apply Lemma 5.2 to determine whether or not OptComp in time . Running binary search over integers , we can find the minimum such integer, which we will call . The algorithm’s estimate of OptComp will be . However since this number is irrational, we will use the Taylor approximation of the natural logarithm to output satisfying . Since we only need to calculate a few terms of the Taylor expansion of to achieve this approximation, this step will not affect our running time.
Since we choose to be the minimum integer satisfying composition we have:
can range from to so the binary search can be done in iterations. This gives us a total running time of:
Now we argue that is a good approximation of OptComp. For all we have:
So all of the values are overestimates of their corresponding values and therefore
satisfying one of the inequalities in the theorem. We also have for all :
Let for all and let . Now we get
by Lemma 5.3. Noting that and are both at most completes the proof. ∎
References
Appendix A Comparison of Composition Theorems
The figures below compare the performances of four homogeneous composition theorems. In all figures, “Summing” refers to basic composition - Theorem 1.2 [DKMMN06], “DRV” refers to advanced composition - Theorem 1.3 [DRV10], “KOV Bound” refers to a bound in [KOV15] that is a closed form approximation of the optimal composition theorem, and “Optimal” refers to the optimal composition theorem - Theorem 1.4 [KOV15]. Here we are composing mechanisms that are differentially private to obtain an differentially private mechanism as guaranteed by one of the composition theorems.