Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions

Vadym Doroshenko, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Introduction

Differential privacy (DP) has become widely adopted as a notion of privacy in analytics and machine learning applications, leading to numerous practical deployments including in industry and government agencies . The DP guarantee of a (randomized) algorithm is parameterized by two real numbers ε>0\varepsilon>0 and δ\delta\in; the smaller these values, the more private the algorithm.

The appeal of DP stems from the strong privacy that it guarantees (which holds even if the adversary controls the inputs of all other users in the database), and from its nice mathematical properties. These include composition, whose basic form says that executing an (ε1,δ1)(\varepsilon_{1},\delta_{1})-DP algorithm and an (ε2,δ2)(\varepsilon_{2},\delta_{2})-DP algorithm and returning their results gives an algorithm that is (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-DP. While basic composition can be used to bound the DP properties of kk algorithms, it is known to not be tight, in particular for large values of kk. In fact, advanced composition yields a general improvement, often translating to k\approx\sqrt{k} reduction in the ε\varepsilon privacy bound of the composition of kk mechanisms each of which being (ε0,δ0)(\varepsilon_{0},\delta_{0})-DP. Such a reduction can be sizeable in practical deployments, and therefore much research has been focusing on obtaining tighter composition bounds in various settings.

In the aforementioned setting where each mechanism has the same DP parameters, Kairouz et al. derived the optimal composition bound. For the more general case of composing kk mechanisms whose privacy parameters are possibly different, i.e., the iith mechanism is guaranteed to be (εi,δi)(\varepsilon_{i},\delta_{i})-DP for some parameters εi,δi\varepsilon_{i},\delta_{i}, computing the (exact) DP parameters of the composed mechanism is known to be #P-complete .

While the results of provide a complete picture of privacy accounting when we assume only that the iith mechanism is (εi,δi)(\varepsilon_{i},\delta_{i})-DP, we can often arrive at tighter bounds when taking into account some additional information about the privacy loss of the mechanisms. For example, the Moments Accountant and Rényi DP methods keep track of (upper bounds on) the Renyi divergences of the output distributions on two adjacent databases; this allows one to compute upper bounds on the privacy parameters. These tools were originally introduced in the context of deep learning with DP (where the composition is over multiple iterations of the learning algorithm) in which they provide significant improvements over simply using the DP parameters of each mechanism. Other known tools that can also be used to upper-bound the privacy parameters of composed mechanisms include concentrated DP and its truncated variant . These methods are however all known not to be tight, and do not allow a high-accuracy estimation of the privacy parameters.

A numerical method for estimating the privacy parameters of a DP mechanism to an arbitrary accuracy, which has been the subject of several recent works starting with , relies on the privacy loss distribution (PLD). This is the probability mass function of the so-called privacy loss random variable in the case of discrete mechanisms, and its probability density function in the case of continuous mechanisms. From the PLD of a mechanism, one can easily obtain its (tight) privacy parameters. Moreover, a crucial property is that the PLD of a composition of multiple mechanisms is the convolution of their individual PLDs. Thus, used the Fast Fourier Transform (FFT) in order to speed up the computation of the PLD of the composition. Furthermore, explicit bounds on the approximation error for the resulting algorithm were derived in . The PLD has been the basis of multiple open-source implementations from both industry and academia including . We note that the PLD can be applied to mechanisms whose privacy loss random variables do not have bounded moments, and thus for which composition cannot be analyzed using the Moments Accountant or Rényi DP methods. An example such mechanism is DP-SGD-JL from .

A crucial step in previous papers that use PLDs is in approximating the distribution so that it has finite support; this is especially needed in the case where the PLD is continuous or has a support of a very large size, as otherwise the FFT cannot be performed efficiently. With the exception of Gopi et al. uses an estimator that is neither pessimistic nor optimistic, and instead derive their final values of δ\delta using concentration bound-based error estimates., previous PLD-based accounting approaches employ pessimistic estimators and optimistic estimators of PLDs. Roughly speaking, the former overestimate (i.e., give upper bounds on) δ\delta, whereas the latter underestimate δ\delta. For efficiency reasons, we would like the support of the approximate PLDs to be as small as possible, while retaining the accuracy of the estimates.

Our main contributions are the following:

We obtain a new pessimistic estimator for a PLD and a given desired support set. Our pessimistic estimator is simple to construct and is based on the idea of “connecting the dots” of the hockey-stick curve at the discretization intervals. Interestingly, we show that this is the best possible pessimistic estimator (and therefore is at least as good as previous estimators).

We complement the above result by obtaining a new optimistic estimator that underestimates the PLD. This estimator is based on the combination of a greedy algorithm and a convex hull computation. In contrast to the pessimistic case, we prove that there is no “best” possible optimistic estimator.

We conduct an experimental evaluation showing that our estimators can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than existing methods.

Preliminaries

We use supp(P)\operatorname*{supp}(P) to denote the support of a probability distribution PP. For two distributions P,QP,Q, we use PQP\otimes Q to denote the product distribution of the two. Furthermore, when P,QP,Q are over an additive group, we use PQP\ast Q to denote the convolution of the two distributions.

Let α0\alpha\geq 0. The α\alpha-hockey-stick divergence between two probability distributions PP and QQ over a domain Ω\Omega is given as

where supS\sup_{S} is over all measurable sets SΩS\subseteq\Omega.

The following characterization of hockey-stick curves, due to , is helpful:

2 Differential Privacy

The definition of differential privacy (DP) For more background on differential privacy, we refer the reader to the monograph . can be stated in terms of the hockey-stick divergence as follows.

For a notion of adjacent datasets, a mechanism M\mathcal{M} is said to satisfy (ε,δ)(\varepsilon,\delta)-differential privacy (denoted, (ε,δ)(\varepsilon,\delta)-DP) if for all adjacent datasets SSS\sim S^{\prime}, it holds that Deε(M(S)M(S))δD_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime}))\leq\delta.

We point out that the techniques developed in this paper are general and do not depend on the specific adjacency relation. For the rest of the paper, for convenience, we always use α\alpha to denote eεe^{\varepsilon}.

3 Dominating Pairs

A central notion in our work is that of a dominating pair for a mechanism, defined by Zhu et al. .

A pair (P,Q)(P,Q) of distributions dominates a pair (A,B)(A,B) of distributions if it holds that

A pair (P,Q)(P,Q) of distributions is a dominating pair for a mechanism M\mathcal{M} if for all adjacent datasets SSS\sim S^{\prime}, it holds that (M(S),M(S))(P,Q)(\mathcal{M}(S),\mathcal{M}(S^{\prime}))\preceq(P,Q); we denote this as M(P,Q)\mathcal{M}\preceq(P,Q).

A pair (P,Q)(P,Q) of distributions is a tightly dominating pair for M\mathcal{M} if for every α0\alpha\geq 0, it holds that Dα(PQ)=supSSDα(M(S)M(S))D_{\alpha}(P||Q)=\sup_{S\sim S^{\prime}}D_{\alpha}(\mathcal{M}(S)||\mathcal{M}(S^{\prime})).

The following result highlights the importance of dominating pairs.

If M(P,Q)\mathcal{M}\preceq(P,Q) and M(P,Q)\mathcal{M}^{\prime}\preceq(P^{\prime},Q^{\prime}), then MM(PP,QQ)\mathcal{M}\circ\mathcal{M}^{\prime}\preceq(P\otimes P^{\prime},Q\otimes Q^{\prime}), where MM\mathcal{M}\circ\mathcal{M}^{\prime} is the composition of M\mathcal{M} and M\mathcal{M}^{\prime}. Furthermore, this holds even for adaptive composition.In adaptive composition of MM\mathcal{M}^{\prime}\circ\mathcal{M}, M\mathcal{M}^{\prime} can also take the output of M\mathcal{M} as an auxiliary input. Here the M(P,Q)\mathcal{M}^{\prime}\preceq(P^{\prime},Q^{\prime}) has to hold for all possible auxiliary input.

Thus, in order to upper bound the privacy loss profile δM(ε):=supSSDeε(M(S)M(S))\delta_{\mathcal{M}}(\varepsilon):=\sup_{S\sim S^{\prime}}D_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime})), it suffices to compute Deε(PQ)D_{e^{\varepsilon}}(P||Q) for a dominating (P,Q)(P,Q) pair for M\mathcal{M}.

4 Privacy Loss Distribution

Privacy Loss Distribution (PLD) is yet another way to represent the privacy loss. For simplicity, we give a definition below specific to discrete distributions P,QP,Q; it can be extended, e.g., to continuous distributions by replacing the probability masses P(o),Q(o)P(o),Q(o) with probability densities of P,QP,Q at oo.

The privacy loss distribution (PLD) of a pair (P,Q)(P,Q) of discrete distributions, denoted by PLD(P,Q)\mathsf{PLD}_{(P,Q)}, is the distribution of the privacy loss random variable LL generated by drawing oPo\sim P and let L=P(o)/Q(o)L=P(o)/Q(o).

As alluded to earlier, PLD can be used to compute the hockey-stick divergence (proof provided in Appendix A for completeness):

Note that the RHS term above depends only on PLD(P,Q)\mathsf{PLD}_{(P,Q)} and not directly on P,QP,Q themselves. For convenience, we will abbreviate the RHS term as Deε(PLD(P,Q))D_{e^{\varepsilon}}(\mathsf{PLD}_{(P,Q)}).

The main advantage in dealing with PLDs is that composition simply corresponds to convolution of PLDs :

Let P,Q,P,QP,Q,P^{\prime},Q^{\prime} be discrete distributions. Then we have

5 Accounting Framework via Dominating Pairs and PLDs

Dominating pairs and PLDs form a powerful set of building blocks to perform privacy accounting. Recall that in privacy accounting, we typically have a mechanism M=M1Mk\mathcal{M}=\mathcal{M}_{1}\circ\cdots\circ\mathcal{M}_{k} where each Mi\mathcal{M}_{i} is a “simple” mechanism (e.g., Laplace or Gaussian mechanisms) and we would like to understand the privacy profile of M\mathcal{M}.

The approach taken in previous works can be summarized as follows.Note that their results are not phrased in terms of dominating pairs, since the latter is only defined and studied in . Nonetheless, these previous works use similar (but more restricted) notions for “worst case” distributions.

Identify a dominating pair (Ai,Bi)(A_{i},B_{i}) for each Mi\mathcal{M}_{i}.

Find a pessimistic estimateWe remark that this is slightly inaccurate as the “pessimistic estimate” in previous works may actually not be a valid PLD; please see Section 4.1 for a more detailed explanation. (Pi,Qi)(Ai,Bi)(P^{\uparrow}_{i},Q^{\uparrow}_{i})\succeq(A_{i},B_{i}) such that PLD(Pi,Qi)\mathsf{PLD}_{(P^{\uparrow}_{i},Q^{\uparrow}_{i})} is supported on a certain set of prespecified values.

Compute PLD=PLD(P1,Q1)PLD(Pk,Qk)\mathsf{PLD}^{\uparrow}=\mathsf{PLD}_{(P^{\uparrow}_{1},Q^{\uparrow}_{1})}\ast\cdots\ast\mathsf{PLD}_{(P^{\uparrow}_{k},Q^{\uparrow}_{k})}.

Compute δ(ε)\delta^{\uparrow}(\varepsilon) from PLD\mathsf{PLD}^{\uparrow} using the formula from Lemma 2.6.

By Theorem 2.4 and Lemmas 2.7 and 2.6, we can conclude that δ(ε)δM(ε)\delta^{\uparrow}(\varepsilon)\geq\delta_{\mathcal{M}}(\varepsilon); in other words, the mechanism M\mathcal{M} is (ε,δ(ε))(\varepsilon,\delta^{\uparrow}(\varepsilon))-DP as desired.

Note that the reason that one needs PLD(Pi,Qi)\mathsf{PLD}_{(P^{\uparrow}_{i},Q^{\uparrow}_{i})} to have finite support in the second step is so that it can be computed efficiently via the Fast Fourier Transform (FFT). Currently, there is only one approach used in previous works, called Privacy Buckets . Roughly speaking, this amounts simply to rounding the PLD up to the nearest point in the specified support set. (See Section 4.1 for a more formal description.)

While the above method gives us an upper bound δ(ε)\delta^{\uparrow}(\varepsilon) of δM(ε)\delta_{\mathcal{M}}(\varepsilon), there are scenarios where we would like to find a lower bound on δM(ε)\delta_{\mathcal{M}}(\varepsilon); for example, this can be helpful in determining how tight our upper bound is. Computing such a lower bound is also possible under the similar framework, except that we need to know a list of tightly dominating pairs (A1,B1),,(Ak,Bk)(A^{*}_{1},B^{*}_{1}),\dots,(A^{*}_{k},B^{*}_{k}) such that there exists an adjacent datasets S,SS,S^{\prime} for which Deε(M(S)M(S))=Deε(A1AkB1Bk)D_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime}))=D_{e^{\varepsilon}}(A^{*}_{1}\otimes\cdots\otimes A^{*}_{k}||B^{*}_{1}\otimes\cdots\otimes B^{*}_{k}). If such tightly dominating pairs can be identified, then we can follow the same blueprint as above except we replace a pessimistic estimate with an optimistic estimate (P,Q)(Ai,Bi)(P^{\downarrow},Q^{\downarrow})\preceq(A^{*}_{i},B^{*}_{i}). This would indeed gives us a lower bound δ(ε)\delta^{\downarrow}(\varepsilon) of δM(ε)\delta_{\mathcal{M}}(\varepsilon).

The described framework is illustrated in Figure 1.

Finitely-Supported PLDs

As alluded to in the previous section, to take full advantage of FFT, it is important that a PLD is discretized in a way such that its support is finite. For exposition purposes, we will assume that the discretization points include -\infty and ++\infty. We will use E\mathcal{E} for discretization points for the PLD and A\mathcal{A} for the corresponding discretization points for the hockey-stick curve:

For the remaining of this work, we will operate under the above assumption and we will not state this explicitly for brevity.

Using the characterization in Lemma 2.1, we can also characterize the hockey-stick curve of PLDs whose support is on a prespecified finite set E\mathcal{E}, stated more precisely in the lemma below. Furthermore, the “inverse” part of the lemma yields an algorithm (Algorithm 1) that can construct A,BA,B given (h(αi))αiA(h(\alpha_{i}))_{\alpha_{i}\in\mathcal{A}} which we will use in the sequel.

h(αi)[1αi]+h(\alpha_{i})\geq[1-\alpha_{i}]_{+} for all αiA\alpha_{i}\in\mathcal{A},

For all i[k1]i\in[k-1], the curve hh restricted to [αi1,αi][\alpha_{i-1},\alpha_{i}] is linear: i.e., for all α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}), we have h(α)=ααiαi1αih(αi1)+αi1ααi1αih(αi)h(\alpha)=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i}).

For all α>αk1\alpha>\alpha_{k-1}, h(α)=h(+)h(\alpha)=h(+\infty).

A consequence of Lemma 3.2 is that hh is completely specified by h(A)h(\mathcal{A}). More formally, given f:Af:\mathcal{A}\to, the only possible extension of ff to a hockey-stick curve is its piecewise-linear extension f\overline{f} defined by

()(\Leftarrow) We start with the converse direction by describing an algorithm that, given h(α0),,h(αk)h(\alpha_{0}),\dots,h(\alpha_{k}), can construct the desired P,QP,Q. In fact, we will construct distributions PP and QQ with supports contained in A\mathcal{A} satisfying the following:

P(α)=αQ(α)P(\alpha)=\alpha\cdot Q(\alpha) for all αA{+}\alpha\in\mathcal{A}\setminus\{+\infty\},

Dα(PQ)=h(α)D_{\alpha}(P||Q)=h(\alpha) for all αA\alpha\in\mathcal{A}.

The first two conditions imply that supp(PLD(P,Q))E\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E} and the last condition implies that h(P,Q)=hh_{(P,Q)}=h as desired.

The construction of P,QP,Q is described in Algorithm 1.

Let us now verify that both PP and QQ are valid probability distributions. First, notice that Q(αi)0Q(\alpha_{i})\geq 0 due to the convexity of hh. Furthermore,

where the last inequality follows from (iii). Thus, QQ is indeed a probability distribution. As for PP, notice that P(αi)0P(\alpha_{i})\geq 0 for all αiA\alpha_{i}\in\mathcal{A}. Finally, we also have

meaning that PP is a probability distribution as desired.

Properties 1 and 2 are immediate from the construction. We will now check Property 3, based on two cases whether α>αk1\alpha>\alpha_{k-1}.

Case I: ααk1\alpha\geq\alpha_{k-1}. In this case, we have Dα(PQ)=P(+)eεQ(+)=h(+)D_{\alpha}(P||Q)=P(+\infty)-e^{\varepsilon}Q(+\infty)=h(+\infty).

Case II: α<αk1\alpha<\alpha_{k-1}. Suppose that α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}) for i[k1]i\in[k-1]. We have

Similarly, we also have j=ik(αjαi)Q(αj)=h(αi)\sum_{j=i}^{k}(\alpha_{j}-\alpha_{i})\cdot Q(\alpha_{j})=h(\alpha_{i}). Combining the three equalities, we arrive at

which is equal to h(α)h(\alpha) due to assumption (iv).

()(\Rightarrow) We will now prove this direction. (i), (ii), and (iii) follow immediately from Lemma 2.1. As a result, it suffices to only prove (iv) and (v). Suppose that h(P,Q)=hh_{(P,Q)}=h for some pair (P,Q)(P,Q) such that supp(PLD(P,Q))E\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E}. Let RR be a shorthand for the distribution of exp(PLD(P,Q))\exp(\mathsf{PLD}_{(P,Q)}). To prove (iv), consider any α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}) for some i[k1]i\in[k-1]. We then have

Next, we prove (v). Consider any ααk1\alpha\geq\alpha_{k-1}. We have

Pessimistic PLDs with Finite Support

As we have described in Figure 1, pessimistic estimates of PLDs with finite supports are crucial in the PLD-based privacy accounting framework. The better these pessimistic estimates approximate the true PLD, the more accurate is the resulting upper bound δ(ε)\delta^{\uparrow}(\varepsilon).

Equipped with tools developed in the previous section, we will now describe our finite-support pessimistic estimate of a PLD. Specifically, given a pair (A,B)(A,B) of distributions, we would like to compute (P,Q)(P^{\uparrow},Q^{\uparrow}) such that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\succeq(A,B) with supp(PLD(P,Q))E\operatorname*{supp}(\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})})\subseteq\mathcal{E}. In fact, as we will show below (Lemma 4.1), our choice of PLD(P,Q)\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})} “best approximates” PLD(A,B)\mathsf{PLD}_{(A,B)}.

Our construction of the pair (P,Q)(P^{\uparrow},Q^{\uparrow}) is simple: run DiscretizePLD (Algorithm 1) on the input h(A,B)(α0),,h(A,B)(αk)h_{(A,B)}(\alpha_{0}),\dots,h_{(A,B)}(\alpha_{k}).

Recall from the proof of Lemma 3.2 that this construction simply gives h(P,Q)h_{(P^{\uparrow},Q^{\uparrow})}, which is a piecewise-linear extension of h(A,B)(α0),,h(A,B)(αk)h_{(A,B)}(\alpha_{0}),\dots,h_{(A,B)}(\alpha_{k}). In other words, we simply “connect the dots” to construct the hockey-stick curve of our pessimistic estimate. Note that this, together with the convexity of h(A,B)h_{(A,B)} (Lemma 2.1), implies that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\succeq(A,B) as desired.

Additionally, it is not hard to observe that our choice of pessimistic PLD is the best possible, in sense that (P,Q)(P^{\uparrow},Q^{\uparrow}) is the least element (under the domination partial order) among all pairs that dominate (A,B)(A,B):

Let P,QP,Q be any pair of distributions such that PLD(P,Q)\mathsf{PLD}_{(P,Q)} is supported on A\mathcal{A} and (P,Q)(A,B)(P,Q)\succeq(A,B). Then, we must have (P,Q)(P,Q)(P,Q)\succeq(P^{\uparrow},Q^{\uparrow})

Case I: ααk1\alpha\geq\alpha_{k-1}. From Lemma 3.2, we simply have h(P,Q)(α)=h(P,Q)(+)h(A,B)(+)=h(P,Q)(α)h_{(P,Q)}(\alpha)=h_{(P,Q)}(+\infty)\geq h_{(A,B)}(+\infty)=h_{(P^{\uparrow},Q^{\uparrow})}(\alpha).

Case II: αk1>α0\alpha_{k-1}>\alpha\geq 0. Suppose that α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}). From Lemma 3.2(ii), we have

where the first inequality follows from (P,Q)(A,B)(P,Q)\succeq(A,B) and the last equality follows from our construction of (P,Q)(P^{\uparrow},Q^{\uparrow}). ∎

We remark that PLD(P,Q)\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})} also has a simple form, due to the properties 1 and 2:

The primary previous work that also derived a pessimistic estimate of PLD is that of Meiser and Mohammadi , which has also been used (implicitly) in later works . In our terminology, the Privacy Buckets (PB) algorithm of Meiser and Mohammadi This is referred to as grid approximation in . can be restated as follows: let the pessimistic-PB estimate PLD~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} be the probability distribution where

for all i[k]i\in[k]. In other words, PLD~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} is a probability distribution on E\mathcal{E} that stochastically dominates PLD(A,B)\mathsf{PLD}_{(A,B)}; furthermore, PLD~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} is the least such distribution under stochastic dominant (partial) ordering. In previous works , such an estimate is then used in place of the true (non-discretized) PLD for accounting and computing δ\delta’s (via Lemma 2.7 and Lemma 2.6).

A priori, it is not clear whether PLD~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} is even a valid PLD (for some pair of distributions). However, it not hard to prove that this is indeed the case:

Let PPBP_{\text{PB}}^{\uparrow} be defined by

for all i{0,,k}i\in\{0,\dots,k\}. It is clear that PPBP_{\text{PB}}^{\uparrow} is a valid distribution.

Then, define QPBQ_{\text{PB}}^{\uparrow} by

for all αA{0}\alpha\in\mathcal{A}\setminus\{0\} and let QPB(0)=1αA{0}QPB(α)Q_{\text{PB}}^{\uparrow}(0)=1-\sum_{\alpha\in\mathcal{A}\setminus\{0\}}Q_{\text{PB}}^{\uparrow}(\alpha). To check that QPBQ_{\text{PB}}^{\uparrow} is a valid distribution, it suffices to show that QPB(0)0Q_{\text{PB}}^{\uparrow}(0)\geq 0. This is true because

Finally, it is obvious from the definitions of PPB,QPBP_{\text{PB}}^{\uparrow},Q_{\text{PB}}^{\uparrow} that PLD(PPB,QPB)=PLD~(A,B)\mathsf{PLD}_{(P_{\text{PB}}^{\uparrow},Q_{\text{PB}}^{\uparrow})}=\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}. ∎

Combining the above lemma and the fact that PLD~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} dominates PLD(A,B)\mathsf{PLD}_{(A,B)} with Lemma 4.1, we can conclude that our estimate is no worse than the PB estimate:

Let (P,Q)(P^{\uparrow},Q^{\uparrow}) be as defined above. Then, for all α0\alpha\geq 0, we have h(P,Q)(α)Dα(PLD~(A,B))h_{(P^{\uparrow},Q^{\uparrow})}(\alpha)\leq D_{\alpha}(\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}).

Illustrations of the exact hockey-stick divergence and its pessimistic estimates from our approach and PB approach can be found in Figure 2. A more detailed evaluation of the error from the two approaches (after compositions) can be found in Section 6.

Optimistic PLDs with Finite Support

We next consider optimistic PLDs, i.e., PLD(P,Q)\mathsf{PLD}_{(P,Q)} dominated by a given  PLD(A,B)\ \mathsf{PLD}_{(A,B)}. We start by showing that, unlike pessimistic PLDs for which there is the “best” possible choice (Lemma 4.1), there is no such a choice for optimistic PLDs:

There exists a pair (A,B)(A,B) of distributions and a finite set A\mathcal{A} such that, for any pair (P,Q)(A,B)(P,Q)\preceq(A,B) such that supp(PLD(P,Q))E\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E}, there exists a pair (P,Q)(A,B)(P^{\prime},Q^{\prime})\preceq(A,B) such that PLD(P,Q)\mathsf{PLD}_{(P^{\prime},Q^{\prime})} is supported on E\mathcal{E} and (P,Q)(P,Q)(P^{\prime},Q^{\prime})\npreceq(P,Q).

Let (A,B)(A,B) be the result of the ε\varepsilon-DP binary randomized response, i.e.,

Let A\mathcal{A} be {0,α1,α2,+}\{0,\alpha_{1},\alpha_{2},+\infty\} where α1=eεγ,α2=eε+γ\alpha_{1}=e^{-\varepsilon}-\gamma,\alpha_{2}=e^{-\varepsilon}+\gamma for any γ<min{eεeε,eε}\gamma<\min\{e^{\varepsilon}-e^{-\varepsilon},e^{-\varepsilon}\}.

Let h1:A{+}h_{1}:\mathcal{A}\cup\{+\infty\}\to be defined as

and let h1\overline{h}_{1} be its piecewise-linear extension. It is again simple to verify that h1\overline{h}_{1} satisfies the conditions in Lemma 3.2 and therefore h1=h(P1,Q1)\overline{h}_{1}=h_{(P_{1},Q_{1})} for some P1,Q1P_{1},Q_{1} such that PLD(P1,Q1)\mathsf{PLD}_{(P_{1},Q_{1})} is supported on E\mathcal{E}. Furthermore, it can be checked from our definition that (A,B)(P1,Q1)(A,B)\succeq(P_{1},Q_{1}).

Similarly, let h2:A{+}h_{2}:\mathcal{A}\cup\{+\infty\}\to be defined as

and let h2\overline{h}_{2} be its piecewise-linear extension. Again, h2=h(P2,Q2)\overline{h}_{2}=h_{(P_{2},Q_{2})} for some P2,Q2P_{2},Q_{2} such that PLD(P2,Q2)\mathsf{PLD}_{(P_{2},Q_{2})} is supported on E\mathcal{E} and (A,B)(P2,Q2)(A,B)\succeq(P_{2},Q_{2}).

Now, consider any (P,Q)(A,B)(P,Q)\preceq(A,B) such that PLD(P,Q)\mathsf{PLD}_{(P,Q)} is supported on E\mathcal{E}. We claim that (P,Q)(P^1,Q^1)(P,Q)\nsucceq(\hat{P}_{1},\hat{Q}_{1}) or (P,Q)(P2,Q2)(P,Q)\nsucceq(P_{2},Q_{2}). To prove this, assume for the sake of contradiction that (P,Q)(P1,Q1)(P,Q)\nsucceq(P_{1},Q_{1}) and (P,Q)(P2,Q2)(P,Q)\nsucceq(P_{2},Q_{2}). This means that

From piecewise-linearity of h(P,Q)h_{(P,Q)} restricted to [α1,α2][\alpha_{1},\alpha_{2}] (Lemma 3.2), we then have h(P,Q)(eε)=12(h(P,Q)(α1)+h(P,Q)(α2))>h(A,B)(eε)h_{(P,Q)}(e^{\varepsilon})=\frac{1}{2}\left(h_{(P,Q)}(\alpha_{1})+h_{(P,Q)}(\alpha_{2})\right)>h_{(A,B)}(e^{\varepsilon}), a contradiction to the assumption that (P,Q)(A,B)(P,Q)\preceq(A,B). ∎

The previous lemma shows that, unfortunately, there is no canonical choice for an optimistic estimate for a given PLD. Due to this, we propose a simple greedy algorithm to construct an optimistic estimate for the PLD of a given pair (A,B)(A,B). Similar to before, it will be more convenient to deal directly with the hockey-stick curve. Here we would like to construct f:Af:\mathcal{A}\to such that its piecewise-linear extension f\overline{f} point-wise lower bounds h(A,B)h_{(A,B)}. The distribution P,QP,Q (and PLD(P,Q)\mathsf{PLD}_{(P,Q)}) can then be computed using Algorithm 1.

First Greedy Attempt. Before describing our algorithm, let us describe an approach that does not work; this will demonstrate the hurdles we have to overcome. Consider the following simple greedy algorithm: start with f(α0)=0f(\alpha_{0})=0 and, if we are currently at f(αi)f(\alpha_{i}), then find the largest possible f(αi+1)f(\alpha_{i+1}) such that the line f(αi),f(αi+1)f(\alpha_{i}),f(\alpha_{i+1}) is below h(A,B)h_{(A,B)}. (In other words, the line f(αi),f(αi+1)f(\alpha_{i}),f(\alpha_{i+1}) is tangent to h(A,B)h_{(A,B)}.)

While this is a natural approach, there are two issues with this algorithm:

First and more importantly, it is possible that at some discretization point f(αi+1)f(\alpha_{i+1}) becomes negative! Obviously this invalidates the construction as ff will not correspond to a hockey-stick curve.

Secondly, each computation of tangent line requires several (and sequential) computations of the derivative hh^{\prime}—rendering the algorithm inefficient—and is also subject to possible numerical instability.

We remark that if we instead start from right (i.e., f(αk)f(\alpha_{k})) and proceed greedily to the left (in decreasing order of ii), then the first issue will become that f(αi)f(\alpha_{i}) can be smaller than [1αi]+[1-\alpha_{i}]_{+}, which also makes it an invalid hockey-stick curve due to Lemma 2.1(iii). As will be explained below, we will combine these two directions of greedy together with a convex hull algorithm in our revised approach.

An Additional Assumption. For our algorithm, we will also need a couple of assumptions. The first one is that 1 belongs to the discretization set:

1A1\in\mathcal{A} (or equivalently 0E0\in\mathcal{E}).

For the remainder of this section, we will use i[k]i^{*}\in[k] to denote the index for which αi=1\alpha_{i^{*}}=1 (i.e., εi=0\varepsilon_{i^{*}}=0).

We show below that this assumption is necessary. When it does not hold, then it may simply be impossible to find an optimistic estimate at all:

There exists a pair (A,B)(A,B) of distributions such that any (P,Q)(A,B)(P,Q)\preceq(A,B) satisfies 0supp(PLD(P,Q))0\in\operatorname*{supp}(\mathsf{PLD}_{(P,Q)}).

Let A=BA=B. We simply have h(A,B)(α)=[1α]+h_{(A,B)}(\alpha)=[1-\alpha]_{+}. Due to Lemma 2.1(iii), we must have h(P,Q)(α)=[1α]+h_{(P,Q)}(\alpha)=[1-\alpha]_{+}. This simply implies PLD(P,Q)(0)=1\mathsf{PLD}_{(P,Q)}(0)=1 as desired. ∎

Our Greedy + Convex Hull Algorithm. We are now ready to describe our final algorithm. The main idea is to not attempt to create the curve in one left-to-right or right-to-left sweep, but rather to simply generate a “candidate set” FiF_{i} for each f(αi)f(\alpha_{i}). (Such a set will in fact be a singleton for all ii except i=ii=i^{*}, for which Fi2|F_{i}|\leq 2, but we will refer to FiF_{i}’s as sets here for simplicity of notation.) We then compute the convex hull of these points {(αi,fi)}i[k1],fiFi\{(\alpha_{i},f_{i})\}_{i\in[k-1],f_{i}\in F_{i}} and take it (or more precisely its lower curve) as our optimistic estimate. This last step immediately ensures the convexity of our curve, which is required for it to be a valid hockey-stick curve (Lemma 2.1(i)).

To construct the candidate set FiF_{i}, we combine the left-to-right and right-to-left greedy approaches. Specifically, for i={0,,i1}i=\{0,\dots,i^{*}-1\}, we draw the tangent line of the true curve hh at h(αi)h(\alpha_{i}) and let its intersection with the vertical line α=αi+1\alpha=\alpha_{i+1} be (αi+1,fi+1)(\alpha_{i+1},f^{\rightarrow}_{i+1}); then we add fi+1f^{\rightarrow}_{i+1} into Fi+1F_{i+1}. Similarly, for i={k1,,i+1}i=\{k-1,\dots,i^{*}+1\}, we draw the tangent line of the true curve hh at h(αi)h(\alpha_{i}) and let its intersection with the vertical line α=αi1\alpha=\alpha_{i-1} be (αi1,fi1)(\alpha_{i-1},f^{\leftarrow}_{i-1}); then we add fi1f^{\leftarrow}_{i-1} into Fi1F_{i-1}. At the very end points i=0i=0 and i=k1i=k-1, we also add 11 and respectively to FiF_{i}.

Notice that this algorithm, unlike the previous (failed) greedy approach, only requires a calculation of hh^{\prime} at each αiA{1,+}\alpha_{i}\in\mathcal{A}\setminus\{1,+\infty\}, which can be done in parallel. Furthermore, efficient algorithms for convex hull are well known in the literature and can be used directly.

The complete and more precise description of our algorithm is given in Algorithm 2. We also note here that Fi=1|F_{i}|=1 for all iii\neq i^{*} and Fi2|F_{i^{*}}|\leq 2 (as the point constructed from the left may be different from the point from the right). Nonetheless, we write FiF_{i}’s as sets for simplicity of notation. An illustration of the algorithm can be found in Figure 4.

Having described our algorithm, we will now proof its correctness, i.e., that it outputs a pair of distributions dominated by the input pair.

Let (P,Q)(P^{\downarrow},Q^{\downarrow}) denote the output of \textscOptimisticPLD(h,A)\textsc{OptimisticPLD}(h,\mathcal{A}) where h=h(A,B)h=h_{(A,B)}. Then, under 5.2, we have (P,Q)(A,B)(P^{\downarrow},Q^{\downarrow})\preceq(A,B).

To prove Theorem 5.4, it will be crucial to have the following lower bounds on the candidate points.

For all i{0,,i}i\in\{0,\dots,i^{*}\}, f(αi)1αif^{\rightarrow}(\alpha_{i})\geq 1-\alpha_{i}.

For all i{i,,k1}i\in\{i^{*},\dots,k-1\}, f(αi)0f^{\leftarrow}(\alpha_{i})\geq 0.

For all i{0,,i}i\in\{0,\dots,i^{*}\}, f(αi)h(αi)f^{\rightarrow}(\alpha_{i})\leq h(\alpha_{i}).

For all i{i,,k1}i\in\{i^{*},\dots,k-1\}, f(αi)h(αi)f^{\leftarrow}(\alpha_{i})\leq h(\alpha_{i}).

The statement obviously holds for i=0i=0. Next, consider i[i]i\in[i^{*}]. From (ii) and (iii) of Lemma 2.1, we have h(0)1h^{\prime}(0)\geq-1. Furthermore, the convexity of hh (Lemma 2.1(i)) implies that h(αk1)h(0)1h^{\prime}(\alpha_{k-1})\geq h^{\prime}(0)\geq-1. Therefore, we have

where the second inequality is due to Lemma 2.1(iii).

The statement obviously holds for i=k1i=k-1. Next, consider i{i,,k2}i\in\{i^{*},\dots,k-2\}. Since hh is non-increasing (from Lemma 2.1(i)), we have h(αi+1)0h^{\prime}(\alpha_{i+1})\leq 0. Therefore, f(αi)h(αi+1)0f^{\leftarrow}(\alpha_{i})\geq h(\alpha_{i+1})\geq 0.

The statement obviously holds for i=0i=0. For i[i]i\in[i^{*}], the convexity of hh (Lemma 2.1(i)) immediately implies that

The statement obviously holds for i=k1i=k-1. For i{i,,k2}i\in\{i^{*},\dots,k-2\}, the convexity of hh (Lemma 2.1(i)) immediately implies that

To see that (i) holds, observe that the second-rightmost point in the convex hull must be (αj,fj)(\alpha_{j},f_{j}) for some j{0,,k2}j\in\{0,\dots,k-2\} where fj=fjf_{j}=f^{\rightarrow}_{j} or fj=fjf_{j}=f^{\leftarrow}_{j}. In either case, Lemma 5.5 implies that fj0=f(αk1)f_{j}\geq 0=f^{\leftarrow}(\alpha_{k-1}). Since f\overline{f} is the lower curve of the convex hull HH and (αj,fj),(αk1,0)(\alpha_{j},f_{j}),(\alpha_{k-1},0) is its rightmost segment, we can conclude that f\overline{f} is non-increasing in the range [α0,αk1][\alpha_{0},\alpha_{k-1}]. Finally, since we simply have f(α)=0\overline{f}(\alpha)=0 for all α>αk1\alpha>\alpha_{k-1}, it is also non-increasing in the range [αk1,+)[\alpha_{k-1},+\infty), thereby proving (i).

As for (ii), since f[α0,αk1]\overline{f}|_{[\alpha_{0},\alpha_{k-1}]} forms the lower boundary of the convex hull HH, f\overline{f} is convex in the range [α0,αk1][\alpha_{0},\alpha_{k-1}]. Again, since we simply have f(α)=0\overline{f}(\alpha)=0 for all α>αk1\alpha>\alpha_{k-1}, we can conclude that it is convex for the entire range [0,+)[0,+\infty).

Finally, for (iii), Lemma 5.5 states that all the points {(αi,fi)}i{0,,i}{(αi,fi)}i{i,,k1}\{(\alpha_{i},f^{\rightarrow}_{i})\}_{i\in\{0,\dots,i^{*}\}}\cup\{(\alpha_{i},f^{\leftarrow}_{i})\}_{i\in\{i^{*},\dots,k-1\}} is above the curve α[1α]+\alpha\mapsto[1-\alpha]_{+}. Since f\overline{f} is in the convex hull HH, we can conclude that f\overline{f} also lies above this curve, as desired.

Case I: ααk1\alpha\geq\alpha_{k-1}. In this case, f(α)=0h(α)\overline{f}(\alpha)=0\leq h(\alpha).

Case II: α[1,αk1)\alpha\in[1,\alpha_{k-1}). Suppose that α[αi,αi+1)\alpha\in[\alpha_{i},\alpha_{i+1}). Let L1L_{1} denote the line segment from (αi,f(αi))(\alpha_{i},f^{\leftarrow}(\alpha_{i})) to (αi+1,f(αi+1))(\alpha_{i+1},f^{\leftarrow}(\alpha_{i+1})), and L2L_{2} denote the line segment from (αi,f(αi))(\alpha_{i},f^{\leftarrow}(\alpha_{i})) to (αi+1,h(αi+1))(\alpha_{i+1},h(\alpha_{i+1})). From Lemma 5.5(iv), L1L_{1} is below L2L_{2}. Furthermore, since f[αi,αi+1)\overline{f}|_{[\alpha_{i},\alpha_{i+1})} is a lower boundary of the convex hull HH containing L1L_{1}, it must also be below L1L_{1}. Therefore, we have

where the last inequality follows from convexity of hh.

Case III: α[0,1)\alpha\in[0,1). Suppose that α[αi,αi+1)\alpha\in[\alpha_{i},\alpha_{i+1}). Similarly to the previous case, let L3L_{3} denote the line segment from (αi,f(αi))(\alpha_{i},f^{\rightarrow}(\alpha_{i})) to (αi+1,f(αi+1))(\alpha_{i+1},f^{\rightarrow}(\alpha_{i+1})), and L4L_{4} denote the line segment from (αi,h(αi))(\alpha_{i},h(\alpha_{i})) to (αi+1,f(αi+1))(\alpha_{i+1},f^{\rightarrow}(\alpha_{i+1})). From Lemma 5.5(iii), L3L_{3} is below L4L_{4}. Furthermore, since f[αi,αi+1)\overline{f}|_{[\alpha_{i},\alpha_{i+1})} is a lower boundary of the convex hull HH containing L3L_{3}, it must also be below L3L_{3}. Therefore, we have

where the last inequality follows from convexity of hh.

As a result, we can conclude that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\preceq(A,B), completing our proof. ∎

2 Comparison to Privacy Loss Buckets

Similar to Section 4.1, PB can also be applied for optimistic-estimate: let PLD~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} be the probability distribution where

for all i[k]i\in[k]. That is, PLD~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} is a probability distribution on E\mathcal{E} that is stochastically dominated by PLD(A,B)\mathsf{PLD}_{(A,B)}; furthermore, PLD~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} is the greatest such distribution under stochastic dominant (partial) ordering. It is important to note that, unlike the pessimistic-PB estimate, the optimistic-PB estimate PLD~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} is not necessarily a valid PLD for some pair of distributions. This can easily be seen by, e.g., taking a PLD of any ε\varepsilon-DP mechanism for finite ε\varepsilon and let E={,+}\mathcal{E}=\{-\infty,+\infty\}; the optimistic-PB estimate puts all of its mass at , which is clearly not a valid PLD.

We present illustrations of our optimistic PLD and optimistic-PB estimates in Figure 3. Recall that in the pessimistic case, we can show that the pessimistic-PB estimate is no better than our approach (Corollary 4.3). Although we observe similar behaviours in the optimistic case in simple examples (e.g. Figure 3) and also in our experiments in Section 6, this unfortunately does not hold in general. Indeed, if PLD(A,B)\mathsf{PLD}_{(A,B)} has a non-zero mass at ++\infty (or equivalently h(+)0h(+\infty)\neq 0), then the optimistic-PB estimate still keeps this mass while our does not. The latter is because we set f(+)=0f(+\infty)=0 in Algorithm 2. Note here that we cannot set f(+)=h(+)f(+\infty)=h(+\infty) here because the monotonicity may not hold anymore; it is possible that f(αi)<h(+)f^{\rightarrow}(\alpha_{i})<h(+\infty) for some i[i]i\in[i^{*}]. Such examples highlight the challenge in finding a good optimistic estimate (especially in light of the non-existence of the best one, i.e., Lemma 5.1), and we provide further discussion regarding this in Section 7.

Evaluation

We compare our algorithm with the Privacy Buckets algorithm as implemented in the Google DP libraryEven though there are several other papers that build on PB, all of them still use the same PB-based approximation, with the differences being how the truncation is computed for FFT. We use the implementation in the Google DP library github.com/google/differential-privacy/tree/main/python, and the algorithm of Gopi et al. implemented in Microsoft PRV Accountant.Implementation from github.com/microsoft/prv_accountant Gopi et al.’s algorithm does not fit into the pessimistic/optimistic framework as described in Section 2.5. Instead, their algorithm uses an approximation of PLD that is neither optimistic nor pessimistic and uses a concentration bound to derive pessimistic and optimistic estimates. Indeed, their approximate distribution maintains the same expectation as the true PLD, which is the main ingredient in their improvement over previous work.

As a first cut, we evaluate pessimistic and optimistic estimates on the privacy parameter ε\varepsilon, for a fixed value of δ=105\delta=10^{-5}, for varying number of compositions of the Gaussian mechanism with noise scale 8080, while comparing our approach to the two other implementations mentioned above. We use the same discretization interval to evaluate each algorithm. The reason for choosing the Gaussian mechanism is that the exact value of ε\varepsilon can be computed explicitly. We find that the estimates given by our approach are the tightest.

For any specified discretization interval, each algorithm has a different choice of how many discretization points are included in E\mathcal{E}. Our implementation uses the same set of discretization points as used by the Google DP implementation. The number of discretization points increases with the number of self compositions (we use the Google DP implementation to perform self compositionWe found that the Google DP implementation has a significantly worse running time when computing optimistic estimates, due to lack of truncation. We modify the self-composition method in the Google DP library to incorporate truncation when computing optimistic estimates, and evaluate both ours and Google DP implementation with this minor modification. These do not change the estimates significantly, but drastically reduce the running time.). On the other hand, Microsoft PRV Accountant chooses a number of discretization points, depending on the number of compositions desired, and this number does not change after self composition. In all the evaluation experiments mentioned in this paper, we find that the number of discretization points in our approach are lower than the number of discretization points in the PRV Accountant (even after composition).

Our main evaluation involves computing pessimistic and optimistic estimates on the privacy parameter ε\varepsilon, for a fixed value of δ\delta, for varying number of compositions of the Poisson sub-sampled Gaussian mechanism and comparing our approach to each of the two other implementations. Note that this particular mechanism is quite popular in that it captures the privacy analysis of DP-SGD where the number of compositions is equal to the number of iterations of the training algorithm, and the subsampling rate is equal to the fraction of the batch size divided by the total number of training examples . In particular, we consider the Gaussian mechanism with noise scale 11, Poisson-subsampled with probability 0.010.01. We compare against each competing algorithm twice, once where both algorithms use the same discretization interval, and once where our approach uses a larger discretization interval than the competing algorithm. We additionally plot the running time required for this computation for each number of compositions; we ran the evaluation for each number of compositions 20×20\times and plot the mean running time along with a shaded region indicating 25th–75th percentiles of running time.

The comparison with Google DP is presented in Figure 6. Figures 6(a) and 6(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval, and finds that our method gives a significantly tighter estimate for a mildly larger running time. Figures 6(b) and 6(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals, and using a discretization interval that is 66.66×66.66\times larger, our method gives comparable estimates, with a drastic speed-up (300×\sim 300\times).

The comparison with Microsoft PRV Accountant is presented in Figure 7. Figures 7(a) and 7(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval, and finds that our method gives a significantly tighter estimate with already shorter running time. Figures 7(b) and 7(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals, and using a discretization interval that is 6.66×6.66\times larger, our method gives comparable estimates, with an even larger speed-up.

In Appendix B, we perform a similar evaluation for the Poisson-subsampled Laplace mechanism.

Discussion & Open Problems

In this work, we have proposed a novel approach to pessimistic and optimistic estimates for PLDs, which outperforms previous approaches under similar discretization intervals, and allows for a more compact representation of PLDs while retaining similar error guarantees. There are still several interesting future directions that one could consider.

As we have proved in Lemma 5.1, there is no unique “best” way to pick an optimistic estimate, and we proposed a greedy algorithm (Algorithm 2) for this task. However, it is difficult to determine how good this greedy algorithm is in general. Instead, it might also be interesting to find (P,Q)(A,B)(P^{\downarrow},Q^{\downarrow})\preceq(A,B) that minimizes a certain objective involving h(P,Q)h_{(P^{\downarrow},Q^{\downarrow})} and h(A,B)h_{(A,B)}. For example, one could consider the area between the two curves, or the Fréchet distance between them. An intriguing direction here is to determine (i) which objective captures the notion of “good approximation” better in terms of composition, and (ii) for a given objective, whether there is an efficient algorithm to compute such (P,Q)(P^{\downarrow},Q^{\downarrow}). We remark that for some objectives, such as the area between the two curves, it is possible to discretize the candidate values for each f(αi)f(\alpha_{i}) and use dynamic programming in an increasing order of ii (with the state being f(αi1),f(αi)f(\alpha_{i-1}),f(\alpha_{i})). Even for these objectives, it remains interesting to determine whether such discretization is necessary and whether more efficient algorithms exist.

Also related is the question of how to theoretically explain our experimental findings (Section 6). Although we see significant numerical improvements, it is intriguing to understand theoretically where these improvements come from and which properties of PLDs govern how big such improvements are. More broadly, given an estimate of a PLD, how can we quantify how “good” it is? Previous work (e.g., ) has obtained certain theoretical bounds on the errors; it would be interesting to investigate whether these bounds can help answer the aforementioned question.

Furthermore, the entire line of work on PLD-based accounting , including this paper, has so far considered only non-interactive compositions, meaning that the mechanisms that are run in subsequent steps cannot be changed based on the outputs from the previous steps. This is not a coincidence: interactive composition is highly non-trivial and in fact it is known that the advanced composition theorem (which is even a more specific form of PLD) does not hold in this regime . Several solutions have been proposed here, such as modified formulae for advanced compositions and Renyi DP . However, as discussed earlier, these methods may be loose even in the non-interactive setting, which is the original motivation for PLD-based accounting. Therefore, it would be interesting to understand whether there is a tighter method similar to PLDs that also works in the interactive setting.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Appendix A Proofs

We have from the definition of hockey-stick divergence that

where the last line follows from the fact that

Appendix B Evaluation of Poisson-Subsampled Laplace Mechanism

Similar to Section 6, we compute pessimistic and optimistic estimates on the privacy parameter ε\varepsilon, for a fixed value of δ\delta, for varying number of compositions of the Poisson sub-sampled Laplace mechanism and comparing our approach to the Google DP implementation.we were unable to compare against Microsoft PRV Accountant, since their implementation does not have support for the Laplace mechanism yet. In particular, we consider the Laplace mechanism with noise scale 55, Poisson-subsampled with probability 0.010.01. We compare against Google DP implementation twice, once where both algorithms use the same discretization interval, and once where our approach uses a larger discretization interval than the competing algorithm. We additionally plot the running time required for this computation for each number of compositions; we ran the evaluation for each number of compositions 20×20\times and plot the mean running time along with a shaded region indicating 25th–75th percentiles of running time.

The comparison with Google DP is presented in Figure 8. Figures 6(a) and 8(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval, and finds that our method gives a significantly tighter estimate for a mildly larger running time. Figures 8(b) and 8(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals, and using a discretization interval that is 100×100\times larger, our method gives comparable estimates, with a significant running time speed-up (75×\sim 75\times).

Appendix C Inaccuracies from Floating-Point Arithmetic

We briefly discuss the errors due to floating-point arithmetic. In our implementation, we use the default float datatype in python, which conforms to IEEE-754 “double precision”. Roughly speaking, this means that the resolution of each floating-point number of 2531.110162^{-53}\approx 1.1\cdot 10^{-16}. The number of operations performed in our algorithms scales linearly with the support size of the (discretized) PLD, which is less than 10410^{4} in all of our experiments. Therefore, a rough heuristic suggests that the numerical error for δ\delta here would be less than 101110^{-11}. We stress however that this is just a heuristic and is not a formal guarantee: achieving a formal guarantee is much more complicated, e.g., our optimistic algorithm requires computing a convex hull and one would have to formalize how the numerical error from convex hull computation affects the final δ\delta.

Finally, we also remark that, while Gopi et al. [15, Appendix A] note that they experience numerical issues around δ109\delta\approx 10^{-9}, we do not experience the same issues in our algorithm even for similar setting of parameters even for δ\delta as small as 101210^{-12}.