Numerical Composition of Differential Privacy

Sivakanth Gopi, Yin Tat Lee, Lukas Wutschitz

Introduction

Differential privacy (DP) introduced by [DMNS06] provides a provable and quantifiable guarantee of privacy when the results of an algorithm run on private data are made public. Formally, we can define an (ε,δ)(\varepsilon,\delta)-differentially private algorithm as follows.

An algorithm M\mathcal{M} is (ε,δ)(\varepsilon,\delta)-DP if for any two neighboring databases D,DD,D^{\prime} differing in exactly one user and any subset SS of outputs, we have Pr[M(D)S]eεPr[M(D)S]+δ.\Pr[\mathcal{M}(D)\in S]\leq e^{\varepsilon}\Pr[\mathcal{M}(D^{\prime})\in S]+\delta.

Intuitively, it says that looking at the outcome of M\mathcal{M}, we cannot tell whether it was run on DD or DD^{\prime}. Hence an adversary cannot infer the existence of any particular user in the input database, and therefore cannot learn any personal data of any particular user.

DP algorithms have an important property called composition. Suppose M1M_{1} and M2M_{2} are DP algorithms and say M(D)=(M1(D),M2(D))M(D)=(M_{1}(D),M_{2}(D)), i.e., MM runs both the algorithms on DD and outputs their results. Then MM is also a DP algorithm.

If M1M_{1} is (ε1,δ1)(\varepsilon_{1},\delta_{1})-DP and M2M_{2} is (ε2,δ2)(\varepsilon_{2},\delta_{2})-DP, then M(D)=(M1(D),M2(D))M(D)=(M_{1}(D),M_{2}(D)) is (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-DP.

This also holds under adaptive composition (denoted by M=M2M1M=M_{2}\circ M_{1}), where M2M_{2} can look at both the database and the output of M1M_{1}.Here M(D)=(M1(D),M2(D,M1(D)))M(D)=(M_{1}(D),M_{2}(D,M_{1}(D))). It turns out that both compositions enjoy much better DP guarantees than this simple composition rule. Let MM be an (ε,δ)(\varepsilon,\delta)-DP algorithm and let MkM^{\circ k} denote the (adaptive) composition of MM with itself kk times. The naive composition rule shows that MkM^{\circ k} is (kε,kδ)(k\varepsilon,k\delta)-DP. This was significantly improved in [DRV10].

If MM is (ε,δ)(\varepsilon,\delta)-DP, then MkM^{\circ k} is (ε,kδ+δ)(\varepsilon^{\prime},k\delta+\delta^{\prime})-DP where

Note that if ε=O(1k)\varepsilon=O\left(\frac{1}{\sqrt{k}}\right) and δ=o(1k)\delta=o\left(\frac{1}{k}\right), then MkM^{\circ k} satisfies (Oδ(1),δ)(O_{\delta^{\prime}}(1),\delta^{\prime})-DP. Using simple composition (Proposition 1.2), we can only claim that MkM^{\circ k} is (O(k),o(1))(O(\sqrt{k}),o(1))-DP. Thus advanced composition often results in k\sqrt{k}-factor savings in privacy which is significant in practice. The optimal DP guarantees for kk-fold composition of an (ε,δ)(\varepsilon,\delta)-DP algorithm were finally obtained by [KOV15]. For composing different algorithms, the situation is more complicated. If M1,M2,,MkM_{1},M_{2},\dots,M_{k} are DP algorithms such that MiM_{i} is (εi,δi)(\varepsilon_{i},\delta_{i})-DP, then it is shown by [MV16] that computing the exact DP guarantees for M=M1M2MkM=M_{1}\circ M_{2}\circ\dots\circ M_{k} is #P-complete. They also give an algorithm to approximate the DP guarantees of MM to desired accuracy η\eta which runs in

In most situations, DP algorithms come with a collection of (ε,δ)(\varepsilon,\delta)-DP guarantees, i.e., for each value of ε\varepsilon, there exists δ\delta such that the algorithm is (ε,δ)(\varepsilon,\delta)-DP.

For example the privacy curve of a Gaussian mechanism (with sensitivity 11 and noise scale σ\sigma) is given by δ(ε)=Φ(εσ+1/2σ)eεΦ(εσ1/2σ)\delta(\varepsilon)=\Phi\left(-\varepsilon\sigma+1/2\sigma\right)-e^{\varepsilon}\Phi\left(-\varepsilon\sigma-1/2\sigma\right) where Φ()\Phi(\cdot) is the Gaussian CDF [BW18]. Suppose we want to compose several Gaussian mechanisms, which (ε,δ)(\varepsilon,\delta)-DP guarantee should we choose for each mechanism? Any choice will lead to suboptimal DP guarantees for the final composition. Instead, we need a way to compose the privacy curves directly. This was suggested through the use of privacy region in [KOV15] and explicitly studied in the ff-DP framework of [DRS19]. ff-DP is a dual way (and equivalent) to look at the privacy curve δ(ε).\delta(\varepsilon).

In an influential paper where they introduce Differentially Private Deep Learning, [ACG+16] proposed a method called the Moments Accountant (MA) for giving an upper bound the privacy curve of a composition of DP algorithms. They applied their method to bound the privacy loss of differentially private Stochastic Gradient Descent (DP-SGD) algorithm which they introduced. Analyzing the privacy loss of DP-SGD involves composing the privacy curve of each iteration of training with itself kk times, where kk is the total of number of training iterations. Typical values of kk range from 10001000 to 300000300000 (such as when training large models like GPT3). The Moments Accountant was subsumed into the framework of Renyi Differential Privacy (RDP) introduced by [Mir17]. The running time of these accountants are independent of kk, but they only give an upper bound and cannot approximate the privacy curve to arbitrary accuracy.

DP-SGD is one of the most important DP algorithms in practice, because one can use it to train neural networks to achieve good privacy-vs-utility tradeoffs. Therefore obtaining accurate and tight privacy guarantees for DP-SGD is important. For example reducing ε\varepsilon from 22 to 11, can mean that one can train the network for 4 times more epochs while staying within the same privacy budget. Therefore DP-SGD is one of the main motivations for this work.

There are also situations when the PRVs do not have bounded moments and so Moments Accountant or Renyi DP cannot be applied for analyzing privacy. An example of such an algorithm is the DP-SGD-JL algorithm of [BGK+21] which uses numerical composition of PRVs to analyze privacy.

GDP Accountant

[DRS19, BDLS19] introduced the notion of Gaussian Differential Privacy (GDP) and used it to develop an accountant for DP-SGD. The accountant is based on central limit theorem and only gives an approximation to the true privacy curve, where the approximation gets better with kk. But as we show in Figure 1, GDP accountant can significantly underreport the true epsilon value.

Several different notions of privacy were introduced for obtaining good upper bounds on the privacy curve of composition of DP algorithms such as Concentrated DP (CDP) [DR16, BS16], Truncated CDP [BDRS18] etc. None of these methods can approximate the privacy curve of compositions to arbitrary accuracy. The notion of ff-DP introduced by [DRS19], allows for a lossless composition theorem, but computing the privacy curve of composition seems computationally hard and they do not give any algorithms for doing it.

1 Our Contributions

The main contribution of this work is a new algorithm with an improved analysis for computing the privacy curve of the composition of a large number of DP algorithms.

Suppose M1,M2,,MkM_{1},M_{2},\dots,M_{k} are DP algorithms. Then the privacy curve δM(ε)\delta_{M}(\varepsilon) of adaptive composition M=M1M2MkM=M_{1}\circ M_{2}\circ\dots\circ M_{k} can be approximated in time

Suppose MM is a DP algorithm. Then the privacy curve δMk(ε)\delta_{M^{\circ k}}(\varepsilon) of MM (adaptively) composed with itself kk times can be approximated in time

Thus we improve the state-of-the-art by at least a factor of kk in running time. We also note that our algorithm improves the memory required by a factor of k.k. See Figure 1 for a comparison of our algorithm with that of [KJPH21]. Also note that RDP Accountant (equivalent to the Moments Accountant) significantly overestimates the true ε\varepsilon, while the GDP Accountant significantly underestimates the true ε.\varepsilon. In contrast, the upper and lower bounds provided by our algorithm lie very close to each other.

Our algorithm (also the prior work of [KJH+20]) proceeds by approximating the privacy loss random variables (PRVs) by truncating and discretizing them. We then use Fast Fourier Transform (FFT) to convolve the distributions efficiently. The main difference is in the approximation procedure and the error analysis. In the approximation procedure, we correct the approximation so that the expected value of the discretization matches with the expected value of the PRV.

DP Preliminaries

Therefore an algorithm M\mathcal{M} is (ε,δ)(\varepsilon,\delta)-DP iff δ(M(D)M(D))(ε)δ\delta\left(\mathcal{M}(D)||\mathcal{M}(D^{\prime})\right)(\varepsilon)\leq\delta for all neighboring databases D,D.D,D^{\prime}.

Let δ1δ(X1Y1)\delta_{1}\equiv\delta(X_{1}||Y_{1}) and δ2δ(X2Y2)\delta_{2}\equiv\delta(X_{2}||Y_{2}) be any two privacy curves. The composition of the privacy curves, denoted by δ1δ2\delta_{1}\otimes\delta_{2}, is defined as

where X1,X2X_{1},X_{2} are independently sampled and Y1,Y2Y_{1},Y_{2} are independently sampled.

Note that there can be many pairs of random variables which have the same privacy curve, but the above operation is well-defined. If δ(X1Y1)δ(X1Y1)\delta(X_{1}||Y_{1})\equiv\delta(X_{1}^{\prime}||Y_{1}^{\prime}) and δ(X2Y2)δ(X2Y2)\delta(X_{2}||Y_{2})\equiv\delta(X_{2}^{\prime}||Y_{2}^{\prime}), then it was shown by [DRS19] that

[DRS19] also show that \otimes is a commutative and associative operation.

Given two DP algorithms M1M_{1} and M2M_{2}, the adaptive composition (M2M1)(D)(M_{2}\circ M_{1})(D) is an algorithm which outputs (M1(D),M2(D,M1(D))(M_{1}(D),M_{2}(D,M_{1}(D)), i.e., M2M_{2} can look at the database DD and also the output of the previous algorithm M1(D)M_{1}(D). Adaptive composition of more than two algorithms is similarly defined. Suppose M1M_{1} has privacy curve δ1\delta_{1} and M2M_{2} has privacy curve δ2\delta_{2} (i.e., M2(,y)M_{2}(\cdot,y) is a DP algorithm with privacy curve δ2\delta_{2} for any fixed y.y.). The following composition theorem shows how to get the privacy curve of M2M1.M_{2}\circ M_{1}.

Let M1,M2,,MkM_{1},M_{2},\dots,M_{k} be DP algorithms with privacy curves given by δ1,δ2,,δk\delta_{1},\delta_{2},\dots,\delta_{k} respectively. The privacy curve of the adaptive composition MkMk1M1M_{k}\circ M_{k-1}\circ\dots\circ M_{1} is given by δ1δ2δk.\delta_{1}\otimes\delta_{2}\otimes\cdots\otimes\delta_{k}.

Privacy Loss Random Variables (PRVs)

The notion of privacy loss random variables (PRVs) is a unique way to assign a pair (X,Y)(X,Y) for any privacy curve δ\delta such that δδ(XY).\delta\equiv\delta(X||Y). PRVs allow us to compute composition of two algorithms via summing random variables (Theorem 3.5) (equivalently, convolving their distributions). Thus PRVs can be thought of as a reparametrization of privacy curves where composition becomes convolution. In this paper, we differ from the usual definition of PRVs given in [DR16, KJH+20], which are tied to a specific algorithm. Instead we think of them as a reparametrization of privacy curves and study them directly. This allows us to succinctly prove many useful properties of PRVs.

where X(t),Y(t)X(t),Y(t) are probability density functions of X,YX,Y respectively.

The following theorem shows that the PRVs for a privacy curve δ=δ(PQ)\delta=\delta(P||Q) are given by the log-likelihood random variables of P,Q.P,Q.

The following theorem provides a formula for computing the privacy curve δ\delta in terms of the PRVs and conversely a formula for PRVs in terms of the privacy curve. A similar statement appears in [SMM19, KJH+20].

The privacy curve δ\delta can be expressed in terms of PRVs (X,Y)(X,Y) as:

PRVs are useful in computing privacy curves because the composition of two privacy curves can be computed by adding the corresponding pairs of PRVs. A similar statement appears in [DR16].

Let δ1,δ2\delta_{1},\delta_{2} be two privacy curves with PRVs (X1,Y1)(X_{1},Y_{1}) and (X2,Y2)(X_{2},Y_{2}) respectively. Then the PRVs for δ1δ2=δ(X1,X2Y1,Y2)\delta_{1}\otimes\delta_{2}=\delta(X_{1},X_{2}||Y_{1},Y_{2}) are given by (X1+X2,Y1+Y2)(X_{1}+X_{2},Y_{1}+Y_{2}). In particular,

Let (X,Y)(X,Y) be the privacy random variables for δ(X1,X2Y1,Y2)\delta(X_{1},X_{2}||Y_{1},Y_{2}). By Theorem 3.2,

In Appendix B, we provide a proof of Theorems 3.2 and 3.3. We also discuss how to compute the PRVs for a subsampled mechanism given the PRVs for the original mechanism and give examples of PRVs for few standard mechanisms. These are used in our experiments to calculate the PRVs for DP-SGD.

Numerical composition of privacy curves

In this section, we present an efficient and numerically accurate method, ComposePRV (Algorithm 1), for composing privacy guarantees by utilizing the notion of PRVs.

The subroutine DiscretizePRV (Algorithm 2) is used to truncate and discretize PRVs. In this subroutine, we shift the discretized random variables such that it has the same mean as the original variables. This is one of main differences between our algorithm and the algorithm in [KJPH21, KH21]. We show that this significantly decreases the discretization error and allow us to use much coarser mesh h1/kh\approx 1/\sqrt{k} instead of h1/kh\approx 1/k.

For simplicity, throughout this paper, we will assume that the PRVs Y1,Y2,,YkY_{1},Y_{2},\dots,Y_{k} do not have any mass at \infty. This is with out loss of generality. Suppose Pr[Yi=]=δi\Pr[Y_{i}=\infty]=\delta_{i} for each i.i. Let Yi=YiYiY_{i}^{\prime}=Y_{i}|_{Y_{i}\neq\infty}. Then

Therefore we can use Algorithm 1 to approximate the distribution of Y1+Y2++YkY_{1}^{\prime}+Y_{2}^{\prime}+\dots+Y_{k}^{\prime}, and use it to approximate the distribution of Y1+Y2++YkY_{1}+Y_{2}+\dots+Y_{k}.

Error analysis

To analyze the discretization error, we introduce the notion of coupling approximation, a variant of Wasserstein distance. Intuitively, a good coupling approximation is a coupling where the two random variables are close to each other with high probability.

Given two random variables Y1,Y2Y_{1},Y_{2}, we write Y1Y2ηh|Y_{1}-Y_{2}|\leq_{\eta}h if there exists a coupling between Y1,Y2Y_{1},Y_{2} such that Pr[Y1Y2>h]η.\Pr[|Y_{1}-Y_{2}|>h]\leq\eta.

The following lemma shows that if we have a good coupling approximation Y~\widetilde{Y} to a PRV YY, then the privacy curves δY(ε)\delta_{Y}(\varepsilon) and δY~(ε)\delta_{\widetilde{Y}}(\varepsilon) should be close.

By Theorem 3.2, δY(ε)=Pr[Yε+Z]\delta_{Y}(\varepsilon)=\Pr[Y\geq\varepsilon+Z] and hence

Therefore the goal of our analysis is to show that the ComposePRV algorithm finds a good coupling approximation Y~\widetilde{Y} to Y=i=1kYi.Y=\sum_{i=1}^{k}Y_{i}. We first show that the DiscretizePRV algorithm computes a good coupling approximation to the PRVs and crucially, it preserves the expected value after truncation. Lemma C.5 shows that Y~YL0h|\widetilde{Y}-Y^{L}|\leq_{0}h where Y~\widetilde{Y} is the approximation of a PRV YY output by Algorithm 2 and YLY^{L} is the truncation of YY to [L,L].[-L,L].

We then use the following key lemma which shows that when we add independent coupling approximations (where expected values match), we get a much better coupling approximation than what the triangle inequality predicts.

Let Xi=YiY~iX_{i}=Y_{i}-\widetilde{Y}_{i} where (Yi,Y~i)(Y_{i},\widetilde{Y}_{i}) are coupled such that YiY~ih|Y_{i}-\widetilde{Y}_{i}|\leq h w.p. 11. Then Xi[h,h]X_{i}\in[-h,h] w.p. 11. Note that X1,X2,,XkX_{1},X_{2},\dots,X_{k} are independent of each other. By Hoeffding’s inequality,

if we set t=h2klog2ηt=h\sqrt{2k\log{\frac{2}{\eta}}}. ∎

This lemma shows that the error of kk times composition is around kh\sqrt{k}\cdot h and hence setting h1/kh\approx 1/\sqrt{k} gives small enough error. Next, we bound the domain size LL. Naively, the domain size LL should be of the order of k\sqrt{k} because YY is the sum of kk independent random variables with each bounded by a constant. In the appendix, we give a tighter tail bound of YY.

Let (X,Y)(X,Y) be the privacy random variables for a (ε,δ)(\varepsilon,\delta)-DP algorithm, then for any t0t\geq 0, we have

This shows that Pr[Yε+2]43δ\Pr[|Y|\geq\varepsilon+2]\leq\frac{4}{3}\delta and hence truncating the domain with L=2+εL=2+\varepsilon only introduces an additive δ\delta error in the privacy curve. Therefore, if the composition satisfies a good privacy guarantee (namely ε=O(1)\varepsilon=O(1) for small enough δ\delta), we can truncate the domain at L=Θ(1)L=\Theta(1). Together with the fact that mesh size is 1/k1/\sqrt{k}, this gives a O(k)O(\sqrt{k})-time algorithm for computing the privacy curve when we compose the same mechanism with itself kk times. The following theorem gives a formal statement of the error bounds of our algorithm, it is proved in Appendix 5.

Let Y~\widetilde{Y} be the approximation of Y=i=1kYiY=\sum_{i=1}^{k}Y_{i} produced by ComposePRV algorithm with mesh size

Furthermore, our algorithm takes O(bLhlog(Lh))O\left(b\frac{L}{h}\log\left(\frac{L}{h}\right)\right) time where bb is the number of distinct algorithms among M1,M2,,Mk\mathcal{M}_{1},\mathcal{M}_{2},\dots,\mathcal{M}_{k}.

A simple way to set LL such that the condition (7) holds is by choosing an LL such that:

where εA(δ)\varepsilon_{\mathcal{A}}(\delta) is the inverse of δA(ε)\delta_{\mathcal{A}}(\varepsilon). To set the value of LL, we do not need the exact value of εM\varepsilon_{\mathcal{M}} (or εMi\varepsilon_{\mathcal{M}_{i}}). We only need an upper bound on εM\varepsilon_{\mathcal{M}}, which can often by obtained by using the RDP Accountant or any other method to derive upper bounds on privacy.

Experiments

In this section, we demonstrate the utility of our composition method by computing the privacy curves for the DP-SGD algorithm which is one of the most important algorithms in differential privacy.

The DP-SGD algorithm [ACG+16] is a variant of stochastic gradient descent with kk steps. In each step, the algorithm selects a pp fraction of training examples uniformly at random. The algorithm adds a Gaussian vector with variance σ2\propto\sigma^{2} to the clipped gradient of the selected batch. Then it performs a gradient step (or any other iterative methods) using the noisy gradient computed. The privacy loss of DP-SGD involves composing the privacy curve of each iteration with itself kk times. The PRVs for each iteration have a closed form and depend only p,σp,\sigma (see Appendix). Our algorithms use this closed form of PRVs.

See Figure 1(b) for the comparison between our algorithm and the GDP and RDP Accountant. Our method provides a lower and upper bound of the privacy curve according to (8). In Figure 1(a), we compare our algorithm with [KJPH21] (implemented in [KP21]). Under the same mesh size, our algorithm computes a much closer upper and lower bound.

We validate our program for the case p=1p=1. When p=1p=1, we have an exact formula for

Note that our error analysis in Section 5 ignores floating point errors. This is because they are negligible compared to the discretization and truncation errors we analyzed in Section 5 for the range of δ\delta we are interested in. Our implementation uses ong } {\verb doube floating point format which is platform dependent, however, it guarantees a precision at least as good as double precision which has a resolution of 101510^{-15}. Computations involving δ\delta of these orders of magnitude suffer from floating point inaccuracies. Our implementation therefore only allows δ\delta values which are greater than 101010^{-10} which sufficies for practical use cases. See Appendix A for more details.

1 Comparison with [KJPH21]

In this section, we provide more results demonstrating the practical use of our algorithm. We compare runtimes of our algorithm with [KJPH21], which is the state-of-the-art, for typical values of privacy parameters (σ=0.8\sigma=0.8, p=4×103p=4\times 10^{-3}, ε=1.5\varepsilon=1.5).

See Figure 3 for the effect of the number of discretisation points nn on the accuracy of δ\delta. Our algorithm requires about a few orders of magnitude smaller number of discretization points to converge compared to the algorithm of [KJPH21]. A similar picture can be seen in Figure 4. While for a small number of compositions, the algorithm of [KJPH21] gives reasonable estimates, for a large number of compositions, their error bounds worsen quickly.

We note that runtimes are directly proportional to the memory required by the algorithms and so a separate memory analysis is not required; the runtime and memory are dominated by the number of points in the discretization of PRV. All experiments are performed on a Intel Xeon W-2155 CPU with 3.30GHz with 128GB of memory.

In order to compare runtimes, we align the accuracy of both FFT algorithms. We find sets of numerical parameters (number of discretization bins and domain length) such that both algorithms give similarly accurate bounds and verify it visually (see Figure 9 (b)). Figure 9 illustrates the runtimes for varying numbers of DPSGD steps. We observe a significant reduction in the runtime using our algorithms.

Acknowledgements

We would like to thank Janardhan Kulkarni and Sergey Yekhanin for several useful discussions and encouraging us to work on this problem. L.W. would like to thank Daniel Jones and Victor Rühle for fruitful discussions and helpful guidance.

References

Appendix A Effect of floating point arithmetic

In this section, we demonstrate the effect of floating point inaccuracies on the computed privacy parameters. Figure 6 compares lower and upper bounds of the privacy curve with the analytical solution for small values of δ\delta. As mentioned in section 6, we use a floating point representation with a resolution of at least 101510^{-15}. The number of discretization points in this examples are on the order of 10410^{4}. Consequently, we expect floating point inaccuracies to become dominant for values on the order of 101110^{-11}. This can be also seen in the illustration, where the lower and upper bound fail to produce meaningful results for δ<2×1011\delta<2\times 10^{-11}.

Appendix B Privacy Loss Random Variables

In this section, we continue the discussion on privacy random variables in Section 3. First, we give the proof of the formula for PRVs of δ(PQ)\delta(P||Q) and the formula for a privacy curve given its PRVs (Theorem 3.2).

We will now prove that δ(XY)=δ(PQ).\delta(X||Y)=\delta(P||Q). We have

Therefore Pr[QSε]=Pr[Y>ε]\Pr[Q\in S_{\varepsilon}]=\Pr[Y>\varepsilon] and Pr[PSε]=Pr[X>ε]\Pr[P\in S_{\varepsilon}]=\Pr[X>\varepsilon]. To complete the proof, note that

Since the PDFs of PRVs (X,Y)(X,Y) satisfy the relation Y(t)=etX(t)Y(t)=e^{t}X(t), we can rewrite the equation 5 in terms of just YY or just XX.

To get the other form for δ(ε)\delta(\varepsilon), we use the integration by parts formula.

We now prove the converse relation by differentiating the expression for δ(ε)\delta(\varepsilon) twice. We have:

In this section, we state the PRVs for a few standard mechanisms.

The PRVs for δ(N(μ,1)N(0,1))\delta(\mathcal{N}(\mu,1)||\mathcal{N}(0,1)) are:

Let P=N(μ,1)P=\mathcal{N}(\mu,1) and Q=N(0,1)Q=\mathcal{N}(0,1). By Theorem 3.2,

A similar calculation shows that X=N(μ22,μ2)X=\mathcal{N}\left(-\frac{\mu^{2}}{2},\mu^{2}\right)

The PRVs for the privacy curve δ(Lap(μ,1)Lap(0,1))\delta\left(\mathsf{Lap}\left(\mu,1\right)||\mathsf{Lap}\left(0,1\right)\right) are:

Let P=Lap(μ,1)P=\mathsf{Lap}(\mu,1) and Q=Lap(0,1)Q=\mathsf{Lap}(0,1). By Theorem 3.2,

A similar calculation shows that X=ZZμX=|Z|-|Z-\mu| where ZLap(0,1).Z\sim\mathsf{Lap}(0,1).

The PRVs for the privacy curve of a (ε,δ)(\varepsilon,\delta)-DP algorithm are

Morever X=YX=-Y, therefore the privacy curve δ(XY)\delta(X||Y) is symmetric by Proposition C.9, i.e., δ(XY)=δ(YX)\delta(X||Y)=\delta(Y||X). These conditions together imply that X,YX,Y are PRVs for the (ε,δ)(\varepsilon,\delta)-DP curve. ∎

Note that in the all the above examples, we have X=YX=-Y as the privacy curves are symmetric.

B.2 Subsampling

In this section, we calculate the PRVs for a subsampled mechanism given the PRVs for the original mechanism. Given two random variables P,QP,Q and a sampling probability pp\in, pP+(1p)Qp\cdot P+(1-p)\cdot Q denotes the mixture where we sample PP w.p. pp and QQ w.p. 1p.1-p.

Let (X,Y)(X,Y) be the PRVs for a privacy curve δ(PQ)\delta(P||Q). Let (Xp,Yp)(X_{p},Y_{p}) be the PRVs for δp=δ(P pP+(1p)Q)\delta_{p}=\delta(P||\ p\cdot P+(1-p)\cdot Q). Then

The CDFs of XpX_{p} and YpY_{p} are given by:

Appendix C Missing Proofs in Error Analysis

Here we collect some useful properties of coupling approximations. The following lemma shows that the coupling approximations satisfy a triangle inequality.

Suppose X,Y,ZX,Y,Z are random variables such that XYη1h1|X-Y|\leq_{\eta_{1}}h_{1} and YZη2h2|Y-Z|\leq_{\eta_{2}}h_{2}. Then XZη1+η2h1+h2.|X-Z|\leq_{\eta_{1}+\eta_{2}}h_{1}+h_{2}.

There exists couplings (X,Y)(X,Y) and (Y,Z)(Y,Z) such that

From these two couplings, we can construct a coupling between (X,Z)(X,Z): sample XX, sample YY from YXY|X (given by coupling (X,Y)(X,Y)) and finally sample ZZ from ZYZ|Y (given by coupling (Y,Z)(Y,Z)). Therefore for this coupling, we have:

The following lemma shows that small total variation distance implies good coupling approximation.

If the total variation distance dTV(X,Y)ηd_{TV}(X,Y)\leq\eta, then XYη0|X-Y|\leq_{\eta}0.

It is well known that for any two random variables X,YX,Y, there exists a coupling such that dTV(X,Y)=Pr[XY]d_{TV}(X,Y)=\Pr[X\neq Y]. This immediately implies what we want. ∎

C.2 Bounding the error using tail bounds of PRVs

The goal of this section is to bound the error of ComposePRV in terms of the tail bounds of the underlying PRVs.

Let Y1,Y2,,YkY_{1},Y_{2},\dots,Y_{k} be PRVs and let Y~\widetilde{Y} be the approximation produced by the ComposePRV algorithm (Algorithm 1) for Y=i=1kYiY=\sum_{i=1}^{k}Y_{i} with truncation parameter LL and mesh size

We can directly bound Pr[i=1kY~iL]\Pr\left[\left|\sum_{i=1}^{k}\widetilde{Y}_{i}\right|\geq L\right] using moment generating functions as

The following key lemma shows that the DiscretizePRV algorithm (Algorithm 2) produces a good coupling approximation to the PRV and preserves the mean.

We will now construct the coupling between (YL,Y~)(Y^{L},\widetilde{Y}) such that YL(Y~μ)h2|Y^{L}-(\widetilde{Y}-\mu)|\leq\frac{h}{2}. The coupling is as follows: First sample yYLy\sim Y^{L}. Suppose y(ihh2,ih+h2]y\in(ih-\frac{h}{2},ih+\frac{h}{2}] for some integer ii such that nin-n\leq i\leq n, then return y~=μ+ih\widetilde{y}=\mu+ih. Clearly, the distribution of y~\widetilde{y} matches with Y~\widetilde{Y} and y(y~μ)=yihh2|y-(\widetilde{y}-\mu)|=|y-ih|\leq\frac{h}{2}.

Since our error bound on Y~\widetilde{Y} is slightly different from the assumption in Lemma 5.3, we need the following generalization using the same proof.

In the algorithm, we only calculate the distribution of Y1Y2YkY_{1}\oplus Y_{2}\oplus\dots\oplus Y_{k} instead of Y1+Y2++YkY_{1}+Y_{2}+\dots+Y_{k}. The following simple lemma shows that this is still a good approximation as long as iYi\sum_{i}Y_{i} stays within [L,L][-L,L] with high probability.

Let Y1,Y2,,YkY_{1},Y_{2},\dots,Y_{k} be random variables supported on (L,L](-L,L]. Then

Combining all the above lemmas, we get the following corollary.

Let Y1,Y2,,YkY_{1},Y_{2},\dots,Y_{k} be random variables supported on and let Y~i\widetilde{Y}_{i} be the discretization of YiY_{i} produced by DiscretizePRV algorithm with mesh size h=h0k2log2η0h=\frac{h_{0}}{\sqrt{\frac{k}{2}\log\frac{2}{\eta_{0}}}} and truncation parameter LL. Then

Let Y^{L}\equiv Y_{i}\big{|}_{|Y_{i}|\leq L} be the truncation of Yi.Y_{i}. By Lemma C.5, YiL(Y~iμi)0h2|Y_{i}^{L}-(\widetilde{Y}_{i}-\mu_{i})|\leq_{0}\frac{h}{2} for some μi\mu_{i} and YiLYiξi0|Y_{i}^{L}-Y_{i}|_{\xi_{i}}\leq 0 where ξi=Pr[YiL].\xi_{i}=\Pr[|Y_{i}|\geq L]. Now applying the triangle inequality for coupling approximations (Lemma C.1), we have

where η1=iξi=iPr[YiL].\eta_{1}=\sum_{i}\xi_{i}=\sum_{i}\Pr[|Y_{i}|\geq L]. By Lemma C.6, we have

where η2=Pr[i=1kY~iL].\eta_{2}=\Pr\left[\left|\sum_{i=1}^{k}\widetilde{Y}_{i}\right|\geq L\right]. Finally applying triangle inequality (Lemma C.1) once again, we get:

where η=η0+η1+η2\eta=\eta_{0}+\eta_{1}+\eta_{2}. We can bound Pr[i=1kY~iL]\Pr\left[\left|\sum_{i=1}^{k}\widetilde{Y}_{i}\right|\geq L\right] as:

C.3 Tail Bound for PRVs

To finish the proof of our main theorem (Theorem 5.5, we need a tail bound on PRVs in terms of their privacy curves. First, we need a lemma relating the PRVs of a privacy curve δ(PQ)\delta(P||Q) with the PRVs of δ(QP)\delta(Q||P).

Let (X,Y)(X,Y) be the PRVs for a privacy curve δ(PQ)\delta(P||Q). Then the PRVs for the privacy curve δ(QP)\delta(Q||P) are (Y,X).(-Y,-X).

Let (X~,Y~)(\widetilde{X},\widetilde{Y}) be the PRVs for δ(QP)\delta(Q||P). We know that δ(PQ)=δ(XY)\delta(P||Q)=\delta(X||Y). So δ(QP)=δ(YX).\delta(Q||P)=\delta(Y||X). Then by Theorem 3.2,

Now, we show our tail bound, which shows the PRVs (X,Y)(X,Y) for a (ε,δ)(\varepsilon,\delta)-DP algorithm satisfies roughly that Pr(Yε+2)2δ\Pr(|Y|\geq\varepsilon+2)\leq 2\delta.

We have δ(XY)fε,δ\delta(X||Y)\leq f_{\varepsilon,\delta} and δ(YX)fε,δ\delta(Y||X)\leq f_{\varepsilon,\delta} where fε,δf_{\varepsilon,\delta} is the privacy curve of a (ε,δ)(\varepsilon,\delta)-DP algorithm. By Theorem 3.2, we have

By Proposition C.9, the PRVs for δ(YX)\delta(Y||X) are (Y,X)(-Y,-X). Therefore by a similar argument, we have

C.4 Proof of Theorem 5.5

Now, we can prove our main theorem. See 5.5

For the runtime, we note that the bottleneck of our algorithm is to compute the convolution, which can be done using FFT. In total, we need to compute b+1b+1 many FFT for bb distinct algorithms, one for each for computing the Fourier transform and one of computing the inverse Fourier transform. Since the length of the array for the FFT is bounded by O(L/h)O(L/h), this costs O(bL/hlog(L/h))O(bL/h\log(L/h)) in total.