Rényi Differential Privacy of the Sampled Gaussian Mechanism

Ilya Mironov, Kunal Talwar, Li Zhang

Introduction and Definitions

The notion of differential privacy grounds a guarantee of individual privacy for input of a statistical procedure in non-determinism of the procedure’s output. The uncertainty, or randomness, can be achieved either explicitly, by injecting noise, or implicitly, by leveraging randomness intrinsic to the mechanism or its input. Developing a general framework that accounts for all sources of output stochasticity and distills them into a guarantee of differential privacy remains a significant open problem.

Following introduction of (pure) differential privacy , many of its relaxations, which we recall shortly, were motivated by the need to capture properties of the Gaussian additive noise and SGM.

The notion of approximate differential privacy, which includes an additive δ\delta term, appeared in the work by Dwork et al. in order to support analysis of the Gaussian noise mechanism. Concentrated Differential Privacy (CDP) and its reformulation, zero-CDP, due to Bun and Steinke were developed to refine composition theorems using the Gaussian mechanism as their main motivating examples.

Another family of definitions and analytical techniques emerged as tools for handling the Sampled Gaussian mechanism. To track privacy loss budget over multiple applications of SGM, Abadi et al. developed a moments accountant, implemented a numerically stable and efficient algorithm for computing it, and analyzed the accountant’s asymptotic properties. A relaxation of CDP well-suited for analysis of SGM, called truncated CDP (tCDP), was introduced by Bun et al. . We compare these closely related notions in Section 4.

In this paper we revisit the moments accountant due to Abadi et al., restate it using the notion of Rényi differential privacy (RDP) , relax certain assumptions, and strengthen upper bounds. An open-source implementation of the RDP accountant for SGM is available from https://github.com/tensorflow/privacy.

We recall the definitions of Rényi divergence, Rényi differential privacy and the Sampled Gaussian mechanism.

Let PP and QQ be two distributions on X\mathcal{X} defined over the same probability space, and let pp and qq be their respective densities. The Rényi divergence of a finite order α1\alpha\neq 1 between PP and QQ is defined as

Rényi divergence at orders α=1,\alpha=1,\infty are defined by continuity.

We refer the reader to van Erven and Harremoës for an introduction to the theory of Rényi divergence, including proof of its continuity and many other useful properties.

We say that a randomized mechanism M ⁣:SR\mathcal{M}\colon\mathcal{S}\to\mathcal{R} satisfies (α,ε)(\alpha,\varepsilon)-Rényi differential privacy (RDP) if for any two adjacent inputs S,SSS,S^{\prime}\in\mathcal{S} it holds that

The notion of adjacency between input datasets is domain-specific and is usually taken to mean that the inputs differ in contributions of a single individual. In this work, we will use this definition and call two datasets S,SS,S^{\prime} to be adjacent if S=S{x}S^{\prime}=S\cup\{x\} for some xx (or vice versa).

The SGM has been studied in several incomparable settings. Broadly speaking, there are three approaches for deriving privacy bounds on the SGM: (1) asymptotic analyses that target tight bounds for q0q\to 0; (2) approaches that lead to numerically accurate estimates; and (3) a closed-form analysis that captures the right dependency on the order α\alpha. These results, along with our contributions are summarized in Table 1.

Reducing to a simpler case

We first reduce the problem of proving the RDP bound for the Sampled Gaussian mechanism to a particularly simple special case of a mixture of single-dimensional Gaussians.

under the assumption f(S)f(S)21\|f(S)-f(S^{\prime})\|_{2}\leq 1 for any adjacent S,SSS,S^{\prime}\in\mathcal{S}.

Let TT denote a set-valued random variable defined by taking a random subset of S\mathcal{S}, where each element of SS is independently placed in TT with probability qq. Conditioned on TT, the mechanism M(S)\mathcal{M}(S) samples from a Gaussian with mean f(T)f(T). Thus

where the sum here denotes mixing of the distributions with the weights pTp_{T}. Similarly,

Rényi divergence is quasi-convex , allowing us to bound

where we have used the translation invariance of Rényi divergence. Since the covariances are symmetric, we can, by applying a rotation, assume that f(T{x})f(T)=cTe1f(T\cup\{x\})-f(T)=c_{T}\mathbf{e}_{1} for some constant cT1c_{T}\leq 1. The two distributions at hand are then both product distributions that are identical in all coordinates except the first. By additivity of Rényi divergence for product distributions, we have that

For any c1c\leq 1, the noise N(0,(σ/c)2)\mathcal{N}(0,(\sigma/c)^{2}) can be obtained from N(0,σ2)\mathcal{N}(0,\sigma^{2}) by adding noise from N(0,(σ/c)2σ2)\mathcal{N}(0,(\sigma/c)^{2}-\sigma^{2}), and the same operation allows us to to obtain (1q)N(0,(σ/c)2)+qN(1,(σ/c)2)(1-q)\mathcal{N}(0,(\sigma/c)^{2})+q\mathcal{N}(1,(\sigma/c)^{2}) from (1q)N(0,σ2)+qN(1,σ2)(1-q)\mathcal{N}(0,\sigma^{2})+q\mathcal{N}(1,\sigma^{2}). Thus by the data processing inequality for Rényi divergence, we conclude

In the next section, we bound these simpler one-dimensional Rényi divergences between mixtures of Gaussian distributions.

RDP Analysis of Single-Dimensional SGM

Let μ0\mu_{0} denote the pdf of N(0,σ2)\mathcal{N}(0,\sigma^{2}) and let μ1\mu_{1} denote the pdf of N(1,σ2)\mathcal{N}(1,\sigma^{2}). Let

We introduce the following notation used through the rest of this section. Define

In our first result (Section 3.1), we demonstrate that AαBαA_{\alpha}\geq B_{\alpha}. In fact, we prove a more general statement about centrally-symmetric distributions, from which the case of μ0=N(0,σ2)\mu_{0}=\mathcal{N}(0,\sigma^{2}) and μ1=N(1,σ2)\mu_{1}=\mathcal{N}(1,\sigma^{2}) follows as a corollary.

To upper bound AαA_{\alpha} we pursue two complementary approaches. Section 3.2 derives a closed-form bound that is valid and reasonably tight within a wide range of parameters. Section 3.3 describes a numerically stable computational procedure for computing AαA_{\alpha} exactly (to within any desired precision).

Looking ahead, AαA_{\alpha} admits decomposition into a finite sum or a convergent series. By comparison, manipulating BαB_{\alpha} is a similar manner is considerably more difficult, since we have a composite term μ\mu in the denominator. Fortunately, it is not necessary as we demonstrate that BαAαB_{\alpha}\leq A_{\alpha}. In fact, we prove a more general statement, which may be of independent interest.

Let PP and QQ be two differentiable distributions on X\mathcal{X} such that there exists a differentiable mapping ν ⁣:XX\nu\colon\mathcal{X}\mapsto\mathcal{X} satisfying ν(ν(x))=x\nu(\nu(x))=x and P(x)=Q(ν(x))P(x)=Q(\nu(x)). Then the following holds for all α1\alpha\geq 1 and qq\in:

We will repeatedly use the fact that Q(x)=Q(ν(ν(x))=P(ν(x))Q(x)=Q(\nu(\nu(x))=P(\nu(x)). Furthermore, since the inverse of ν\nu is ν\nu, it is continuously differentiable and its Jacobian satisfies detJν=±1\det\mathbf{J}_{\nu}=\pm 1.

Let Pq=Δ(1q)P+qQP_{q}\stackrel{{\scriptstyle\Delta}}{{=}}(1-q)P+qQ and Qq=Δ(1q)Q+qPQ_{q}\stackrel{{\scriptstyle\Delta}}{{=}}(1-q)Q+qP. Then, substituting x=ν(y)x=\nu(y), we have

We proceed by arguing a stronger statement: the integrand of the right-hand side of (1) dominates the integrand of (2) pointwise. In other words, we prove the following lemma, from which the theorem claim follows:

For all xXx\in\mathcal{X}, and any α>1\alpha>1, qq\in:

where Pq(x)=(1q)P(x)+qQ(x)P_{q}(x)=(1-q)P(x)+qQ(x) and Qq(x)=(1q)Q(x)+qP(x)Q_{q}(x)=(1-q)Q(x)+qP(x).

Let u=ΔP(x)u\stackrel{{\scriptstyle\Delta}}{{=}}P(x) and v=ΔQ(x)v\stackrel{{\scriptstyle\Delta}}{{=}}Q(x). Assume wlog uvu\geq v (the claim is symmetric with respect to PP and QQ). Then the two sides become, respectively,

Dividing by vv and letting y=Δ(1q)+quvy\stackrel{{\scriptstyle\Delta}}{{=}}(1-q)+q\frac{u}{v} and z=Δ(1q)+qvuz\stackrel{{\scriptstyle\Delta}}{{=}}(1-q)+q\frac{v}{u}, we need to compare

subject to y1/z1zy\geq 1/z\geq 1\geq z. (The bound y1y\geq 1 follows from uvu\geq v and y1/zy\geq 1/z from yz=(1q)2+q(1q)(uv+vu)+q2(1q)2+2q(1q)+q2=1yz=(1-q)^{2}+q(1-q)(\frac{u}{v}+\frac{v}{u})+q^{2}\geq(1-q)^{2}+2q(1-q)+q^{2}=1.)

Collecting the yy terms on the left and the zz terms on the right we have

The left-hand side dominates, since the two expressions are equal for y=1/zy=1/z and the left expression is monotonically increasing in yy over [1,)[1,\infty):

which holds due to (yαyα)/α(yα1y1α)/(α1)(y^{\alpha}-y^{-\alpha})/\alpha\geq(y^{\alpha-1}-y^{1-\alpha})/(\alpha-1), in turn implied by monotonicity of sinh(αlny)/α\sinh(\alpha\ln y)/\alpha in α\alpha for y1y\geq 1. ∎

This concludes the proofs of the lemma and of the theorem. ∎

AαBαA_{\alpha}\geq B_{\alpha} for any α1\alpha\geq 1.

Let ν(x)=Δ1x\nu(x)\stackrel{{\scriptstyle\Delta}}{{=}}1-x. Then, the pdf of μ0\mu_{0} is exp(x2/2σ2)=exp((ν(x)1)2/2σ2)\propto\exp(-x^{2}/2\sigma^{2})=\exp(-(\nu(x)-1)^{2}/2\sigma^{2}) as required. The claim follows. ∎

Theorem 5 holds for any additive noise whose distribution is centrally symmetric, which also includes Laplace, sinh-normal , their discretized and multi-dimensional variants.

2 Closed-Form Bound

We write AαA_{\alpha} as an integral over the real line, break it into two parts (at z0z_{0} to be chosen shortly) and bound them separately as follows:

We define z0=Δ12+σ2ln(1+1q(α1))z_{0}\stackrel{{\scriptstyle\Delta}}{{=}}\frac{1}{2}+\sigma^{2}\ln\left(1+\frac{1}{q(\alpha-1)}\right), chosen to satisfy (1q)+qμ1(z0)μ0(z0)=α/(α1)(1-q)+q\frac{\mu_{1}(z_{0})}{\mu_{0}(z_{0})}=\alpha/(\alpha-1). For notational convenience we also introduce r0=Δμ1(z0)μ0(z0)=1+1q(α1)r_{0}\stackrel{{\scriptstyle\Delta}}{{=}}\frac{\mu_{1}(z_{0})}{\mu_{0}(z_{0})}=1+\frac{1}{q(\alpha-1)}. Note that z0>12z_{0}>\frac{1}{2} and r0>1r_{0}>1. We repeatedly use the facts that μ0(z)=μ1(1z)\mu_{0}(z)=\mu_{1}(1-z) everywhere and the ratio μ0(z)/μ1(z)\mu_{0}(z)/\mu_{1}(z) is monotonically decreasing in zz. It follows that for all zz0z\leq z_{0} the ratio μ1(z)/μ0(z)r0\mu_{1}(z)/\mu_{0}(z)\leq r_{0} and for z1z0z\geq 1-z_{0} the ratio μ0(z)/μ1(z)r0\mu_{0}(z)/\mu_{1}(z)\leq r_{0}. In particular, for z[1z0,z0]z\in[1-z_{0},z_{0}] the ratios μ1(z)/μ0(z)\mu_{1}(z)/\mu_{0}(z) and μ1(z)/μ0(z)\mu_{1}(z)/\mu_{0}(z) are confined to [1/r0,r0][1/r_{0},r_{0}].

For any α1\alpha\geq 1, qq\in and positive x,yx,y such that 1/r0x/yr01/r_{0}\leq x/y\leq r_{0} where r0=1+1q(α1)r_{0}=1+\frac{1}{q(\alpha-1)}:

Recall the following Lemma 15 in Bun et al. :

Let r=x/y[1/r0,r0]r=x/y\in[1/r_{0},r_{0}]. By setting w=Δq(1/r1)w\stackrel{{\scriptstyle\Delta}}{{=}}q(1/r-1) and w=Δq(r1)w\stackrel{{\scriptstyle\Delta}}{{=}}q(r-1) we obtain, respectively,

The claim follows by simple addition and multiplication of both sides by yy. ∎

A useful fact that facilitates application of Lemma 8 is that for positive xx and yy

It implies that all terms in the claim of the lemma are non-negative.

Similarly to the argument in the previous section, we “double over” the integral, by using the symmetry between μ0\mu_{0} and μ1\mu_{1}:

Note that (1q)+qμ1(z)μ0(z)<1(1-q)+q\frac{\mu_{1}(z)}{\mu_{0}(z)}<1 for z1z0z\leq 1-z_{0} and (1q)+qμ0(z)μ1(z)<1(1-q)+q\frac{\mu_{0}(z)}{\mu_{1}(z)}<1 for zz0z\geq z_{0}. Furthermore, 1/r0μ1(z)μ0(z)r01/r_{0}\leq\frac{\mu_{1}(z)}{\mu_{0}(z)}\leq r_{0} for z[1z0,z0]z\in[1-z_{0},z_{0}].

Applying Lemma 8 and inequality (3), we bound Aα(1)A_{\alpha}^{(1)}:

as claimed (we use the identities μ0(z)2μ1(z)\mboxdz=μ1(z)2μ0(z)\mboxdz=exp(1/σ2)\int_{-\infty}^{\infty}\frac{\mu_{0}(z)^{2}}{\mu_{1}(z)}\,\mbox{d}z=\int_{-\infty}^{\infty}\frac{\mu_{1}(z)^{2}}{\mu_{0}(z)}\,\mbox{d}z=\exp(1/\sigma^{2}) elaborated in Section 3.3). ∎

Lemma 9 is unconditional, mildly behaved, and it yields an asymptotically right dependency of the bound on the parameters qq, α\alpha, and σ\sigma (after taking the logarithm, it is non-tight by a factor close to 2). We next consider the integral to the right of z0z_{0} that is responsible for the conditions on α\alpha.

If q15q\leq\frac{1}{5}, σ4\sigma\geq 4, and α\alpha satisfy

where L=Δln(1+1q(α1))L\stackrel{{\scriptstyle\Delta}}{{=}}\ln\left(1+\frac{1}{q(\alpha-1)}\right), then

We observe (similarly to [2, Lemma 16]) that for zz0z\geq z_{0}, it holds that

The inequality (6) follows from the Gaussian tail bound tμ0(x)\mboxdx<exp(t2/(2σ2))\int_{t}^{\infty}\mu_{0}(x)\,\mbox{d}x<\exp(-t^{2}/(2\sigma^{2})) for t0t\geq 0. To apply the bound it is necessary to check that the lower limit of the integral t=z0α0t=z_{0}-\alpha\geq 0. Recall that z0=12+σ2ln(1+1q(α1))=12+σ2Lz_{0}=\frac{1}{2}+\sigma^{2}\ln\left(1+\frac{1}{q(\alpha-1)}\right)=\frac{1}{2}+\sigma^{2}L. Together, the definition of z0z_{0} and condition (4) imply that

which, given that σ>1\sigma>1, guarantees that z0>2αz_{0}>2\alpha.

It is convenient to take the logarithm of both sides of Eq. (7) resulting in the following:

We will argue that the right-hand side of the above is less than ln(0.9q2α(α1)/σ2)\ln(0.9\cdot q^{2}\alpha(\alpha-1)/\sigma^{2}). Subtracting the two quantities, we need to demonstrate that

The rest of the proof proceeds by case analysis.

Observing that lnq+ln(α1)+L=ln(q(α1)+1)>0\ln q+\ln(\alpha-1)+L=\ln(q(\alpha-1)+1)>0, it suffices to verify that

From q<1/5q<1/5 and 1<α<2<1/q1<\alpha<2<1/q we conclude that the first summand is positive. Since L>ln(1+1q)ln6L>\ln(1+\frac{1}{q})\geq\ln 6 and σ4\sigma\geq 4, the second summand is larger than 1ln(0.9)1-\ln(0.9), by direct computation.

Case IIa: α≥2𝛼2\alpha\geq 2 and q​(α−1)<1/3𝑞𝛼113q(\alpha-1)<1/3

The inequality holds since the first term is non-negative (qα=q(α1)αα1<2/3q\alpha=q(\alpha-1)\cdot\frac{\alpha}{\alpha-1}<2/3), ln(α1)lnαln2\ln(\alpha-1)-\ln\alpha\geq-\ln 2, L>ln4L>\ln 4, and σ4\sigma\geq 4.

Case IIb: α≥2𝛼2\alpha\geq 2 and q​(α−1)≥1/3𝑞𝛼113q(\alpha-1)\geq 1/3

Continue Eq. (7), taking the logarithm of both sides:

The last inequality follows from Eq. (5) and by the fact that L+ln(qα)=ln(qα+αα1)>0L+\ln(q\alpha)=\ln(q\alpha+\frac{\alpha}{\alpha-1})>0.

Our main theorem holds under the conditions of Lemmas 9 and 10:

If q15q\leq\frac{1}{5}, σ4\sigma\geq 4, and α\alpha satisfy

Recall that to state an RDP guarantee on SGM it is sufficient to bound AαA_{\alpha}. Applying the results of Lemmas 9 and 10,

(The last inequality due to exp(1/σ2)11.1/σ2\exp(1/\sigma^{2})-1\leq 1.1/\sigma^{2} for σ4\sigma\geq 4.)

Finally, since ln(1+x)<x\ln(1+x)<x for x0x\geq 0, we conclude that SGM satisfies (α,ε)(\alpha,\varepsilon)-RDP where ε=1α1lnAα2q2α/σ2\varepsilon=\frac{1}{\alpha-1}\ln A_{\alpha}\leq 2q^{2}\alpha/\sigma^{2} as needed. ∎

3 Numerically Stable Computation

Naïvely, AαA_{\alpha} can be approximated as an integral using standard numerical libraries. It however leads to the problem of computing an integral over the whole real line of a quantity that can vary a lot. We sidestep this difficulty by expressing AαA_{\alpha} as a finite sum (or a convergent series), swap the order of the integration and summation operators, and compute the integrals analytically.

Applying the binomial expansion to (11), we have

Thus it suffices to compute for k{0,,α}k\in\{0,\dots,\alpha\} the expectation

Case II. Fractional α𝛼\alpha.

To rewrite (11) as a convergent series, consider two cases depending on how 1q1-q compares with qμ1(z)/μ0(z)q\mu_{1}(z)/\mu_{0}(z). The inflection point is z1z_{1} where the two quantities are equal:

Analogously to the case of integer α\alpha, we compute the expectations of both series under zμ0z\sim\mu_{0}, where the integrals are taken over the half lines (,z1](-\infty,z_{1}] and [z1,+)[z_{1},+\infty):

The computation done in the privacy accountant proceeds by plugging in these quantities into the series (12), and carrying out the summation to convergence.

Upper bounds (Theorem 11) and exact computations are compared in Figure 1.

Discussion

The notions of CDP, zCDP, Moments accountant and Rényi DP are closely related in that they control the moments of the privacy loss random variable. In this section, we clarify their differences.

For two adjacent datasets SS and SS^{\prime}, the privacy loss of a mechanism M\mathcal{M} at an outcome zz is defined as

For continuous output spaces, the probability above is replaced by the probability density function. The various definitions deal with moment generating function of the privacy loss random variable. Define

Recalling the definition of Rényi divergence between distributions:

The following are equivalent definitions of CDP, zCDP, tCDP and RDP:

A mechanism M\mathcal{M} satisfies (μ,τ)(\mu,\tau)-CDP if for all adjacent datasets S,SS,S^{\prime} and for all α1\alpha\geq 1

A mechanism M\mathcal{M} satisfies (ξ,ρ)(\xi,\rho)-zCDP if for all adjacent datasets S,SS,S^{\prime},

A shorthand for (0,ρ)(0,\rho)-zCDP is ρ\rho-zCDP. Thus a mechanism M\mathcal{M} satisfies ρ\rho-zCDP if for all adjacent datasets S,SS,S^{\prime},

A mechanism M\mathcal{M} satisfies (ρ,ω)(\rho,\omega)-tCDP if for all adjacent datasets S,SS,S^{\prime},

A mechanism M\mathcal{M} satisfies (α,ε)(\alpha,\varepsilon)-RDP if for all adjacent datasets S,SS,S^{\prime},

While here we have stated the definitions in terms of MαM_{\alpha}, using (14), one can translate these to bounds on the Rényi divergence; in some cases that leads to cleaner looking definitions. Going down the list, the definitions get less restrictive and have more parameters. While zCDP suffices for many purposes, SGM is an important mechanism that does not satisfy zCDP, but satisfies RDP for a suitable range of α\alpha.

While the above were proposed as standalone privacy definitions, the moments accountant was proposed as an accounting mechanism that tracks (the logarithm of) MαM_{\alpha} directly and converts the resulting bound to an (ε,δ)(\varepsilon,\delta)-DP bound.

References