Renyi Differential Privacy

Ilya Mironov

I Introduction

Differential privacy, introduced by Dwork et al. , has been embraced by multiple research communities as a commonly accepted notion of privacy for algorithms on statistical databases. As applications of differential privacy begin to emerge, practical concerns of tracking and communicating privacy guarantees are coming to the fore.

Informally, differential privacy bounds a shift in the output distribution of a randomized algorithm that can be induced by a small change in its input. The standard definition of ϵ\epsilon-differential privacy puts a multiplicative upper bound on the worst-case change in the distribution’s density.

Several relaxations of differential privacy explored other measures of closeness between two distributions. The most common such relaxation, the (ϵ,δ)(\epsilon,\delta) definition, has been a method of choice for expressing privacy guarantees of a variety of differentially private algorithms, especially those that rely on the Gaussian additive noise mechanism or whose analysis follows from composition theorems. The additive δ\delta parameter allows suppressing the long tails of the mechanism’s distribution where pure ϵ\epsilon-differential privacy guarantees may not hold.

Compared to the standard definition, (ϵ,δ)(\epsilon,\delta)-differential privacy offers asymptotically smaller cumulative loss under composition and allows greater flexibility in the selection of privacy-preserving mechanisms.

Despite its notable advantages and numerous applications, the definition of (ϵ,δ)(\epsilon,\delta)-differential privacy is an imperfect fit for its two most common use cases: the Gaussian mechanism and a composition rule. We briefly sketch them here and elaborate on these points in the next section.

The first application of (ϵ,δ)(\epsilon,\delta)-differential privacy was the analysis of the Gaussian noise mechanism . In contrast with the Laplace mechanism, whose privacy guarantee is characterized tightly and accurately by ϵ\epsilon-differential privacy, a single Gaussian mechanism satisfies a curve of (ϵ(δ),δ)(\epsilon(\delta),\delta)-differential privacy definitions. Picking any one point on this curve leaves out important information about the mechanism’s actual behavior.

The second common use of (ϵ,δ)(\epsilon,\delta)-differential privacy is due to applications of advanced composition theorems. The central feature of ϵ\epsilon-differential privacy is that it is closed under composition; moreover, the ϵ\epsilon parameters of composed mechanisms simply add up, which motivates the concept of a privacy budget. By relaxing the guarantee to (ϵ,δ)(\epsilon,\delta)-differential privacy, advanced composition allows tighter analyses for compositions of (pure) differentially private mechanisms. Iterating this process, however, quickly leads to a combinatorial explosion of parameters, as each application of an advanced composition theorem leads to a wide selection of possibilities for (ϵ(δ),δ)(\epsilon(\delta),\delta)-differentially private guarantees.

In part to address the shortcomings of (ϵ,δ)(\epsilon,\delta)-differential privacy, several recent works, surveyed in the next section, explored the use of higher-order moments as a way of bounding the tails of the privacy loss variable.

Inspired by these theoretical results and their applications, we propose Rényi differential privacy as a natural relaxation of differential privacy that is well-suited for expressing guarantees of privacy-preserving algorithms and for composition of heterogeneous mechanisms. Compared to (ϵ,δ)(\epsilon,\delta)-differential privacy, Rényi differential privacy is a strictly stronger privacy definition. It offers an operationally convenient and quantitatively accurate way of tracking cumulative privacy loss throughout execution of a standalone differentially private mechanism and across many such mechanisms. Most significantly, Rényi differential privacy allows combining the intuitive and appealing concept of a privacy budget with application of advanced composition theorems.

The paper presents a self-contained exposition of the new definition, unifying current literature and demonstrating its applications. The organization of the paper is as follows. Section II reviews the standard definition of differential privacy, its (ϵ,δ)(\epsilon,\delta) relaxation and its most common uses. Section III introduces the definition of Rényi differential privacy and proves its basic properties that parallel those of ϵ\epsilon-differential privacy, summarizing the results in Table I. Section IV demonstrates a reduction from Rényi differential privacy to (ϵ,δ)(\epsilon,\delta)-differential privacy, followed by a proof of an advanced composition theorem in Section V. Section VI applies Rényi differential privacy to analysis of several basic mechanisms: randomized response for predicates, Laplace and Gaussian (see Table II for a brief summary). Section VII discusses assessment of risk due to application of a Rényi differentially private mechanism and use of Rényi differential privacy as a privacy loss tracking tool. Section VIII concludes with open questions.

II Differential Privacy and Its Flavors

ϵ\epsilon-Differential privacy . We first recall the standard definition of ϵ\epsilon-differential privacy.

A randomized mechanism f ⁣:DRf\colon\mathcal{D}\mapsto\mathcal{R} satisfies ϵ\epsilon-differential privacy (ϵ\epsilon-DP) if for any adjacent D,DDD,D^{\prime}\in\mathcal{D} and SRS\subset\mathcal{R}

The above definition is contingent on the notion of adjacent inputs DD and DD^{\prime}, which is domain-specific and is typically chosen to capture the contribution to the mechanism’s input by a single individual.

taken over all adjacent inputs DD and DD^{\prime}.

The basic composition theorem states that if ff and gg are, respectively, ϵ1\epsilon_{1}- and ϵ2\epsilon_{2}-DP, then the simultaneous release of f(D)f(D) and g(D)g(D) satisfies (ϵ1+ϵ2)(\epsilon_{1}+\epsilon_{2})-DP. Moreover, the mechanism gg may be selected adaptively, after seeing the output of f(D)f(D).

(ϵ,δ)(\epsilon,\delta)-Differential privacy . A relaxation of ϵ\epsilon-differential privacy allows a δ\delta additive term in its defining inequality:

A randomized mechanism f ⁣:DRf\colon\mathcal{D}\mapsto\mathcal{R} offers (ϵ,δ)(\epsilon,\delta)-differential privacy if for any adjacent D,DDD,D^{\prime}\in\mathcal{D} and SRS\subset\mathcal{R}

The common interpretation of (ϵ,δ)(\epsilon,\delta)-DP is that it is ϵ\epsilon-DP “except with probability δ\delta”. Formalizing this statement runs into difficulties similar to the ones addressed by Mironov et al. for a different (computational) relaxation. For any two adjacent inputs, D1D_{1} and D2D_{2}, it is indeed possible to define an ϵ\epsilon-DP mechanism that agrees with ff with all but δ\delta probability. Extending this argument to domains of exponential sizes (for instance, to a boolean hypercube) cannot be done without diluting the guarantee exponentially . We conclude that (ϵ,δ)(\epsilon,\delta)-differential privacy is a qualitatively different definition than pure ϵ\epsilon-DP (unless, of course, δ=0\delta=0, which we assume not to be the case through the rest of this section).

Even for the simple case of exactly two input databases (such as when the adversary knows the entire dataset except whether it contains a particular record), the δ\delta additive term encompasses two very different modes in which privacy may fail. In both scenarios ϵ\epsilon-DP holds with probability 1δ1-\delta, they differ in what happens with the remaining probability δ\delta. In the first scenario privacy degrades gracefully, such as to ϵ1\epsilon_{1}-DP with probability δ/2\delta/2, to ϵ2\epsilon_{2}-DP with probability δ/4\delta/4, etc. In the other scenario, with probability δ\delta the secret—whether the record is part of the database or not—becomes completely exposed. The difference between the two failure modes can be quite stark. In the former, there is always some residual deniability; in the latter, the adversary occasionally learns the secret with certainty. Depending on the adversary’s tolerance to false positives, plausible deniability may offer adequate protection, but a single (ϵ,δ)(\epsilon,\delta)-DP privacy statement cannot differentiate between the two alternatives. For a lively parable of the different guarantees offered by the ϵ\epsilon-DP and (ϵ,δ)(\epsilon,\delta)-DP definitions see McSherry .

The definition of (ϵ,δ)(\epsilon,\delta)-differential privacy was initially proposed to capture privacy guarantees of the Gaussian mechanism, defined as follows:

taken over all adjacent inputs DD and DD^{\prime}.

Another common reason for bringing in (ϵ,δ)(\epsilon,\delta)-differential privacy is application of advanced composition theorems. Consider the case of kk-fold adaptive composition of an (ϵ,δ)(\epsilon,\delta)-DP mechanism. For any δ>0\delta^{\prime}>0 it holds that the composite mechanism is (ϵ,kδ+δ)(\epsilon^{\prime},k\delta+\delta^{\prime})-DP, where ϵ2kln(1/δ)ϵ+kϵ(eϵ1)\epsilon^{\prime}\triangleq\sqrt{2k\ln(1/\delta^{\prime})}\epsilon+k\epsilon(e^{\epsilon}-1) . Note that, similarly to our discussion of the Gaussian mechanism, a single mechanism satisfies a continuum of incomparable (ϵ,δ)(\epsilon,\delta)-DP guarantees.

Kairouz et al. give a procedure for computing an optimal kk-fold composition of an (ϵ,δ)(\epsilon,\delta)-DP mechanism . Murtagh and Vadhan demonstrate that generalizing this result to composition of heterogeneous mechanisms (i.e., satisfying (ϵi,δi)(\epsilon_{i},\delta_{i})-DP for different ϵi\epsilon_{i}’s) is #P-hard; they describe a PTAS for an approximate solution. None of these works tackles the problem of composing mechanisms that satisfy several (ϵ,δ)(\epsilon,\delta)-DP guarantees simultaneously.

(zero)-Concentrated Differential Privacy and the moments accountant. The closely related work by Dwork and Rothblum , followed by Bun and Steinke , explore privacy definitions—Concentrated Differential Privacy and zero-Concentrated Differential Privacy—that are framed using the language of, respectively, subgaussian tails and the Rényi divergence. The main difference between our approaches is that both Concentrated and zero-Concentrated DP require a linear bound on all positive moments of a privacy loss variable. In contrast, our definition applies to one moment at a time. Although less restrictive, it allows for more accurate numerical analyses.

The work by Abadi et al. on differentially private stochastic gradient descent introduced the moments accountant as an internal tool for tracking privacy loss across multiple invocations of the Gaussian mechanism applied to random subsets of the input dataset. The paper’s results are expressed via a necessarily lossy translation of the accountant’s output (bounds on select moments of the privacy loss variable) to the language of (ϵ,δ)(\epsilon,\delta)-differential privacy.

Taken together, the works on Concentrated DP, zero-Concentrated DP, and the moments accountant point towards adopting Rényi differential privacy as an effective and flexible mechanism for capturing privacy guarantees of a wide variety of algorithms and their combinations.

Other relaxations. We briefly mention other relaxations and generalizations of differential privacy.

Under the indistinguishability-based Computational Differential Privacy (IND-CDP) definition , the test of closeness between distributions on adjacent inputs is computationally bounded (all other definitions considered in this paper hold against an unbounded, information-theoretic adversary). The IND-CDP notion allows much more accurate functionalities in the two-party setting ; in the traditional client-server setup there is a natural class of functionalities where the gap between IND-CDP and (ϵ,δ)(\epsilon,\delta)-DP is minimal , and there are (contrived) examples where the computational relaxation permits tasks that are infeasible under information-theoretic definitions .

Several other works, most notably the Pufferfish and the coupled-worlds frameworks , propose different stability constraints on the output distribution of privacy-preserving mechanisms. Although they differ in what distributions are compared, their notion of closeness is the same as in (ϵ,δ)(\epsilon,\delta)-DP.

III Rényi differential privacy

We describe a generalization of the notion of differential privacy based on the concept of the Rényi divergence. Connection between the two notions has been pointed out before (mostly for one extreme order, known as the Kullback-Leibler divergence ); our contribution is in systematically exploring the relationship and its practical implications.

The (parameterized) Rényi divergence is classically defined as follows :

For two probability distributions PP and QQ defined over R\mathcal{R}, the Rényi divergence of order α>1\alpha>1 is

(All logarithms are natural; P(x)P(x) is the density of PP at xx.)

For the endpoints of the interval (1,)(1,\infty) the Rényi divergence is defined by continuity. Concretely, D1(PQ)D_{1}(P\|Q) is set to be limα1Dα(PQ)\lim_{\alpha\to 1}D_{\alpha}(P\|Q) and can be verified to be equal to the Kullback-Leibler divergence (also known as relative entropy):

Note that the expectation is taken over PP, rather than over QQ as in the previous definition. It is possible, though, that D1(PQ)D_{1}(P\|Q) thus defined is finite whereas Dα(PQ)=+D_{\alpha}(P\|Q)=+\infty for all α>1\alpha>1.

For completeness, we reproduce in the Appendix properties of the Rényi divergence important to the sequel: non-negativity, monotonicity, probability preservation, and a weak triangle inequality (Propositions 8–11).

The relationship between the Rényi divergence with α=\alpha=\infty and differential privacy is immediate. A randomized mechanism ff is ϵ\epsilon-differentially private if and only if its distribution over any two adjacent inputs DD and DD^{\prime} satisfies

It motivates exploring a relaxation of differential privacy based on the Rényi divergence.

A randomized mechanism f ⁣:DRf\colon\mathcal{D}\mapsto\mathcal{R} is said to have ϵ\epsilon-Rényi differential privacy of order α\alpha, or (α,ϵ)(\alpha,\epsilon)-RDP for short, if for any adjacent D,DDD,D^{\prime}\in\mathcal{D} it holds that

Similarly to the definition of differential privacy, a finite value for ϵ\epsilon-RDP implies that feasible outcomes of f(D)f(D) for some DDD\in\mathcal{D} are feasible, i.e., have a non-zero density, for all inputs from D\mathcal{D} except for a set of measure 0. Assuming that this is the case, we let the event space be the support of the distribution.

The Rényi divergence can be defined for α\alpha smaller than 1, including negative orders. We are not using these orders in our definition of Rényi differential privacy.

The standard definition of differential privacy has been successful as a privacy measure because it simultaneously meets several important criteria. We verify that the relaxed definition inherits many of the same properties. The results of this section are summarized in Table I.

“Bad outcomes” guarantee. A privacy definition is only as useful as its guarantee for data contributors. The simplest such assurance is the “bad outcomes” interpretation. Consider a person, concerned about some adverse consequences, deliberating whether to withhold her record from the database. Let us say that some outputs of the mechanism are labeled as “bad.” The differential privacy guarantee asserts that the probability of observing a bad outcome will not change (either way) by more than a factor of eϵe^{\epsilon} whether anyone’s record is part of the input or not (for appropriately defined “adjacent” inputs). This is an immediate consequence of the definition of differential privacy, where the subset SS is the union of bad outcomes.

This guarantee is relaxed for Rényi differential privacy. Concretely, if ff is (α,ϵ)(\alpha,\epsilon)-RDP, then by Proposition 10:

We discuss consequences of this relaxation in Section VII.

Robustness to auxiliary information. Critical to the adoption of differential privacy as an operationally useful definition is its lack of assumptions on the adversary’s knowledge. More formally, the property is captured by the Bayesian interpretation of privacy guarantees, which compares the adversary’s prior with the posterior.

Assume that the adversary has a prior p(D)p(D) over the set of possible inputs DDD\in\mathcal{D}, and observes an output XX of an ϵ\epsilon-differentially private mechanism ff. Its posterior satisfies the following guarantee for all pairs of adjacent inputs D,DDD,D^{\prime}\in\mathcal{D} and all XRX\in\mathcal{R}:

In other words, evidence obtained from an ϵ\epsilon-differentially private mechanism does not move the relative probabilities assigned to adjacent inputs (the odds ratio) by more than eϵe^{\epsilon}.

The guarantee implied by RDP is a probabilistic statement about the change in the Bayes factor. Let the random variable R(D,D)R(D,D^{\prime}) be defined as follows:

It follows immediately from definition that the Rényi divergence of order α\alpha between P=f(D)P=f(D^{\prime}) and Q=f(D)Q=f(D) bounds the α\alpha-th moment of the change in RR:

By taking the logarithm of both sides and applying Jensen’s inequality we obtain that

(This can also be derived by observing that

and by monotonicity of the Rényi divergence.)

Compare (1) with the guarantee of pure differential privacy, which states that logRpost(D,D)logRprior(D,D)ϵ\log R_{\textrm{post}}(D,D^{\prime})-\log R_{\textrm{prior}}(D,D^{\prime})\leq\epsilon everywhere, not just in expectation.

Post-processing. A privacy guarantee that can be diminished by manipulating output is unlikely to be useful. Consider a randomized mapping g ⁣:RRg\colon\mathcal{R}\mapsto\mathcal{R}^{\prime}. We observe that Dα(PQ)Dα(g(P)g(Q))D_{\alpha}(P\|Q)\geq D_{\alpha}(g(P)\|g(Q)) by the analogue of the data processing inequality [19, Theorem 9]. It means that if f()f(\cdot) is (α,ϵ)(\alpha,\epsilon)-RDP, so is g(f())g(f(\cdot)). In other words, Rényi differential privacy is preserved by post-processing.

Preservation under adaptive sequential composition. The property that makes possible modular construction of differentially private algorithms is self-composition: if f()f(\cdot) is ϵ1\epsilon_{1}-differentially private and g()g(\cdot) is ϵ2\epsilon_{2}-differentially private, then simultaneous release of f(D)f(D) and g(D)g(D) is ϵ1+ϵ2\epsilon_{1}+\epsilon_{2}-differentially private. The guarantee even extends to when gg is chosen adaptively based on ff’s output: if gg is indexed by elements of R\mathcal{R} and gX()g_{X}(\cdot) is ϵ2\epsilon_{2}-differentially private for any XRX\in\mathcal{R}, then publishing (X,Y)(X,Y), where Xf(D)X\sim f(D) and YgX(D)Y\sim g_{X}(D), is ϵ1+ϵ2\epsilon_{1}+\epsilon_{2}-differentially private.

We prove a similar statement for the composition of two RDP mechanisms.

Let f ⁣:DR1f\colon\mathcal{D}\mapsto\mathcal{R}_{1} be (α,ϵ1)(\alpha,\epsilon_{1})-RDP and g ⁣:R1×DR2g\colon\mathcal{R}_{1}\times\mathcal{D}\mapsto\mathcal{R}_{2} be (α,ϵ2)(\alpha,\epsilon_{2})-RDP, then the mechanism defined as (X,Y)(X,Y), where Xf(D)X\sim f(D) and Yg(X,D)Y\sim g(X,D), satisfies (α,ϵ1+ϵ2)(\alpha,\epsilon_{1}+\epsilon_{2})-RDP.

Let h ⁣:DR1×R2h\colon\mathcal{D}\mapsto\mathcal{R}_{1}\times\mathcal{R}_{2} be the result of running ff and gg sequentially. We write XX, YY, and ZZ for the distributions f(D)f(D), g(X,D)g(X,D), and the joint distribution (X,Y)=h(D)(X,Y)=h(D). XX^{\prime}, YY^{\prime}, and ZZ^{\prime} are similarly defined if the input is DD^{\prime}. Then

Significantly, the guarantee holds whether the releases of ff and gg are coordinated or not, or computed over the same or different versions of the input dataset. It allows us to operate with a well-defined notion of a privacy budget associated with an individual, which is a finite resource consumed with each differentially private data release.

Extending the concept of the privacy budget, we say that the Rényi differential privacy has a budget curve parameterized by the order α\alpha. We present examples illustrating this viewpoint in Section VI.

Group privacy. Although the definition of differential privacy constrains a mechanism’s outputs on pairs of adjacent inputs, its guarantee extends, in a progressively weaker form, to inputs that are farther apart. This property has two important consequences. First, the differential privacy guarantee degrades gracefully if our assumptions about one person’s influence on the input are (somewhat) wrong. For example, a single family contributing to a survey will likely share many socio-economic, demographic, and health characteristics. Rather than collapsing, the differential privacy guarantee will scale down linearly with the number of family members. Second, the group privacy property allows preprocessing input into a differentially private mechanism, possibly amplifying (in a controlled fashion) one record’s impact on the output of the computation.

We define group privacy using a notion of cc-stable transformation . We say that g ⁣:DDg\colon\mathcal{D}\mapsto\mathcal{D}^{\prime} is cc-stable if g(A)g(A) and g(B)g(B) are adjacent in DD^{\prime} implies that there exists a sequence of length c+1c+1 so that D0=A,,Dc=BD_{0}=A,\dots,D_{c}=B and all (Di,Di+1)(D_{i},D_{i+1}) are adjacent in D\mathcal{D}.

The standard notion of differential privacy satisfies the following. If ff is ϵ\epsilon-differentially private and g ⁣:DDg\colon\mathcal{D}^{\prime}\mapsto\mathcal{D} is cc-stable, then fgf\circ g is cϵc\epsilon-differentially private. A similar statement holds for Rényi differential privacy.

If f ⁣:DRf\colon\mathcal{D}\mapsto\mathcal{R} is (α,ϵ)(\alpha,\epsilon)-RDP, g ⁣:DDg\colon\mathcal{D}^{\prime}\mapsto\mathcal{D} is 2c2^{c}-stable and α2c+1\alpha\geq 2^{c+1}, then fgf\circ g is (α/2c,3cϵ)(\alpha/2^{c},3^{c}\epsilon)-RDP.

We prove the statement for c=1c=1, the rest follows by induction.

Define h=fgh=f\circ g. Since gg is 2-stable, it means that for any adjacent D,DDD,D^{\prime}\in\mathcal{D}^{\prime} there exist ADA\in D, so that g(D)g(D) and AA, AA and g(D)g(D^{\prime}) are adjacent in DD.

By Corollary 4 and monotonicity of the Rényi divergence, we have that h=fgh=f\circ g satisfies

IV RDP and (ϵ,δ)italic-ϵ𝛿(\epsilon,\delta)-DP

As we observed earlier, the definition of ϵ\epsilon-differential privacy coincides with (,ϵ)(\infty,\epsilon)-RDP. By monotonicity of the Rényi divergence, (,ϵ)(\infty,\epsilon)-RDP implies (α,ϵ)(\alpha,\epsilon)-RDP for all finite α\alpha.

In turn, an (α,ϵ)(\alpha,\epsilon)-RDP implies (ϵδ,δ)(\epsilon_{\delta},\delta)-differential privacy for any given probability δ>0\delta>0.

If ff is an (α,ϵ)(\alpha,\epsilon)-RDP mechanism, it also satisfies (ϵ+log1/δα1,δ)(\epsilon+\frac{\log 1/\delta}{\alpha-1},\delta)-differential privacy for any 0<δ<10<\delta<1.

Take any two adjacent inputs DD and DD^{\prime}, and a subset of ff’s range SS. To show that ff is (ϵ,δ)(\epsilon^{\prime},\delta)-differentially private, where ϵ=ϵ+1α1log1/δ\epsilon^{\prime}=\epsilon+\frac{1}{\alpha-1}\log 1/\delta, we need to demonstrate that Pr[f(D)S]eϵPr[f(D)S]+δ\Pr[f(D)\in S]\leq e^{\epsilon^{\prime}}\Pr[f(D^{\prime})\in S]+\delta. In fact, we prove a stronger statement that Pr[f(D)S]max(eϵPr[f(D)S],δ)\Pr[f(D)\in S]\leq\max(e^{\epsilon^{\prime}}\Pr[f(D^{\prime})\in S],\delta).

Denote Pr[f(D)S]\Pr[f(D^{\prime})\in S] by QQ and consider two cases.

Case I. eϵQ>δα/(α1)e^{\epsilon}Q>\delta^{\alpha/(\alpha-1)}. Continuing the above,

Case II. eϵQδα/(α1)e^{\epsilon}Q\leq\delta^{\alpha/(\alpha-1)}. This case is immediate since

A more detailed comparison between the notions of RDP and (ϵ,δ)(\epsilon,\delta)-differential privacy that goes beyond these reductions is deferred to Section VII.

V Advanced Composition Theorem

The main thesis of this section is that the Rényi differential privacy curve of a composite mechanism is sufficient to draw non-trivial conclusions about its privacy guarantees, similar to the ones given by other advanced composition theorems, such as Dwork et al. or Kairouz et al. . Although our proof is structured similarly to Dwork et al. (for instance, Lemma 1 is a direct generalization of [6, Lemma III.2]), it is phrased entirely in the language of Rényi differential privacy without making any (explicit) use of probability arguments.

If PP and QQ are such that D(PQ)ϵD_{\infty}(P\|Q)\leq\epsilon and D(QP)ϵD_{\infty}(Q\|P)\leq\epsilon, then for α1\alpha\geq 1

Consider the case when α<1+1/ϵ\alpha<1+1/\epsilon. We first observe that for any x>y>0x>y>0, λ=log(x/y)\lambda=\log(x/y), and 0β1/λ0\leq\beta\leq 1/\lambda the following inequality holds:

Since all terms of the right hand side of (2) are positive, the inequality applies if λ\lambda is an upper bound on logx/y\log x/y, which we use in the argument below.

Taking the logarithm of both sides and using that log(1+x)<x\log(1+x)<x for positive xx we find that

Plugging the bound on PQ1\|P-Q\|_{1} into (3) completes the proof.

The claim for α=1\alpha=1 follows by continuity. ∎

The constant in Lemma 1 can be improved to .5 via a substantially more involved analysis [10, Proposition 3.3] (see also )

Let f ⁣:DRf\colon\mathcal{D}\mapsto\mathcal{R} be an adaptive composition of nn mechanisms all satisfying ϵ\epsilon-differential privacy. Let DD and DD^{\prime} be two adjacent inputs. Then for any SRS\subset\mathcal{R}:

By applying Lemma 1 to the Rényi differential privacy curve of the underlying mechanisms and Proposition 1 to their composition, we find that for all α1\alpha\geq 1

Denote Pr[f(D)S]\Pr[f(D^{\prime})\in S] by QQ and consider two cases.

Case I: log1/Qϵ2n\log 1/Q\geq\epsilon^{2}n. Choosing with some foresight α=log1/Q/(ϵn)1\alpha=\sqrt{\log 1/Q}/(\epsilon\sqrt{n})\geq 1, we have by Proposition 10 (probability preservation):

Case II: log1/Q<ϵ2n\log 1/Q<\epsilon^{2}n. This case follows trivially, since the right hand side of the claim is larger than 1:

The notable feature of Proposition 4 is that its privacy guarantee—bounded probability gain—comes in the form that depends on the event’s probability. We discuss this type of guarantee in Section VII.

The following corollary gives a more conventional (ϵ,δ)(\epsilon,\delta) variant of advanced composition.

Let ff be the composition of the nn ϵ\epsilon-differentially private mechanisms. Let 0<δ<10<\delta<1 be such that log(1/δ)ϵ2n\log(1/\delta)\geq\epsilon^{2}n. Then ff satisfies (ϵ,δ)(\epsilon^{\prime},\delta)-differential privacy where

Let DD and DD^{\prime} be two adjacent inputs, and SS be some subset of the range of ff. To argue (ϵ,δ)(\epsilon^{\prime},\delta)-differential privacy of ff, we need to verify that

In fact, we prove a somewhat stronger statement, namely that Pr[f(D)S]max(eϵPr[f(D)S],δ).\Pr[f(D)\in S]\leq\max(e^{\epsilon^{\prime}}\Pr[f(D^{\prime})\in S],\delta).

Denote Pr[f(D)S]\Pr[f(D^{\prime})\in S] by QQ and consider two cases.

Case II: 8log1/δlog1/Q8\log 1/\delta\leq\log 1/Q. Then

The condition log(1/δ)ϵ2n\log(1/\delta)\geq\epsilon^{2}n corresponds to the so-called “high privacy” regime of the advanced composition theorem , where ϵ<(1+2)log(1/δ)\epsilon^{\prime}<(1+\sqrt{2})\log(1/\delta). Since δ\delta is typically chosen to be small, say, less than 1%, it covers the case of ϵ<11\epsilon^{\prime}<11. In other words, if log(1/δ)>ϵ2n\log(1/\delta)>\epsilon^{2}n, this and other composition theorems are unlikely to yield strong bounds.

VI Basic Mechanisms

In this section we analyze Rényi differential privacy of three basic mechanisms and their self-composition: randomized response, Laplace and Gaussian noise addition. The results are summarized in Table II and plotted for select parameters in Figures 1 and 2.

Let ff be a predicate, i.e., f ⁣:D{0,1}f\colon\mathcal{D}\mapsto\{0,1\}. The Randomize Response mechanism for ff is defined as

The following statement can be verified by direct application of the definition of Rényi differential privacy:

Randomized Response mechanism RRp(f)\operatorname{\textnormal{{RR}}}_{p}(f) satisfies

VI-B Laplace noise

Define the Laplace mechanism for ff of sensitivity 1 as

where Λ(μ,λ)\Lambda(\mu,\lambda) is Laplace distribution with mean μ\mu and scale λ\lambda, i.e., its probability density function is 12λexp(xμ/λ)\frac{1}{2\lambda}\exp(-|x-\mu|/\lambda).

To derive the RDP budget curve for the exponential mechanism we compute the Rényi divergence for Laplace distribution and its offset.

For continuous distributions PP and QQ defined over the real interval with densities pp and qq

To compute the integral for p(x)=12λexp(x/λ)p(x)=\frac{1}{2\lambda}\exp(-|x|/\lambda) and q(x)=12λexp(x1/λ)q(x)=\frac{1}{2\lambda}\exp(-|x-1|/\lambda), we evaluate it separately over the intervals (,0](-\infty,0], $andand[1,+\infty]$.

Since the Laplace mechanism is additive, the Rényi divergence between Lλf(D)\operatorname{\textbf{L}}_{\lambda}f(D) and Lλf(D)\operatorname{\textbf{L}}_{\lambda}f(D^{\prime}) depends only on α\alpha and the distance f(D)f(D)|f(D)-f(D^{\prime})|. Proposition 6 implies the following:

If real-valued function ff has sensitivity 1, then the Laplace mechanism Lλf\operatorname{\textbf{L}}_{\lambda}f satisfies (α,(\alpha,1α1log{α2α1exp(α1λ)+α12α1exp(αλ)})\frac{1}{\alpha-1}\log\left\{\frac{\alpha}{2\alpha-1}\exp(\frac{\alpha-1}{\lambda})+\frac{\alpha-1}{2\alpha-1}\exp(-\frac{\alpha}{\lambda})\right\})-RDP.

This is, of course, consistent with the Laplace mechanism satisfying 1/λ1/\lambda-differential privacy. The other extreme evaluates to the following expression limα1Dα(Λ(0,λ)Λ(1,λ))=1/λ+exp(1/λ)1\lim_{\alpha\to 1}D_{\alpha}(\Lambda(0,\lambda)\|\Lambda(1,\lambda))=1/\lambda+\exp(-1/\lambda)-1, which is well approximated by .5/λ2.5/\lambda^{2} for large λ\lambda.

VI-C Gaussian noise

Assuming, as before, that ff is a real-valued function, the Gaussian mechanism for approximating ff is defined as

where N(0,σ2)N(0,\sigma^{2}) is normally distributed random variable with standard deviation σ2\sigma^{2} and mean 0.

The following statement is a closed-form expression of the Rényi divergence between a Gaussian and its offset (for a more general version see ).

Dα(N(0,σ2)N(μ,σ2))=αμ2/(2σ2).D_{\alpha}(N(0,\sigma^{2})\|N(\mu,\sigma^{2}))=\alpha\mu^{2}/(2\sigma^{2}).

If ff has sensitivity 1, then the Gaussian mechanism Gσf\operatorname{\textbf{G}}_{\sigma}f satisfies (α,α/(2σ2))(\alpha,\alpha/(2\sigma^{2}))-RDP.

Observe that the RDP budget curve for the Gaussian mechanism is particularly simple—a straight line (Figure 1). Recall that the (adaptive) composition of RDP mechanisms satisfies Rényi differential privacy with the budget curve that is the sum of the mechanisms’ budget curves. It means that a composition of Gaussian mechanisms will behave, privacy-wise, “like” a Gaussian mechanism. Concretely, a composition of nn Gaussian mechanisms each with parameter σ\sigma will have the RDP curve of a Gaussian mechanism with parameter σ/n\sigma/\sqrt{n}.

VI-D Privacy of basic mechanisms under composition

The “bad outcomes” interpretation of Rényi differential privacy ties the probabilities of seeing the same outcome under runs of the mechanism applied to adjacent inputs. The dependency of the upper bound on the increase in probability on its initial value is complex, especially compared to the standard differential privacy guarantee. The main advantage of this more involved analysis is that for most parameters the bound becomes tighter.

In this section we compare numerical bounds for several analyses of self-composed mechanisms (see Figure 2), presented as three sets of graphs, where Pr[f(D)S]\Pr[f(D)\in S] takes values 10610^{-6}, 10310^{-3}, and 10110^{-1}.

Each of the six graphs in Figure 2 (three probability values ×\times {randomized response, Laplace}) plots bounds in logarithmic scale on the relative increase in probability of SS (i.e., Pr[f(D)S]/Pr[f(D)S]\Pr[f(D^{\prime})\in S]/\Pr[f(D)\in S]) offered by four analyses. The first, “naïve”, bound follows from the basic composition theorem for differential privacy and, as expected, is very pessimistic for all but a handful of parameters. A tighter, advanced composition theorem , gives a choice of δ\delta, from which one computes ϵ\epsilon^{\prime} so that the nn-fold composition satisfies (ϵ,δ)(\epsilon^{\prime},\delta)-differential privacy. The second curve plots the bound for the optimal (tightest) choice of (ϵ,δ)(\epsilon^{\prime},\delta). Two other bounds come from Rényi differential privacy analysis: our generic advanced composition theorem (Proposition 4) and the bound of Proposition 10 for the optimal combination of (α,ϵ)(\alpha,\epsilon) from the RDP curve of the composite mechanism.

Several observations are in order. The RDP-specific analysis for both mechanisms is tighter than all generic bounds whose only input is the mechanism’s differential privacy parameter. On the other hand, our version of the advanced composition bound (Proposition 4) is consistently outperformed by the standard (ϵ,δ)(\epsilon,\delta)-form of the composition theorem, where δ\delta is chosen optimally. We elaborate on this distinction in the next section.

VII Discussion

Rényi differential privacy is a natural relaxation of the standard notion of differential privacy that preserves many of its essential properties. It can most directly be compared with (ϵ,δ)(\epsilon,\delta)-differential privacy, with which it shares several important characteristics.

Probabilistic privacy guarantee. The standard “bad outcomes” guarantee of ϵ\epsilon-differential privacy is independent of the probability of a bad outcome: it may increase only by a factor of exp(ϵ)\exp(\epsilon). Its relaxation, (ϵ,δ)(\epsilon,\delta)-differential privacy, allows for an additional δ\delta term, which allows for a complete privacy compromise with probability δ\delta.

In stark contrast, Rényi differential privacy even with very weak parameters never allows a total breach of privacy with no residual uncertainty. The following analysis quantifies this assurance.

Let ff be (α,ϵ)(\alpha,\epsilon)-RDP with α>1\alpha>1. Recall that for any two adjacent inputs DD and DD^{\prime}, and an arbitrary prior pp the odds function R(D,D)p(D)/p(D)R(D,D^{\prime})\sim p(D)/p(D^{\prime}) satisfies E[{Rpost(D,D)Rprior(D,D)}α1]exp((α1)ϵ)\operatorname{E}\left[\left\{\frac{R_{\textrm{post}}(D,D^{\prime})}{R_{\textrm{prior}}(D,D^{\prime})}\right\}^{\alpha-1}\right]\leq\exp((\alpha-1)\epsilon). By Markov’s inequality Pr[Rpost(D,D)>βRprior(D,D)]<eϵ/β1/(α1)\Pr[R_{\textrm{post}}(D,D^{\prime})>\beta R_{\textrm{prior}}(D,D^{\prime})]<e^{\epsilon}/\beta^{1/(\alpha-1)}. For instance, if α=2\alpha=2, the probability that the ratio between two posteriors increases by more than the β\beta factor drops off as O(1/β)O(1/\beta).

Baseline-dependent guarantees. The Rényi differential privacy bound gets weaker for less likely outcomes. For instance, if ff is a (10.0,.1)(10.0,.1)-RDP mechanism, an event of probability .5 under f(D)f(D) can be as large as .586 and as small as .419 under f(D)f(D^{\prime}). For smaller events the range is (in relative terms) wider. If the probability under f(D)f(D) is .001, then Pr[f(D)S][.00042,0.00218]\Pr[f(D^{\prime})\in S]\in[.00042,0.00218]. For Pr[f(D)S]=106\Pr[f(D)\in S]=10^{-6} the range is wider still: Pr[f(D)S][.195106,4.36106]\Pr[f(D^{\prime})\in S]\in[.195\cdot 10^{-6},4.36\cdot 10^{-6}].

Contrasted with the pure ϵ\epsilon-differential privacy this type of guarantee is conceptually weaker and more onerous in application: in order to decide whether the increased risk is tolerable, one is required to estimate the baseline risk first.

However, in comparison with (ϵ,δ)(\epsilon,\delta)-DP the analysis via Rényi differential privacy is simpler and, especially for probabilities that are smaller than δ\delta, leads to stronger bounds. The reason is that (ϵ,δ)(\epsilon,\delta)-differential privacy often arises as a result of some analysis that implicitly comes with an ϵ\epsilon-δ\delta tradeoff. Finding an optimal value of (ϵ,δ)(\epsilon,\delta) given the baseline risk may be non-trivial, especially in closed form. Contrast the following two, basically equivalent, statements of advanced composition theorems (Proposition 4 and its Corollary 1):

Let f ⁣:DRf\colon\mathcal{D}\mapsto\mathcal{R} be an adaptive composition of nn mechanisms all satisfying ϵ\epsilon-differential privacy for ϵ1\epsilon\leq 1. Let DD and DD^{\prime} be two adjacent inputs. Then for any SRS\subset\mathcal{R}, by Proposition 4:

where 0<ϵ,δ<10<\epsilon,\delta<1 such that log(1/δ)ϵ2n\log(1/\delta)\geq\epsilon^{2}n.

Given some value of baseline risk Pr[f(D)S]\Pr[f(D)\in S], which formulation is easier to interpret? We argue that it is the former, since the (ϵ,δ)(\epsilon,\delta) form has a free parameter (δ\delta) that ought to be optimized in order to extract a tight bound that Proposition 4 gives directly.

The use of (ϵ,δ)(\epsilon,\delta) bounds gets even more complex if we consider a composition of heterogeneous mechanisms. It brings us to the last point of comparison between (ϵ,δ)(\epsilon,\delta)- and Rényi differential privacy measures.

Keeping track of accumulated privacy loss. A finite privacy budget associated with an individual is an intuitive and appealing concept, to which ϵ\epsilon-differential privacy gives a rigorous mathematical expression. Cumulative loss of differential privacy over the cause of a mechanism run, a protocol, or one’s lifetime can be tracked easily thanks to the additivity property of differential privacy. Unfortunately, doing so naïvely likely exaggerates privacy loss, which grows sublinearly in the number of queries with all but negligible probability (via advanced composition theorems).

Critically, applying advanced composition theorems breaks the convenient abstraction of privacy as a non-negative real number. Instead, the guarantee comes in the (ϵ,δ)(\epsilon,\delta) form that effectively corresponds to a single point on an implicitly defined curve. Composition of multiple, heterogeneous mechanisms makes applying the composition rule optimally much more challenging, as one may choose various (ϵ,δ)(\epsilon,\delta) points to represent their privacy (in the analysis, not during the mechanisms’ run time!). It begs the question of how to represent the privacy guarantee of a complex mechanism: distilling it to a single number throws away valuable information, while publishing the entire (ϵ,δ)(\epsilon,\delta) curve shifts the problem to the aggregation step. (See Kairouz et al. for an optimal bound on composition of homogeneous mechanisms and Murtagh and Vadhan for hardness results and an approximation scheme for composition of mechanisms with heterogeneous privacy guarantees.)

Rényi differential privacy restores the concept of a privacy budget, thanks to its composition rule: RDP curves for composed mechanisms simply add up. Importantly, the α\alpha’s of (α,ϵ)(\alpha,\epsilon)-Rényi differential privacy do not change. If RDP statements are reported for a common set of α\alpha’s (which includes ++\infty, to keep track of pure differential privacy), RDP of the aggregate is the sum of the reported vectors. Since the composition theorem of Proposition 4 takes as an input the mechanism’s RDP curve, it means that the sublinear loss of privacy as a function of the number of queries will still hold.

For an example of this approach we tabulate the bound on privacy loss for an iterative mechanism consisting of three basic mechanisms: randomized response, Gaussian, and Laplace. Its RDP curve is given, in the closed form, by application of the basic composition rule to RDP curves of the underlying mechanisms (Table II). The privacy guarantee is presented in Figure 3 for three values of the baseline risk: .1, .001, and 10610^{-6}. For each set of parameters two curves are plotted: one for an optimal value of α\alpha from (1,+](1,+\infty], the other for an optimal α\alpha restricted to the set of 13 values {1.5,1.75,2,2.5,3,4,5,6,8,16,32,64,+}\{1.5,1.75,2,2.5,3,4,5,6,8,16,32,64,+\infty\}. The two curves are nearly identical, which illustrates our thesis that reporting RDP curves for a restricted set of α\alpha’s preserves tightness of privacy analysis.

VIII Conclusions and Open Questions

We put forth the proposition that Rényi divergence yields useful insight into analysis of differentially private mechanisms. Among our findings

Rényi differential privacy (RDP) is a natural generalization of pure differential privacy.

RDP shares, with some adaptations, many properties that make differential privacy a useful and versatile tool.

RDP analysis of Gaussian noise is particularly simple.

A composition theorem can be proved based solely on the properties of RDP, which implies that RDP packs sufficient information about a composite mechanism as to enable its analysis without consideration of its components.

Furthermore, an RDP curve may be sampled in just a few points to provide useful guarantees for a wide range of parameters. If these points are chosen consistently across multiple mechanisms, this information can be used to estimate aggregate privacy loss.

Naturally, multiple questions remain open. Among those

As Lemma 1 demonstrates, the RDP curve of a differentially private mechanism is severely constrained. Making fuller use of these constraints is a promising direction, in particular towards formal bounds on tightness of RDP guarantees from select α\alpha values.

Proposition 10 (probability preservation) is not tight when Dα(PQ)0D_{\alpha}(P\|Q)\to 0. We expect that P(A)Q(A)P(A)\to Q(A) but the bound does not improve beyond P(A)(α1)/αP(A)^{(\alpha-1)/\alpha}.

Acknowledgments

We would like to thank Cynthia Dwork, Kunal Talwar, Salil Vadhan, and Li Zhang for numerous fruitful discussions, the CSF reviewers, Nicolas Papernot and Damien Desfontaines for their helpful comments, and Mark Bun and Thomas Steinke for sharing a draft of .

References

Appendix A Basic Properties of the Rényi Divergence

For comprehensive exposition of properties of the Rényi divergence we refer to two recent papers . Here we recall and re-prove several facts useful for our analysis.

For 1α1\leq\alpha and arbitrary distributions P,QP,Q

Assume that α>1\alpha>1. Define ϕ(x)x1α\phi(x)\triangleq x^{1-\alpha} and g(x)Q(x)/P(x)g(x)\triangleq Q(x)/P(x). Then

by the Jensen inequality applied to the convex function ϕ\phi. The case of α=1\alpha=1 follows by letting ϕ\phi to be log1/x\log 1/x. ∎

For 1α<β1\leq\alpha<\beta and arbitrary P,QP,Q

Assume that α>1\alpha>1. Observe that the function xxα1β1x\mapsto x^{\frac{\alpha-1}{\beta-1}} is concave. By Jensen’s inequality

The case of α=1\alpha=1 follows by continuity. ∎

The following proposition appears in Langlois et al. , generalizing Lyubashevsky et al. .

Let α>1\alpha>1, PP and QQ be two distributions defined over R\mathcal{R} with identical support, ARA\subset\mathcal{R} be an arbitrary event. Then

The result follows by application of Hölder’s inequality, which states that for real-valued functions ff and gg, and real p,q>1p,q>1, such that 1/p+1/q=11/p+1/q=1,

By setting pαp\triangleq\alpha, qα/(α1)q\triangleq\alpha/(\alpha-1), f(x)P(x)/Q(x)1/qf(x)\triangleq P(x)/Q(x)^{1/q}, g(x)Q(x)1/qg(x)\triangleq Q(x)^{1/q}, and applying Hölder’s, we have

The most salient feature of the bound is its (often non-monotone) dependency on α\alpha: as α\alpha approaches 1, Dα(PQ)D_{\alpha}(P\|Q) shrinks (by monotonicity of the Rényi divergence) but the power to which it is raised goes to 0, pushing the result in the opposite direction. Several our proofs proceed by finding the optimal, or approximately optimal, α\alpha minimizing the bound.

The Rényi divergence is not a metric: it is not symmetric and it does not satisfy the triangle inequality. A weaker variant of the triangle inequality tying together the Rényi divergence of different orders does hold. Its general version is presented below.

Let P,Q,RP,Q,R be distributions on R\mathcal{R}. Then for α>1\alpha>1 and for any p,q>1p,q>1 satisfying 1/p+1/q=11/p+1/q=1 it holds that

By taking the logarithm and dividing both sides by α1\alpha-1 we establish the claim. ∎

Several important special cases of the weak triangle inequality can be obtained by fixing parameters pp and qq (compare it with [25, Lemma 12] and [23, Lemma 4.1]):

Dα(PQ)α1/2α1D2α(PR)+D2α1(RQ).D_{\alpha}(P\|Q)\leq\frac{\alpha-1/2}{\alpha-1}D_{2\alpha}(P\|R)+D_{2\alpha-1}(R\|Q).

Dα(PQ)αα1D(PR)+Dα(RQ).D_{\alpha}(P\|Q)\leq\frac{\alpha}{\alpha-1}D_{\infty}(P\|R)+D_{\alpha}(R\|Q).

Dα(PQ)Dα(PR)+D(RQ).D_{\alpha}(P\|Q)\leq D_{\alpha}(P\|R)+D_{\infty}(R\|Q).

Dα(PQ)αα/βα1Dβ(PR)+Dβ(RQ)D_{\alpha}(P\|Q)\leq\frac{\alpha-\alpha/\beta}{\alpha-1}D_{\beta}(P\|R)+D_{\beta}(R\|Q), for some explicit β=2α.5+O(1/α)\beta=2\alpha-.5+O(1/\alpha).

All claims follow from the weak triangle inequality (Proposition 11) where pp and qq are chosen, respectively, as

pp\to\infty and qp/(p1)1q\triangleq p/(p-1)\to 1.

qq\to\infty and pq/(q1)1p\triangleq q/(q-1)\to 1.

such that αp=αq1\alpha p=\alpha q-1 and 1/p+1/q=11/p+1/q=1.

In the last case βpα=2α.5+O(1/α)\beta\triangleq p\alpha=2\alpha-.5+O(1/\alpha). ∎