Auditing Differentially Private Machine Learning: How Private is Private SGD?

Matthew Jagielski, Jonathan Ullman, Alina Oprea

Introduction

Differential privacy [DMNS06] has become the de facto standard for guaranteeing privacy in machine learning and statistical analysis, and is now being deployed by many organizations including Apple [TVV+17], Google [EPK14, BEM+17, PSM+18], and the US Census Bureau [HMA+17]. Now that differential privacy has moved from theory to practice, there has been considerable attention on optimizing and evaluating differentially private machine learning algorithms, notably differentially private stochastic gradient descent (henceforth, DP-SGD) [SCS13, BST14, ACG+16], which is now widely available in TensorFlow Privacy [Goo]. DP-SGD is the building block for training many widely used private classification models, including feed-forward and convolutional neural networks.

Differential privacy gives a strong worst-case guarantee of individual privacy: a differentially private algorithm ensures that, for any set of training examples, no attacker, no matter how powerful, can learn much more information about a single training example than they could have learned had that example been excluded from the training data. The amount of information is quantified by a privacy parameter ε\varepsilon.There are several common variants of differential privacy [DKM+06, DR16, BS16, Mir17, BDRS18, DRS19] that quantify the influence of a single example in slightly different ways, sometimes using more than one parameter. For this high-level discussion, we focus on the single, primary parameter ε\varepsilon. Intuitively, a smaller ε\varepsilon means stronger privacy protections, but leads to lower accuracy. As such there is often pressure to set this parameter as large as one feels still gives a reasonable privacy guarantee, and relatively large parameters such as ε=2\varepsilon=2 are not uncommon. However, this guarantee is not entirely satisfying, as such an algorithm might allow an attacker to guess a random bit of information about each training example with approximately 86% accuracy. As such there is often a gap between the strong formal protections promised by differential privacy and the specific quantitative implications of the choice of ε\varepsilon in practice.

This state-of-affairs is often justified by the fact that our analysis of the algorithm is often pessimistic. First of all, ε\varepsilon is a parameter that has to be determined by careful analysis, and often existing theoretical analysis is not tight. Indeed a big part of making differentially private machine learning practical has been the significant body of work giving progressively more refined privacy analyses specifically for DP-SGD [ACG+16, DRS19, MTZ19, YLP+19], and for all we know these bounds on ε\varepsilon will continue to shrink. Indeed, it is provably intractable to determine the tightest bound on ε\varepsilon for a given algorithm [GM18]. Second, differential privacy is a worst-case notion, as the mechanism might have stronger privacy guarantees on realistic datasets and realistic attackers. Although it is plausible that differentially private algorithms with large values of ε\varepsilon provide strong privacy in practice, it is far from certain, which makes it difficult to understand the appropriate value of ε\varepsilon for practical deployments.

Auditing DP-SGD. In this paper we investigate the extent to which DP-SGD,Although our methods are general, in this work we exclusively study the implementation and privacy analysis of DP-SGD in TensorFlow Privacy [Goo]. does or does not give better privacy in practice than what its current theoretical analysis suggests. We do so using novel data poisoning attacks. Specifically, our method starts with a dataset DD of interest (e.g. Fashion-MNIST) and some algorithm A\mathcal{A} (e.g. DP-SGD with a specific setting of hyperparameters), and produces a small poisoning set SS of kk points and a binary classifier TT such that TT distinguishes the distribution A(D)\mathcal{A}(D) from A(DS)\mathcal{A}(D\cup S) with significant advantage over random guessing. If A\mathcal{A} were ε\varepsilon-DP, then TT could have accuracy at most exp(εk)/(1+exp(εk))\exp(\varepsilon k)/(1+\exp(\varepsilon k)), so if we can estimate the accuracy of TT we can infer a lower bound on ε\varepsilon. While previous work [MZH19] proposed to use this connection between differential privacy and data poisoning as a defense against data poisoning, our use in this context of auditing the privacy of DP-SGD is new.

Specifically, for certain natural choices of hyperparameters in DP-SGD, and standard benchmark datasets (see Figure 2), our attacks give lower bounds on ε\varepsilon that are approximately 10x better than what we could obtain from previous methods, and are within approximately 10x of the worst-case, analytically derived upper bound. For context, previous theoretical improvements to the analysis have improved the worst-case upper bounds by factors of more than 1000x over the naïve analysis, and thus our results show that we cannot hope for similarly dramatic gains in the future.

Novel Data Poisoning Attacks. We find that existing data poisoning attacks, as well as membership inference attacks proposed by prior work, have poor or trivial performance not only against DP-SGD, but even against SGD with gradient clipping (i.e. rescaling gradients to have norm no larger than some CC). Gradient clipping is an important part of DP-SGD, but does not provide any formal privacy guarantees on its own. Thus, we develop a novel data poisoning attack that is more robust to gradient clipping, and also performs much better against DP-SGD.

Intuitively, data poisoning attacks introduce new points whose gradients will change the model in a certain direction, and the attack impact increases when adding poisoning points of larger gradients. Existing attacks modify the model in a random direction, and have to push far enough that the original distribution on model parameters and the new distribution become distinguishable. To be effective, these attacks use points which induce large gradients, making the attack sensitive to gradient clipping. On the other hand, our attack improves by finding the direction where the model parameters have the lowest variance, and select poisoning points that modify the model in that direction. Therefore, we achieve the same effect of model poisoning with poisoning points of smaller gradients, thereby making the attack more robust to clipping.

The Role of Auditing in DP. More generally, our work takes a quantitative, empirical approach to auditing the privacy afforded by specific implementations of differentially private algorithms. We do not advocate trying to definitively measure privacy of an algorithm empirically, since it’s hopeless to try to anticipate all future attacks. Rather, we believe this empirical approach has the potential to complement and influence analytical work on differential privacy, somewhat analogous to the way cryptanalysis informs the design and deployment of cryptography.

Specifically, we believe this approach can complement the theory in several ways:

Most directly, by advancing the state-of-art in privacy attacks, we can either demonstrate that a given algorithm with a given choice of parameters is not sufficiently private, or give some confidence that it might be sufficiently private.

Establishing strong lower bounds on ε\varepsilon gives a sense of how much more one could hope to get out of tightening the existing privacy analysis.

Observing how the performance of the attack depends on different datasets, hyperparameters, and variants of the algorithm can identify promising new phenomena to explore theoretically.

Producing concrete privacy violations can help non-experts interpret the concrete implications of specific choices of the privacy parameter.

2 Related Work

DP-SGD. Differentially private SGD was introduced in [SCS13], and an asymptotically optimal analysis of its privacy properties was given in [BST14]. Notably Abadi et al. [ACG+16] gave greatly improved concrete bounds on its privacy parameter, and showed its practicality for training neural networks, making DP-SGD one of the most promising methods for practical private machine learning. There have been several subsequent efforts to refine the privacy analysis of this specific algorithm [MTZ19, DRS19, YLP+19]. A recent work [HT19] gave a heuristic argument that SGD itself (without adding noise to the gradients) satisfies differential privacy, but even then the bounds on ε\varepsilon are quite large (e.g. ε=13.01\varepsilon=13.01) for most datasets.

Privacy Attacks. Although privacy attacks have a very long history, the history of privacy attacks against aggregate statistical information, such as machine learning models, goes back to the seminal work of Dinur and Nissim [DN03] on reconstruction attacks. A similar, but easier to implement type of attack, membership inference attacks, was first performed by Homer et al. [HSR+08], and theoretically analyzed in [SOJH09, DSS+15]. Shokri et al. [SSSS17] and Yeom et al. [YGFJ18] gave black-box membership inference algorithms for complex machine learning models. Membership inference attacks are compelling because they require relatively weak assumptions, but, as we show, state-of-the-art membership inference attacks lead to quantitatively weak privacy violations.

More directly related to our work, privacy attacks were recently used by Jayaraman and Evans [JE19] to understand the concrete privacy leakage from differentially private machine learning algorithms, specifically DP-SGD. However, the goal of their work is to compare the privacy guarantees offered by different variants of differential privacy, rather than to determine the level of privacy afforded by a given algorithm. As such, their attacks are quantitatively much less powerful than ours (as we show in Figure 2), and are much further from determining the precise privacy guarantees of DP-SGD.

Differential Privacy and Data Poisoning. Ma et al. [MZH19] and Hong et al. [HCK+20] evaluate the effectiveness of data poisoning attacks on differentially private machine learning algorithms. Ma et al. consider both the output perturbation and objective perturbation algorithms for learning ridge regression and logistic regression models, proposing attacks on differentially private algorithms, and also argue that differentially privacy can serve as a defense for poisoning attacks. Hong et al. propose differential privacy as a defense for poisoning attacks, showing that DP-SGD performs effectively at defending against existing poisoning attacks in the literature. While differential privacy provides a provable defense for poisoning attacks, our intuition is that the strong poisoning attacks we design allow us to measure a lower bound on the privacy offered by differentially private algorithms.

Automated Discovery of Privacy Parameters. Two works have focused on automatically discovering (upper or lower bounds on) privacy parameters. [GM18] showed that determining the exact privacy level using black-box access to the algorithm is prohibitively expensive. In the white-box setting, Ding et al. [DWW+18] used a clever combination of program analysis and random sampling to identify violations of ε\varepsilon-DP, although their methods are currently limited to simple algorithms. Moreover the violations of DP they identify may not correspond to realistic attacks.

(Measuring) Differential Privacy

We begin by outlining differential privacy and one of its relevant properties: group privacy. We consider machine learning classification tasks, where a dataset consists of many samples from some domain D=X×Y\mathcal{D}=\mathcal{X}\times\mathcal{Y}, where X\mathcal{X} is the feature domain, and Y\mathcal{Y} the label domain. We say two datasets D0,D1D_{0},D_{1} differ on kk rows if we can replace at most kk elements from D0D_{0} to produce D1D_{1}.

An algorithm A:DR\mathcal{A}:\mathcal{D}\mapsto\mathcal{R} is (ε,δ)(\varepsilon,\delta)-differentially private if for any two datasets D0,D1D_{0},D_{1} which differ on at most one row, and every set of outputs OR\mathcal{O}\subseteq\mathcal{R}:

where the probabilities are taken only over the randomness of A\mathcal{A}.

Let D0,D1D_{0},D_{1} be two datasets differing on at most kk rows, A\mathcal{A} is an (ε,δ)(\varepsilon,\delta)-differentially private algorithm, and O\mathcal{O} an arbitrary output set. Then

Group privacy will give guarantees for poisoning attacks that introduce multiple points.

DP-SGD. The most prominent differentially private mechanism for training machine learning models is differentially private stochastic gradient descent (DP-SGD) [SCS13, BST14, ACG+16]. DP-SGD makes two modifications to the standard SGD procedure:

clipping gradients to a fixed maximum norm CC, and adding noise to gradients with standard deviation σC\sigma C, for a given σ\sigma, as shown in Algorithm 1. Given the hyperparameters – clipping norm, noise magnitude, iteration count, and batch size – one can analyze DP-SGD to conclude that it satisfies (ε,δ)(\varepsilon,\delta)-differential privacy for some parameters ε,δ0\varepsilon,\delta\geq 0.

2 Statistically Measuring Differential Privacy

In this section we describe our main statistical procedure for obtaining lower bounds on the privacy parameter for a given algorithm A\mathcal{A}, which functions differently from the membership inference attack used in prior work ([SSSS17, JE19] and described in Appendix D). Here, we describe the procedure generally, in the case where δ=0\delta=0; in Appendix A, we show how to adapt the procedure for δ>0\delta>0, and in Section 3, we discuss how we instantiate it in our work. Given a learning algorithm A\mathcal{A}, we construct two datasets D0D_{0} and D1D_{1} differing on kk rows, and some output set O\mathcal{O}. We defer the discussion of constructing D0D_{0}, D1D_{1}, and O\mathcal{O} to Section 3. We also wish to bound the probability that we incorrectly measure εLB\varepsilon_{\mathit{LB}} by a small value α\alpha. From Equation (2), observe that by estimating the quantities p0=Pr[A(D0)O]p_{0}=\Pr[\mathcal{A}(D_{0})\in\mathcal{O}] and p1=Pr[A(D1)O]p_{1}=\Pr[\mathcal{A}(D_{1})\in\mathcal{O}], we can compute the largest εLB\varepsilon_{\mathit{LB}} such that Equation (2) holds. With δ=0\delta=0, this simplifies to εLB=ln(p0/p1)\varepsilon_{\mathit{LB}}=\ln(p_{0}/p_{1}). This serves as an estimate of the leakage of the private algorithm, but requires estimating p0p_{0} and p1p_{1} accurately.

For an arbitrary algorithm, it’s infeasible to compute p0,p1p_{0},p_{1} precisely, so we rely on Monte Carlo estimation, by training some fixed TT number of times. However, this approach incurs statistical uncertainty, which we correct for by using Clopper Pearson confidence intervals [CP34]. That is, to ensure that our estimate εLB\varepsilon_{\mathit{LB}} holds with probability >1α>1-\alpha, we find a Clopper Pearson lower bound for p1p_{1} that holds with probability 1α/21-\alpha/2, and an upper bound for p0p_{0} holding with probability 1α/21-\alpha/2. Qualitatively, we can be confident that our lower bound on privacy leakage ε\varepsilon^{\prime} holds with probability 1α1-\alpha. This procedure is outlined in Algorithm 2, and we prove its correctness in Theorem 2.

When provided with black box access to an algorithm A\mathcal{A}, two datasets D0D_{0} and D1D_{1} differing on at most kk rows, an output set O\mathcal{O}, a trial number TT and statistical confidence α\alpha, if Algorithm 2 returns εLB\varepsilon_{\mathit{LB}}, then, with probability 1α1-\alpha, A\mathcal{A} does not satisfy ε\varepsilon^{\prime}-DP for any ε<εLB\varepsilon^{\prime}<\varepsilon_{\mathit{LB}}.

We stress that when we say εLB\varepsilon_{\mathit{LB}} is a lower bound with probability 1α1-\alpha, this is only over the randomness of the Monte Carlo sampling, and is not based on any modeling or assumptions. We can always move our confidence closer to 11 by taking TT larger.

First, the guarantee of the Clopper-Pearson confidence intervals is that, with probability at least 1α1-\alpha, p^0p0\hat{p}_{0}\leq p_{0} and p^1p1\hat{p}_{1}\geq p_{1}, which implies p0/p1p^0/p^1p_{0}/p_{1}\geq\hat{p}_{0}/\hat{p}_{1}. Second, if A\mathcal{A} is ε\varepsilon-DP, then by group privacy we would have p0/p1exp(kε)p_{0}/p_{1}\leq\exp(k\varepsilon), meaning A\mathcal{A} is not ε\varepsilon^{\prime}-DP for any ε<1kln(p0/p1)\varepsilon^{\prime}<\frac{1}{k}\ln(p_{0}/p_{1}). Combining the two statements, A\mathcal{A} is not ε\varepsilon^{\prime} for any ε<1kln(p^0/p^1)=εLB\varepsilon^{\prime}<\frac{1}{k}\ln(\hat{p}_{0}/\hat{p}_{1})=\varepsilon_{\mathit{LB}}. ∎

The εLB\varepsilon_{\mathit{LB}} reported by Algorithm 2 has two fundamental upper bounds, the provable εth\varepsilon_{\mathit{th}}, and an upper bound, εOPT(T,α)\varepsilon_{OPT}(T,\alpha), imposed by Monte Carlo estimation. The first upper bound is natural: if we run the algorithm on some A\mathcal{A} for which the ε\varepsilon we can prove is εth=1\varepsilon_{\mathit{th}}=1, then εLBεth=1\varepsilon_{\mathit{LB}}\leq\varepsilon_{\mathit{th}}=1. To understand εOPT(T,α)\varepsilon_{OPT}(T,\alpha), suppose we run 500 trials, and desire α=0.01\alpha=0.01. The best possible performance is if we get perfect inference accuracy and k=1k=1, where ct0=500\text{ct}_{0}=500 and ct1=0\text{ct}_{1}=0. The Clopper Pearson confidence interval produces p^0=0.989,p^1=0.011\hat{p}_{0}=0.989,\hat{p}_{1}=0.011, which gives εLB=4.54/k=4.54\varepsilon_{\mathit{LB}}=4.54/k=4.54. Then, with 99%99\% probability, the true ε\varepsilon is at least 4.544.54, and εOPT(T,α)=4.54\varepsilon_{OPT}(T,\alpha)=4.54.

We remark that the above procedure only demonstrates that A\mathcal{A} cannot be strictly better than (εLB,0)(\varepsilon_{\mathit{LB}},0)-DP, but allows for it to be (εLB/2,δ)(\varepsilon_{\mathit{LB}}/2,\delta)-DP for very small δ\delta. However, in our work, p^0,p^1\hat{p}_{0},\hat{p}_{1} turn out never to be too close to , so these differences have little effect on our findings. In Appendix A, we formally discuss how to modify this algorithm for (ε,δ)(\varepsilon,\delta)-DP for δ>0\delta>0. We also show when we can increase εLB\varepsilon_{\mathit{LB}} by considering the maximum upper bounds of the original output set O\mathcal{O} and its complement OC\mathcal{O}^{C}.

Poisoning Attacks

We now show how to use poisoning attacks to run Algorithm 2. Intuitively, we begin with a dataset D0D_{0} and replace kk rows with poisoning points to form D1D_{1}; we then use the impact of poisoning as an output set O\mathcal{O}. We start with existing backdoor attacks [GDGG17], and then propose a more effective clipping-aware poisoning attack.

In a poisoning attack, an adversary replaces kk data points from a training dataset DD of nn points. The poisoned training dataset is provided as input to the training algorithm, which releases a model ff that minimizes a loss function L(f,D)\mathcal{L}(f,D) on its given dataset DD.

We focus on a specific type of poisoning attack, called a backdoor attack [GDGG17]. In a backdoored model, the performance on natural data is maintained, but, by adding a small perturbation to a data point xx into Pert(x)\mathit{Pert}(x), the adversary changes the predicted class of the perturbed data. These attacks have been developed for image datasets. In the original attack [GDGG17], described in Algorithm 3, the perturbation function Pert()\mathit{Pert}(\cdot) adds a pattern in the corner of an image. The poisoning attack takes natural data (x,y)(x,y), perturbs the image to Pert(x)\mathit{Pert}(x), and changes the class to some ypy_{p}. The objective is to decrease the loss on (Pert(x),yp)(\mathit{Pert}(x),y_{p}) values from the perturbed test set.

2 Clipping-Aware Poisoning

DP-SGD makes two modifications to the learning process to preserve privacy: clipping gradients and adding noise. Clipping provides no formal privacy on its own, but many poisoning attacks perform significantly worse in the presence of clipping. Indeed, the basic backdoor attack from Section 3.1 results in a fairly weak lower bound of at most εLB=0.11\varepsilon_{\mathit{LB}}=0.11 using the Fashion-MNIST dataset, even with no added noise (which has an εth=\varepsilon_{\mathit{th}}=\infty). To improve this number, we must make the poisoning attack sufficiently robust to clipping.

To understand existing backdoor attacks’ difficulty with clipping, consider clipping’s impact on logistic regression. The gradient of model parameters ww with respect to a poisoning point (xp,yp)(x_{p},y_{p}) is

We can minimize this variance with respect to the poisoning point xpx_{p} by using the singular value decomposition: selecting xpx_{p} as the singular vector corresponding to the smallest singular value (i.e. the direction of least variance), and scale xpx_{p} to a similar norm to the rest of the dataset. We select ypy_{p} to be the smallest probability class on xpx_{p}. We then insert kk copies of the poisoning point (xp,yp)(x_{p},y_{p}). We call this approach ClipBKD, detailed in Algorithm 4. We prove in Appendix B that when we run ClipBKD (modified for regression tasks) to estimate the privacy of the output perturbation algorithm, we obtain εLB\varepsilon_{\mathit{LB}} within a small factor of the upper bound εth\varepsilon_{\mathit{th}}, giving evidence that this attack is well suited to our application in differential privacy. In Appendix C, we describe how to adapt ClipBKD to transfer learning from a pre-trained model.

For both standard and clipping-aware backdoors, we generate D0,D1D_{0},D_{1} with a given poisoning size kk, using functions Backdoor or ClipBkd, respectively. Then the test statistic is whether the backdoored points are distinguishable by a threshold on their loss (i.e., output set O\mathcal{O} is whether BkdTest or ClipBkdTest return “Backdoored”). We first run an initial phase of TT trials to find a good threshold ZZ for the test functions. We then run another TT trials in Algorithm 2 to estimate p0^\hat{p_{0}} and p1^\hat{p_{1}} based on either the BkdTest or the ClipBkdTest test statistic.

Experiments and Discussion

We evaluate both membership inference (MI, as used by [YGFJ18] and [JE19] and described in Appendix D) and our algorithms on three datasets: Fashion-MNIST (FMNIST), CIFAR10, and Purchase-100 (P100). For each dataset, we consider both a logistic regression (LR) model and a two-layer feedforward neural network (FNN), trained with DP-SGD using various hyperparameters:

FMNIST [XRV17] is a dataset of 70000 28x28 pixel images of clothing from one of 10 classes, split into a train set of 60000 images and a test set of 10000 images. It is a standard benchmark dataset for differentially private machine learning. To improve training speed, we consider a simplified version, using only classes 0 and 1 (T-shirt and trouser), and downsample so each class contains 3000 training and 1000 testing points. CIFAR10 [Kri09] is a harder dataset than FMNIST, consisting of 60000 32x32x3 images of vehicles and animals, split into a train set of 50000 and a test set of 10000. For training speed, we again take only class 0 and 1 (airplane and automobile), making our train set contain 10000 samples, and the test set 2000 samples. When training on CIFAR10, we follow standard practice for differential privacy and fine-tune the last layer of a model pretrained nonprivately on the more complex CIFAR100, a similarly sized but more complex benchmark dataset [PCS+20]. P100 [SSSS17] is a modification of a Kaggle dataset [Pur], with 200000 records of 100 features, and 100 classes. The features are purchases, and the classes are user clusters. Following [JE19], we subsample the dataset so it has 10000 train records and 10000 test records.

Our techniques are general, and could be applied to any dataset-model pair to identify privacy risks for DP-SGD. Examining these six dataset-model pairs demonstrates that our technique can be used to identify new privacy risks in DP-SGD, and a comprehensive empirical study is not our focus.

Model Size. The two-layer feedforward neural networks all have a width of 32 neurons. For CIFAR10, the logistic regression model and feedforward neural network are added on top of the pretrained convolutional neural network.

Computing Thresholds. In order to run Algorithm 2, we need to specify D0,D1D_{0},D_{1} and O\mathcal{O}. We’ve described how to use poisoning to compute D0,D1D_{0},D_{1}, and how the test statistics for these attacks are constructed, assuming a known threshold. To produce this threshold, we train 500 models on the unpoisoned dataset and 500 models on the poisoned dataset, and identify which of the resulting 1000 thresholds produces the best εLB\varepsilon_{\mathit{LB}}, using Algorithm 2.

2 Results and Discussion

Figure 2 presents a direct comparison of the privacy bounds produced by ClipBKD (our attack), the standard backdoor attack, and MI. As standard backdoor attacks only exist for images, we only report results on them on FMNIST and CIFAR10. The pattern we choose for backdoor attacks is a 5x5 white square in the top-left corner of an image. For ClipBKD, we use T=500T=500 trials and confidence level α=0.01\alpha=0.01 (i.e., our Monte Carlo estimates hold with 99% confidence) and report the best result from k=1,2,4,8k=1,2,4,8 poisoning points. Results for MI use 1000 samples, and average over 10 trained models. For context, we display the best theoretical upper bound on εth\varepsilon_{\mathit{th}} and also εOPT(500,0.01)\varepsilon_{\mathit{OPT}}(500,0.01), which is the best value of εLB\varepsilon_{\mathit{LB}} that we could hope to produce using TT trials and confidence level α\alpha.

For every dataset and model, we find that ClipBKD significantly outperforms MI, by a factor of 2.5x–1500x. As a representative example, for εth=4\varepsilon_{\mathit{th}}=4 on Purchase-100 with 2-layer neural networks, ClipBKD gives an εLB\varepsilon_{\mathit{LB}} of 0.46, while MI gives εLB\varepsilon_{\mathit{LB}} of 0.04, an improvement of 12.1x. We also find ClipBKD always improves over standard backdoors: on FMNIST by an average factor of 3.84x, and standard backdoors never reach positive εLB\varepsilon_{\mathit{LB}} on CIFAR, due to the large number of points required to poison the pretrained model. We also find that ClipBKD returns εLB\varepsilon_{\mathit{LB}} that are close to εth\varepsilon_{\mathit{th}}; for finite εth\varepsilon_{\mathit{th}}, the majority of gaps are a factor of <12.3<12.3x, reaching as low as 6.6x. For example, on Purchase-100, when εth=4\varepsilon_{\mathit{th}}=4, we find that ClipBKD returns an εLB\varepsilon_{\mathit{LB}} of 0.46, a gap of 8.7x.

Sensitivity to Hyperparameters. We also give a more thorough evaluation of ClipBKD’s performance as a function of DP-SGD’s hyperparameters. We vary clipping norm between 0.5, 1, and 2 and vary the noise to ensure εth\varepsilon_{\mathit{th}} between 1, 2, 4, 8, 16, and \infty. We also vary the initialization randomness between random normal initializations with variance (fixed initialization), 0.5σ0.5\sigma, σ\sigma, and 2σ2\sigma, where σ\sigma is the variance of Glorot normal initialization. Table 2 reports the best εLB\varepsilon_{\mathit{LB}} produced by the attack over k=1,2,4,8k=1,2,4,8. Our best measured values of εLB\varepsilon_{\mathit{LB}} occur when initialization is fixed, and are within a 4.2-7.7x factor of εth\varepsilon_{\mathit{th}}, speaking to the effectiveness of ClipBKD. When εth=\varepsilon_{\mathit{th}}=\infty and the initialization is fixed, we achieve perfect inference accuracy, matching εOPT(500,0.01)=4.54\varepsilon_{OPT}(500,0.01)=4.54.

These experiments reveal three intuitive trends. First, as εth\varepsilon_{\mathit{th}} increases (equivalently, the noise level decreases), εLB\varepsilon_{\mathit{LB}} also increases. Second, as the initialization randomness decreases, εLB\varepsilon_{\mathit{LB}} increases. All existing analyses of DP-SGD give privacy upper bounds for any fixed initialization, and our results suggest that initial randomization might play a significant role. Finally, as clipping norm decreases, εLB\varepsilon_{\mathit{LB}} decreases, except when the initialization is fixed. In fact, our results show that εLB\varepsilon_{\mathit{LB}} is more sensitive to the clipping norm than the amount of noise. All existing analyses of DP-SGD consider only the noise multiplier σGD\sigma_{\mathit{GD}} but not the clipping norm, but the role of the clipping norm itself seems highly significant.

We emphasize that for every choice of hyperparameters, the training accuracy is 96–98%, so the algorithm has comparable utility, but potentially very different privacy and robustness to poisoning, as we vary these parameters. We believe these phenomena deserve further study.

Conclusion and Future Directions

We use novel poisoning attacks to establish strong limits on the privacy of specific differentially private algorithms, namely DP-SGD. We establish that the worst-case privacy bounds for this algorithm are approaching their limits. Our findings highlight several questions for future exploration:

How much can our attacks be pushed quantitatively? Can the gap between our lower bounds and the worst-case upper bounds be closed?

Can we incorporate additional features into the privacy analysis of DP-SGD, such as the specific gradient-clipping norm, and the amount of initial randomness?

How realistic are the instances produced by our attacks, and can we extend the attacks to give easily interpretable examples of privacy risks for non-experts?

Although there is no hope of determining the precise privacy level of a given algorithm in a fully empirical way, we believe our work demonstrates how a quantitative, empirical approach to privacy attacks can effectively complement analytical work on privacy in machine learning.

Acknowledgments

JU is supported by NSF grants CCF-1750640, CNS-1816028, and CNS-1916020, and a Google Faculty Research Award. This research was also sponsored by a Toyota ITC research award and the U.S. Army Combat Capabilities Development Command Army Research Laboratory under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

Appendix A Extending Algorithm 2.

Notice that Algorithm 2 holds for (ε,0)(\varepsilon,0)-differential privacy. However, this is only for simplicity—the group privacy guarantee of (ε,δ)(\varepsilon,\delta)-differential privacy can be converted to a similar procedure. Write x=exp(ε)x=\exp(\varepsilon) in the group privacy guarantee for Equation 2, and rearrange to provide p1xk+1(p1δ)xkp0x+(p0δ)0p_{1}x^{k+1}-(p_{1}-\delta)x^{k}-p_{0}x+(p_{0}-\delta)\geq 0. We can solve for xx here using a root-finding algorithm to find xx, and computing εLB=ln(x)\varepsilon_{\mathit{LB}}=\ln(x). Theorem 2 can be easily extended to this case.

Notice that differential privacy makes a guarantee for all output sets, including the complement OC\mathcal{O}^{C}; if Pr[A(D)O]=p\Pr[A(D)\in\mathcal{O}]=p, then Pr[A(D)OC]=1p\Pr[A(D)\in\mathcal{O}^{C}]=1-p. If, upon computing p0,p1p_{0},p_{1}, we can compute a larger εLB\varepsilon_{\mathit{LB}} by using OC\mathcal{O}^{C}, this requires no extra trials.

For example, suppose δ=0\delta=0, k=1k=1, p1=0.8p_{1}=0.8, and p0=0.4p_{0}=0.4. Here, εLB=ln(p1/p0)/1=ln(2)\varepsilon_{LB}=\ln(p_{1}/p_{0})/1=\ln(2). If, instead, we replace O\mathcal{O} by OC\mathcal{O}^{C}, εLB=ln((1p0)/(1p1))/1=ln(0.6/0.2)=ln(3)\varepsilon_{LB}=\ln((1-p_{0})/(1-p_{1}))/1=\ln(0.6/0.2)=\ln(3). In Lemma 3, we show when this technique improves εLB\varepsilon_{LB}: when p1>p0+kδp_{1}>p_{0}+k\delta and p0+p1>1p_{0}+p_{1}>1. We use this modification in all of our experiments.

If p1>p0+kδp_{1}>p_{0}+k\delta and p0+p1>1p_{0}+p_{1}>1, then the largest root of f0(x)=p1xk+1(p1δ)xkp0x+(p0δ)f_{0}(x)=p_{1}x^{k+1}-(p_{1}-\delta)x^{k}-p_{0}x+(p_{0}-\delta) is smaller than the largest root of f1(x)=(1p0)xk+1(1p0δ)xk(1p1)x+(1p1δ)f_{1}(x)=(1-p_{0})x^{k+1}-(1-p_{0}-\delta)x^{k}-(1-p_{1})x+(1-p_{1}-\delta).

Write x0x_{0} the largest root of f0(x)f_{0}(x), and x1x_{1} the largest root of f1(x)f_{1}(x). We show this in two parts: first, we show that, for all p0,p1p_{0},p_{1}, f0(x)f_{0}(x) has a root x>1x>1 when p1>p0+kδp_{1}>p_{0}+k\delta, after which it is monotonically increasing. This shape is evident in Figure 3. This provides a nonzero εLB\varepsilon_{LB}. Then we show that this εLB\varepsilon_{LB} must be smaller for f1(x)f_{1}(x) when p0+p1>1p_{0}+p_{1}>1, because f0(x)f1(x)>0f_{0}(x)-f_{1}(x)>0 when x>1x>1. This ensures that f1(x0)<0f_{1}(x_{0})<0, and so the root x1>x0x_{1}>x_{0}.

We begin by showing that f0(x)f_{0}(x) has a single root x>1x>1. First, notice that f0(1)=0f_{0}(1)=0. We then analyze the derivative, showing that it starts negative, has a root, and then is always positive. This indicates that there can only be one root.

This has a root x>1x>1 if xkp0+kδxk1p1<0x^{k}p_{0}+k\delta x^{k-1}-p_{1}<0, so we require p0+kδp1<0p_{0}+k\delta-p_{1}<0. Notice too that f0(x)f_{0}^{\prime}(x) is monotonically increasing at x>1x>1. This ensures that it has only one root x>1x>1. This argument holds, too, for f1(x)f_{1}(x), as if p0+kδp1<0p_{0}+k\delta-p_{1}<0, then (1p1)+kδ(1p0)<0(1-p_{1})+k\delta-(1-p_{0})<0.

Now that we know both f0f_{0} and f1f_{1} only have a single root, and they are both increasing at that root, we just need to show that f1(x0)<0f_{1}(x_{0})<0, as this will ensure x1>x0x_{1}>x_{0}. We do this by showing that x>1\forall x>1, f0(x)f1(x)>0f_{0}(x)-f_{1}(x)>0. First, write

The δ\delta terms cancel, and we can factor the above into

This is always positive when x>1x>1 and p0+p1>1p_{0}+p_{1}>1. ∎

Appendix B Analysis of Backdoor Poisoning-based Auditing

We now provide formal evidence for the effectiveness of backdoor poisoning attacks in auditing differentially private algorithms with a case study on linear regression. To the best of our knowledge, this is also the first formal analysis of backdoor poisoning attacks for a concrete learning algorithm.

where σd\sigma_{d} is the smallest singular value of XX.

The output perturbation algorithm for Ridge regression, with (ε,δ)(\varepsilon,\delta)-DP, adds Gaussian noise with variance σ2=2ln(1.25/δ)(2/λ)2/ε2\sigma^{2}=2\ln(1.25/\delta)(2/\lambda)^{2}/\varepsilon^{2} to the optimal parameters ww.

Letting c=0.5λ+σd+1c=\frac{0.5}{\lambda+\sigma_{d}+1}, the probability of success for this distinguisher is Pr[N(0,σ2)<c],\Pr[\mathcal{N}(0,\sigma^{2})<c], which gives an ε\varepsilon lower bound of

We can lower bound Pr[0<N(0,σ2)<c]\Pr[0<\mathcal{N}(0,\sigma^{2})<c] using the following integral approximation:

By its Maclaurin series, \ln\mathopen{}\mathclose{{}\left(\frac{0.5+x}{0.5-x}}\right)\geq 4x. Then we can compute our lower bound on ε\varepsilon to be

so the attack differs by a constant factor from the provable ε\varepsilon.

Appendix C ClipBKD with Pretrained Models

State-of-the-art differentially private CIFAR10 models use transfer learning from a fixed pretrained CIFAR100 convolutional neural network. We call the pretrained model function f0f_{0}, which is never updated during training. Training produces a f1f_{1}, such that the entire prediction model is f(x)=f1(f0(x))f(x)=f_{1}(f_{0}(x)).

This makes ClipBKD not directly applicable, as ClipBKD requires access to the input of the trained model f1f_{1}. Then we must try to produce some xx such that f0(x)=hpf_{0}(x)=h_{p}, where hph_{p} is produced by SVD in Algorithm 3. This is not in general possible, so we instead use gradient descent to optimize the combination of two loss functions on xx.

Our first loss function incentivizes decreasing hpvh_{p}\cdot v for high-variance directions vVhighv\in V_{high} from SVD. This ensures the gradient will not move in SGD’s noisy directions. Our second loss function incentivizes increasing hpvh_{p}\cdot v for low-variance directions vVlowv\in V_{low} from SVD, ensuring the gradient is distinguishable in low variance directions. Putting these together, we produce xpx_{p} by optimizing the following loss function:

We perform this optimization by projected gradient descent, running 10000 iterations with a learning rate of 1.

Appendix D Membership Inference

In membership inference [YGFJ18], an adversary is given a model ff and its training loss cc and seeks to understand whether a given data point (x,y)(x,y) was used to train the model. Although alternative formulations have been proposed [SSSS17], we focus on the one proposed by [YGFJ18]. Intuitively, the attack relies on a generalization gap: the loss on training data should be smaller than the loss on test data. The algorithm is provided a set of 2n2n elements, nn of which were used for training, and nn not used for training, and predicts that any sample with loss lower than the training loss. The accuracy of these predictions is bounded by exp(ε)1+exp(ε)\tfrac{\exp(\varepsilon)}{1+\exp(\varepsilon)} for any ε\varepsilon-differentially private algorithm, so we can produce a lower bound εLB\varepsilon_{\mathit{LB}} from it. The algorithm is provided in Algorithm 6.