Auditing Differentially Private Machine Learning: How Private is Private SGD?

Matthew Jagielski, Jonathan Ullman, Alina Oprea

Introduction

Differential privacy [DMNS06] has become the de facto standard for guaranteeing privacy in machine learning and statistical analysis, and is now being deployed by many organizations including Apple [TVV+17], Google [EPK14, BEM+17, PSM+18], and the US Census Bureau [HMA+17]. Now that differential privacy has moved from theory to practice, there has been considerable attention on optimizing and evaluating differentially private machine learning algorithms, notably differentially private stochastic gradient descent (henceforth, DP-SGD) [SCS13, BST14, ACG+16], which is now widely available in TensorFlow Privacy [Goo]. DP-SGD is the building block for training many widely used private classification models, including feed-forward and convolutional neural networks.

Differential privacy gives a strong worst-case guarantee of individual privacy: a differentially private algorithm ensures that, for any set of training examples, no attacker, no matter how powerful, can learn much more information about a single training example than they could have learned had that example been excluded from the training data. The amount of information is quantified by a privacy parameter $\varepsilon$ .There are several common variants of differential privacy [DKM+06, DR16, BS16, Mir17, BDRS18, DRS19] that quantify the influence of a single example in slightly different ways, sometimes using more than one parameter. For this high-level discussion, we focus on the single, primary parameter $\varepsilon$ . Intuitively, a smaller $\varepsilon$ means stronger privacy protections, but leads to lower accuracy. As such there is often pressure to set this parameter as large as one feels still gives a reasonable privacy guarantee, and relatively large parameters such as $\varepsilon=2$ are not uncommon. However, this guarantee is not entirely satisfying, as such an algorithm might allow an attacker to guess a random bit of information about each training example with approximately 86% accuracy. As such there is often a gap between the strong formal protections promised by differential privacy and the specific quantitative implications of the choice of $\varepsilon$ in practice.

This state-of-affairs is often justified by the fact that our analysis of the algorithm is often pessimistic. First of all, $\varepsilon$ is a parameter that has to be determined by careful analysis, and often existing theoretical analysis is not tight. Indeed a big part of making differentially private machine learning practical has been the significant body of work giving progressively more refined privacy analyses specifically for DP-SGD [ACG+16, DRS19, MTZ19, YLP+19], and for all we know these bounds on $\varepsilon$ will continue to shrink. Indeed, it is provably intractable to determine the tightest bound on $\varepsilon$ for a given algorithm [GM18]. Second, differential privacy is a worst-case notion, as the mechanism might have stronger privacy guarantees on realistic datasets and realistic attackers. Although it is plausible that differentially private algorithms with large values of $\varepsilon$ provide strong privacy in practice, it is far from certain, which makes it difficult to understand the appropriate value of $\varepsilon$ for practical deployments.

Auditing DP-SGD. In this paper we investigate the extent to which DP-SGD,Although our methods are general, in this work we exclusively study the implementation and privacy analysis of DP-SGD in TensorFlow Privacy [Goo]. does or does not give better privacy in practice than what its current theoretical analysis suggests. We do so using novel data poisoning attacks. Specifically, our method starts with a dataset $D$ of interest (e.g. Fashion-MNIST) and some algorithm $\mathcal{A}$ (e.g. DP-SGD with a specific setting of hyperparameters), and produces a small poisoning set $S$ of $k$ points and a binary classifier $T$ such that $T$ distinguishes the distribution $\mathcal{A}(D)$ from $\mathcal{A}(D\cup S)$ with significant advantage over random guessing. If $\mathcal{A}$ were $\varepsilon$ -DP, then $T$ could have accuracy at most $\exp(\varepsilon k)/(1+\exp(\varepsilon k))$ , so if we can estimate the accuracy of $T$ we can infer a lower bound on $\varepsilon$ . While previous work [MZH19] proposed to use this connection between differential privacy and data poisoning as a defense against data poisoning, our use in this context of auditing the privacy of DP-SGD is new.

Specifically, for certain natural choices of hyperparameters in DP-SGD, and standard benchmark datasets (see Figure 2), our attacks give lower bounds on $\varepsilon$ that are approximately 10x better than what we could obtain from previous methods, and are within approximately 10x of the worst-case, analytically derived upper bound. For context, previous theoretical improvements to the analysis have improved the worst-case upper bounds by factors of more than 1000x over the naïve analysis, and thus our results show that we cannot hope for similarly dramatic gains in the future.

Novel Data Poisoning Attacks. We find that existing data poisoning attacks, as well as membership inference attacks proposed by prior work, have poor or trivial performance not only against DP-SGD, but even against SGD with gradient clipping (i.e. rescaling gradients to have norm no larger than some $C$ ). Gradient clipping is an important part of DP-SGD, but does not provide any formal privacy guarantees on its own. Thus, we develop a novel data poisoning attack that is more robust to gradient clipping, and also performs much better against DP-SGD.

Intuitively, data poisoning attacks introduce new points whose gradients will change the model in a certain direction, and the attack impact increases when adding poisoning points of larger gradients. Existing attacks modify the model in a random direction, and have to push far enough that the original distribution on model parameters and the new distribution become distinguishable. To be effective, these attacks use points which induce large gradients, making the attack sensitive to gradient clipping. On the other hand, our attack improves by finding the direction where the model parameters have the lowest variance, and select poisoning points that modify the model in that direction. Therefore, we achieve the same effect of model poisoning with poisoning points of smaller gradients, thereby making the attack more robust to clipping.

The Role of Auditing in DP. More generally, our work takes a quantitative, empirical approach to auditing the privacy afforded by specific implementations of differentially private algorithms. We do not advocate trying to definitively measure privacy of an algorithm empirically, since it’s hopeless to try to anticipate all future attacks. Rather, we believe this empirical approach has the potential to complement and influence analytical work on differential privacy, somewhat analogous to the way cryptanalysis informs the design and deployment of cryptography.

Specifically, we believe this approach can complement the theory in several ways:

Most directly, by advancing the state-of-art in privacy attacks, we can either demonstrate that a given algorithm with a given choice of parameters is not sufficiently private, or give some confidence that it might be sufficiently private.

Establishing strong lower bounds on $\varepsilon$ gives a sense of how much more one could hope to get out of tightening the existing privacy analysis.

Observing how the performance of the attack depends on different datasets, hyperparameters, and variants of the algorithm can identify promising new phenomena to explore theoretically.

Producing concrete privacy violations can help non-experts interpret the concrete implications of specific choices of the privacy parameter.

2 Related Work

DP-SGD. Differentially private SGD was introduced in [SCS13], and an asymptotically optimal analysis of its privacy properties was given in [BST14]. Notably Abadi et al. [ACG+16] gave greatly improved concrete bounds on its privacy parameter, and showed its practicality for training neural networks, making DP-SGD one of the most promising methods for practical private machine learning. There have been several subsequent efforts to refine the privacy analysis of this specific algorithm [MTZ19, DRS19, YLP+19]. A recent work [HT19] gave a heuristic argument that SGD itself (without adding noise to the gradients) satisfies differential privacy, but even then the bounds on $\varepsilon$ are quite large (e.g. $\varepsilon=13.01$ ) for most datasets.

Privacy Attacks. Although privacy attacks have a very long history, the history of privacy attacks against aggregate statistical information, such as machine learning models, goes back to the seminal work of Dinur and Nissim [DN03] on reconstruction attacks. A similar, but easier to implement type of attack, membership inference attacks, was first performed by Homer et al. [HSR+08], and theoretically analyzed in [SOJH09, DSS+15]. Shokri et al. [SSSS17] and Yeom et al. [YGFJ18] gave black-box membership inference algorithms for complex machine learning models. Membership inference attacks are compelling because they require relatively weak assumptions, but, as we show, state-of-the-art membership inference attacks lead to quantitatively weak privacy violations.

More directly related to our work, privacy attacks were recently used by Jayaraman and Evans [JE19] to understand the concrete privacy leakage from differentially private machine learning algorithms, specifically DP-SGD. However, the goal of their work is to compare the privacy guarantees offered by different variants of differential privacy, rather than to determine the level of privacy afforded by a given algorithm. As such, their attacks are quantitatively much less powerful than ours (as we show in Figure 2), and are much further from determining the precise privacy guarantees of DP-SGD.

Differential Privacy and Data Poisoning. Ma et al. [MZH19] and Hong et al. [HCK+20] evaluate the effectiveness of data poisoning attacks on differentially private machine learning algorithms. Ma et al. consider both the output perturbation and objective perturbation algorithms for learning ridge regression and logistic regression models, proposing attacks on differentially private algorithms, and also argue that differentially privacy can serve as a defense for poisoning attacks. Hong et al. propose differential privacy as a defense for poisoning attacks, showing that DP-SGD performs effectively at defending against existing poisoning attacks in the literature. While differential privacy provides a provable defense for poisoning attacks, our intuition is that the strong poisoning attacks we design allow us to measure a lower bound on the privacy offered by differentially private algorithms.

Automated Discovery of Privacy Parameters. Two works have focused on automatically discovering (upper or lower bounds on) privacy parameters. [GM18] showed that determining the exact privacy level using black-box access to the algorithm is prohibitively expensive. In the white-box setting, Ding et al. [DWW+18] used a clever combination of program analysis and random sampling to identify violations of $\varepsilon$ -DP, although their methods are currently limited to simple algorithms. Moreover the violations of DP they identify may not correspond to realistic attacks.

(Measuring) Differential Privacy

We begin by outlining differential privacy and one of its relevant properties: group privacy. We consider machine learning classification tasks, where a dataset consists of many samples from some domain $\mathcal{D}=\mathcal{X}\times\mathcal{Y}$ , where $\mathcal{X}$ is the feature domain, and $\mathcal{Y}$ the label domain. We say two datasets $D_{0},D_{1}$ differ on $k$ rows if we can replace at most $k$ elements from $D_{0}$ to produce $D_{1}$ .

An algorithm $\mathcal{A}:\mathcal{D}\mapsto\mathcal{R}$ is $(\varepsilon,\delta)$ -differentially private if for any two datasets $D_{0},D_{1}$ which differ on at most one row, and every set of outputs $\mathcal{O}\subseteq\mathcal{R}$ :

where the probabilities are taken only over the randomness of $\mathcal{A}$ .

Let $D_{0},D_{1}$ be two datasets differing on at most $k$ rows, $\mathcal{A}$ is an $(\varepsilon,\delta)$ -differentially private algorithm, and $\mathcal{O}$ an arbitrary output set. Then

Group privacy will give guarantees for poisoning attacks that introduce multiple points.

DP-SGD. The most prominent differentially private mechanism for training machine learning models is differentially private stochastic gradient descent (DP-SGD) [SCS13, BST14, ACG+16]. DP-SGD makes two modifications to the standard SGD procedure:

clipping gradients to a fixed maximum norm $C$ , and adding noise to gradients with standard deviation $\sigma C$ , for a given $\sigma$ , as shown in Algorithm 1. Given the hyperparameters – clipping norm, noise magnitude, iteration count, and batch size – one can analyze DP-SGD to conclude that it satisfies $(\varepsilon,\delta)$ -differential privacy for some parameters $\varepsilon,\delta\geq 0$ .

2 Statistically Measuring Differential Privacy

In this section we describe our main statistical procedure for obtaining lower bounds on the privacy parameter for a given algorithm $\mathcal{A}$ , which functions differently from the membership inference attack used in prior work ([SSSS17, JE19] and described in Appendix D). Here, we describe the procedure generally, in the case where $\delta=0$ ; in Appendix A, we show how to adapt the procedure for $\delta>0$ , and in Section 3, we discuss how we instantiate it in our work. Given a learning algorithm $\mathcal{A}$ , we construct two datasets $D_{0}$ and $D_{1}$ differing on $k$ rows, and some output set $\mathcal{O}$ . We defer the discussion of constructing $D_{0}$ , $D_{1}$ , and $\mathcal{O}$ to Section 3. We also wish to bound the probability that we incorrectly measure $\varepsilon_{\mathit{LB}}$ by a small value $\alpha$ . From Equation (2), observe that by estimating the quantities $p_{0}=\Pr[\mathcal{A}(D_{0})\in\mathcal{O}]$ and $p_{1}=\Pr[\mathcal{A}(D_{1})\in\mathcal{O}]$ , we can compute the largest $\varepsilon_{\mathit{LB}}$ such that Equation (2) holds. With $\delta=0$ , this simplifies to $\varepsilon_{\mathit{LB}}=\ln(p_{0}/p_{1})$ . This serves as an estimate of the leakage of the private algorithm, but requires estimating $p_{0}$ and $p_{1}$ accurately.

For an arbitrary algorithm, it’s infeasible to compute $p_{0},p_{1}$ precisely, so we rely on Monte Carlo estimation, by training some fixed $T$ number of times. However, this approach incurs statistical uncertainty, which we correct for by using Clopper Pearson confidence intervals [CP34]. That is, to ensure that our estimate $\varepsilon_{\mathit{LB}}$ holds with probability $>1-\alpha$ , we find a Clopper Pearson lower bound for $p_{1}$ that holds with probability $1-\alpha/2$ , and an upper bound for $p_{0}$ holding with probability $1-\alpha/2$ . Qualitatively, we can be confident that our lower bound on privacy leakage $\varepsilon^{\prime}$ holds with probability $1-\alpha$ . This procedure is outlined in Algorithm 2, and we prove its correctness in Theorem 2.

When provided with black box access to an algorithm $\mathcal{A}$ , two datasets $D_{0}$ and $D_{1}$ differing on at most $k$ rows, an output set $\mathcal{O}$ , a trial number $T$ and statistical confidence $\alpha$ , if Algorithm 2 returns $\varepsilon_{\mathit{LB}}$ , then, with probability $1-\alpha$ , $\mathcal{A}$ does not satisfy $\varepsilon^{\prime}$ -DP for any $\varepsilon^{\prime}<\varepsilon_{\mathit{LB}}$ .

We stress that when we say $\varepsilon_{\mathit{LB}}$ is a lower bound with probability $1-\alpha$ , this is only over the randomness of the Monte Carlo sampling, and is not based on any modeling or assumptions. We can always move our confidence closer to $1$ by taking $T$ larger.

First, the guarantee of the Clopper-Pearson confidence intervals is that, with probability at least $1-\alpha$ , $\hat{p}_{0}\leq p_{0}$ and $\hat{p}_{1}\geq p_{1}$ , which implies $p_{0}/p_{1}\geq\hat{p}_{0}/\hat{p}_{1}$ . Second, if $\mathcal{A}$ is $\varepsilon$ -DP, then by group privacy we would have $p_{0}/p_{1}\leq\exp(k\varepsilon)$ , meaning $\mathcal{A}$ is not $\varepsilon^{\prime}$ -DP for any $\varepsilon^{\prime}<\frac{1}{k}\ln(p_{0}/p_{1})$ . Combining the two statements, $\mathcal{A}$ is not $\varepsilon^{\prime}$ for any $\varepsilon^{\prime}<\frac{1}{k}\ln(\hat{p}_{0}/\hat{p}_{1})=\varepsilon_{\mathit{LB}}$ . ∎

The $\varepsilon_{\mathit{LB}}$ reported by Algorithm 2 has two fundamental upper bounds, the provable $\varepsilon_{\mathit{th}}$ , and an upper bound, $\varepsilon_{OPT}(T,\alpha)$ , imposed by Monte Carlo estimation. The first upper bound is natural: if we run the algorithm on some $\mathcal{A}$ for which the $\varepsilon$ we can prove is $\varepsilon_{\mathit{th}}=1$ , then $\varepsilon_{\mathit{LB}}\leq\varepsilon_{\mathit{th}}=1$ . To understand $\varepsilon_{OPT}(T,\alpha)$ , suppose we run 500 trials, and desire $\alpha=0.01$ . The best possible performance is if we get perfect inference accuracy and $k=1$ , where $\text{ct}_{0}=500$ and $\text{ct}_{1}=0$ . The Clopper Pearson confidence interval produces $\hat{p}_{0}=0.989,\hat{p}_{1}=0.011$ , which gives $\varepsilon_{\mathit{LB}}=4.54/k=4.54$ . Then, with $99\%$ probability, the true $\varepsilon$ is at least $4.54$ , and $\varepsilon_{OPT}(T,\alpha)=4.54$ .

We remark that the above procedure only demonstrates that $\mathcal{A}$ cannot be strictly better than $(\varepsilon_{\mathit{LB}},0)$ -DP, but allows for it to be $(\varepsilon_{\mathit{LB}}/2,\delta)$ -DP for very small $\delta$ . However, in our work, $\hat{p}_{0},\hat{p}_{1}$ turn out never to be too close to , so these differences have little effect on our findings. In Appendix A, we formally discuss how to modify this algorithm for $(\varepsilon,\delta)$ -DP for $\delta>0$ . We also show when we can increase $\varepsilon_{\mathit{LB}}$ by considering the maximum upper bounds of the original output set $\mathcal{O}$ and its complement $\mathcal{O}^{C}$ .

Poisoning Attacks

We now show how to use poisoning attacks to run Algorithm 2. Intuitively, we begin with a dataset $D_{0}$ and replace $k$ rows with poisoning points to form $D_{1}$ ; we then use the impact of poisoning as an output set $\mathcal{O}$ . We start with existing backdoor attacks [GDGG17], and then propose a more effective clipping-aware poisoning attack.

In a poisoning attack, an adversary replaces $k$ data points from a training dataset $D$ of $n$ points. The poisoned training dataset is provided as input to the training algorithm, which releases a model $f$ that minimizes a loss function $\mathcal{L}(f,D)$ on its given dataset $D$ .

We focus on a specific type of poisoning attack, called a backdoor attack [GDGG17]. In a backdoored model, the performance on natural data is maintained, but, by adding a small perturbation to a data point $x$ into $\mathit{Pert}(x)$ , the adversary changes the predicted class of the perturbed data. These attacks have been developed for image datasets. In the original attack [GDGG17], described in Algorithm 3, the perturbation function $\mathit{Pert}(\cdot)$ adds a pattern in the corner of an image. The poisoning attack takes natural data $(x,y)$ , perturbs the image to $\mathit{Pert}(x)$ , and changes the class to some $y_{p}$ . The objective is to decrease the loss on $(\mathit{Pert}(x),y_{p})$ values from the perturbed test set.

2 Clipping-Aware Poisoning

DP-SGD makes two modifications to the learning process to preserve privacy: clipping gradients and adding noise. Clipping provides no formal privacy on its own, but many poisoning attacks perform significantly worse in the presence of clipping. Indeed, the basic backdoor attack from Section 3.1 results in a fairly weak lower bound of at most $\varepsilon_{\mathit{LB}}=0.11$ using the Fashion-MNIST dataset, even with no added noise (which has an $\varepsilon_{\mathit{th}}=\infty$ ). To improve this number, we must make the poisoning attack sufficiently robust to clipping.

To understand existing backdoor attacks’ difficulty with clipping, consider clipping’s impact on logistic regression. The gradient of model parameters $w$ with respect to a poisoning point $(x_{p},y_{p})$ is

We can minimize this variance with respect to the poisoning point $x_{p}$ by using the singular value decomposition: selecting $x_{p}$ as the singular vector corresponding to the smallest singular value (i.e. the direction of least variance), and scale $x_{p}$ to a similar norm to the rest of the dataset. We select $y_{p}$ to be the smallest probability class on $x_{p}$ . We then insert $k$ copies of the poisoning point $(x_{p},y_{p})$ . We call this approach ClipBKD, detailed in Algorithm 4. We prove in Appendix B that when we run ClipBKD (modified for regression tasks) to estimate the privacy of the output perturbation algorithm, we obtain $\varepsilon_{\mathit{LB}}$ within a small factor of the upper bound $\varepsilon_{\mathit{th}}$ , giving evidence that this attack is well suited to our application in differential privacy. In Appendix C, we describe how to adapt ClipBKD to transfer learning from a pre-trained model.

For both standard and clipping-aware backdoors, we generate $D_{0},D_{1}$ with a given poisoning size $k$ , using functions Backdoor or ClipBkd, respectively. Then the test statistic is whether the backdoored points are distinguishable by a threshold on their loss (i.e., output set $\mathcal{O}$ is whether BkdTest or ClipBkdTest return “Backdoored”). We first run an initial phase of $T$ trials to find a good threshold $Z$ for the test functions. We then run another $T$ trials in Algorithm 2 to estimate $\hat{p_{0}}$ and $\hat{p_{1}}$ based on either the BkdTest or the ClipBkdTest test statistic.

Experiments and Discussion

We evaluate both membership inference (MI, as used by [YGFJ18] and [JE19] and described in Appendix D) and our algorithms on three datasets: Fashion-MNIST (FMNIST), CIFAR10, and Purchase-100 (P100). For each dataset, we consider both a logistic regression (LR) model and a two-layer feedforward neural network (FNN), trained with DP-SGD using various hyperparameters:

FMNIST [XRV17] is a dataset of 70000 28x28 pixel images of clothing from one of 10 classes, split into a train set of 60000 images and a test set of 10000 images. It is a standard benchmark dataset for differentially private machine learning. To improve training speed, we consider a simplified version, using only classes 0 and 1 (T-shirt and trouser), and downsample so each class contains 3000 training and 1000 testing points. CIFAR10 [Kri09] is a harder dataset than FMNIST, consisting of 60000 32x32x3 images of vehicles and animals, split into a train set of 50000 and a test set of 10000. For training speed, we again take only class 0 and 1 (airplane and automobile), making our train set contain 10000 samples, and the test set 2000 samples. When training on CIFAR10, we follow standard practice for differential privacy and fine-tune the last layer of a model pretrained nonprivately on the more complex CIFAR100, a similarly sized but more complex benchmark dataset [PCS+20]. P100 [SSSS17] is a modification of a Kaggle dataset [Pur], with 200000 records of 100 features, and 100 classes. The features are purchases, and the classes are user clusters. Following [JE19], we subsample the dataset so it has 10000 train records and 10000 test records.

Our techniques are general, and could be applied to any dataset-model pair to identify privacy risks for DP-SGD. Examining these six dataset-model pairs demonstrates that our technique can be used to identify new privacy risks in DP-SGD, and a comprehensive empirical study is not our focus.

Model Size. The two-layer feedforward neural networks all have a width of 32 neurons. For CIFAR10, the logistic regression model and feedforward neural network are added on top of the pretrained convolutional neural network.

Computing Thresholds. In order to run Algorithm 2, we need to specify $D_{0},D_{1}$ and $\mathcal{O}$ . We’ve described how to use poisoning to compute $D_{0},D_{1}$ , and how the test statistics for these attacks are constructed, assuming a known threshold. To produce this threshold, we train 500 models on the unpoisoned dataset and 500 models on the poisoned dataset, and identify which of the resulting 1000 thresholds produces the best $\varepsilon_{\mathit{LB}}$ , using Algorithm 2.

2 Results and Discussion

Figure 2 presents a direct comparison of the privacy bounds produced by ClipBKD (our attack), the standard backdoor attack, and MI. As standard backdoor attacks only exist for images, we only report results on them on FMNIST and CIFAR10. The pattern we choose for backdoor attacks is a 5x5 white square in the top-left corner of an image. For ClipBKD, we use $T=500$ trials and confidence level $\alpha=0.01$ (i.e., our Monte Carlo estimates hold with 99% confidence) and report the best result from $k=1,2,4,8$ poisoning points. Results for MI use 1000 samples, and average over 10 trained models. For context, we display the best theoretical upper bound on $\varepsilon_{\mathit{th}}$ and also $\varepsilon_{\mathit{OPT}}(500,0.01)$ , which is the best value of $\varepsilon_{\mathit{LB}}$ that we could hope to produce using $T$ trials and confidence level $\alpha$ .

For every dataset and model, we find that ClipBKD significantly outperforms MI, by a factor of 2.5x–1500x. As a representative example, for $\varepsilon_{\mathit{th}}=4$ on Purchase-100 with 2-layer neural networks, ClipBKD gives an $\varepsilon_{\mathit{LB}}$ of 0.46, while MI gives $\varepsilon_{\mathit{LB}}$ of 0.04, an improvement of 12.1x. We also find ClipBKD always improves over standard backdoors: on FMNIST by an average factor of 3.84x, and standard backdoors never reach positive $\varepsilon_{\mathit{LB}}$ on CIFAR, due to the large number of points required to poison the pretrained model. We also find that ClipBKD returns $\varepsilon_{\mathit{LB}}$ that are close to $\varepsilon_{\mathit{th}}$ ; for finite $\varepsilon_{\mathit{th}}$ , the majority of gaps are a factor of $<12.3$ x, reaching as low as 6.6x. For example, on Purchase-100, when $\varepsilon_{\mathit{th}}=4$ , we find that ClipBKD returns an $\varepsilon_{\mathit{LB}}$ of 0.46, a gap of 8.7x.

Sensitivity to Hyperparameters. We also give a more thorough evaluation of ClipBKD’s performance as a function of DP-SGD’s hyperparameters. We vary clipping norm between 0.5, 1, and 2 and vary the noise to ensure $\varepsilon_{\mathit{th}}$ between 1, 2, 4, 8, 16, and $\infty$ . We also vary the initialization randomness between random normal initializations with variance (fixed initialization), $0.5\sigma$ , $\sigma$ , and $2\sigma$ , where $\sigma$ is the variance of Glorot normal initialization. Table 2 reports the best $\varepsilon_{\mathit{LB}}$ produced by the attack over $k=1,2,4,8$ . Our best measured values of $\varepsilon_{\mathit{LB}}$ occur when initialization is fixed, and are within a 4.2-7.7x factor of $\varepsilon_{\mathit{th}}$ , speaking to the effectiveness of ClipBKD. When $\varepsilon_{\mathit{th}}=\infty$ and the initialization is fixed, we achieve perfect inference accuracy, matching $\varepsilon_{OPT}(500,0.01)=4.54$ .

These experiments reveal three intuitive trends. First, as $\varepsilon_{\mathit{th}}$ increases (equivalently, the noise level decreases), $\varepsilon_{\mathit{LB}}$ also increases. Second, as the initialization randomness decreases, $\varepsilon_{\mathit{LB}}$ increases. All existing analyses of DP-SGD give privacy upper bounds for any fixed initialization, and our results suggest that initial randomization might play a significant role. Finally, as clipping norm decreases, $\varepsilon_{\mathit{LB}}$ decreases, except when the initialization is fixed. In fact, our results show that $\varepsilon_{\mathit{LB}}$ is more sensitive to the clipping norm than the amount of noise. All existing analyses of DP-SGD consider only the noise multiplier $\sigma_{\mathit{GD}}$ but not the clipping norm, but the role of the clipping norm itself seems highly significant.

We emphasize that for every choice of hyperparameters, the training accuracy is 96–98%, so the algorithm has comparable utility, but potentially very different privacy and robustness to poisoning, as we vary these parameters. We believe these phenomena deserve further study.

Conclusion and Future Directions

We use novel poisoning attacks to establish strong limits on the privacy of specific differentially private algorithms, namely DP-SGD. We establish that the worst-case privacy bounds for this algorithm are approaching their limits. Our findings highlight several questions for future exploration:

How much can our attacks be pushed quantitatively? Can the gap between our lower bounds and the worst-case upper bounds be closed?

Can we incorporate additional features into the privacy analysis of DP-SGD, such as the specific gradient-clipping norm, and the amount of initial randomness?

How realistic are the instances produced by our attacks, and can we extend the attacks to give easily interpretable examples of privacy risks for non-experts?

Although there is no hope of determining the precise privacy level of a given algorithm in a fully empirical way, we believe our work demonstrates how a quantitative, empirical approach to privacy attacks can effectively complement analytical work on privacy in machine learning.

Acknowledgments

JU is supported by NSF grants CCF-1750640, CNS-1816028, and CNS-1916020, and a Google Faculty Research Award. This research was also sponsored by a Toyota ITC research award and the U.S. Army Combat Capabilities Development Command Army Research Laboratory under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

Appendix A Extending Algorithm 2.

Notice that Algorithm 2 holds for $(\varepsilon,0)$ -differential privacy. However, this is only for simplicity—the group privacy guarantee of $(\varepsilon,\delta)$ -differential privacy can be converted to a similar procedure. Write $x=\exp(\varepsilon)$ in the group privacy guarantee for Equation 2, and rearrange to provide $p_{1}x^{k+1}-(p_{1}-\delta)x^{k}-p_{0}x+(p_{0}-\delta)\geq 0$ . We can solve for $x$ here using a root-finding algorithm to find $x$ , and computing $\varepsilon_{\mathit{LB}}=\ln(x)$ . Theorem 2 can be easily extended to this case.

Notice that differential privacy makes a guarantee for all output sets, including the complement $\mathcal{O}^{C}$ ; if $\Pr[A(D)\in\mathcal{O}]=p$ , then $\Pr[A(D)\in\mathcal{O}^{C}]=1-p$ . If, upon computing $p_{0},p_{1}$ , we can compute a larger $\varepsilon_{\mathit{LB}}$ by using $\mathcal{O}^{C}$ , this requires no extra trials.

For example, suppose $\delta=0$ , $k=1$ , $p_{1}=0.8$ , and $p_{0}=0.4$ . Here, $\varepsilon_{LB}=\ln(p_{1}/p_{0})/1=\ln(2)$ . If, instead, we replace $\mathcal{O}$ by $\mathcal{O}^{C}$ , $\varepsilon_{LB}=\ln((1-p_{0})/(1-p_{1}))/1=\ln(0.6/0.2)=\ln(3)$ . In Lemma 3, we show when this technique improves $\varepsilon_{LB}$ : when $p_{1}>p_{0}+k\delta$ and $p_{0}+p_{1}>1$ . We use this modification in all of our experiments.

If $p_{1}>p_{0}+k\delta$ and $p_{0}+p_{1}>1$ , then the largest root of $f_{0}(x)=p_{1}x^{k+1}-(p_{1}-\delta)x^{k}-p_{0}x+(p_{0}-\delta)$ is smaller than the largest root of $f_{1}(x)=(1-p_{0})x^{k+1}-(1-p_{0}-\delta)x^{k}-(1-p_{1})x+(1-p_{1}-\delta)$ .

Write $x_{0}$ the largest root of $f_{0}(x)$ , and $x_{1}$ the largest root of $f_{1}(x)$ . We show this in two parts: first, we show that, for all $p_{0},p_{1}$ , $f_{0}(x)$ has a root $x>1$ when $p_{1}>p_{0}+k\delta$ , after which it is monotonically increasing. This shape is evident in Figure 3. This provides a nonzero $\varepsilon_{LB}$ . Then we show that this $\varepsilon_{LB}$ must be smaller for $f_{1}(x)$ when $p_{0}+p_{1}>1$ , because $f_{0}(x)-f_{1}(x)>0$ when $x>1$ . This ensures that $f_{1}(x_{0})<0$ , and so the root $x_{1}>x_{0}$ .

We begin by showing that $f_{0}(x)$ has a single root $x>1$ . First, notice that $f_{0}(1)=0$ . We then analyze the derivative, showing that it starts negative, has a root, and then is always positive. This indicates that there can only be one root.

This has a root $x>1$ if $x^{k}p_{0}+k\delta x^{k-1}-p_{1}<0$ , so we require $p_{0}+k\delta-p_{1}<0$ . Notice too that $f_{0}^{\prime}(x)$ is monotonically increasing at $x>1$ . This ensures that it has only one root $x>1$ . This argument holds, too, for $f_{1}(x)$ , as if $p_{0}+k\delta-p_{1}<0$ , then $(1-p_{1})+k\delta-(1-p_{0})<0$ .

Now that we know both $f_{0}$ and $f_{1}$ only have a single root, and they are both increasing at that root, we just need to show that $f_{1}(x_{0})<0$ , as this will ensure $x_{1}>x_{0}$ . We do this by showing that $\forall x>1$ , $f_{0}(x)-f_{1}(x)>0$ . First, write

The $\delta$ terms cancel, and we can factor the above into

This is always positive when $x>1$ and $p_{0}+p_{1}>1$ . ∎

Appendix B Analysis of Backdoor Poisoning-based Auditing

We now provide formal evidence for the effectiveness of backdoor poisoning attacks in auditing differentially private algorithms with a case study on linear regression. To the best of our knowledge, this is also the first formal analysis of backdoor poisoning attacks for a concrete learning algorithm.

where $\sigma_{d}$ is the smallest singular value of $X$ .

The output perturbation algorithm for Ridge regression, with $(\varepsilon,\delta)$ -DP, adds Gaussian noise with variance $\sigma^{2}=2\ln(1.25/\delta)(2/\lambda)^{2}/\varepsilon^{2}$ to the optimal parameters $w$ .

Letting $c=\frac{0.5}{\lambda+\sigma_{d}+1}$ , the probability of success for this distinguisher is $\Pr[\mathcal{N}(0,\sigma^{2})<c],$ which gives an $\varepsilon$ lower bound of

We can lower bound $\Pr[0<\mathcal{N}(0,\sigma^{2})<c]$ using the following integral approximation:

By its Maclaurin series, $\ln\mathopen{}\mathclose{{}\left(\frac{0.5+x}{0.5-x}}\right)\geq 4x$ . Then we can compute our lower bound on $\varepsilon$ to be

so the attack differs by a constant factor from the provable $\varepsilon$ .

Appendix C ClipBKD with Pretrained Models

State-of-the-art differentially private CIFAR10 models use transfer learning from a fixed pretrained CIFAR100 convolutional neural network. We call the pretrained model function $f_{0}$ , which is never updated during training. Training produces a $f_{1}$ , such that the entire prediction model is $f(x)=f_{1}(f_{0}(x))$ .

This makes ClipBKD not directly applicable, as ClipBKD requires access to the input of the trained model $f_{1}$ . Then we must try to produce some $x$ such that $f_{0}(x)=h_{p}$ , where $h_{p}$ is produced by SVD in Algorithm 3. This is not in general possible, so we instead use gradient descent to optimize the combination of two loss functions on $x$ .

Our first loss function incentivizes decreasing $h_{p}\cdot v$ for high-variance directions $v\in V_{high}$ from SVD. This ensures the gradient will not move in SGD’s noisy directions. Our second loss function incentivizes increasing $h_{p}\cdot v$ for low-variance directions $v\in V_{low}$ from SVD, ensuring the gradient is distinguishable in low variance directions. Putting these together, we produce $x_{p}$ by optimizing the following loss function:

We perform this optimization by projected gradient descent, running 10000 iterations with a learning rate of 1.

Appendix D Membership Inference

In membership inference [YGFJ18], an adversary is given a model $f$ and its training loss $c$ and seeks to understand whether a given data point $(x,y)$ was used to train the model. Although alternative formulations have been proposed [SSSS17], we focus on the one proposed by [YGFJ18]. Intuitively, the attack relies on a generalization gap: the loss on training data should be smaller than the loss on test data. The algorithm is provided a set of $2n$ elements, $n$ of which were used for training, and $n$ not used for training, and predicts that any sample with loss lower than the training loss. The accuracy of these predictions is bounded by $\tfrac{\exp(\varepsilon)}{1+\exp(\varepsilon)}$ for any $\varepsilon$ -differentially private algorithm, so we can produce a lower bound $\varepsilon_{\mathit{LB}}$ from it. The algorithm is provided in Algorithm 6.