A General Framework for Auditing Differentially Private Machine Learning

Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott Zaresky-Williams, Edward Raff, Francis Ferraro, Brian Testa

Introduction

Machine learning (ML) is increasingly deployed in contexts where the privacy of the data used to build the model is of concern. ML models are capable of ingesting enormous quantities of data in order to determine meaningful predictive patterns. When such models are built on potentially sensitive inputs such as healthcare or financial data , it becomes important to determine to what extent information from the training dataset can be inferred from the model. Such information can be of personal interest when the identity of a data contributor may be compromised based on involvement in the data collection process , but it can also lead to regulatory concern about whether a model trained on private data should itself be considered private .

Differentially private (DP) machine learning is a theoretically rigorous approach that provides a worst-case privacy guarantee for ML algorithms . Specifically, it provides a mathematical guarantee that the distribution of possible output models based on the input dataset is not significantly altered whether an individual data point is included in the training set or not. This is accomplished by randomizing the output model in a specific manner so that the distributions of the model trained with or without any individual data point are statistically indistinguishable up to a level specified by parameters $\varepsilon,\delta$ that must be set appropriately. While DP mathematically certifies the privacy of a mechanism, the noise added is generally tailored to the worst case scenario.

Previous results attempting to assess the privacy leakage of such models, using attacks ranging from membership inference to model inversion, have inferred that the actual detectable risk of a model procedure is lower than the theoretical bound . While a recent study on DP neural networks indicates that the privacy guarantee of the DP-SGD algorithm is tight under the strongest adversary , the privacy risk under realistic datasets and attackers without complete access to the training process still lies below the worst case bound. Across other models, the question remains of whether the privacy violation of a mechanism is lower than specified or whether the attacks previously used are not strong enough. Given that there are no guidelines on how to set $\varepsilon,\delta$ in practice, it would be useful for practitioners to estimate the actual risk they incur in practice and set the parameters accordingly. This is critical when DP for ML is still a nascent field, with limited implementations that can be hard to verify.

Recent works have shown that simple DP mechanisms can be audited effectively . These mechanisms generally map an input vector to a perturbed output vector, and are often used as building blocks of more complicated DP algorithms such as machine learning models. Examples include the Laplace and exponential mechanisms which are respectively used in DP Naive Bayes and random forest . The auditing procedure involves searching for optimal neighboring input sets $a,a^{\prime}$ and sampling the DP outputs $\mathcal{M}(a),\mathcal{M}(a^{\prime})$ , to get a Monte Carlo estimate of $\varepsilon$ . To extend these techniques to audit a machine learning model, the vector inputs $a,a^{\prime}$ are replaced with datasets $D,D^{\prime}$ which differ by a single entry. Working in this realm raises important challenges. First, previously effective search methods for neighboring inputs involving enumeration or symbolic search are impossible over large datasets, making it difficult to find optimal dataset pairs. In addition, Monte Carlo estimation requires costly model retraining thousands of times to bound $\varepsilon$ with high confidence.

We propose solutions to these problems to accommodate the general auditing approach to machine learning algorithms. For the first concern, we develop a set of data poisoning attacks which perturb a single point of dataset $D$ , approximating a worst-case neighboring dataset $D^{\prime}$ specific to $\mathcal{M}$ . To address the second, we present a statistical framework combining elements of Bichsel et al. and Jagielski et al. to estimate $\varepsilon$ for general ML procedures for smaller sample sizes. Our techniques involve improved estimation bounds and, more significantly, learning probabilities over general model spaces, while previous works can only estimate probabilities over a single output (the predictive probability).

These novel contributions form our framework ML-Audit for auditing the DP of arbitrary machine learning models. Compared to prior works we can obtain orders of magnitude improvement in the estimation of $\varepsilon$ that are effective across a larger range of $\varepsilon$ , more datasets, and more model types. Our framework is compatible with prior model-specific approaches (small neural networks), obtaining similar or improved efficacy. Further still, we provide case studies where our framework detects potential violations in existing implementations of DP machine learning.

We review these prior approaches in section 2, and introduce key DP concepts in section 3. Then we present our framework in section 4, followed by results and discussion in section 5. Finally we conclude in section 6.

Related work

Although differential privacy of a mechanism is established with mathematical proof, previous work has shown that holes can exist, either in theory, such as the sparse vector mechanism (of which erroneous versions have been proposed ), or implementation, for example sampling non-uniformity in pseudo-random number generators . Such occurrences point to the need for empirical verification of the promised security of differential privacy mechanisms. This is critical as lapses in DP are likely to provide greater predictive performance, so choosing a model to obtain maximal accuracy given a chosen level of privacy is likely to select these errant models.

Recent works propose efficient solutions for auditing simple differential privacy mechanisms operating on scalar or vector inputs . In such cases, the neighboring inputs can be enumerated tractably (e.g. for a vector input, trying adding one to each input element). For each neighboring pair, the appropriate output set is determined, and Monte Carlo probabilities are measured to determine privacy. The sampling process in such mechanisms is vectorized to greatly reduce the runtime . These works develop a rigorous statistical framework for DP auditing which serves as the basis of current ML auditing approaches, including our work.

Few studies have attempted to audit DP machine learners. We are aware of two works that measure the privacy of neural networks trained with the DP-SGD mechanism . Jagielski et al. develops a poisoning attack (ClipBKD) by constructing a data point along the axis of least variance in the dataset. This causes the gradients of the point to be distinguishable, mitigating the effect of the DP-SGD clipping and noise. Since this attack is model-agnostic, we adopt it as a baseline for all our models. We show considerably improved performance with our method and that the least variance approach suffers as the sphericity of the dataset increases.

The second work analyzes DP-SGD through a series of increasingly white-box attacks . They note that while the DP-SGD privacy guarantee is formulated for all neighboring datasets, the mechanism itself operates purely via gradient manipulation. Therefore they are able to formulate stronger attacks which directly target the gradient or make use of adaptive poisoning. While these are not usable for general-purpose ML auditing, we implement their poisoning attack, based on adversarial perturbation, for our DP-SGD experiments. We find that this attack performs comparably with ClipBKD on DP-SGD.

While their works are important progenitors to ours, they estimate the privacy loss by thresholding a single scalar output, e.g. the loss on a specific poisoning point. This is a major bottleneck limiting the ability to audit other machine learning algorithms. Instead of a scalar, our approach learns distributions over any vector representation of a model, allowing us to be the first to propose a general procedure for auditing any DP learner.

Another predecessor established the complementary relationship between data poisoning and differential privacy, using a poisoning attack based on gradient ascent of logistic regression parameters $\theta$ with respect to $x$ to reach a target $\theta^{\star}$ . The result has similar form to a perturbation influence function which we discuss . However, the aim of their study is to evaluate the effect of differential privacy on poisoning strength rather than vice versa. In addition, their attack formulation requires manual gradient computation which does not readily extend to other models. We include a form of their attack to compare in our logistic regression experiments.

Differential privacy overview

We provide a background on key elements of differential privacy relevant to our work.

A randomized learner $\mathcal{M}:\mathcal{D}\times\mathcal{B}\mapsto\Theta$ is a function mapping a training dataset $D\in\mathcal{D}$ and auxiliary noise $b\in\mathcal{B}$ to an output model $\theta\in\Theta$ .

A randomized learner $\mathcal{M}$ is considered $(\varepsilon,\delta)$ -differentially private if for all datasets $D,D^{\prime}\in\mathcal{D}$ differing in a single entry and measurable sets $S$ over the output space $\Theta$ ,

In this definition, the $\varepsilon$ term is the primary quantity of interest because it bounds the ratio between model output probabilities when the original dataset $D$ is perturbed by a single row. Setting $\varepsilon=\delta=0$ would enforce that $M(D)$ equals $M(D^{\prime})$ in distribution, while a large $\varepsilon$ or $\delta$ permits the distributions to be arbitrarily different. As $\varepsilon\to\infty$ the model distribution converges to $\mathcal{M}_{\infty}(D)$ , which in most cases is identical to the standard non-private version of the model. The $\delta$ term permits some additive slack in the probability bound, but in most purposes is set to an $o(1/n)$ quantity.

Given $\varepsilon$ and $\delta$ , $\mathcal{M}$ calibrates the noise distribution to the sensitivity of the model function .

Our main goal is to audit the privacy of a known mechanism whose inner workings may be hidden, so we assume in our work that the user has access to the mechanism, the privacy parameters, and the dataset, but does not have control over the training process or the randomness $b$ .

The final tool which we will employ in our framework is group privacy, which iterates over Eq. 1 to bound the probabilities when $D,D^{\prime}$ differ by multiple points.

Suppose $\mathcal{M}$ is $(\varepsilon,\delta)$ -private and $D,D^{\prime}$ differ by $k$ rows. Then for all measurable $S\subset\Theta$ ,

Auditing framework

A DP mechanism calibrates its randomization to a user-specified $\varepsilon_{th}$ representing the desired theoretical level of privacy. The goal of DP auditing is to empirically assess the actual privacy enabled by the mechanism, $\varepsilon^{\star}$ . To do this, we determine a lower bound for $\varepsilon^{\star}$ with high statistical confidence. Observe that rewriting the definition of DP gives

For any given $(D,D^{\prime},S)$ we can estimate the quantity with Monte Carlo by retraining $\mathcal{M}$ to generate samples and then compute a 95% confidence interval. The lower bound of the interval $\hat{\varepsilon}_{lb}$ is the detected privacy violation, and with high probability, we state that $\varepsilon^{\star}\geq\hat{\varepsilon}_{lb}$ with $(D,D^{\prime},S)$ as a witness. By maximizing $\hat{\varepsilon}_{lb}$ over all possible witnesses, we increase the lower bound on $\varepsilon^{\star}$ which we can then compare against the promised $\varepsilon_{th}$ . Given a mechanism $\mathcal{M}$ , the auditing objective is then to find

that is, the maximum privacy loss ratio over neighboring datasets $D,D^{\prime}$ and measurable output set $S$ . For all mechanisms considered in this work $\delta<10^{-4}$ which is not detectable, so we simplify by setting $\delta=0$ .

A successful auditing approach requires solutions to the two separate maximizations. We maximize over $D,D^{\prime}$ by developing a toolkit of poisoning attacks which have the largest influence on the mechanism in question. Once $D$ and $D^{\prime}$ are obtained, we fit a likelihood ratio-based optimization to determine an appropriate $S$ . In the following, we discuss these procedures in detail.

Given a dataset $D=(X,Y)$ we define a valid poisoning attack to be an operation which selects an existing data point $(x,y)$ and perturbs $x$ to any new point $x^{\star}$ within a constraint set $\mathcal{C}$ . As we consider classification algorithms in this work, the class label $y$ can also be switched. The constraint $\mathcal{C}$ is specific to each DP algorithm and is used to determine the max sensitivity bound from Def. 3.3 (as otherwise a single data point can arbitrarily affect the model). Without loss of generality, we set the constraint to be the smallest containing the training data.

This is the only part of the framework which needs user design, as a strong attack against one mechanism may not be effective against another. However we distill the steps for constructing such attacks into a recipe:

The underlying model of $\mathcal{M}$ can be abstract, requiring a summary function $\tau$ to embed $\mathcal{M}$ into a vector space. Since post-processing of DP outputs does not decrease privacy , we can select $\tau$ to preserve as much information about $\mathcal{M}$ as possible. In logistic regression the transformation is natural: we define $\tau\circ\mathcal{M}$ as the coefficient vector $\theta$ (Table 1). To reduce notation overload we implicitly apply $\tau$ when we discuss empirical samples $z\sim\mathcal{M}(D)$ .

Obtain a non-private algorithm $\mathcal{M}_{\infty}(D)$ as a surrogate. Note that as long as $E_{b}[\mathcal{M}(D,b)]\approx\mathcal{M}_{\infty}(D)$ , meaning the DP model is on average equivalent to the non-private version, it is sufficient to find the optimal poisoning with respect to the non-private model. While not explored in this work, attacks can instead take into account the actual noise distribution by averaging over samples of $\mathcal{M}(D,b)$ , at the cost of additional run-time.

Determine a poisoning $(x^{\star},y^{\star})$ from $(x,y)$ maximizing the distance between $\tau(\mathcal{M}_{\infty}(D))$ and $\tau(\mathcal{M}_{\infty}(D^{\prime}))$ . This involves a selection and a perturbation step.

Ahead we detail the poisoning attack design for each mechanism:

Logistic Regression. DP logistic regression is achieved using one of three approaches: objective, output, or gradient perturbation. We evaluate objective and output perturbation using the widely used method developed by Chaudhuri and Monteleoni and Chaudhuri et al. .

For the attack we adopt the influence function, a technique which estimates the change in a model when a specific data point is infinitesimally upweighted in the training set . For a general model with gradient and Hessian information, the effect of point $x^{\star}$ on the model parameters for a loss function $L$ can be approximated as $\mathcal{I}_{\theta}(x^{\star})=-H_{\theta}^{-1}\nabla_{\theta}L(x^{\star},\hat{\theta})$ . In generalized linear models with canonical link function where $L$ is the negative log likelihood, this has a closed form involving the Fisher information as the expected Hessian , which for logistic regression evaluates to $\mathcal{I}_{\theta}(x^{\star})=(y^{\star}-\hat{y}^{\star})(X^{\top}WX+\lambda I)^{-1}x^{\star}$ where $W$ is a diagonal matrix with entries $W_{ii}=\hat{y}_{i}(1-\hat{y}_{i})$ , and $\lambda$ regularization.

We choose $\tau$ to be the coefficient vector $\theta,\theta^{\prime}$ of $\mathcal{M}_{\infty}(D),\mathcal{M}_{\infty}(D^{\prime})$ respectively. We first select $(x,y)$ closest to the corner of the hyper-rectangle containing the data (a heuristic for selecting a point far from where the separating hyperplane would likely be) and initialize $(x^{\star},y^{\star})$ as the mean point in the opposite class. Since our goal is to increase the distance $\lVert\theta-\theta^{\prime}\rVert_{2}$ , we optimize the $L_{2}$ norm of the influence function using projected gradient ascent. The constraint set $\mathcal{C}$ is the $L_{2}$ norm ball containing the training set, so after each iteration we clip $x^{\star}$ if needed.

Naive Bayes. The Gaussian Naive Bayes model parameterizes the class-conditional distribution of features using independent Normal distributions. The parameters involve a mean $\mu_{yd}$ and variance $\sigma^{2}_{yd}$ for each feature and class combination, as well as prior probabilities for each class $\pi_{y}$ . To achieve differential privacy, Laplace or Geometric noise are added to the maximum likelihood estimates for each parameter . The constraint set $\mathcal{C}$ is a hyper-rectangle set by the smallest and largest value of each feature in the dataset.

We choose $\tau$ to be the vector of all $\{\mu_{yd},\sigma^{2}_{yd},\pi_{y}\}$ . The maximum influence a perturbation can have on $\hat{\mu}_{y}$ of class $y$ is by selecting a point on the corner of the hyper-rectangle $\mathcal{C}$ nearest the data points of class $y$ and placing it on the corner furthest away. On the other hand, the maximum influence attack on $\pi_{y}$ is to flip a class label. We combine the two ideas by flipping the class label of the point closest to a corner. In our experiments we compare the power against only attacking $\mu_{y}$ .

Random Forest. The DP random forest mechanism uses unsupervised random splits of the features based on the domain of the inputs. Each tree is split to a pre-determined depth. Then the training points are percolated through the forest, and the majority label of each leaf is sampled from the Exponential mechanism . Thus a perturbed training point has no impact on the actual tree structure besides the potential label of a single leaf in each tree. As a result we measure only the change in those leaves, choosing $\tau$ to be the prediction of each tree on $x^{\star}$ .

The most detectable change in probability occurs when $j=1$ and then the majority class is swapped by a class flip. For example, if a leaf has one positive and two negative points, we flip the class of one of the negatives. Since each tree is random, there is no guarantee that any given point will be in such a situation where $j=1$ , but to increase this chance we target points which are likely to be solitary in each leaf. Thus we select the point most distant from the rest of the training set as measured by $L_{1}$ -distance and flip its label. We provide further exposition and hyperparameter details in the Appendix.

DP-SGD. For DP-SGD we evaluate two extent poisoning attacks using our framework. The ClipBKD attack constructs a data point along the axis of least variance in the dataset . This can be performed by taking the eigenvector of the covariance matrix with the smallest eigenvalue, and then projecting it to the median L2 norm of the training set. This point is assigned the label with smallest predicted probability. The goal of this attack is to enable the gradients of the point to be distinguishable, mitigating the effect of the DP-SGD clipping and noise.

The second attack selects a random datum and maximizes the loss with gradient ascent. The loss is defined with respect to a set of trained shadow models . In both attacks, the outcome $\tau(\mathcal{M}(D))$ being measured is the predicted probability of the perturbed point $P_{\mathcal{M}}(x^{\star}=1)$ , while ClipBKD first subtracts the zero vector’s predicted $P_{\mathcal{M}}(\vec{0})$ . We also consider an updated ClipBKD with the zero prediction encoded separately with additional test predictions in $\tau$ : $P_{\mathcal{M}}(\vec{0})$ , $P_{\mathcal{M}}(x_{i}^{test})$ .

2 Optimizing S𝑆S

Intuitively, we have a likelihood ratio $f_{\mathcal{M}(D)}(z)/f_{\mathcal{M}(D^{\prime})}(z)$ and we are selecting points to fill $S$ where $D$ is more likely than $D^{\prime}$ . We start by adding points where $D$ is most likely and work downwards until $S$ is large enough that there is a $c$ chance that $\mathcal{M}(D^{\prime})$ is erroneously in $S$ .

Directly obtaining these densities is intractable. Instead we can learn, given collected samples $\{\mathcal{M}(\mathbf{D})\}$ where $\mathbf{D}\in\{D,D^{\prime}\}$ , the posterior probability of the dataset $D$ :

Let us consider the set $L^{k}\coloneqq\{z\ |\ p(D|z)/p(D^{\prime}|z)>k\}$ . From Bayes’ Theorem we know $p(D|z)\propto f_{\mathcal{M}(D)}(z)\cdot\Pr(\mathbf{D}=D)$ . Given equal samples of $\mathbf{D}$ from $D$ and $D^{\prime}$ , then $P(\mathbf{D}=D)=P(\mathbf{D}=D^{\prime})$ implying $p(D|z)/p(D^{\prime}|z)=f_{\mathcal{M}(D)}(z)/f_{\mathcal{M}(D^{\prime})}(z)$ .

Our approach introduces the following further adaptations on Bichsel et al. to enable auditing on slower, potentially expensive machine learning mechanisms. (The original work used around $10^{8}$ samples for auditing simple mechanisms, which is infeasible for training machine learning models. We use $N=10000$ samples for all mechanisms except DP-SGD where $N=500$ .)

All previous auditing works (including below) use Clopper-Pearson intervals for $\hat{p}_{1}$ and $\hat{p}_{0}$ to determine the lower bound. This method is highly suboptimal because it separately bounds the numerator and denominator. We identify instead the Katz-log confidence interval to directly bound the ratio of binomial proportions , see Lemma A.3. This has far better coverage properties in our simulations, as shown in the Appendix.

Our modified procedure is presented in Algorithm 1 and summarized visually in Fig. 1.

Given Algo. 1 and $N$ samples, the highest detectable privacy risk is

To avoid biasing the confidence interval, it is important that the final MC estimate is conducted on an independent set of samples, as done in . This is especially the case when optimizing over thresholds in $S$ . Since we compute a $1-\alpha$ interval over $\hat{\varepsilon}$ for each threshold, using the best result from the same sample would lead to biased inference from multiple testing.

Experiments and Results

We evaluate the DP mechanisms over a range of $\varepsilon_{th}$ : $\{0.1,0.25,0.5,1,2,4,8,16,50\}$ . At $\varepsilon_{th}=0.1$ the ratio of probabilities is bounded by $e^{0.1}\approx 1.1$ giving nearly indistinguishable distributions, whereas at $\varepsilon_{th}=50$ essentially no privacy is guaranteed. For a given dataset $D$ , we perturb $k\in\{1,2,4,8\}$ points to get $D^{\prime}$ and train $N=10000$ times for each to determine the appropriate auditing set $S$ . Then we obtain $N$ new samples to perform the final Monte Carlo estimate and obtain the lower bound $\hat{\varepsilon}_{lb}$ . We use confidence level $\alpha=0.05$ throughout.

We assess Naive Bayes, logistic regression (output and objective perturbation), and random forest on common machine learning datasets: adult, credit, iris, breast-cancer, banknote, thoracic. We use the diffprivlib library and implement output perturbation following . Additionally, we test DP-SGD on FMNIST and CIFAR10 (here with $N=500$ ) using . Refer to Appendix for more details.

Figure 2 directly compares, over three datasets, the $\hat{\varepsilon}_{lb}$ obtained by our poisoning attacks (ML-Audit), the ClipBKD attack, and a baseline where a random data point $x$ is changed to a random point from the opposite class without changing labels (Swap-X). (Note that while the original ClipBKD work uses DP-SGD, the poisoning itself is mechanism-agnostic, so it is an appropriate baseline.) We replicated our procedure 3x and show the median $\hat{\varepsilon}_{lb}$ . The Katz log interval was used for the lower bounds even on the baselines. As a reference we plot the theoretical privacy loss up to the maximum detectable limit (Corollary 4.0.1).

For the logistic regression experiments we include an additional alternative of our approach where the modified point, rather than maximizing the influence, is optimized to target a specific coefficient vector (labeled Poisoning). The method is based on the coefficient attack of , a gradient-based optimization approach with similar form to ours. Heuristically, we want the target coefficient vector to be as distinguishable as possible from the original optimal linear coefficients, so we constructed a perpendicular vector to the original coefficients using cross products. This method, which only exists for logistic regression, can be considered an ablation of ML-Audit with a weaker attack.

Our attacks consistently and often dramatically outperform baselines across datasets, models, and $\varepsilon_{th}$ . In some cases, the privacy detection is within a small factor $(1-2\times)$ of optimal. The Poisoning attack is close to our attacks in some datasets but performs poorly in others. The Swap-X baseline shows the expected privacy loss of an attack which poisons a data point within the original data distribution, rather than a worst-case perturbation (e.g. swapping two points in the training set). The only case where Swap-X performs similarly to our tailored attacks is in output-perturbed logistic regression on separable datasets with high non-sphericity such as iris and breast-cancer. We believe these properties enable simple strategies to work well (see Fig. 3a). We believe this is not a meaningful difference in performance as both methods are near the limit of what can be theoretically detected. Our results validate previous findings showing that estimating $\varepsilon$ using membership inference-based bounds are weaker than poisoning attacks . We also tried the loss-threshold attack of and found it ineffective.

On many models and datasets we obtain nearly optimal results, indicating that these datasets are close to the maximal sensitivity of the mechanism. We observe this for random forest at $\varepsilon_{th}\leq 2$ but also a plateau in higher $\varepsilon_{th}$ . This is likely due to lack of resolution in the outputs: an average over $m$ tree decisions only takes $m$ unique probability values. Our attack on Naive Bayes is also near-optimal, while varying among datasets. Since the algorithm assigns $\varepsilon_{th}/3$ to protect each of $\mu_{yd}$ , $\sigma^{2}_{yd}$ , and $\pi_{y}$ , a perturbation needs to achieve the sensitivity bound on all three to reach the theoretical privacy risk, with the bound over all pairs $D,D^{\prime}$ rather than conditioning on a starting $D$ . For example, to reach the bound for $\mu$ , all points of one class must be in one corner of the hyper-rectangle and the poisoned point on the other corner, which is unrealistic. For this reason, our choice of attacking the class prior $\pi$ is generally more effective than attacking the mean $\mu$ (Fig. 4).

Lastly, our DP-SGD experiments assess the two existing perturbation methods of . Our findings (Fig. 8) are consistent with what is reported in those works. Interestingly, we find that the shadow adversarial attack outperforms ClipBKD on FMNIST and vice versa on CIFAR10. We also updated ClipBKD to use additional information in $\tau$ as detailed in Section 4.1 and find a moderate improvement on FMNIST. Refer to the Appendix for further discussion.

Dataset-specific attack comparisons. Previous work hypothesized that privacy attacks on realistic datasets do not cause the model shift to reach the worst-case given by the sensitivity bound, except when the attacker can completely specify the dataset . Based on our observations on the impact of the dataset distribution on attack strength, our evidence supports the hypothesis that achievable privacy is dataset-dependent. We believe theoretical analysis of datasets for attack risk to be useful for future work.

We briefly investigate this phenomenon by comparing the performance of ClipBKD by dataset sphericity, as computed by ratio between largest and smallest singular values. An attack on the direction of least variance is likely ineffective when the dataset is spherical. We observe a strong relationship between the non-sphericity of a dataset and the influence of the ClipBKD perturbed point in Fig. 3a. In comparison our influence-based attack consistently finds a higher influence value (Fig. 3b). Furthermore, this difference in influence is directly linked to the auditing improvement of our approach compared to ClipBKD.

Detecting violations in privacy. We discuss two case studies where privacy leaks occur in practice and are detected by our method. First, the Naive Bayes mechanism in diffprivlib exposes perturbed class counts in the API. However, the sum of class counts is enforced to be the dataset size. This means whenever $(D,D^{\prime})$ differ by one row in length no privacy is guaranteed. When we include the exposed class counts as part of $\tau$ , our framework detects maximal privacy loss at all $\varepsilon_{th}$ (Fig. 4a). We therefore recommend that only the class priors and not counts be made accessible.

As another example, in some works DP defines neighboring datasets can only have a single row addition or deletion, while others allow modifiying an existing point (equivalently deleting and then adding a row). This means that depending on the implementation, the actual privacy level may be half or double what the user desires to obtain. Our auditing framework correctly detected this discrepancy in the DP random forest, which defines $\varepsilon$ under the first option. Our estimated $\hat{\varepsilon}$ using the second definition exceeded the theoretical level until the definitions were reconciled.

We reiterate that the advantages of our approach over precursors are (1) improved perturbation attacks on $D$ , and (2) optimizing a likelihood attack $S$ on any summary function $\tau$ of a model $\mathcal{M}$ . Our final ablation in Fig. 4b highlights these strengths by demonstrating the incremental improvement to ClipBKD when either (1) or (2) are added.

Conclusion

We have proposed ML-Audit, a framework for estimating the differential privacy of a ML model. ML-Audit provides a recipe for devising audits against arbitrary models and is often orders of magnitude more effective than existing approaches – sometimes $\leq 2\times$ of theoretical optimum.

Acknowledgments

Approved for Public Release; Distribution Unlimited. PA #: AFRL-2022-3247.

References

Appendix A Mathematical claims and proofs

Suppose $\mathcal{M}$ is $(\varepsilon,\delta)$ -private and $D,D^{\prime}$ differ by $k$ rows. Then for all measurable $S\subset\Theta$ ,

Let $D_{1},D_{2},\ldots,D_{k-1}$ represent intermediate datasets between $D$ and $D^{\prime}$ changing one row at a time. For each step Definition 3.2 holds. For example

For $\varepsilon>0$ , the final sum is a geometric series. Applying the standard identity gives the final result. ∎

(Neyman-Pearson): Given observations of a variable $X$ , consider a hypothesis test distinguishing $H_{0}:\theta=\theta_{0}$ and $H_{1}:\theta=\theta_{1}$ with respective densities $f_{0}(x)$ and $f_{1}(x)$ . For a level of significance $\alpha$ , there exists a test with rejection region $R$ and $t\geq 0$ such that

A test satisfying these conditions is a most powerful test at level $\alpha$ (that is, $P_{1}(X\in R)$ is greater or equal to that of any other test with the same level).

Given two independent Binomial variables with underlying probability parameters $p_{1},p_{0}$ , number of trials $N$ , and observed values $n_{1}$ and $n_{0}$ , a $1-\alpha$ confidence interval for the ratio $\ln(p1/p0)$ is

where $z_{\alpha/2}$ is the critical value of the standard normal (e.g. 1.96 for $\alpha=0.05$ ).

Given Algorithm 1 and $N$ samples, the highest detectable privacy violation is

Given $N$ samples of distribution $M_{1}$ and $N$ of distribution $M_{0}$ with Bernoulli probability $p_{1},p_{0}$ of being in set $S$ , the largest value of $\ln(p_{1}/p_{0})$ is when all $N$ of $M_{1}$ are accepted in $S$ and only one of $M_{0}$ is (otherwise the ratio is infinite). That is, $n_{1}=N$ and $n_{0}=1$ . So the maximal value of $\ln(p1/p_{0})$ is $\ln(N)$ . Then apply Lemma A.3 to get the confidence lower bound, which is the best case reportable by our algorithm. ∎

with probability at least $1-\alpha$ , where $\rho=\sqrt{\frac{\ln(2/\alpha)}{2N}}$ , and $\frac{\rho}{c}\leq\frac{1}{2}$ .

Appendix B Additional Results

We highlight some results comparing $k=\{1,2,4,8\}$ for the same attack, model, and dataset. As shown, small values of $\varepsilon$ benefit from larger $k$ , while at larger $\varepsilon$ , $k$ is detrimental because it requires dividing the final privacy estimate. In our final experiments we adopt $k=8$ for all $\varepsilon\leq 2$ , $k=2$ for $2<\varepsilon\leq 8$ , and $k=1$ for $\varepsilon\in\{16,50\}$ .

B.2 Coverage simulation

We compare the baseline Clopper-Pearson interval with the Katz-log interval specifically designed for ratios of binomial variables. At a $95\%$ desired coverage ( $\alpha=0.05$ ), we find the Katz log intervals to be precise while the Clopper-Pearson are highly conservative. This is to be expected since the Clopper-Pearson approach separately bounds the two binomial variables before taking the ratio: the lower bound of the numerator divided by the upper bound of the denominator. Thus intuitively at a $\alpha=0.05$ level it actually computes a quantity that holds with probability on the order of $\alpha^{2}$ .

Appendix C DP-SGD Result

Our findings here are consistent with what is reported in the benchmark works. The shadow adversarial attack here outperforms ClipBKD on FMNIST and vice versa on CIFAR10. In general the auditing performance of these attacks are weaker compared to the other models. We believe this to be a combination of small sample size and resistance of DP-SGD to data perturbations: we found that more perturbations $k=4,8$ were required to induce detectable changes at high $\varepsilon_{th}$ .

Appendix D Additional experiment details

Our work considers binary classification so we restrict all datasets to classes 0 and 1. Datasets which are larger than 1000 points are subsampled to 1000. For evaluating random forest, which scales poorly with dataset size, we further subsample all datasets to 500 points. Results do not change significantly with dataset size (it is currently a pure Python implementation), as the DP noise added is calibrated to dataset statistics. Hyperparameter details are in the Appendix.

Our experiments were run over a cluster of 240 CPUs over a week. A single auditing run (40k retrainings, not DP-SGD) averaged 10-40 min depending on the dataset and model, while a single DP-SGD audit took about 3 hrs for 2k retrainings.

For FMNIST, we use a CNN with three 2D-conv layers and 2x2 max-pooling. The number of filters starts at 4 and is doubled to 8 at the third convolution. This is followed by a fully-connected layer. For CIFAR10, the overall architecture is the same but the number of filters is doubled for all conv layers.

We set the clipping norm to 0.002 similar to previous work and use learning rate 1e-3 with RMSprop. We train FMNIST for 10 epochs and CIFAR10 for 15. For all settings tested, model accuracy is over $98\%$ so we believe our settings are reasonable.

Note that in one- and two-layer fully connected networks are studied. Thus our models have higher capacity and are more realistic for the datasets studied. Our results for both baselines are consistent with reported.

During training our observations correlate with the comments in about the normalization of the dataset vs impact of weight initialization. For example, we find the strongest auditing performance by far when FMNIST is kept in the original feature scaling of $ $, with a detected privacy risk of around 0.7 for$ \varepsilon=50$. In contrast, when normalized to $or$ , the highest detected privacy risk drops to around 0.4. We believe this to be the intersection of a few factors, including weight initialization (affecting initial variance of the network embeddings), the max grad norm clipping value, and the learning rate. We think this phenomenon worthy of further study to better understand the nuances affecting privacy in networks trained with DP-SGD.

D.2 Random Forest

Derivation of attack: The class label $y(l)$ of a leaf $l$ is determined as

where $u(z,l)$ is 1 if $z$ is the majority label in $l$ and 0 otherwise, and $j$ is the number of excess points for the majority. We want a change of a single point that can induce a large change of this probability. As $j$ increases the probability rapidly converges to 1, so for larger $j$ the difference is statistically indistinguishable. We can compare the ratio when $j=2$ and $j=1$ , and verify that the largest difference is when $j=1$ and the majority is flipped.

Note that this requires actively flipping the label of a point rather than just removing or adding a point to make the majority an equality. In the case of equal proportions of classes then the final probability becomes 0.5 which is not as strong as if the majority flips.

We use $m=15$ trees and depth 10 in our study.

We further note that the detectable privacy violation is hyperparameter-dependent. By our analysis, growing the tree depth with $n$ sufficiently such that each data point is solitary would guarantee the effectiveness of the class flip attack. However, the size of the tree increases exponentially and the runtime is too costly with the current available implementation. As a compromise we cap our dataset sizes to 500 and disregard categorical features in the random splits, as they cannot be used to isolate points. This provides additional evidence that practical privacy risk is dataset-specific. We note that given the goal to analyze the largest privacy risk in random forest (e.g. to assess whether there are implementation errors), it makes sense to select parameters and dataset sizes to maximize privacy risk.

D.3 Logistic regression

We found in the implementations of logistic regression that the performance of the model at certain $\varepsilon$ and $\lambda$ combinations would lead to severe degradation of optimization, essentially giving degenerate coefficients. In such cases, nearly no privacy loss is detectable. Our goal is to assess the models at a reasonable performance level reflective of their actual use case on each dataset, so we adjusted the regularization $\lambda$ to avoid these situations while maintaining weak regularization consistent with statistical literature (e.g. $\lambda\approx 1/\sqrt{n}$ ). Our code uses scikit-learn which uses the $C=1/(n\cdot\lambda)$ convention. In our code we specified $C=1.0$ for objective perturbation, $C=0.01$ for output perturbation. For $\varepsilon$ around 1-8 we needed $C=10.0$ to deal with the degeneracy issue in objective perturbation.

Appendix E Adapting ML-Audit for δ>0𝛿0\delta>0

In our current study all methods we evaluate use standard implementations with $\delta=0$ , with the exception of DP-SGD, where $\delta$ is very small. Our method can be readily adapted for $\delta>0$ by keeping the $\delta$ in the auditing objective of Section 4. Then the objective splits into two quantities, the first of which is what we currently bound. For the second we can use a binomial proportion CI on the denominator while setting $\delta$ to be the level we wish to audit at. If we don’t have prior information we could also vary $\delta$ which results in a curve of estimated $\delta$ vs $\hat{\varepsilon}$ .

Nonetheless, in practical applications, $\delta\in o(1/n)$ yielding values which are exceedingly difficult to measure (and which essentially have no effect on our measurements). For example, the minimum probabilities which we can measure with confidence via Monte Carlo are around 0.01 while $\delta<0.001$ for a 1000-point dataset.

Appendix F Data-specific epsilon vs influence

As the $\varepsilon$ of a DP ML mechanism holds for a worst-case pair of neighboring datasets, we note that many real-world datasets may have lower achievable maximum privacy leakage. We can take the maximum $\hat{\varepsilon}$ estimated from our logistic regression auditing experiments as a proxy lower bound for this quantity. We let $Y$ be this value relative to a true $\varepsilon=1$ .

Additionally, we define $X$ to be the norm of the influence function applied at our perturbation point, relative to the average influence function norm of each dataset. The following figure compares $X$ and $Y$ :