Efficient Neural Network Robustness Certification with General Activation Functions

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, Luca Daniel

Introduction

While neural networks (NNs) have achieved remarkable performance and accomplished unprecedented breakthroughs in many machine learning tasks, recent studies have highlighted their lack of robustness against adversarial perturbations . For example, in image learning tasks such as object classification or content captioning , visually indistinguishable adversarial examples can be easily crafted from natural images to alter a NN’s prediction result. Beyond the white-box attack setting where the target model is entirely transparent, visually imperceptible adversarial perturbations can also be generated in the black-box setting by only using the prediction results of the target model . In addition, real-life adversarial examples have been made possible through the lens of realizing physical perturbations . As NNs are becoming a core technique deployed in a wide range of applications, including safety-critical tasks, certifying robustness of a NN against adversarial perturbations has become an important research topic in machine learning.

Although certifying the largest possible robustness is challenging for ReLU networks, the piece-wise linear nature of ReLUs can be exploited to efficiently compute a non-trivial certified lower bound of the minimum distortion . Beyond ReLU, one fundamental problem that remains largely unexplored is how to generalize the robustness certification technique to other popular activation functions that are not piece-wise linear, such as tanh and sigmoid, and how to motivate and certify the design of other activation functions towards improved robustness. In this paper, we tackle the preceding problem by proposing an efficient robustness certification framework for NNs with general activation functions. Our main contributions in this paper are summarized as follows:

We propose a generic analysis framework CROWN for certifying NNs using linear or quadratic upper and lower bounds for general activation functions that are not necessarily piece-wise linear.

Unlike previous work , CROWN allows flexible selections of upper/lower bounds for activation functions, enabling an adaptive scheme that helps to reduce approximation errors. Our experiments show that CROWN achieves up to 26% improvements in certified lower bounds compared to .

Our algorithm is efficient and can scale to large NNs with various activation functions. For a NN with over 10,000 neurons, we can give a certified lower bound in about 1 minute on 1 CPU core.

Background and Related Work

For ReLU networks, finding the minimum adversarial distortion for a given input data point x0x_{0} can be cast as a mixed integer linear programming (MILP) problem . Reluplex uses a satisfiable modulo theory (SMT) to encode ReLU activations into linear constraints. Similarly, Planet uses satisfiability (SAT) solvers. However, due to the NP-completeness for solving such a problem , these methods can only find minimum distortion for very small networks. It can take Reluplex several hours to find the minimum distortion of an example for a ReLU network with 5 inputs, 5 outputs and 300 neurons.

On the other hand, a computationally feasible alternative of robustness certificate is to provide a non-trivial and certified lower bound of minimum distortion. Some analytical lower bounds based on operator norms on the weight matrices or the Jacobian matrix in NNs do not exploit special property of ReLU and thus can be very loose . The bounds in are based on the local Lipschitz constant. assumes a continuous differentiable NN and hence excludes ReLU networks; a closed form lower-bound is also hard to derive for networks beyond 2 layers. applies to ReLU networks and uses Extreme Value Theory to provide an estimated lower bound (CLEVER score). Although the CLEVER score is capable of reflecting the level of robustness in different NNs and is scalable to large networks, it is not a certified lower bound. On the other hand, use the idea of a convex outer adversarial polytope in ReLU networks to compute a certified lower bound by relaxing the MILP certification problem to linear programing (LP). apply semidefinite programming for robustness certification in ReLU networks but their approach is limited to NNs with one hidden layer. exploit the ReLU property to bound the activation function (or the local Lipschitz constant) and provide efficient algorithms (Fast-Lin and Fast-Lip) for computing a certified lower bound, achieving state-of-the-art performance. A recent work uses abstract transformations to zonotopes for proving robustness property for ReLU networks. Nonetheless, there are still some applications demand non-ReLU activations, e.g. RNN and LSTM, thus a general framework that can efficiently compute non-trivial and certified lower bounds for NNs with general activation functions is of great importance. We aim at filling this gap and propose CROWN that can perform efficient robustness certification to NNs with general activation functions. Table 1 summarizes the differences of other approaches and CROWN. Note that a recent work based on solving Lagrangian dual can also handle general activation functions but it trades off the quality of robustness bound with scalability.

CROWN: A general framework for certifying neural networks

1 General framework

Input perturbation and pre-activation bounds.

Below, we first define the linear upper bounds and lower bounds of activation functions in Definition 3.1, which are the key to derive explicit output bounds for an mm-layer neural network with general activation functions. The formal statement of the explicit output bounds is shown in Theorem 3.2.

Note that the parameters αU,r(k),αL,r(k),βU,r(k),βL,r(k)\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}},\mathbf{\beta}^{(k)}_{U,{r}},\mathbf{\beta}^{(k)}_{L,{r}} depend on lr(k)\mathbf{l}^{(k)}_{r} and ur(k)\mathbf{u}^{(k)}_{r}, i.e. for different lr(k)\mathbf{l}^{(k)}_{r} and ur(k)\mathbf{u}^{(k)}_{r} we may choose different parameters. Also, for ease of exposition, in this paper we restrict αU,r(k),αL,r(k)0\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}}\geq 0. However, Theorem 3.2 can be easily generalized to the case of negative αU,r(k),αL,r(k)\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}}.

Theorem 3.2 illustrates how a NN function fj(x)f_{j}(\mathbf{x}) can be bounded by two linear functions fjU(x)f^{U}_{j}(\mathbf{x}) and fjL(x)f^{L}_{j}(\mathbf{x}) when the activation function of each neuron is bounded by two linear functions hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r} in Definition 3.1. The central idea is to unwrap the activation functions layer by layer by considering the signs of the associated (equivalent) weights of each neuron and apply the two linear bounds hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r}. As we demonstrate in the proof, when we replace the activation functions with the corresponding linear upper bounds and lower bounds at the layer m1m-1, we can then define equivalent weights and biases based on the parameters of hU,r(m1)h^{(m-1)}_{U,r} and hL,r(m1)h^{(m-1)}_{L,r} (e.g. Λ(k),Δ(k),Ω(k),Θ(k)\mathbf{\Lambda}^{(k)},\mathbf{\Delta}^{(k)},\mathbf{\Omega}^{(k)},\mathbf{\Theta}^{(k)} are related to the terms αU,r(k),βU,r(k),αL,r(k),βL,r(k)\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\beta}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}},\mathbf{\beta}^{(k)}_{L,{r}}, respectively) and then repeat the procedure to “back-propagate” to the input layer. This allows us to obtain fjU(x)f^{U}_{j}(\mathbf{x}) and fjL(x)f^{L}_{j}(\mathbf{x}) in (1). The formal proof of Theorem 3.2 is in Appendix A. Note that for a neuron rr in layer kk, the slopes of its linear upper and lower bounds αU,r(k),αL,r(k)\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}} can be different. This implies:

Fast-Lin is a special case of our framework as they require the slopes αU,r(k),αL,r(k)\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}} to be the same; and it only applies to ReLU networks (cf. Sec. 3.2). In Fast-Lin, Λ(0)\mathbf{\Lambda}^{(0)} and Ω(0)\mathbf{\Omega}^{(0)} are identical.

Our CROWN framework allows adaptive selections on the linear approximation when computing certified lower bounds of minimum adversarial distortion, which is the main contributor to improve the certified lower bound as demonstrated in the experiments in Section 4.

Certified lower bound of minimum distortion.

Time Complexity.

2 Case studies: CROWN for ReLU, tanh, sigmoid and arctan activations

In Section 3.1 we showed that as long as one can identify two linear functions hU(y),hL(y)h_{U}(y),h_{L}(y) to bound a general activation function σ(y)\sigma(y) for each neuron, we can use Corollary 3.3 with a binary search to obtain certified lower bounds of minimum distortion. In this section, we illustrate how to find parameters αU,r(k),αL,r(k)\mathbf{\alpha}^{(k)}_{U,{r}},\mathbf{\alpha}^{(k)}_{L,{r}} and βU,r(k),βL,r(k)\mathbf{\beta}^{(k)}_{U,{r}},\mathbf{\beta}^{(k)}_{L,{r}} of hU(y),hL(y)h_{U}(y),h_{L}(y) for four most widely used activation functions: ReLU, tanh, sigmoid and arctan. Other activations, including but not limited to leaky ReLU, ELU and softplus, can be easily incorporated into our CROWN framework following a similar procedure.

Bounding tanh/sigmoid/arctan.

For tanh activation, σ(y)=1e2y1+e2y\sigma(y)=\frac{1-e^{-2y}}{1+e^{-2y}}; for sigmoid activation, σ(y)=11+ey\sigma(y)=\frac{1}{1+e^{-y}}; for arctan activation, σ(y)=arctan(y)\sigma(y)=\arctan(y). All functions are convex on one side (y<0y<0) and concave on the other side (y>0y>0), thus the same rules can be used to find hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r}. Below we call (lr(k),σ(lr(k)))(\mathbf{l}^{(k)}_{r},\sigma(\mathbf{l}^{(k)}_{r})) as left end-point and (ur(k),σ(ur(k)))(\mathbf{u}^{(k)}_{r},\sigma(\mathbf{u}^{(k)}_{r})) as right end-point. For rSk+r\in\mathcal{S}^{+}_{k}, since σ(y)\sigma(y) is concave, we can let hU,r(k)h^{(k)}_{U,r} be any tangent line of σ(y)\sigma(y) at point d[lr(k),ur(k)]d\in[\mathbf{l}^{(k)}_{r},\mathbf{u}^{(k)}_{r}], and let hL,r(k)h^{(k)}_{L,r} pass the two end-points. Similarly, σ(y)\sigma(y) is concave for rSk+r\in\mathcal{S}^{+}_{k}, thus we can let hL,r(k)h^{(k)}_{L,r} be any tangent line of σ(y)\sigma(y) at point d[lr(k),ur(k)]d\in[\mathbf{l}^{(k)}_{r},\mathbf{u}^{(k)}_{r}] and let hU,r(k)h^{(k)}_{U,r} pass the two end-points. Lastly, for rSk±r\in\mathcal{S}^{\pm}_{k}, we can let hU,r(k)h^{(k)}_{U,r} be the tangent line that passes the left end-point and (d,σ(d))(d,\sigma(d)) where d0d\geq 0 and hU,r(k)h^{(k)}_{U,r} be the tangent line that passes the right end-point and (d,σ(d))(d,\sigma(d)) where d0d\leq 0. The value of dd for transcendental functions can be found using a binary search. The plots of upper and lower bounds for tanh and sigmoid are in Figure 1 and 3 (in Appendix). Plots for arctan\arctan are similar and so omitted.

Bounding ReLU.

For ReLU activation, σ(y)=max(0,y)\sigma(y)=\max(0,y). If rSk+r\in\mathcal{S}^{+}_{k}, we have σ(y)=y\sigma(y)=y and so we can set hU,r(k)=hL,r(k)=yh^{(k)}_{U,r}=h^{(k)}_{L,r}=y; if rSkr\in\mathcal{S}^{-}_{k}, we have σ(y)=0\sigma(y)=0, and thus we can set hU,r(k)=hL,r(k)=0h^{(k)}_{U,r}=h^{(k)}_{L,r}=0; if rSk±r\in\mathcal{S}^{\pm}_{k}, we can set hU,r(k)=ur(k)ur(k)lr(k)(ylr(k))h^{(k)}_{U,r}=\frac{\mathbf{u}^{(k)}_{r}}{\mathbf{u}^{(k)}_{r}-\mathbf{l}^{(k)}_{r}}(y-\mathbf{l}^{(k)}_{r}) and hL,r(k)=ay,0a1h^{(k)}_{L,r}=ay,\,0\leq a\leq 1. Setting a=ur(k)ur(k)lr(k)a=\frac{\mathbf{u}^{(k)}_{r}}{\mathbf{u}^{(k)}_{r}-\mathbf{l}^{(k)}_{r}} leads to the linear lower bound used in Fast-Lin . Thus, Fast-Lin is a special case under our framework. We propose to adaptively choose aa, where we set a=1a=1 when ur(k)lr(k)\mathbf{u}^{(k)}_{r}\geq|\mathbf{l}^{(k)}_{r}| and a=0a=0 when ur(k)<lr(k)\mathbf{u}^{(k)}_{r}<|\mathbf{l}^{(k)}_{r}|. In this way, the area between the lower bound hL,r(k)=ayh^{(k)}_{L,r}=ay and σ(y)\sigma(y) (which reflects the gap between the lower bound and the ReLU function) is always minimized. As shown in our experiments, the adaptive selection of hL,r(k)h^{(k)}_{L,r} based on the value of ur(k)\mathbf{u}^{(k)}_{r} and lr(k)\mathbf{l}^{(k)}_{r} helps to achieve a tighter certified lower bound. Figure 4 (in Appendix) illustrates the idea discussed here.

Summary.

We summarized the above analysis on choosing valid linear functions hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r} in Table 3 and 3. In general, as long as hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r} are identified for the activation functions, we can use Corollary 3.3 to compute certified lower bounds for general activation functions. Note that there remain many other choices of hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r} as valid upper/lower bounds of σ(y)\sigma(y), but ideally, we would like them to be close to σ(y)\sigma(y) in order to achieve a tighter lower bound of minimum distortion.

3 Extension to quadratic bounds

Experiments

Methods. For ReLU networks, CROWN-Ada is CROWN with adaptive linear bounds (Sec. 3.2), CROWN-Quad is CROWN with quadratic bounds (Sec. 3.3). Fast-Lin and Fast-Lip are state-of-the-art fast certified lower bound proposed in . Reluplex can solve the exact minimum adversarial distortion but is only computationally feasible for very small networks. LP-Full is based on the LP formulation in and we solve LPs for each neuron exactly to achieve the best possible bound. For networks with other activation functions, CROWN-general is our proposed method.

Model and Dataset. We evaluate CROWN and other baselines on multi-layer perceptron (MLP) models trained on MNIST and CIFAR-10 datasets. We denote a feed-forward network with mm layers and nn neurons per layer as m×[n]m\times[n]. For models with ReLU activation, we use pretrained models provided by and also evaluate the same set of 100 random test images and random attack targets as in (according to their released code) to make our results comparable. For training NN models with other activation functions, we search for best learning rate and weight decay parameters to achieve a similar level of accuracy as ReLU models.

Implementation and Setup. We implement our algorithm using Python (numpy with numba). Most computations in our method are matrix operations that can be automatically parallelized by the BLAS library; however, we set the number of BLAS threads to 1 for a fair comparison to other methods. Experiments were conducted on an Intel Skylake server CPU running at 2.0 GHz on Google Cloud. Our code is available at https://github.com/huanzhang12/CROWN-Robustness-Certification

Conclusion

We have presented a general framework CROWN to efficiently compute a certified lower bound of minimum distortion in neural networks for any given data point x0\mathbf{x_{0}}. CROWN features adaptive bounds for improved robustness certification and applies to general activation functions. Moreover, experiments show that (1) CROWN outperforms state-of-the-art baselines on ReLU networks and (2) CROWN can efficiently certify non-trivial lower bounds for large networks with over 10K neurons and with different activation functions.

Acknowledgement

This work was supported in part by NSF IIS-1719097, Intel faculty award, Google Cloud Credits for Research Program and GPUs donated by NVIDIA. Tsui-Wei Weng and Luca Daniel are partially supported by MIT-IBM Watson AI Lab and MIT-Skoltech program.

References

Appendix A Proof of Theorem 3.2

Assume the activation function σ(y)\sigma(y) is bounded by two linear functions hU,i(m1),hL,i(m1)h^{(m-1)}_{U,i},h^{(m-1)}_{L,i} in Definition 3.1, we have

Thus, if the associated weight Wj,i(m)\mathbf{W}^{(m)}_{j,i} to the ii-th neuron is non-negative (the terms in F1F_{1} bracket), we have

otherwise (the terms in F2F_{2} bracket), we have

Let fjU,m1(x)f_{j}^{U,m-1}(\mathbf{x}) be an upper bound of fj(x)f_{j}(\mathbf{x}). To compute fjU,m1(x)f_{j}^{U,m-1}(\mathbf{x}), (6), (7) and (8) are the key equations. Precisely, for the Wj,i(m)0\mathbf{W}^{(m)}_{j,i}\geq 0 terms in (6), the upper bound is the right-hand-side (RHS) in (7); and for the Wj,i(m)<0\mathbf{W}^{(m)}_{j,i}<0 terms in (6), the upper bound is the RHS in (8). Thus, we obtain:

From (9) to (10), we replace hU,i(m1)(yi(m1))h^{(m-1)}_{U,i}(\mathbf{y}^{(m-1)}_{i}) and hL,i(m1)(yi(m1))h^{(m-1)}_{L,i}(\mathbf{y}^{(m-1)}_{i}) by their definitions; from (10) to (11), we use variables λj,i(m1)\mathbf{\lambda}^{(m-1)}_{j,i} and Δj,i(m1)\mathbf{\Delta}^{(m-1)}_{j,i} to denote the slopes in front of yi(m1)\mathbf{y}^{(m-1)}_{i} and the intercepts in the parentheses:

and repeat again iteratively until obtain the final upper bound fjU,1(x)f_{j}^{U,1}(\mathbf{x}), where fj(x)fjU,m1(x)fjU,m2(x)fjU,1(x)f_{j}(\mathbf{x})\leq f_{j}^{U,m-1}(\mathbf{x})\leq f_{j}^{U,m-2}(\mathbf{x})\leq\ldots\leq f_{j}^{U,1}(\mathbf{x}). We let fj(x)f_{j}(\mathbf{x}) denote the final upper bound fjU,1(x)f_{j}^{U,1}(\mathbf{x}), and we have

Lower bound.

The above derivations of upper bound can be applied similarly to deriving lower bounds of fj(x)f_{j}(\mathbf{x}), and the only difference is now we need to use the LHS of (7) and (8) (rather than RHS when deriving upper bound) to bound the two terms in (6). Thus, following the same procedure in deriving the upper bounds, we can iteratively unwrap the activation functions and obtain a final lower bound fjL,1(x)f_{j}^{L,1}(\mathbf{x}), where fj(x)fjL,m1(x)fjL,m2(x)fjL,1(x)f_{j}(\mathbf{x})\geq f_{j}^{L,m-1}(\mathbf{x})\geq f_{j}^{L,m-2}(\mathbf{x})\geq\ldots\geq f_{j}^{L,1}(\mathbf{x}). Let fjL(x)=fjL,1(x)f_{j}^{L}(\mathbf{x})=f_{j}^{L,1}(\mathbf{x}), we have:

Indeed, λj,i(k)\mathbf{\lambda}^{(k)}_{j,i} and ωj,i(k)\mathbf{\omega}^{(k)}_{j,i} only differs in the conditions of selecting αU,i(k)\mathbf{\alpha}^{(k)}_{U,{i}} or αL,i(k)\mathbf{\alpha}^{(k)}_{L,{i}}; similarly for Δi,j(k)\mathbf{\Delta}^{(k)}_{i,j} and Θi,j(k)\mathbf{\Theta}^{(k)}_{i,j}.

Appendix B Proof of Corollary 3.3

Global lower bound.

Since fjL(x)=Ωj,:(0)x+k=1mΩj,:(k)(b(k)+Θ:,j(k))f_{j}^{L}(\mathbf{x})=\mathbf{\Omega}^{(0)}_{j,:}\mathbf{x}+\sum_{k=1}^{m}\mathbf{\Omega}^{(k)}_{j,:}(\mathbf{b}^{(k)}+\mathbf{\Theta}^{(k)}_{:,j}), we can derive γjL\gamma^{L}_{j} (similar to the derivation of γjU\gamma^{U}_{j}) below:

Appendix C Illustration of linear upper and lower bounds on sigmoid activation function.

Let fjU(x)f_{j}^{U}(\mathbf{x}) be an upper bound of fj(x)f_{j}(\mathbf{x}). To compute fjU(x)f_{j}^{U}(\mathbf{x}) with quadratic approximations, we can still apply (7) and (8) except that hU,r(k)(y)h^{(k)}_{U,r}(y) and hL,r(k)(y)h^{(k)}_{L,r}(y) are replaced by the following quadratic functions:

From (21) to (22), we replace hU,i(m1)(yi(m1))h^{(m-1)}_{U,i}(\mathbf{y}^{(m-1)}_{i}) and hL,i(m1)(yi(m1))h^{(m-1)}_{L,i}(\mathbf{y}^{(m-1)}_{i}) by their definitions and let

From (22) to (23), we let qU,j(m1)=Wj,:(m)τj,i(m1)\mathbf{q}^{(m-1)}_{U,{j}}=\mathbf{W}^{(m)}_{j,:}\odot\mathbf{\tau}^{(m-1)}_{j,i}, and write in the matrix form. From (23) to (24), we substitute y(m1)\mathbf{y}^{(m-1)} by its definition: y(m1)=W(m1)Φ(m2)(x)+b(m1)\mathbf{y}^{(m-1)}=\mathbf{W}^{(m-1)}\Phi_{(m-2)}(\mathbf{x})+\mathbf{b}^{(m-1)} and then collect the quadratic terms, linear terms and constant terms of Φ(m2)(x)\Phi_{(m-2)}(\mathbf{x}), where

Lower bound.

Similar to the above derivation, we can simply swap hU,r(k)h^{(k)}_{U,r} and hL,r(k)h^{(k)}_{L,r} and obtain lower bound fjL(x)f_{j}^{L}(\mathbf{x}):

Appendix E Additional Experimental Results

E.2 Results on CROWN-general