Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

Introduction

Deep learning has gained unprecedented attention over the last few years, and deservedly so, as it has introduced transformative solutions across diverse scientific disciplines . Despite the ongoing success, there exist many scientific applications that have yet failed to benefit from this emerging technology, primarily due to the high cost of data acquisition. It is well known that the current state-of-the-art machine learning tools (e.g., deep/convolutional/recurrent neural networks) are lacking robustness and fail to provide any guarantees of convergence when operating in the small data regime, i.e., the regime where very few training examples are available.

In the first part of this study, we introduced physics informed neural networks as a viable solution for training deep neural networks with few training examples, for cases where the available data is known to respect a given physical law described by a system of partial differential equations. Such cases are abundant in the study of physical, biological, and engineering systems, where longstanding developments of mathematical physics have shed tremendous insight on how such systems are structured, interact, and dynamically evolve in time. We saw how the knowledge of an underlying physical law can introduce structure that effectively regularizes the training of neural networks, and enables them to generalize well even when only a few training examples are available. Through the lens of different benchmark problems, we highlighted the key features of physics informed neural networks in the context of data-driven solutions of partial differential equations .

In this second part of our study, we shift our attention to the problem of data-driven discovery of partial differential equations . To this end, let us consider parametrized and nonlinear partial differential equations of the general form

In what follows, we will provide an overview of our two main approaches to tackle this problem, namely continuous time and discrete time models, as well as a series of results and systematic studies for a diverse collection of benchmarks. In the first approach, we will assume availability of scattered and potential noisy measurements across the entire spatio-temporal domain. In the latter, we will try to infer the unknown parameters λ\lambda from only two data snapshots taken at distinct time instants. All data and codes used in this manuscript are publicly available on GitHub at https://github.com/maziarraissi/PINNs.

Continuous Time Models

We define f(t,x)f(t,x) to be given by the left-hand-side of equation (1); i.e.,

and proceed by approximating u(t,x)u(t,x) by a deep neural network. This assumption along with equation (2) result in a physics informed neural network f(t,x)f(t,x). This network can be derived by applying the chain rule for differentiating compositions of functions using automatic differentiation . It is worth highlighting that the parameters of the differential operator λ\lambda turn into parameters of the physics informed neural network f(t,x)f(t,x).

As a first example, let us consider the Burgers’ equation. This equation arises in various areas of applied mathematics, including fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flow . It is a fundamental partial differential equation and can be derived from the Navier-Stokes equations for the velocity field by dropping the pressure gradient term. For small values of the viscosity parameters, Burgers’ equation can lead to shock formation that is notoriously hard to resolve by classical numerical methods. In one space dimension the equation reads as

and proceed by approximating u(t,x)u(t,x) by a deep neural network. This will result in the physics informed neural network f(t,x)f(t,x). The shared parameters of the neural networks u(t,x)u(t,x) and f(t,x)f(t,x) along with the parameters λ=(λ1,λ2)\lambda=(\lambda_{1},\lambda_{2}) of the differential operator can be learned by minimizing the mean squared error loss

Here, {tui,xui,ui}i=1N\{t_{u}^{i},x_{u}^{i},u^{i}\}_{i=1}^{N} denote the training data on u(t,x)u(t,x). The loss MSEuMSE_{u} corresponds to the training data on u(t,x)u(t,x) while MSEfMSE_{f} enforces the structure imposed by equation (3) at a finite set of collocation points, whose number and location is taken to be the same as the training data.

To illustrate the effectiveness of our approach, we have created a training data-set by randomly generating N=2,000N=2,000 points across the entire spatio-temporal domain from the exact solution corresponding to λ1=1.0\lambda_{1}=1.0 and λ2=0.01/π\lambda_{2}=0.01/\pi. The locations of the training points are illustrated in the top panel of figure 1. This data-set is then used to train a 9-layer deep neural network with 20 neurons per hidden layer by minimizing the mean square error loss of (5) using the L-BFGS optimizer . Upon training, the network is calibrated to predict the entire solution u(t,x)u(t,x), as well as the unknown parameters λ=(λ1,λ2)\lambda=(\lambda_{1},\lambda_{2}) that define the underlying dynamics. A visual assessment of the predictive accuracy of the physics informed neural network is given in the middle and bottom panels of figure 1. The network is able to identify the underlying partial differential equation with remarkable accuracy, even in the case where the scattered training data is corrupted with 1% uncorrelated noise.

To further scrutinize the performance of our algorithm, we have performed a systematic study with respect to the total number of training data, the noise corruption levels, and the neural network architecture. The results are summarized in tables 1 and 2. The key observation here is that the proposed methodology appears to be very robust with respect to noise levels in the data, and yields a reasonable identification accuracy even for noise corruption up to 10%. This enhanced robustness seems to greatly outperform competing approaches using Gaussian process regression as previously reported in as well as approaches relying on sparse regression that require relatively clean data for accurately computing numerical gradients .

Our next example involves a realistic scenario of incompressible fluid flow as described by the ubiquitous Navier-Stokes equations. Navier-Stokes equations describe the physics of many phenomena of scientific and engineering interest. They may be used to model the weather, ocean currents, water flow in a pipe and air flow around a wing. The Navier-Stokes equations in their full and simplified forms help with the design of aircraft and cars, the study of blood flow, the design of power stations, the analysis of the dispersion of pollutants, and many other applications. Let us consider the Navier-Stokes equations in two dimensionsIt is straightforward to generalize the proposed framework to the Navier-Stokes equations in three dimensions (3D). (2D) given explicitly by

where u(t,x,y)u(t,x,y) denotes the xx-component of the velocity field, v(t,x,y)v(t,x,y) the yy-component, and p(t,x,y)p(t,x,y) the pressure. Here, λ=(λ1,λ2)\lambda=(\lambda_{1},\lambda_{2}) are the unknown parameters. Solutions to the Navier-Stokes equations are searched in the set of divergence-free functions; i.e.,

This extra equation is the continuity equation for incompressible fluids that describes the conservation of mass of the fluid. We make the assumption that

for some latent function ψ(t,x,y)\psi(t,x,y).This construction can be generalized to three dimensional problems by employing the notion of vector potentials. Under this assumption, the continuity equation (7) will be automatically satisfied. Given noisy measurements

of the velocity field, we are interested in learning the parameters λ\lambda as well as the pressure p(t,x,y)p(t,x,y). We define f(t,x,y)f(t,x,y) and g(t,x,y)g(t,x,y) to be given by

and proceed by jointly approximating [ψ(t,x,y)p(t,x,y)]\begin{bmatrix}\psi(t,x,y)&p(t,x,y)\end{bmatrix} using a single neural network with two outputs. This prior assumption along with equations (8) and (11) results into a physics informed neural network [f(t,x,y)g(t,x,y)]\begin{bmatrix}f(t,x,y)&g(t,x,y)\end{bmatrix}. The parameters λ\lambda of the Navier-Stokes operator as well as the parameters of the neural networks [ψ(t,x,y)p(t,x,y)]\begin{bmatrix}\psi(t,x,y)&p(t,x,y)\end{bmatrix} and [f(t,x,y)g(t,x,y)]\begin{bmatrix}f(t,x,y)&g(t,x,y)\end{bmatrix} can be trained by minimizing the mean squared error loss

Here we consider the prototype problem of incompressible flow past a circular cylinder; a problem known to exhibit rich dynamic behavior and transitions for different regimes of the Reynolds number Re=uD/νRe=u_{\infty}D/\nu. Assuming a non-dimensional free stream velocity u=1u_{\infty}=1, cylinder diameter D=1D=1, and kinematic viscosity ν=0.01\nu=0.01, the system exhibits a periodic steady state behavior characterized by a asymmetrical vortex shedding pattern in the cylinder wake, known as the Kármán vortex street .

To generate a high-resolution data set for this problem we have employed the spectral/hp-element solver NekTar . Specifically, the solution domain is discretized in space by a tessellation consisting of 412 triangular elements, and within each element the solution is approximated as a linear combination of a tenth-order hierarchical, semi-orthogonal Jacobi polynomial expansion . We have assumed a uniform free stream velocity profile imposed at the left boundary, a zero pressure outflow condition imposed at the right boundary located 25 diameters downstream of the cylinder, and periodicity for the top and bottom boundaries of the ×\times domain. We integrate equation (6) using a third-order stiffly stable scheme until the system reaches a periodic steady state, as depicted in figure 2(a). In what follows, a small portion of the resulting data-set corresponding to this steady state solution will be used for model training, while the remaining data will be used to validate our predictions. For simplicity, we have chosen to confine our sampling in a rectangular region downstream of cylinder as shown in figure 2(a).

Given scattered and potentially noisy data on the stream-wise u(t,x,y)u(t,x,y) and transverse v(t,x,y)v(t,x,y) velocity components, our goal is to identify the unknown parameters λ1\lambda_{1} and λ2\lambda_{2}, as well as to obtain a qualitatively accurate reconstruction of the entire pressure field p(t,x,y)p(t,x,y) in the cylinder wake, which by definition can only be identified up to a constant. To this end, we have created a training data-set by randomly sub-sampling the full high-resolution data-set. To highlight the ability of our method to learn from scattered and scarce training data, we have chosen N=5,000N=5,000, corresponding to a mere 1% of the total available data as illustrated in figure 2(b). Also plotted are representative snapshots of the predicted velocity components u(t,x,y)u(t,x,y) and v(t,x,y)v(t,x,y) after the model was trained. The neural network architecture used here consists of 9 layers with 20 neurons in each layer.

A summary of our results for this example is presented in figure 3. We observe that the physics informed neural network is able to correctly identify the unknown parameters λ1\lambda_{1} and λ2\lambda_{2} with very high accuracy even when the training data was corrupted with noise. Specifically, for the case of noise-free training data, the error in estimating λ1\lambda_{1} and λ2\lambda_{2} is 0.078%, and 4.67%, respectively. The predictions remain robust even when the training data are corrupted with 1% uncorrelated Gaussian noise, returning an error of 0.17%, and 5.70%, for λ1\lambda_{1} and λ2\lambda_{2}, respectively.

A more intriguing result stems from the network’s ability to provide a qualitatively accurate prediction of the entire pressure field p(t,x,y)p(t,x,y) in the absence of any training data on the pressure itself. A visual comparison against the exact pressure solution is presented in figure 3 for a representative pressure snapshot. Notice that the difference in magnitude between the exact and the predicted pressure is justified by the very nature of the Navier-Stokes system, as the pressure field is only identifiable up to a constant. This result of inferring a continuous quantity of interest from auxiliary measurements by leveraging the underlying physics is a great example of the enhanced capabilities that physics informed neural networks have to offer, and highlights their potential in solving high-dimensional inverse problems.

Our approach so far assumes availability of scattered data throughout the entire spatio-temporal domain. However, in many cases of practical interest, one may only be able to observe the system at distinct time instants. In the next section, we introduce a different approach that tackles the data-driven discovery problem using only two data snapshots. We will see how, by leveraging the classical Runge-Kutta time-stepping schemes, one can construct discrete time physics informed neural networks that can retain high predictive accuracy even when the temporal gap between the data snapshots is very large.

Discrete Time Models

We begin by applying the general form of Runge-Kutta methods with qq stages to equation (1) and obtain

Here, un+cj(x)=u(tn+cjΔt,x)u^{n+c_{j}}(x)=u(t^{n}+c_{j}\Delta t,x) for j=1,,qj=1,\ldots,q. This general form encapsulates both implicit and explicit time-stepping schemes, depending on the choice of the parameters {aij,bj,cj}\{a_{ij},b_{j},c_{j}\}. Equations (13) can be equivalently expressed as

We proceed by placing a multi-output neural network prior on

This prior assumption along with equations (15) result in two physics informed neural networks

Given noisy measurements at two distinct temporal snapshots {xn,un}\{\bm{x}^{n},\bm{u}^{n}\} and {xn+1,un+1}\{\bm{x}^{n+1},\bm{u}^{n+1}\} of the system at times tnt^{n} and tn+1t^{n+1}, respectively, the shared parameters of the neural networks (16), (17), and (18) along with the parameters λ\lambda of the differential operator can be trained by minimizing the sum of squared errors

Here, xn={xn,i}i=1Nn\bm{x}^{n}=\left\{x^{n,i}\right\}_{i=1}^{N_{n}}, un={un,i}i=1Nn\bm{u}^{n}=\left\{u^{n,i}\right\}_{i=1}^{N_{n}}, xn+1={xn+1,i}i=1Nn+1\bm{x}^{n+1}=\left\{x^{n+1,i}\right\}_{i=1}^{N_{n+1}}, and un+1={un+1,i}i=1Nn+1\bm{u}^{n+1}=\left\{u^{n+1,i}\right\}_{i=1}^{N_{n+1}}.

Let us illustrate the key features of this method through the lens of the Burgers’ equation. Recall the equation’s form

and notice that the nonlinear operator in equation (15) is given by

Given merely two training data snapshots, the shared parameters of the neural networks (16), (17), and (18) along with the parameters λ=(λ1,λ2)\lambda=(\lambda_{1},\lambda_{2}) of the Burgers’ equation can be learned by minimizing the sum of squared errors in equation (19). Here, we have created a training data-set comprising of Nn=199N_{n}=199 and Nn+1=201N_{n+1}=201 spatial points by randomly sampling the exact solution at time instants tn=0.1t^{n}=0.1 and tn+1=0.9t^{n+1}=0.9, respectively. The training data are shown in the top and middle panel of figure 4. The neural network architecture used here consists of 4 hidden layers with 50 neurons each, while the number of Runge-Kutta stages is empirically chosen to yield a temporal error accumulation of the order of machine precision ϵ\epsilon by settingThis is motivated by the theoretical error estimates for implicit Runge-Kutta schemes suggesting a truncation error of O(Δt2q)\mathcal{O}(\Delta{t}^{2q}) .

where the time-step for this example is Δt=0.8\Delta{t}=0.8. The bottom panel of figure 4 summarizes the identified parameters λ=(λ1,λ2)\lambda=(\lambda_{1},\lambda_{2}) for the cases of noise-free data, as well as noisy data with 1% of Gaussian uncorrelated noise corruption. For both cases, the proposed algorithm is able to learn the correct parameter values λ1=1.0\lambda_{1}=1.0 and λ2=0.01/π\lambda_{2}=0.01/\pi with remarkable accuracy, despite the fact that the two data snapshots used for training are very far apart, and potentially describe different regimes of the underlying dynamics.

A sensitivity analysis is performed to quantify the accuracy of our predictions with respect to the gap between the training snapshots Δt\Delta{t}, the noise levels in the training data, and the physics informed neural network architecture. As shown in table 3, the proposed algorithm is quite robust to both Δt\Delta{t} and the noise corruption levels, and it consistently returns reasonable estimates for the unknown parameters. This robustness is mainly attributed to the flexibility of the underlying implicit Runge-Kutta scheme to admit an arbitrarily high number of stages, allowing the data snapshots to be very far apart in time, while not compromising the accuracy with which the nonlinear dynamics of equation (20) are resolved. This is the key highlight of our discrete time formulation for identification problems, setting it apart from competing approaches . Lastly, table 4 presents the percentage error in the identified parameters, demonstrating the robustness of our estimates with respect to the underlying neural network architecture.

Our final example aims to highlight the ability of the proposed framework to handle governing partial differential equations involving higher order derivatives. Here, we consider a mathematical model of waves on shallow water surfaces; the Korteweg-de Vries (KdV) equation. This equation can also be viewed as Burgers’ equation with an added dispersive term. The KdV equation has several connections to physical problems. It describes the evolution of long one-dimensional waves in many physical settings. Such physical settings include shallow-water waves with weakly non-linear restoring forces, long internal waves in a density-stratified ocean, ion acoustic waves in a plasma, and acoustic waves on a crystal lattice. Moreover, the KdV equation is the governing equation of the string in the Fermi-Pasta-Ulam problem in the continuum limit. The KdV equation reads as

with (λ1,λ2)(\lambda_{1},\lambda_{2}) being the unknown parameters. For the KdV equation, the nonlinear operator in equations (15) is given by

and the shared parameters of the neural networks (16), (17), and (18) along with the parameters λ=(λ1,λ2)\lambda=(\lambda_{1},\lambda_{2}) of the KdV equation can be learned by minimizing the sum of squared errors (19).

To obtain a set of training and test data we simulated (22) using conventional spectral methods. Specifically, starting from an initial condition u(0,x)=cos(πx)u(0,x)=\cos(\pi x) and assuming periodic boundary conditions, we have integrated equation (22) up to a final time t=1.0t=1.0 using the Chebfun package with a spectral Fourier discretization with 512 modes and a fourth-order explicit Runge-Kutta temporal integrator with time-step Δt=106\Delta{t}=10^{-6}. Using this data-set, we then extract two solution snapshots at time tn=0.2t^{n}=0.2 and tn+1=0.8t^{n+1}=0.8, and randomly sub-sample them using Nn=199N_{n}=199 and Nn+1=201N_{n+1}=201 to generate a training data-set. We then use these data to train a discrete time physics informed neural network by minimizing the sum of squared error loss of equation (19) using L-BFGS . The network architecture used here comprises of 4 hidden layers, 50 neurons per layer, and an output layer predicting the solution at the qq Runge-Kutta stages, i.e., un+cj(x)u^{n+c_{j}}(x), j=1,,qj=1,\dots,q, where qq is computed using equation (21) by setting Δt=0.6\Delta{t}=0.6.

The results of this experiment are summarized in figure 5. In the top panel, we present the exact solution u(t,x)u(t,x), along with the locations of the two data snapshots used for training. A more detailed overview of the exact solution and the training data is given in the middle panel. It is worth noticing how the complex nonlinear dynamics of equation (22) causes dramatic differences in the form of the solution between the two reported snapshots. Despite these differences, and the large temporal gap between the two training snapshots, our method is able to correctly identify the unknown parameters regardless of whether the training data is corrupted with noise or not. Specifically, for the case of noise-free training data, the error in estimating λ1\lambda_{1} and λ2\lambda_{2} is 0.023%, and 0.006%, respectively, while the case with 1% noise in the training data returns errors of 0.057%, and 0.017%, respectively.

Summary and Discussion

We have introduced physics informed neural networks, a new class of universal function approximators that is capable of encoding any underlying physical laws that govern a given data-set, and can be described by partial differential equations. In this work, we design data-driven algorithms for discovering dynamic models described by parametrized nonlinear partial differential equations. The inferred models allow us to construct computationally efficient and fully differentiable surrogates that can be subsequently used for different applications including predictive forecasting, control, and optimization.

Although a series of promising results was presented, the reader may perhaps agree that this two-part treatise creates more questions than it answers. In a broader context, and along the way of seeking further understanding of such tools, we believe that this work advocates a fruitful synergy between machine learning and classical computational physics that has the potential to enrich both fields and lead to high-impact developments.

Acknowledgements

This work received support by the DARPA EQUiPS grant N66001-15-2-4055, the MURI/ARO grant W911NF-15-1-0562, and the AFOSR grant FA9550-17-1-0013. All data and codes used in this manuscript are publicly available on GitHub at https://github.com/maziarraissi/PINNs.

References