Interactive Differentiable Simulation

Eric Heiden, David Millard, Hejia Zhang, Gaurav S. Sukhatme

Introduction

A key ingredient to achieving intelligent behavior is physical understanding. Under the umbrella of intuitive physics, specialized models, such as interaction and graph neural networks, have been proposed to learn dynamics from data to predict the motion of objects over long time horizons. By labelling the training data given to these models by physical quantities, they are able to produce behavior that is conditioned on actual physical parameters, such as masses or friction coefficients, allowing for plausible estimation of physical properties and improved generalizability.

In this work, we introduce Interactive Differentiable Simulation (IDS), a differentiable physical simulator for rigid body dynamics. Instead of learning every aspect of such dynamics from data, our engine constrains the learning problem to the prediction of a small number of physical parameters that influence the motion and interaction of bodies.

A differentiable physics engine provides many advantages when used as part of a learning process. Physically accurate simulation obeys dynamical laws of real systems, including conservation of energy and momentum. Furthermore, joint constraints are enforced with no room outside of the model for error. The parameters of a physics engine are well-defined and correspond to properties of real systems, including multi-body geometries, masses, and inertia matrices. Learning these parameters provides a significantly interpretable parameter space, and can benefit classical control and estimation algorithms. Further, due to the high inductive bias, model parameters need not be jointly retrained for differing degrees of freedom or a reconfigured dynamics environment.

Differentiable Rigid Body Dynamics

In this work, we introduce a physical simulator for rigid-body dynamics. The motion of kinematic chains of multi-body systems can be described using the Newton-Euler equations:

Here, $\mathbf{q}$ , $\mathbf{\dot{q}}$ and $\mathbf{\ddot{q}}$ are vectors of generalized“Generalized coordinates” sparsely encode only particular degrees of freedom in the kinematic chain such that bodies connected by joints are guaranteed to remain connected. position, velocity and acceleration coordinates, and $\mathbf{\tau}$ is a vector of generalized forces. $\mathbf{M}$ is the generalized inertia matrix and depends on $\mathbf{q}$ . Coriolis forces, centrifugal forces, gravity and other forces acting on the system, are accounted for by the bias force matrix $\mathbf{C}$ that depends on $\mathbf{q}$ and $\mathbf{\dot{q}}$ . Since all bodies are connected via joints, including free-floating bodies which connect to a static world body via special joints with seven degrees of freedom (DOF), i.e. 3D position and orientation in quaternion coordinates, their positions and orientations in world coordinates $\mathbf{p}$ are computed by the forward kinematics function $\operatorname{KIN}(\cdot)$ (Fig. 2) using the joint angles and the bodies’ relative transforms to their respective joints through which they are attached to their parent body.

Forward dynamics $\operatorname{FD}(\cdot)$ (cf. Fig. 2) is the mapping from positions, velocities and forces to accelerations. We efficiently compute the forward dynamics using the Articulated Body Algorithm (ABA) . Given a descriptive model consisting of joints, bodies, and predecessor/successor relationships, we build a kinematic chain that specifies the dynamics of the system. In our simulator, bodies comprise physical entities with mass, inertia, and attached rendering and collision geometries. Joints describe constraints on the relative motion of bodies in a model. Equipped with such a graph of $n$ bodies connected via joints with forces acting on them, ABA computes the joint accelerations $\mathbf{\ddot{q}}$ in $O(n)$ operations. Following the calculation of the accelerations, we implement semi-implicit Euler integration (referred to as $\operatorname{INT}(\cdot)$ in Fig. 2) to compute the velocities and positions of the joints and bodies at the current instant $t$ given time step $\Delta_{t}$ :

In control scenarios, external forces are applied to the kinematic tree, which are then propagated through the joints and bodies of the physical model. This propagation is efficiently calculated using the Recursive Newton-Euler Algorithm . For body $i$ , let $\lambda(i)$ denote the predecessor body and $\mu(i)$ denote successor bodies. We denote the allowable motion subspace matrix of the joint by $S_{i}$ , and the spatial inertia matrix by $I_{i}$ . Given the velocity $v_{i}$ and acceleration $a_{i}$ of body $i$ , we may compute

Denoting the net force on body $i$ as $f_{i}^{B}$ , we use the physical equation of motion to relate this force to the body acceleration

We can separate this force into $f_{i}$ , the force transmitted from body $\lambda(i)$ across joint $i$ , and $f_{i}^{x}$ , the external force acting on body $i$ (such as gravity). Then

which lets us easily calculate $f_{i}$ , the force transmitted across each joint as

Finally, we may calculate the generalized force vector $\tau_{i}$ at joint $i$ as

While the analytical gradients of the rigid-body dynamics algorithms can be derived manually , we choose to implement the entire physics engine in the reverse-mode automatic differentiation framework Stan Math . Automatic differentiation allows us to compute gradients of any quantity involved in the simulation of complex systems, opening avenues to state estimation, optimal control and system design. Enabled by low-level optimization, our C++ implementation is designed to lay the foundations for real-time performance on physical robotic systems in the future.

Experiments

To act autonomously, intelligent agents need to understand the world through high-dimensional sensor inputs, like cameras. We demonstrate that our approach is able to infer the relevant physical parameters of the environment dynamics from these types of high-dimensional observations. We optimize the weights of an autoencoder network trained to predict the future visual state of a dynamical system, with our physics layer serving as the bottleneck layer. In this exemplar scenario, given an image of a three-link compound pendulum simulated in the MuJoCo physics simulator at time $t$ , the model is tasked to predict the future rendering of this pendulum $H$ time steps ahead. Compound pendula are known to exhibit chaotic behavior, i.e. given slightly different initial conditions (such as link lengths, starting angles, etc.), the trajectories drift apart significantly. Therefore, IDS must recover the true physical parameters accurately in order to generate valid motions that match the training data well into the future.

Given a dataset $D$ of ground-truth pairs of images $(o_{t}^{*},o_{t+H}^{*})$ and ground-truth joint coordinates $(\mathbf{q}_{t}^{*},\mathbf{q}_{t+1}^{*},\mathbf{\dot{q}}_{t}^{*},\mathbf{\dot{q}}_{t+1}^{*})$ , we optimize a triplet loss using the Adam optimizer that jointly trains the individual components of the autoencoder:

As a baseline from the intuitive physics literature, we train a graph neural network model based on on the first 800 frames of a 3-link pendulum motion. When we let the graph network predict 20 time steps into the future from a point after these 800 training samples, it returns false predictions where the pendulum is continuing to swing up, even though such motion would violate Newton’s laws. Such behavior is typical for fully learned models, which mostly achieve accurate predictions within the domain of the training examples. By contrast, IDS imposes a strong inductive bias, which allows the estimator to make accurate predictions far into the future (Fig. 4).

2 Automatic Robot Design

Industrial robotic applications often require a robot to follow a given tool path. In general, robotic arms with 6 or more degrees of freedom provide large workspaces and redundant configurations to reach any possible point within the workspace. However, motors are expensive to produce, maintain, and calibrate. Designing arms that contain a minimal number of motors required for a task provides economic and reliability benefits, but imposes constraints on the kinematic design of the arm.

One standard for specifying the kinematic configuration of a serial robot arm is the Denavit-Hartenberg (DH) parameterization. For each joint $i$ , the DH parameters are $(d_{i},\theta_{i},a_{i},\alpha_{i})$ . The preceding motor axis is denoted by $z_{i-1}$ and the current motor axis is denoted by $z_{i}$ . $d_{i}$ describes the distance to the joint $i$ projected onto $z_{i-1}$ and $\theta_{i}$ specifies the angle of rotation about $z_{i-1}$ . $a_{i}$ specifies the distance to joint $i$ in the direction orthogonal to $z_{i-1}$ and $\alpha_{i}$ describes the angle between $z_{i}$ and $z_{i-1}$ , rotated about the $x$ -axis of the preceding motor coordinate frame. We are primarily interested in arms with motorized revolute joints, and thus $\theta_{i}$ becomes the $q_{i}$ parameter of our joint state. We can thus fully specify the relevant kinematic properties of a serial robot arm with $N$ degrees of freedom (DOF) as $R=\{d_{0},a_{0},\alpha_{0},\ldots,d_{N},a_{N},\alpha_{N}\}$ .

where the forward kinematics function $\operatorname{KIN}(\cdot)$ maps from joint space to Cartesian tool positions conditioned on DH parameters $R$ . Since we compute $\operatorname{KIN}(\cdot)$ using our engine, we may compute derivatives to arbitrary inputs to this function (cf. Fig. 2), and use gradient-based optimization through L-BFGS to converge to arm designs which accurately perform the trajectory-following task, as shown in Fig. 5.

3 Adaptive MPC

Throughout our control experiments, we optimize a trajectory for an $n$ -link cartpole to swing up from arbitrary initial configuration of the joint angles. In the case of double cartpole, i.e. a double inverted pendulum on a cart, the state space is defined as $\mathbf{x}=\left(p,\dot{p},\sin q_{0},\cos q_{0},\sin q_{1},\cos q_{1},\dot{q}_{0},\dot{q}_{1},\ddot{q}_{0},\ddot{q}_{1}\right),$ where $p$ and $\dot{p}$ refer to the cart’s position and velocity, $q_{0},q_{1}$ to the joint angles, and $\dot{q}_{0},\dot{q}_{1},\ddot{q}_{0},\ddot{q}_{1}$ to the velocities and accelerations of the revolute joints of the poles, respectively. For a single cartpole the state space is analogously represented, excluding the second revolute joint coordinates $q_{1},\dot{q}_{1},\ddot{q}_{1}$ . The cost is defined as the norm of the control plus the Euclidean distance between the cartpole’s current state and the goal state $\mathbf{x}^{*}=\left(0,0,0,1,0,1,0,0,0,0\right),$ at which the pole is upright at zero angular velocity and acceleration, and the cart is centered at the origin with zero positional velocity.

Trajectory optimization assumes that the dynamics model is accurate w.r.t the real world and generates sequences of actions that achieve optimal behavior towards a given goal state, leading to open-loop control. Model-predictive control (MPC) leverages trajectory optimization in a feedback loop where the next action is chosen as the first control computed by trajectory optimization over a shorter time horizon with the internal dynamics model. After some actions are executed in the real world and subsequent state samples are observed, adaptive MPC (Algorithm 1) fits the dynamics model to these samples to align it closer with the real-world dynamics. In this experiment, we want to investigate how differentiable physics can help overcome the domain shift that poses an essential challenge of model-based control algorithms that are employed in a different environment. To this end, we incorporate IDS as dynamics model in such receding-horizon control algorithm to achieve swing-up motions of a single and double cartpole in the DeepMind Control Suite environments that are based on the MuJoCo simulator .

We fit the parameters $\theta$ of the system by minimizing the state-action prediction loss:

Thanks to the low dimensionality of the model parameter vector $\theta$ (for a double cartpole there are 14 parameters, cf. Fig. 6), efficient optimizers such as the quasi-Newton optimizer L-BFGS are applicable, leading to fast convergence of the fitting phase, typically within 10 optimization steps. The length $T$ of one episode is 140 time steps. During the first episode we fit the dynamics model more often, i.e. every 50 time steps, to warm-start the receding-horizon control scheme. Given a horizon size $H$ of 20 and 40 time steps, MPC is able to find the optimal swing-up trajectory for the single and double cartpole, respectively.

Within a handful of training episodes, adaptive MPC infers the correct model parameters involved in the dynamics of double cartpole (Fig. 6). As shown in Fig. 1, the models we start from in IDS do not match their counterparts from DeepMind Control Suite. For example, the poles are represented by capsules where the mass is distributed across these elongated geometries, whereas initially in our IDS model, the center of mass of the poles is at the end of them, such that they have different inertia parameters. We set the masses, lengths of the links, and 3D coordinates of the center of masses to 2, and using a few steps of the optimizer and less than 100 transition samples, converge to a much more accurate model of the true dynamics in the MuJoCo environment. On the example of a cartpole, Fig. 8 visualizes the predicted and actual dynamics for each state dimension after the first (left) and third (right) episode.

Related Work

Degrave et al. implemented a differentiable physics engine in the automatic differentiation framework Theano. IDS is implemented in C++ using Stan Math which enables reverse-mode automatic differentiation to efficiently compute gradients, even in cases where the code branches significantly. Analytical gradients of rigid-body dynamics algorithms have been implemented efficiently in the Pinnocchio library to facilitate optimal control and inverse kinematics. These are less general than our approach since they can only be used to optimize for a number of hand-engineered quantities. Simulating non-penetrative multi-point contacts between rigid bodies requires solving a linear complementarity problem (LCP), through which differentiate using the differentiable quadratic program solver OptNet . While our proposed model does not yet incorporate contact dynamics, we are able to demonstrate the scalability of our approach on versatile applications of differentiable physics to common 3D control domains.

Learning dynamics models has a tradition in the field of robotics and control theory. Early works on forward models and locally weighted regression yielded control algorithms that learn from previous experiences. More recently, a variety of novel deep learning architectures have been proposed to learn intuitive physics models. Inductive bias has been introduced through graph neural networks , particularly interaction networks that are able to learn rigid and soft body dynamics. By incorporating more structure into the learning problem, Deep Lagrangian Networks represent functions in the Lagrangian mechanics framework using deep neural networks. Besides novel architectures, vision-based machine learning approaches to predict the future outcomes of the state of the world have been proposed .

The approach of adapting the simulator to real world dynamics, which we demonstrate through our adaptive MPC algorithm in Sec. 3.3, has been less explored. While many previous works have shown to adapt simulators to the real world using system identification and state estimation , few have shown adaptive model-based control schemes that actively close the feedback loop between the real and the simulated system . Instead of using a simulator, model-based reinforcement learning is a broader field , where the system dynamics, and state-action transitions in particular, are learned to achieve higher sample efficiency compared to model-free methods. Within this framework, predominantly Gaussian Processes and neural networks have been proposed to learn the dynamics and optimize policies.

Future Work

We plan to continue this contribution in several ways. IDS can only model limited dynamics due to its lack of a contact model. Modeling collision and contact in a plausibly differentiable way is an exciting topic that will greatly expand the number of environments that can be modeled.

We are interested in exploring the loss surfaces of redundant physical parameters in IDS, where different models may have equivalent predictive power over the given task horizon. Resolving couplings between physical parameters can give rise to exploration strategies that expose properties of the physical system which allow our model to systematically calibrate itself. By examining the generalizability of models on these manifolds, we hope to establish guarantees of performance and prediction for specific tasks.

Conclusion

We introduced interactive differentiable simulation (IDS), a novel differentiable layer in the deep learning toolbox that allows for inference of physical parameters, optimal control and system design. Being constrained to the laws of physics, such as conservation of energy and momentum, our proposed model is interpretable in that its parameters have physical meaning. Combined with established learning algorithms from computer vision and receding horizon planning, we have shown how such a physics model can lead to significant improvements in sample efficiency and generalizability. Within a handful of trials in the test environment, our gradient-based representation of rigid-body dynamics allows an adaptive MPC scheme to infer the model parameters of the system thereby allowing it to make predictions and plan for actions many time steps ahead.