SAL: Sign Agnostic Learning of Shapes from Raw Data
Matan Atzmon, Yaron Lipman
Introduction
Recently, deep neural networks have been used to reconstruct, learn and generate 3D surfaces. There are two main approaches: parametric and implicit . In the parametric approach neural nets are used as parameterization mappings, while the implicit approach represents surfaces as zero level-sets of neural networks:
In this paper we advocate Sign Agnostic Learning (SAL), defined by a family of loss functions that can be used directly with raw (unsigned) geometric data and produce signed implicit representations of surfaces. An important application for SAL is in generative models such as variational auto-encoders , learning shape spaces directly from the raw 3D data. Figure 1 depicts an example where collectively learning a dataset of raw human scans using SAL overcomes many imperfections and artifacts in the data (left in every gray pair) and provides high quality surface reconstructions (right in every gray pair) and shape space (interpolations of latent representations are in gold).
We have experimented with SAL for surface reconstruction from point clouds as well as learning a human shape space from the raw scans of the D-Faust dataset . Comparing our results to current approaches and baselines we found SAL to be the method of choice for learning shapes from raw data, and believe SAL could facilitate many computer vision and computer graphics shape learning applications, allowing the user to avoid the tedious and unsolved problem of surface reconstruction in preprocess. Our code is available at https://github.com/matanatz/SAL.
Previous work
Learning collections of shapes is done using Generative Adversarial Networks (GANs) , auto-encoders and variational auto-encoders , and auto-decoders . Wu et al. use GAN on a voxel grid encoding of the shape, while Ben-Hamu et al. apply GAN on a collection of conformal charts. Dai et al. use encoder-decoder architecture to learn a signed distance function to a complete shape from a partial input on a volumetric grid. Stutz et al. use variational auto-encoder to learn an implicit surface representations of cars using a volumetric grid. Baqautdinov et al. use variational auto-encoder with a constant mesh to learn parametrizations of faces shape space. Litany et al. use variational auto-encoder to learn body shape embeddings of a template mesh. Park et al. use auto-decoder to learn implicit neural representations of shapes, namely directly learns a latent vector for every shape in the dataset. In our work we also make use of a variational auto-encoder but differently from previous work, learning is done directly from raw 3D data.
2 Surface reconstruction.
Many surface reconstruction methods require normal or inside/outside information. Carr et al. were among the first to suggest using a parametric model to reconstruct a surface by computing its implicit representation; they use radial basis functions (RBFs) and regress at inside and outside points computed using oriented normal information. Kazhdan et al. solve a Poisson equation on a volumetric discretization to extend points and normals information to an occupancy indicator function. Walder et al. use radial basis functions and solve a variational hermite problem (i.e., fitting gradients of the implicit to the normal data) to avoid trivial solution. In general our method works with a non-linear parameteric model (MLP) and therefore does not require a-priori space discretization nor works with a fixed linear basis such as RBFs.
More related to this paper are surface reconstruction methods that work with unsigned data such as point clouds and triangle soups. Zhao et al. use the level-set method to fit an implicit surface to an unoriented point cloud by minimizing a loss penalizing distance of the surface to the point cloud achieving a sort of minimal area surface interpolating the points. Walder et al. formulates a variational problem fitting an implicit RBF to an unoriented point cloud data while minimizing a regularization term and maximizing the norm of the gradients; solving the variational problem is equivalent to an eigenvector problem. Mullen et al. suggests to sign an unsigned distance function to a point cloud by a multi-stage algorithm first dividing the problem to near and far field sign estimation, and propagating far field estimation closer to the zero level-set; then optimize a convex energy fitting a smooth sign function to the estimated sign function. Takayama et al. suggested to orient triangle soups by minimizing the Dirichlet energy of the generalized winding number noting that correct orientation yields piecewise constant winding number. Xu et al. suggested to compute robust signed distance function to triangle soups by using an offset surface defined by the unsigned distance function. Zhiyang et al. fit an RBF implicit by optimizing a non-convex variational problem minimizing smoothness term, interpolation term and unit gradient at data points term. All these methods use some linear function space; when the function space is global, e.g. when using RBFs, model fitting and evaluation are costly and limit the size of point clouds that can be handled efficiently, while local support basis functions usually suffer from inferior smoothness properties . In contrast we use a non-linear function basis (MLP) and advocate a novel and simple sign agnostic loss to optimize it. Evaluating the non-linear neural network model is efficient and scalable and the training process can be performed on a large number of points, e.g., with stochastic optimization techniques.
Sign agnostic learning
We introduce the Sign Agnostic Learning (SAL) defined by a loss of the form
To theoretically motivate the loss family in equation 2 we will prove that it possess a plane reproduction property. That is, if the data is contained in a plane, there is a critical weight reconstructing this plane as the zero level-set of . Plane reproduction is important for surface approximation since surfaces, by definition, have an approximate tangent plane almost everywhere .
We will explore instantiations of SAL based on different choices of unsigned distance functions , as follows.
We consider two -distance functions: For we have the standard (Euclidean) distance
Although many choices exist for the unsigned similarity function, in this paper we take
The choice of is depending on the particular choice of . For distance, it is enough to make the simple choice of splatting an isotropic Gaussian, , at every point (uniformly randomized) ; we denote this probability ; note that can be taken to be a function of to reflect local density in . In this case, the loss takes the form
For the distance however, only for and therefore a non-continuous density should be used; we opt for , where is the delta distribution measure concentrated at . The loss takes the form
Remarkably, the latter loss requires only randomizing points near the data samples without any further computations involving . This allows processing of large and/or complex geometric data.
Although SAL can work with different parametric models, in this paper we consider a multilayer perceptron (MLP) defined by
In this paper we use or , where is a parameter. Furthermore, similarly to previous work we have incorporated a skip connection layer , concatenating the input to the middle hidden layer, that is , where here is a hidden variable in .
Notice that both and its signed version are local minima of the loss in equation 2. These local minima are stable in the sense that there is an energy barrier when moving from one to the other. For example, to get to a solution as in Figure 2(b) from the solution in Figure 2(d) one needs to flip the sign in the interior or exterior of the region defined by the black line. Changing the sign continuously will result in a considerable increase to the SAL loss value.
We elaborate on our initialization method, , that in practice favors the signed version of in the next section.
Geometric network initialization
A key aspect of our method is a proper, geometrically motivated initialization of the network’s parameters. For MLPs, equations 8-9, we develop an initialization of its parameters, , so that , where is the signed distance function to an -radius sphere. The following theorem specify how to pick to achieve this:
Figure 3 depicts level-sets (zero level-sets in bold) using the initialization of Theorem 1 with the same 8-layer MLP (using ) and increasing width of 100, 200, and 2000 neurons in the hidden layers. Note how the approximation improves as the layers’ width increase, while the sphere-like (in this case circle-like) zero level-set remains topologically correct at all approximation levels.
The proof to Theorem 1 is provided in the supplementary material; it is a corollary of the following theorem, showing how to chose the initial weights for a single hidden layer network:
Properties
Plane reproduction is a key property to surface approximation methods since, in essence, surfaces are locally planar, i.e., have an approximating tangent plane almost everywhere . In this section we provide a theoretical justification to SAL by proving a plane reproduction property. We first show this property for a linear model (i.e., a single layer MLP) and then show how this implies local plane reproduction for general MLPs.
This theorem can be applied locally when optimizing a general MLP (equation 8) with SAL to prove local plane reproduction. See supplementary for more details.
2 Convergence to the limit signed function
The SAL loss pushes the neural implicit function towards a signed version of the unsigned distance function . In the case it is the inside/outside indicator function of the surface, while for it is a signed version of the Euclidean distance to the data . Figure 4 shows advanced epochs of the 2D experiment in Figure 2; note that the in these advanced epochs is indeed closer to the signed version of the respective . Since the indicator function and the signed Euclidean distance are discontinuous across the surface, they potentially impose quantization errors when using standard contouring algorithms, such as Marching Cubes , to extract their zero level-set. In practice, this phenomenon is avoided with a standard choice of stopping criteria (learning rate and number of iterations). Another potential solution is to add a regularization term to the SAL loss; we mark this as future work.
Experiments
2 Learning shape space from raw scans
In the main experiment of this paper we trained on the D-Faust scan dataset , consisting of approximately 41k raw scans of 10 humans in multiple posesDue to the dense temporal sampling in this dataset we experimented with a 1:5 sample.. Each scan is a triangle soup, , where common defects include holes, ghost geometry, and noise, see Figure 1 for examples.
We use SAL loss with distance, i.e., the unsigned distance to the triangle soup , and combine it with a variational auto-encoder type loss :
where , is the 1-norm, encourages the latent prediction to be close to the origin, while encourages the variances to be constant ; together, these enforce a regularization on the latent space. is a balancing weight chosen to be .
We compared versus three baseline methods. First, AtlasNet , one of the only existing algorithms for learning a shape collection from raw point clouds. AtlasNet uses a parametric representation of surfaces, which is straight-forward to sample. On the down side, it uses a collection of patches that tend to not overlap perfectly, and their loss requires computation of closest points between the generated and input point clouds which poses a challenge for learning large point clouds. Second, we approximate a signed distance function, , to the data in two different ways, and regress them using an MLP as in DeepSDF ; we call these methods SignReg. Note that Occupancy Networks and regress a different signed distance function and perform similarly.
To approximate the signed distance function, , we first tried using a state of the art surface reconstruction algorithm to produce watertight manifold surfaces. However, only 28684 shapes were successfully reconstructed ( of the dataset), making this option infeasible to compute . We have opted to approximate the signed distance function similar to with , where is the closest point to in and is the normal at . To approximate the normal we tested two options: (i) taking directly from the original scan with its original orientation; and (ii) using local normal estimation using Jets followed by consistent orientation procedure based on minimal spanning tree using the CGAL library .
Table 1 and Figure 6 show the result on a random 75%-25% train-test split on the D-Faust raw scans. We report the 5%, 50% (median), and 95% percentiles of the Chamfer distances between the surface reconstructions and the raw scans (one-sided Chamfer from reconstruction to scan), and ground truth registrations. The SAL and SignReg reconstructions were generated by a forward pass of a point cloud sampled from the raw unseen scans, yielding an implicit function . We used the Marching Cubes algorithm to mesh the zero level-set of this implicit function. Then, we sampled uniformly 30K points from it and compute the Chamfer Distance.
Table 2 demonstrates that the latent optimization method further improves predictions quality, compared to a single forward pass. In 7 and 8, we demonstrate few representatives examples, where we plot left to right in each column: input test scan, SAL reconstruction with forward pass alone, and SAL reconstruction with latent optimization. Failure cases are shown in the bottom-right. Despite the little variability of humans in the training dataset (only 8 humans), 7 shows that SAL can usually fit a pretty good human shape to the unseen human scan using a single forward pass reconstruction; using latent optimization further improves the approximation as can be inspected in the different examples in this figure.
Figure 8 shows how a single forward reconstruction is able to predict the pose correctly, where latent optimization improves the prediction in terms of shape and pose.
SAL’s limitation is mainly in capturing thin structures. Figure 9 shows reconstructions (obtained similarly to 6.1) of a chair and a plane from the ShapeNet dataset; note that some parts in the chair back and the plane wheel structure are missing.
Conclusions
We introduced SAL: Sign Agnostic Learning, a deep learning approach for processing raw data without any preprocess or need for ground truth normal data or inside/outside labeling. We have developed a geometric initialization formula for MLPs to approximate the signed distance function to a sphere, and a theoretical justification proving planar reproduction for SAL. Lastly, we demonstrated the ability of SAL to reconstruct high fidelity surfaces from raw point clouds, and that SAL easily integrates into standard generative models to learn shape spaces from raw geometric data. One limitation of SAL was mentioned in Section 5, namely the stopping criteria for the optimization.
Using SAL in other generative models such as generative adversarial networks could be an interesting follow-up. Another future direction is global reconstruction from partial data. Combining SAL with image data also has potentially interesting applications. We think SAL has many exciting future work directions, progressing geometric deep learning to work with unorganized, raw data.
The research was supported by the European Research Council (ERC Consolidator Grant, ”LiftMatch” 771136), the Israel Science Foundation (Grant No. 1830/17) and by a research grant from the Carolito Stiftung (WAIC).
References
Appendix
We train the network in the surface reconstruction experiment with Adam optimizer , learning rate for 5000 epochs. Training was done on an Nvidia V-100 GPU, using pytorch deep learning framework .
1.2 Learning shape space
Our network architecture is Encoder-Decoder based, where for the encoder we have used PointNet and DeepSets layers. Each layer is composed of
where is the concat operation. Our Architecture is
Training our networks for learning the shape space of the D-Faust dataset was done with the following choices. We have used the Adam optimizer , initialized with learning rate and batch size of . We scheduled the learning rate to decrease every 500 epochs by a factor of 0.5. We stopped the training process after 2000 epochs. Training was done on 4 Nvidia V-100 GPUs, using pytorch deep learning framework .
2 Additional Experiments
One of the key advantages of SAL is that it can be used for reconstructing a surface from a single input scan or incorporated into a VAE architecture for learning a shape space from an entire scans dataset. This raises an interesting question, whether learning a shape space has also an impact on the quality of the reconstructions. To answer this question, we ran SAL surface reconstruction on each of the scans used for training the main experiment of the paper (See table 2 for more details). When comparing our SAL VAE training results on the registrations (ground truth) versus SAL single reconstruction we see differences in favor of our VAE learner, whereas the results on the original scans are comparable. That is, SAL single reconstruction results are on the registrations and scans for the percentiles respectively.
Figure 10 shows reconstructions of test scans for different stages of training on the D-Faust dataset. Given the main paper discussion on SAL limit signed function, we additionally add reconstructions from relatively advanced epoch as , showing that no error in contouring occur.
3 Proofs
For simplicity, we restrict our attention to absolutely continuous measures , that is defined by a continuous density function . Generalizing to measures with a discrete part (such as the one we use for the distance, for example) can be proven similarly.
Denoting , the loss in equation 2 can be written as
where .
As is anti-symmetric, is symmetric, i.e., and therefore
Plugging these in the integral after the change of variables we reach
The last integral is scalar and we denote its integrand by (remember that ), i.e.,
Let us show the existence of . Let such that and . Then,
Furthermore, since we have that
the existence of is established. The case of is proven similarly. ∎
3.2 Local plane reproduction with MLP
By ”reconstructs locally” we mean that is critical for the loss if is sufficiently concentrated around any point in .
where is the Heaviside function.
and are constant diagonal matrices. Over this domain we have
3.3 Proof of Theorem 1
It is enough to consider a single layer: . For brevity let . Now,
Note that the entries of are distributed i.i.d. . Hence by the law of large numbers the last term converge to
3.4 Proof of Theorem 2
For brevity we denote . Note that plugging in we get , where is the row of . Let denote the density of multivariate normal distribution . By the law of large numbers, the first term converges to