Neural network identifiability for a family of sigmoidal nonlinearities

Verner Vlačić, Helmut Bölcskei

I Introduction

Deep learning has become a highly successful machine learning method employed in a wide range of applications such as optical character recognition , image classification , and speech recognition . In a typical deep learning scenario one aims to fit a parametric model, realized by a deep neural network, to match a set of training data points. In order to make the ensuing discussion more concrete, we begin with the definition of a neural network and the map it realizes under a nonlinearity.

$L$ is a positive integer, referred to as the depth of $\mathcal{N}$ ,

$(D_{0},D_{1},\dots,D_{L})$ is an $(L+1)$ -tuple of positive integers, called the layout,

where $\rho$ acts on real vectors in a componentwise fashion.

Given positive integers $D_{in}$ and $D_{out}$ , define $\mathscr{N}^{D_{in},D_{out}}$ to be the set of all neural networks whose layouts $(D_{0},\dots,D_{L})$ satisfy $D_{0}=D_{in}$ and $D_{L}=D_{out}$ , but are otherwise arbitrary. Let $\mathscr{N}$ be a subset of $\mathscr{N}^{D_{in},D_{out}}$ , $\rho$ a nonlinearity, and $\sim$ an equivalence relation on $\mathscr{N}^{D_{in},D_{out}}$ .

We say that $\sim$ is compatible with $(\mathscr{N},\rho)$ if, for all $\mathcal{N}_{1},\mathcal{N}_{2}\in\mathscr{N}$ ,

We say that $(\mathscr{N},\rho)$ is identifiable up to $\sim$ if, for all $\mathcal{N}_{1},\mathcal{N}_{2}\in\mathscr{N}$ ,

Thus, by informally saying that a neural network $\mathcal{N}_{1}$ in a certain class is identifiable, we mean that any neural network $\mathcal{N}_{2}$ in the same class giving rise to the same output map, i.e., $\langle{\mathcal{N}_{1}}\rangle^{\rho}=\langle{\mathcal{N}_{2}}\rangle^{\rho}$ , is necessarily equivalent to $\mathcal{N}_{2}$ . The role of the equivalence relation $\sim$ in the previous definition is thus to “measure the degree of non-uniqueness”, and in particular, to accommodate symmetries within the network that may arise either from symmetries induced by the network weights and biases (such as the presence of clone pairs, to be introduced in Definition 5), symmetries of the nonlinearity (e.g., $\tanh$ is odd), or both simultaneously. These abstract concepts will be incarnated momentarily when discussing the seminal work by Fefferman , and in Section II through Definitions 4 and 5, as well as in the examples leading up to the formulation of the paper’s main results.

In , Fefferman showed that neural networks satisfying the following genericity conditions are, indeed, uniquely determined by the map they realize under the nonlinearity $\rho=\tanh$ , up to certain obvious isomorphisms of networks:

More precisely, for fixed positive integers $D_{in}$ and $D_{out}$ , Fefferman showed that $(\mathscr{N}_{A1}^{D_{in},D_{out}},\tanh)$ is identifiable up to $\sim_{\pm}$ , where $\mathscr{N}_{A1}^{D_{in},D_{out}}$ is defined as the set of all neural networks in $\mathscr{N}^{D_{in},D_{out}}$ satisfying Assumptions 1, and $\sim_{\pm}$ is defined by stipulating that $\mathcal{N}\sim_{\pm}\widetilde{\mathcal{N}}$ if and only if

$L=\widetilde{L}$ and $(D_{0},D_{1},\dots,D_{L})=(\widetilde{D}_{0},\widetilde{D}_{1},\dots,\widetilde{D}_{L})$ , and

Indeed, Fefferman remarks explicitly that it would be interesting to replace Assumptions 1 with minimal hypotheses, and to study nonlinearities other than $\tanh$ . The present paper aims to address these two issues. Characterizing the fundamental nature of conditions necessary for identifiability with respect to a fixed nonlinearity, even a simple one such as $\tanh$ , is likely a rather formidable task. In fact, the minimal identifiability conditions may generally depend on “fine” properties of the nonlinearity under consideration, and it is hence unclear how much insight can be obtained by having conditions that are specific to a given nonlinearity. We will thus be interested in an identification result with very mild conditions on the weights and biases of the neural networks to be identified, while still accommodating a broad class of nonlinearities.

II Contributions

We begin with two motivating examples. These lead up to the statements of our main contributions, whose corresponding proofs are developed in the remainder of the paper. We consider nonlinearities $\rho$ which are not necessarily odd (as $\tanh$ ), and thus need an equivalence relation which dispenses with sign changes.

We say that the neural networks $\mathcal{N}$ and $\widetilde{\mathcal{N}}$ are isomorphic, and write $\mathcal{N}\simeq\widetilde{\mathcal{N}}$ , if

$L=\widetilde{L}$ and $(D_{0},D_{1},\dots,D_{L})=(\widetilde{D}_{0},\widetilde{D}_{1},\dots,\widetilde{D}_{L})$ , and

If $\mathcal{N}$ does not have a clone pair, we say that $\mathcal{N}$ satisfies the no-clones condition.

As the nonlinearity $\rho$ in the example above is completely arbitrary, the no-clones condition is necessary to have any hope of obtaining identifiability up to $\simeq$ . Hence, with our program in mind, given positive integers $D_{in}$ and $D_{out}$ , we define

and seek nonlinearities $\rho$ such that $(\mathscr{N}^{D_{in},D_{out}}_{nc},\rho)$ is identifiable up to $\simeq$ . As any class strictly containing $\mathscr{N}^{D_{in},D_{out}}_{nc}$ , paired with any nonlinearity, fails identifiability up to $\simeq$ , the no-clones condition furnishes a canonical minimal assumption for identifiability up to $\simeq$ . Similarly to $\mathscr{N}^{D_{in},D_{out}}_{A1}$ , the class $\mathscr{N}^{D_{in},D_{out}}_{nc}$ , paired with any measurable nonlinearity $\rho$ such that $\displaystyle\lim_{x\to\infty}\rho(x)$ and $\displaystyle\lim_{x\to-\infty}\rho(x)$ exist and are not equal, satisfies the universal approximation property in the sense of Hornik and Cybenko . The following example demonstrates that insisting on the no-clones condition as the only assumption on the weights, biases, and layout will necessarily come at the cost of restricting the class of nonlinearities that allow for identifiability. Let $\rho(x)=\min\{1,\max\{0,x\}\}$ be the clipped rectified linear unit (ReLU) function. Note that

Now, given an arbitrary neural network $\mathcal{N}=(W^{1},\theta^{1},W^{2},\theta^{2},\dots,W^{L},\theta^{L})$ with $D_{L}=1$ satisfying the no-clones condition, the network

Concretely, we have the following main result of this paper.

The function $\bm{1}$ is included in the linearly independent set both for the sake of greater generality of the statement, and to facilitate the proof of Theorem 2.

for $m\in\{1,2,3,4\}$ . As $\mathcal{N}$ satisfies the no-clones condition, the networks $\mathcal{N}_{m}$ , $m\in\{1,2,3,4\}$ , also satisfy the no-clones condition, and are pairwise non-isomorphic.

which stands in contradiction to Theorem 2. This completes the proof of Theorem 1.

$\sigma$ is $i$ -periodic, i.e., $\sigma(z+i)=\sigma(z)$ , for all $z\in\mathcal{D}$ , and

Amalgamation: In Section III we construct a neural network $\mathcal{M}\in\mathscr{N}^{1,n}_{nc}$ , called the amalgam of $\{\mathcal{N}_{j}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}$ , containing each $\mathcal{N}_{j}$ as a subnetwork. In particular, we have ${(\langle{\mathcal{M}}\rangle^{\sigma})}_{j}=\langle{\mathcal{N}_{j}}\rangle^{\sigma}$ , for all $j\in\{1,\dots,n\}$ . The linear dependence of $\{\langle{\mathcal{N}_{j}}\rangle^{\sigma}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}$ thus translates to

Input anchoring. We then construct a third network $\mathcal{N}\in\mathscr{M}$ , obtained by fixing $k-1$ of the $k$ inputs of $\mathcal{M}^{\prime\prime}$ to specific real numbers, and “cutting out” all the parts of the network whose contributions to the output map have become constant in the process. The resulting network $\mathcal{N}$ will be a network in $\mathscr{M}$ of size smaller than $\mathcal{M}^{\prime}$ , which contradicts the minimality of $\mathcal{M}^{\prime}$ , and thereby completes the proof.

Input anchoring. Finally, we apply an input anchoring procedure to $\mathcal{M}^{\prime\prime}$ similar to the one described above. Even though now $\mathcal{M}^{\prime\prime}$ is not a network in the sense of Definition 1, the input anchoring procedure will result in a network $\mathcal{N}\in\mathscr{M}$ which is a network in the sense of Definition 1, and is of smaller size than $\mathcal{M}^{\prime}$ , again completing the proof by contradiction.

We conclude this section by laying out the organization of the remainder of the paper. In Section III we develop a graph-theoretic framework needed to define amalgams of neural networks and several other technical concepts. In Section IV we state results from complex analysis and Kronecker’s theorem needed in arguments involving analytic continuation and input splitting, respectively. The proofs of these results are relegated to the Appendix. In Section V we discuss the fine structural properties of the function $\sigma$ constructed in the proof of Theorem 2. Finally, Section VI contains the proofs of our two main results.

III Directed acyclic graphs, general neural networks, and neural network amalgams

As already mentioned, in the proof of Theorem 2 we will work with a form of neural networks that does not fit in with Definitions 1 and 2. In order to accommodate this notion of neural networks, and to lighten the manipulations needed to formalize the aforementioned techniques of amalgamation and input anchoring, we introduce a graph-theoretic framework.

We start by introducing the concept of a directed acyclic graph (DAG), commonly encountered in the graph theory literature .

A directed graph is an ordered pair $G=(V,E)$ where $V$ is a finite set of nodes, and $E\subset V\times V$ is a set of directed edges.

A directed cycle of a directed graph $G$ is a set $\{v_{1},\dots,v_{k}\}\subset V$ such that, for every $j\in\{1,\dots,k\}$ , $(v_{j},v_{j+1})\in E$ , where we set $v_{k+1}\vcentcolon=v_{1}$ .

A directed graph $G$ is said to be a directed acyclic graph (DAG) if it has no directed cycles.

We interpret an edge $(v,\widetilde{v})$ as an arrow connecting the nodes $v$ and $\widetilde{v}$ and pointing at $\widetilde{v}$ .

Since the graph $G$ in Definition 7 is assumed to be acyclic, the level is well-defined for all nodes of $G$ . We are now ready to introduce our generalized definition of a neural network.

A general feed-forward neural network (GFNN) is an ordered sextuple $\mathcal{N}=(V,E,V_{in},\allowbreak V_{out},\Omega,\Theta)$ , where

$G=(V,E)$ is a DAG, called the architecture of $\mathcal{N}$ ,

$V_{out}\subset V\setminus V_{in}$ is the set of outputs of $\mathcal{N}$ ,

Let $\mathcal{N}=(V,E,V_{in},V_{out},\Omega,\Theta)$ be a GFNN. A subnetwork of $\mathcal{N}$ is a GFNN $\mathcal{N}^{\prime}=(V^{\prime},E^{\prime},V_{in}^{\prime},V_{out}^{\prime},\Omega^{\prime},\Theta^{\prime})$ such that there exists a set $S\subset V$ so that

$E^{\prime}=\{(v,\widetilde{v})\in E:v,\widetilde{v}\in V^{\prime}\}$ ,

$\Omega^{\prime}=\{\omega_{\widetilde{v}v}:(v,\widetilde{v})\in E^{\prime}\}$ , and

$\Theta^{\prime}=\{\theta_{v}:v\in V^{\prime}\}$ .

If additionally $V_{out}^{\prime}=S$ , then $\mathcal{N}^{\prime}$ is uniquely specified by $S$ . In this case we say that $\mathcal{N}^{\prime}$ is the ancestor subnetwork of $S$ in $\mathcal{N}$ , and write $\mathcal{N}(S)$ for this network.

We will treat nodes $v\in V$ only as “handles”, and never as variables or functions. This is relevant when dealing with several networks with shared nodes, such as depicted in Figure 2. On the other hand, the output map $\left\langle{v}\right\rangle^{\rho}$ realized by $v$ is a function.

It follows that the natural domain $\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}}$ of a node $u$ is open, as it is the preimage of an open set with respect to a continuous map. Moreover, the output map $\left\langle{u}\right\rangle^{\sigma}$ realized by $u$ is holomorphic on $\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}}$ , as it is given explicitly by a concatenation of affine maps and the nonlinearity $\sigma$ , which are themselves holomorphic functions.

The following definition is a straightforward generalization of Definition 5.

The following definition generalizes Definition 4 to GFNNs, and introduces two new concepts, termed extensional isomorphism and faithful isomorphism, which will play an important technical role throughout the remainder of the paper.

Let $\mathcal{N}^{1}=(V^{1},E^{1},V_{in},V_{out}^{1},\allowbreak\Omega^{1},\Theta^{1})$ and $\mathcal{N}^{2}=(V^{2},E^{2},V_{in},V_{out}^{2},\Omega^{2},\Theta^{2})$ be GFNNs with the same input nodes $V_{in}$ .

We say that $\mathcal{N}^{1}$ and $\mathcal{N}^{2}$ are extensionally isomorphic, and write $\mathcal{N}^{1}\stackrel{{\scriptstyle e}}{{\sim}}\mathcal{N}^{2}$ , if there exists a bijection $\pi:V^{1}\to V^{2}$ , called an extensional isomorphism, such that the following holds:

$\pi$ restricted to $V_{in}$ is the identity map,

for all $(v,\widetilde{v})\in E^{1}$ , we have $\omega^{2}_{\pi(\widetilde{v})\pi(v)}=\omega^{1}_{\widetilde{v}v}$ , and

for all $v\in V^{1}\setminus V_{in}$ , we have $\theta^{2}_{\pi(v)}=\theta^{1}_{v}$ .

We say that $\mathcal{N}^{1}$ and $\mathcal{N}^{2}$ are faithfully isomorphic, and write $\mathcal{N}^{1}\stackrel{{\scriptstyle f}}{{\sim}}\mathcal{N}^{2}$ , if they are extensionally isomorphic via $\pi:V^{1}\to V^{2}$ with the following additional property:

$V_{out}^{1}=V_{out}^{2}$ , and $\pi$ restricted to $V_{out}^{1}$ is the identity map.

In this case we call $\pi$ a faithful isomorphism.

The concept of faithful isomorphisms in Definition 14 generalizes that of isomorphisms according to Definition 4. It is easily seen that extensional isomorphism is an equivalence relation on the set of all GFNNs with the same input nodes, whereas faithful isomorphism is an equivalence relation on the set of all GFNNs with the same input and output nodes. Furthermore, if $\mathcal{N}^{1}\stackrel{{\scriptstyle e}}{{\sim}}\mathcal{N}^{2}$ via $\pi:V^{1}\to V^{2}$ , then we have $\left\langle{\pi(v)}\right\rangle^{\rho,\,\mathcal{N}^{2}}=\left\langle{v}\right\rangle^{\rho,\,\mathcal{N}^{1}}$ , for all $v\in V^{1}$ and any nonlinearity $\rho$ , and if additionally $\mathcal{N}^{1}\stackrel{{\scriptstyle f}}{{\sim}}\mathcal{N}^{2}$ , then $\left\langle{\mathcal{N}^{1}}\right\rangle^{\rho}=\left\langle{\mathcal{N}^{2}}\right\rangle^{\rho}$ .

We say that a GFNN $\mathcal{N}=(V,E,V_{in},V_{out},\Omega,\Theta)$ is non-degenerate if

$V=V^{\mathcal{N}(V_{out})}$ , where $V^{\mathcal{N}(V_{out})}$ is the set of nodes of the ancestor subnetwork of $V_{out}$ in $\mathcal{N}$ . Networks that are not non-degenerate are referred to as degenerate.

Informally, a network is non-degenerate if its every node “leads up” to at least one output. This notion is best understood with the help of examples as in Figure 3.

We are now ready to introduce the concept of amalgams of LFNNs.

Let $\mathcal{A}=(V^{\mathcal{A}},E^{\mathcal{A}},V_{in},V_{out}^{\mathcal{A}},\Omega^{\mathcal{A}},\Theta^{\mathcal{A}})$ be a non-degenerate LFNN with the following properties:

There exist injective maps $\pi_{1}:V^{1}\to\pi_{1}(V^{1})\subset V^{\mathcal{A}}$ and $\pi_{2}:V^{2}\to\pi_{2}(V^{2})\subset V^{\mathcal{A}}$ such that the networks $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ are extensionally isomorphic to the ancestor subnetworks $\mathcal{A}(\pi_{1}(V_{out}^{1}))$ and $\mathcal{A}(\pi_{2}(V_{out}^{2}))$ via $\pi_{1}$ and $\pi_{2}$ , respectively.

$V^{\mathcal{A}}=\pi_{1}(V^{1})\cup\pi_{2}(V^{2})$ and $V_{out}^{\mathcal{A}}=\pi_{1}(V_{out}^{1})\cup\pi_{2}(V_{out}^{2})$ .

We then say that $\mathcal{A}$ is a proto-amalgam of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ .

If $\mathcal{A}$ is a clones-free proto-amalgam of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ , we say that $\mathcal{A}$ is an amalgam of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ .

Let $\mathcal{N}_{1}=(V^{1},E^{1},V_{in},V_{out}^{1},\Omega^{1},\Theta^{1})$ and $\mathcal{N}_{2}=(V^{2},E^{2},V_{in},V_{out}^{2},\Omega^{2},\Theta^{2})$ be non-degenerate clones-free LFNNs with a shared input set $V_{in}$ . Then there exists an amalgam $\mathcal{A}$ of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ . Moreover, the amalgam is unique up to extensional isomorphisms.

As asserted in Proposition 1 (whose proof is deferred to the Appendix), an amalgam of two given non-degenerate clones-free LFNNs $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ always exists and is unique up to extensional isomorphisms. With slight abuse of notation, we will write $\mathcal{N}_{1}\vee\mathcal{N}_{2}$ for an arbitrary element of the equivalence class (induced by $\stackrel{{\scriptstyle e}}{{\sim}}$ ) of all the amalgams of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ . A concrete example of an amalgam construction is provided in Figure 4. Having defined the amalgam of two non-degenerate clones-free LFNNs, we define the amalgam of any finite collection $\mathcal{N}_{1},\dots,\mathcal{N}_{n}$ of non-degenerate clones-free LFNNs according to

By Definition 16, $\bigvee_{k=1}^{n}\mathcal{N}_{k}$ is a non-degenerate clones-free LFNN. Moreover, there exist extensional isomorphisms $\pi_{j}:\mathcal{N}_{j}\to\pi_{j}(\mathcal{N}_{j})\subset\bigvee_{k=1}^{n}\mathcal{N}_{k}$ , for $j\in\{1,\dots,n\}$ , and we have $\left\langle{\pi_{j}(v)}\right\rangle^{\rho,\,\bigvee_{k=1}^{n}\mathcal{N}_{k}}=\left\langle{v}\right\rangle^{\rho,\,\mathcal{N}_{j}}$ , for $j\in\{1,\dots,n\}$ , $v\in V^{\mathcal{N}_{j}}$ , and any nonlinearity $\rho$ .

We are now in a position to prove two lemmas that form the basis for the proof of Theorem 2. The first lemma formalizes the idea of combining multiple pairwise non-isomorphic single-output networks with linearly dependent ouput maps into one multiple-output network with linear dependency among the maps of its ouput nodes.

For all $w\in V_{out}^{{\mathcal{M}_{a}}}$ ,

is constant, and we denote its value by $\left\langle{w}\right\rangle^{\rho,\,\mathcal{M}}\!\left(a\right)$ .

$V^{{\mathcal{M}_{a}}}=\{v\in V^{\mathcal{M}}:\{v_{1}^{0},\dots,v_{D_{0}-1}^{0}\}\cap V^{\mathcal{M}(v)}\neq\varnothing\}$ , where $\mathcal{M}(v)$ denotes the ancestor network of $v$ ,

$E^{{\mathcal{M}_{a}}}=\{(v,\widetilde{v}),v,\widetilde{v}\in V^{{\mathcal{M}_{a}}}\}$ ,

$V_{in}^{{\mathcal{M}_{a}}}=\{v_{1}^{0},\dots,v_{D_{0}-1}^{0}\}$ , $V_{out}^{{\mathcal{M}_{a}}}=V_{out}^{\mathcal{M}}\cap V^{{\mathcal{M}_{a}}}$ , and

$\Omega^{{\mathcal{M}_{a}}}=\{\omega_{\widetilde{v}v}:(v,\widetilde{v})\in E^{{\mathcal{M}_{a}}}\}$ .

For a node $v\in V^{\mathcal{M}}\setminus V^{{\mathcal{M}_{a}}}$ we define recursively

and set $\Theta^{{\mathcal{M}_{a}}}=\{\widetilde{\theta}_{v}:v\in V^{{\mathcal{M}_{a}}}\}$ .

The network ${\mathcal{M}_{a}}$ satisfies (IA-1) and (IA-2) by construction, and if $\mathcal{M}$ is layered, then so is ${\mathcal{M}_{a}}$ . Moreover, ${\mathcal{M}_{a}}$ is non-degenerate. To see this, let $v\in V^{\mathcal{M}_{a}}$ be arbitrary. Then, by non-degeneracy of $\mathcal{M}$ , there exists a $w\in V^{\mathcal{M}}_{out}$ such that $v\in V^{\mathcal{M}(w)}$ . As $w$ is connected directly with a node in $V^{\mathcal{M}_{a}}$ , it follows that $w\in V^{{\mathcal{M}_{a}}}$ , and so $w\in V_{out}^{\mathcal{M}_{a}}$ .

For a pair of nodes $(c_{1},c_{2})\in V^{{\mathcal{M}}}\times V^{{\mathcal{M}}}$ define

Let $S=\{v\in V^{\mathcal{M}(\{c_{1},c_{2}\})}:V_{in}^{\mathcal{M}}\cap V^{\mathcal{M}(v)}=\{v_{D_{0}}^{0}\}\}$ , and set

$E^{{\mathcal{N}}}=\{(v,\widetilde{v}),\;v,\widetilde{v}\in V^{{\mathcal{N}}}\}$ ,

$V_{out}^{\mathcal{N}}=\{c_{1},c_{2}\}\cap V^{\mathcal{N}}$ ,

$\Omega^{\mathcal{N}}=\{\omega_{\widetilde{v}v}:(v,\widetilde{v})\in E^{\mathcal{N}}\}$ ,

As the construction of $\mathcal{N}$ does not depend on $a$ , we can fix an arbitrary $a\in E_{(c_{1},\,c_{2})}$ , and the condition that $c_{1}$ and $c_{2}$ are clones in $\mathcal{M}_{a}$ then implies

where the real numbers $a_{u}$ are defined according to (2). This, together with (4), yields

which would say that $c_{1}$ and $c_{2}$ are clones in $\mathcal{M}$ and hence stands in contradiction to the no-clones property of $\mathcal{M}$ . This establishes the no-clones property of $\mathcal{N}$ . The non-degeneracy of $\mathcal{N}$ follows by its construction. Now, by adding $r$ to both sides of (5) and applying $\rho$ , we find

IV Auxiliary results from complex analysis and Kronecker’s theorem

We state the remaining auxiliary results needed in the proof of our main statements. Since these results are relatively simple consequences of standard results in complex analysis and of Kronecker’s theorem, their proofs are relegated to the appendix.

Recall the definition of the natural domain $\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}}$ of the map realized by a GFNN node $u$ with respect to a holomorphic nonlinearity as given in Definition 12.

In the proof of Theorem 2 it will be crucial that $\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}}$ be connected for all nodes $u$ of a certain GFNN with a single input. The following lemma establishes this fact.

Suppose that $D^{\circ}_{k}(\bm{a},\delta)\subset\mathcal{U}$ , and $F(z)=0$ , for all $z\in T$ . Then $F=0$ identically on $\mathcal{U}$ .

$|t^{n,\bm{s}}|\to\infty$ as $n\to\infty$ ,

V Imaginary period and the self-avoiding property

In other words, a set $S$ is self-avoiding if the union of a finite number of distinct copies of $S$ obtained by translating and scaling by an odd integer contains a real number which is an element of exactly one of the copies.

Moreover, for $r=1,2$ , we have $k_{1}^{r}<k_{2}^{r}<k_{3}^{r}$ if $\omega_{r}>0$ and $k_{1}^{r}>k_{2}^{r}>k_{3}^{r}$ if $\omega_{r}<0$ . Define the index sets

The following proposition formalizes the notion that nonlinearities $\sigma$ of the form considered at the beginning of the chapter are dense in the set of sigmoidal nonlinearities, even after imposing the additional constraint that $S$ be self-avoiding.

Denote $h_{\alpha}=\frac{1}{2}\left(1+\tanh(\alpha\,\cdot\,)\right)$ and consider the function $\rho_{\alpha}$ defined by

VI The main theorems

Let $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ be non-degenerate clones-free LFNNs with the same input and ouput sets $V_{in}$ and $V_{out}$ . Let

Let $\mathcal{N}_{j}$ , $j\in\{1,2,\dots,n\}$ , be non-degenerate clones-free LFNNs with the same input set $V_{in}$ and the same single output node $\{v_{out}\}$ . Furthermore, suppose that no two networks $\mathcal{N}_{j_{1}}$ , $\mathcal{N}_{j_{2}}$ , $j_{1}\neq j_{2}$ , are extensionally isomorphic. Consider the nonlinearity

Before embarking on the proofs of Theorems 3 and 4, we show how Theorems 1 and 2 follow from these two results together with Proposition 3.

and $S_{\alpha}$ is a discrete self-avoiding set (as the self-avoiding property is preserved under scaling by a nonzero real number), so by Theorem 3 we obtain $\mathcal{N}^{\alpha}\stackrel{{\scriptstyle f}}{{\sim}}\widetilde{\mathcal{N}}^{\alpha}$ , which implies $\mathcal{N}\simeq\widetilde{\mathcal{N}}$ . ∎

and $S_{\alpha}$ is a discrete self-avoiding set, so by Theorem 4 we obtain that $\{\langle{\mathcal{N}_{j}^{\alpha}}\rangle^{\sigma_{\alpha}}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}$ is linearly independent. Now, suppose by way of contradiction that there is linear dependency $\lambda_{0}+\sum_{j=1}^{n}\lambda_{j}\,\langle{\mathcal{N}_{j}}\rangle^{\sigma}=0$ among $\{\left\langle{\mathcal{N}_{j}}\right\rangle^{\sigma}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}$ . But then

which contradicts the linear independence of $\{\langle{\mathcal{N}_{j}^{\alpha}}\rangle^{\sigma_{\alpha}}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}$ . We deduce that $\{\left\langle{\mathcal{N}_{j}}\right\rangle^{\sigma}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}$ must be linearly independent, as desired. ∎

Next, note that $\sigma$ is a real meromorphic function whose set of poles is

We now perform a calculation that will enable us to interpret the single input variable of $\mathcal{M}^{\prime}$ as a rational linear combination of $k$ input variables of another LFNN $\mathcal{M}^{\prime\prime}$ , to be specified below. The argument will then proceed by anchoring at all but one of the inputs of $\mathcal{M}^{\prime\prime}$ . It is this last step that uses $k\geq 2$ as a key assumption, as anchoring requires at least two input nodes to be meaningful. We thus have

$V_{in}^{{\mathcal{M}^{\prime\prime}}}=\{u_{1},\dots,u_{k}\}$ is a set of $k$ newly-created input nodes (disjoint from $V^{\mathcal{M}^{\prime}}$ ),

$V_{out}^{{\mathcal{M}^{\prime\prime}}}\vcentcolon=V_{out}^{{\mathcal{M}^{\prime}}}$ ,

Define $\omega_{v_{p}^{1}u_{j}}:=q_{pj}\,\omega_{v_{j}^{1}v_{in}}$ , for $p\in\{1,\dots,D_{1}\}$ , $j\in\{1,\dots,k\}$ , and let

$\Theta^{{\mathcal{M}^{\prime\prime}}}:=\Theta^{\mathcal{M}^{\prime}}$ .

The procedure for constructing ${\mathcal{M}^{\prime\prime}}$ for a given $\mathcal{M}^{\prime}$ is illustrated in Figure 8.

Owing to (19) – (21) and the construction of ${\mathcal{M}^{\prime\prime}}$ , we have the following “input splitting” relationship

where $F_{w}$ corresponds to the map realized by the LFNN with nodes

Now, by definition of natural domain, for each $w\in V_{out}^{\mathcal{M}^{\prime\prime}}$ , we have

Moreover, as $\mathcal{M}^{\prime}$ and $\mathcal{M}^{\prime\prime}$ share the nodes in (23), as well as the associated edges, weights, and biases, we have

for all $w\in V_{out}^{\mathcal{M}^{\prime\prime}}$ , and thus

for all $p,j\in\{1,\dots,k\}$ . Therefore, for each $j\in\{1,\dots,k\}$ , the node $v_{j}^{1}$ will be removed when anchoring the input $u_{j}$ . A concrete example of this input anchoring procedure in the case $k\geq 2$ is shown schematically in Figure 9.

Thus, having anchored the nodes $u_{1},u_{2},\dots,u_{k-1}$ to appropriate real numbers $a_{1},\dots,a_{k-1}$ , we will be left with a non-degenerate clones-free LFNN ${\mathcal{N}}=(V^{\mathcal{N}},E^{\mathcal{N}},\{u_{k}\},V_{out}^{\mathcal{N}},\Omega^{\mathcal{N}},\Theta^{\mathcal{N}})$ such that the function $h_{out}^{\mathcal{N}}\vcentcolon=\sum_{w\in V_{out}^{\mathcal{N}}}\lambda_{w}\left\langle{w}\right\rangle^{\sigma,\,\mathcal{N}}$ satisfies

Claim 1: We have $L(\mathcal{M}^{\prime})\geq 2$ and $\{\widetilde{v}\in V_{2}:(v_{j^{*}}^{1},\widetilde{v})\in E^{\mathcal{M}^{\prime}}\}\neq\varnothing$ . Proof of Claim 1. We first show that $L(\mathcal{M}^{\prime})\geq 2$ . To this end, suppose by way of contradiction that $L(\mathcal{M}^{\prime})=1$ . Then $V_{out}^{\mathcal{M}^{\prime}}=V_{1}$ by non-degeneracy, so the function $h_{out}=\sum_{w\in V_{out}^{\mathcal{M}^{\prime}}}\lambda_{w}\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime}}$ can be written as

where $g$ is analytic in an open neighborhood of $t^{*}$ . But $\langle{v_{j^{*}}^{1}}\rangle^{\sigma,\,\mathcal{M}^{\prime}}$ has a pole at $t^{*}$ , and so $h_{out}$ has a pole at $t^{*}$ , which stands in contradiction to $h_{out}\equiv c$ , and thus establishes $L(\mathcal{M}^{\prime})\geq 2$ .

Next, by way of contradiction assume that $\{\widetilde{v}\in V_{2}:(v_{j^{*}}^{1},\widetilde{v})\in E^{\mathcal{M}^{\prime}}\}=\varnothing$ . Then, by non-degeneracy of $\mathcal{M}^{\prime}$ , we have $v_{j^{*}}^{1}\in V_{out}^{\mathcal{M}^{\prime}}$ , and $\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime}}$ , for $w\in V_{out}^{\mathcal{M}^{\prime}}\setminus\{v_{j^{*}}^{1}\}$ , are real holomorphic functions of $\big{(}\langle{v_{j}^{1}}\rangle^{\sigma,\,{\mathcal{M}^{\prime}}}\big{)}_{j\in\{1,\dots,D_{1}\}\setminus\{j^{*}\}}$ . Now, as $\langle{v_{j}^{1}}\rangle^{\sigma,\,{\mathcal{M}^{\prime}}}$ , $j\in\{1,\dots,D_{1}\}\setminus\{j^{*}\}$ , are analytic and real-valued at $t^{*}$ , the function $h_{out}$ can again be written in the form (28) with $g$ analytic in an open neighborhood of $t^{*}$ . This again contradicts $h_{out}\equiv c$ , and thus $\{\widetilde{v}\in V_{2}:(v_{j^{*}}^{1},\widetilde{v})\in E^{\mathcal{M}^{\prime}}\}\neq\varnothing$ , establishing the claim. We can therefore enumerate the nodes $V_{2}=\{v^{2}_{1},\dots,v^{2}_{d},v^{2}_{d+1},\dots,v^{2}_{D_{2}}\}$ so that

When $D_{1}=1$ , the functions $f_{p}$ are all identically zero. For given $p\in\{1,\dots,d\}$ , $z\in\mathcal{L}_{B}$ is a singularity of $\langle{v_{p}^{2}}\rangle^{\sigma,\,\mathcal{M}^{\prime}}$ if and only if $z$ is an element of $D(t^{*},\epsilon)$ such that

where $P$ is the set of poles of $\sigma$ , expressed in terms of $S$ by (14). But

for all $z\in D(t^{*},\epsilon)$ , so it suffices to ensure that

where in (35) we used the definition of $z^{n,\bm{s}}$ , in (36) we used the $i$ -periodicity of $\sigma$ , in (37) we used (32), and in (38) we used $\omega_{v_{p}^{2}v_{j^{*}}^{1}}=\sum_{j=1}^{\bar{k}}{\bar{q}}_{pj}\,\omega_{v_{j}^{2}v_{j^{*}}^{1}}$ and the $i$ -periodicity of $\sigma$ again. As $B$ was chosen so that the functions (29) do not have singularities in $\mathcal{L}_{B}$ , all the quantities in the calculation (35)–(38) are well-defined.

Motivated by (35)–(38), we construct a GFNN ${\mathcal{M}^{\prime\prime}}=(V^{{\mathcal{M}^{\prime\prime}}},E^{{\mathcal{M}^{\prime\prime}}},V_{in}^{{\mathcal{M}^{\prime\prime}}},V_{out}^{{\mathcal{M}^{\prime\prime}}},\Omega^{{\mathcal{M}^{\prime\prime}}},\Theta^{{\mathcal{M}^{\prime\prime}}})$ as follows

First, $\bar{k}$ new nodes are created and enumerated as $\{u_{1},\dots,u_{\bar{k}}\}$ . Now, if $D_{1}>1$ , then let $V_{in}^{{\mathcal{M}^{\prime\prime}}}=\{v_{in},u_{1},\dots,u_{\bar{k}}\}$ , and if $D_{1}=1$ , set $V_{in}^{{\mathcal{M}^{\prime\prime}}}=\{u_{1},\dots,u_{\bar{k}}\}$ .

$V_{out}^{{\mathcal{M}^{\prime\prime}}}\vcentcolon=V_{out}^{{\mathcal{M}^{\prime}}}\setminus\{v_{j^{*}}^{1}\}$ ,

define $\omega_{v_{p}^{2}u_{j}}:={\bar{q}}_{pj}\,\omega_{v_{j}^{2}v_{1}^{1}}$ , for $p=1,\dots,d$ , $j=1,\dots,{\bar{k}}$ , and let

The construction of $\mathcal{M}^{\prime\prime}$ for a concrete $\mathcal{M}^{\prime}$ is illustrated in Figure 10. Note that ${\mathcal{M}^{\prime\prime}}$ is not layered in the case $D_{1}>1$ , due to the presence of the node $v_{in}$ . Owing to (35)–(38) and the construction of ${\mathcal{M}^{\prime\prime}}$ , we have the following “input splitting” relationship:

We next show that ${\mathcal{M}^{\prime\prime}}$ is non-degenerate and clones-free. To establish non-degeneracy, it suffices to show $V_{in}^{{\mathcal{M}^{\prime\prime}}}\subset\bigcup_{w\in V_{out}^{{\mathcal{M}^{\prime\prime}}}}V^{\mathcal{M}^{\prime\prime}(w)}$ . First note that, in both cases $D_{1}=1$ and $D_{1}>1$ , for a given $j\in\{1,\dots,\bar{k}\}$ , there exists a $w\in V_{out}^{{\mathcal{M}^{\prime}}}\setminus\{v_{j^{*}}^{1}\}$ such that $v_{j}^{2}\in V^{\mathcal{M}^{\prime}(w)}$ , by non-degeneracy of $\mathcal{M}^{\prime}$ . It follows that $v_{j}^{2}\in V^{{\mathcal{M}^{\prime\prime}}(w)}$ and thus $u_{j}\in V^{{\mathcal{M}^{\prime\prime}}(w)}$ . As $j$ was arbitrary, we have $\{u_{1},\dots,u_{\bar{k}}\}\subset\bigcup_{w\in V_{out}^{{\mathcal{M}^{\prime\prime}}}}V^{\mathcal{M}^{\prime\prime}(w)}$ , which establishes non-degeneracy of $\mathcal{M}^{\prime\prime}$ in the case $D_{1}=1$ . For $D_{1}>1$ we need to additionally show that $v_{in}\in V^{{\mathcal{M}^{\prime\prime}}(w)}$ . To this end, note that there exist an $m^{*}\in\{1,\dots,D_{1}\}\setminus\{j^{*}\}$ and a $w\in V_{out}^{{\mathcal{M}^{\prime}}}\setminus\{v_{j^{*}}^{1}\}$ such that $v_{m^{*}}^{1}\in V^{\mathcal{M}^{\prime}(w)}$ , and so $v_{in}\in V^{{\mathcal{M}^{\prime\prime}}(w)}$ , as desired. The clones-free property of ${\mathcal{M}^{\prime\prime}}$ follows by the same argument as in the case $k\geq 2$ .

Once again, we revisit the function $h_{out}(t)=\sum_{w\in V_{out}^{\mathcal{M}^{\prime}}}\lambda_{w}\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime}}(t)=c$ , for all $t\in\mathcal{D}_{h_{out}}$ , and proceed in a similar fashion as in the case $k\geq 2$ . This time, however, the output sets $V^{\mathcal{M}^{\prime}}_{out}$ and $V^{\mathcal{M}^{\prime\prime}}_{out}$ may differ by the node $v_{j^{*}}^{1}$ . This is a nuisance that will be dealt with below in Claim 2, but in the meantime, it is convenient to introduce the “truncated” linear dependency function

and proceed exactly as in the case $k\geq 2$ . By examining the structure of $\mathcal{M}^{\prime}$ , we see that, for each $w\in V_{out}^{\mathcal{M}^{\prime}}\setminus\{v^{1}_{j^{*}}\}$ , we can write

Now, by definition of natural domain, for each $w\in V_{out}^{\mathcal{M}^{\prime\prime}}$ , the natural domain $\mathcal{D}_{\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime\prime}}}$ is the set of all $\bm{z}\in\bigcap_{p=1}^{d}\mathcal{D}_{\langle{v_{p}^{2}}\rangle^{\sigma,\,\mathcal{M}^{\prime\prime}}}\cap\bigcap_{j\neq j^{*}}\mathcal{D}_{\langle{v_{j}^{1}}\rangle^{\sigma,\,\mathcal{M}^{\prime\prime}}}$ such that

Moreover, as $\mathcal{M}^{\prime}$ and $\mathcal{M}^{\prime\prime}$ share the nodes in (41), as well as the associated edges, weights, and biases, we have

for all $w\in V_{out}^{\mathcal{M}^{\prime\prime}}$ , and thus

as $n\to\infty$ , which contradicts the fact that $\langle{v_{j^{*}}^{1}}\rangle^{\sigma,\,\mathcal{M}^{\prime}}$ has a pole at $t^{*}$ . This establishes $v_{j^{*}}^{1}\notin V_{out}^{\mathcal{M}^{\prime}}$ . As a consequence we further have $h_{tr}=h_{out}$ , and so (44) reads

for all $\bm{s}\in C\cap D^{\circ}_{\bar{k}}(0,\delta)$ . Now, define the set

Let $\mathcal{N}_{j}=(V^{j},E^{j},V_{in},V_{out},\Omega^{j},\Theta^{j})$ , $j\in\{1,2\}$ , be networks as in the theorem statement. Let $\mathcal{N}=\mathcal{N}_{1}\vee\mathcal{N}_{2}$ be their amalgam and $\pi_{j}:V^{\mathcal{N}_{j}}\to\pi_{j}(V^{\mathcal{N}_{j}})\subset V^{\mathcal{N}}$ the extensional isomorphisms between $\mathcal{N}_{j}$ and the corresponding subnetworks of $\mathcal{N}$ , for $j\in\{1,2\}$ . We start by claiming that $\pi_{1}(w)=\pi_{2}(w)$ , for all $w\in V_{out}$ . Indeed, suppose to the contrary that we have $\pi_{1}(w^{\prime})\neq\pi_{2}(w^{\prime})$ , for some $w^{\prime}\in V_{out}$ , and denote $w_{j}=\pi_{j}(w^{\prime})$ , $j\in\{1,2\}$ . Since $w_{1}\neq w_{2}$ , it follows that $\mathcal{N}(w_{1})$ and $\mathcal{N}(w_{2})$ are not extensionally isomorphic, for otherwise $w_{1}$ and $w_{2}$ would be clones, contradicting the no-clones condition for $\mathcal{N}$ . Now,

by assumption. But this contradicts the conclusion of Theorem 4, and thus establishes $\pi_{1}(w)=\pi_{2}(w)$ , for all $w\in V_{out}$ . By non-degeneracy of $\mathcal{N}_{1}$ , for every $v\in V^{1}$ , there exists a $w\in V_{out}$ such that $v\in V^{\mathcal{N}_{1}(w)}$ . Then $\pi_{1}(v)\in V^{\mathcal{N}(\pi_{1}(w))}=V^{\mathcal{N}(\pi_{2}(w))}=\pi_{2}(V^{\mathcal{N}_{2}(w)})\subset\pi_{2}(V^{2})$ . Similarly, for every $v\in V^{2}$ , we have $\pi_{2}(v)\in\pi_{1}(V^{1})$ . Thus, the function $\psi:V^{1}\to V^{2}$ given by $\psi=\pi_{2}^{-1}\circ\pi_{1}$ is well-defined. This function is invertible with inverse $\pi_{1}^{-1}\circ\pi_{2}$ , so it is a bijection. Therefore $\psi$ is an extensional isomorphism between $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ , by virtue of being a composition of two extensional isomorphisms. Moreover, we have $\psi(w)=\pi_{2}^{-1}(\pi_{1}(w))=w$ , for all $w\in V_{out}$ , so $\psi$ restricted to $V_{out}$ is the identity map, and thus $\psi$ is a faithful isomorphism. ∎

Acknowledgment

The authors would like to thank Thomas Allard for useful suggestions regarding the proof of Proposition 3 and an anonymous reviewer for proposing a clearer exposition of Lemma 6.

References

Appendix: proofs of auxiliary results

Fix $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ as in the statement of the proposition. We begin by establishing the existence of a corresponding amalgam $\mathcal{A}$ . Let $\mathscr{A}$ denote the set of all proto-amalgams of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ . To see that $\mathscr{A}$ is non-empty, consider the LFNN $\mathcal{N}=(V^{\mathcal{N}},E^{\mathcal{N}},V_{in},V_{out}^{\mathcal{N}},\Omega^{\mathcal{N}},\Theta^{\mathcal{N}})$ specified as follows:

Let $S$ be a set of cardinality $\#(V^{1}\setminus V_{in})+\#(V^{2}\setminus V_{in})$ disjoint from $V_{in}$ , and set $V^{\mathcal{N}}\vcentcolon=V_{in}\cup S$ . Furthermore, let $\pi_{j}^{\,\mathcal{N}}:V^{j}\to\pi_{j}^{\,\mathcal{N}}(V^{j})\subset V^{\mathcal{N}}$ be injective functions such that $\pi_{j}^{\,\mathcal{N}}(v)=v$ , for $v\in V_{in}$ , $j\in\{1,2\}$ , and $\pi_{1}^{\,\mathcal{N}}(V^{1}\setminus V_{in})\cap\pi_{2}^{\,\mathcal{N}}(V^{2}\setminus V_{in})=\varnothing$ , but otherwise arbitrary.

$V^{\mathcal{N}}_{out}\vcentcolon=\pi_{1}^{\mathcal{N}}(V_{out}^{1})\cup\pi_{2}^{\mathcal{N}}(V_{out}^{2})$ .

$\Omega^{\mathcal{N}}\vcentcolon=\left\{\omega_{vu}:(u,v)\in E^{\mathcal{N}}\right\}$ .

For $j=1,2$ and $v\in V^{j}\setminus V_{in}$ , let $\theta_{\pi_{j}^{\,\mathcal{N}}(v)}=\theta_{v}$ , and set $\Theta^{\mathcal{N}}\vcentcolon=\left\{\theta_{u}:u\in V^{\mathcal{N}}\setminus V_{in}\right\}$ .

Informally, the network ${\mathcal{N}}$ is obtained by putting $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ “side by side”, sharing only the input nodes $V_{in}$ . As $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ are non-degenerate, so is $\mathcal{N}$ . Moreover, Properties (i) and (ii) of Definition 16 hold for $\mathcal{N}$ with $\pi_{j}^{\,\mathcal{N}}:V^{j}\to\pi_{j}(V^{j})\subset V^{\mathcal{N}}$ , for $j=1,2$ .

Thus $\mathcal{N}$ is a proto-amalgam of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ , and so $\mathscr{A}\neq\varnothing$ . Now, let $\mathcal{A}=(V^{\mathcal{A}},E^{\mathcal{A}},V_{in}^{\mathcal{A}},V_{out}^{\mathcal{A}},\Omega^{\mathcal{A}},\allowbreak\Theta^{\mathcal{A}})\in\mathscr{A}$ be a network with the least possible number of nodes among all the networks in $\mathscr{A}$ , and let $\pi_{j}:V^{j}\to\pi_{j}(V^{j})\subset V^{\mathcal{A}}$ , for $j\in\{1,2\}$ , be extensional isomorphisms between $\mathcal{N}_{j}$ and the appropriate subnetworks of $\mathcal{A}$ . We now show that $\mathcal{A}$ is clones-free. To this end, suppose by way of contradiction that $c_{1},c_{2}\in V^{\mathcal{A}}$ are clones. As $\mathcal{N}_{1}$ is clones-free, $c_{1},c_{2}$ cannot both be in $\pi_{1}(V^{1})$ , for otherwise $\pi_{1}^{-1}(c_{1})$ and $\pi_{1}^{-1}(c_{2})$ would be clones in $\mathcal{N}_{1}$ . By the same token, $c_{1},c_{2}$ cannot both be in $\pi_{2}(V^{2})$ . Thus, we may write w.l.o.g. $c_{1}=\pi_{1}(v_{1})$ and $c_{2}=\pi_{2}(v_{2})$ , for some $v_{1}\in V^{1}$ and $v_{2}\in V^{2}$ . Now, let $\widetilde{\mathcal{A}}$ be the network obtained from $\mathcal{A}$ by making the following alterations:

For every edge $(c_{2},v)\in E^{\mathcal{A}}$ , where $v\in V^{\mathcal{A}}$ , introduce a new edge $(c_{1},v)$ together with the associated weight $\omega_{vc_{2}}$ , and delete the edge $(c_{2},v)$ .

Delete the edges $(v,c_{2})\in E^{\mathcal{A}}$ , as well as the node $c_{2}$ .

If $c_{2}$ was a node in $\pi_{2}(V_{out}^{2})$ , then add $c_{1}$ to the set $V_{out}^{\widetilde{\mathcal{A}}}$ .

The network $\widetilde{\mathcal{A}}$ is a proto-amalgam of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ via the extensional isomorphisms ${\widetilde{\pi}_{1}=\pi_{1}}$ and

But $\widetilde{\mathcal{A}}$ has strictly fewer nodes than $\mathcal{A}$ , which contradicts the minimality of $\mathcal{A}$ , and thereby establishes that $\mathcal{A}$ is clones-free, and hence $\mathcal{A}$ is an amalgam of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ , completing the proof of existence. To establish uniqueness—up to extensional isomorphisms—of the amalgam, suppose that $\mathcal{A}$ and $\mathcal{A}^{\prime}$ are both amalgams of $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ via extensional isomorphisms $\pi_{j}:V^{j}\to\pi_{j}(V^{j})\subset V^{\mathcal{A}}$ , $\pi_{j}^{\prime}:V^{j}\to\pi_{j}^{\prime}(V^{j})\subset V^{\mathcal{A}^{\prime}}$ , for $j\in\{1,2\}$ . We first show that

Now define $\psi:V^{\mathcal{A}}\to V^{\mathcal{A}^{\prime}}$ according to

It follows by (46) that this definition is consistent, in the sense that the two cases in (47) yield the same value for $\psi(v)$ when $v\in\pi_{1}(V_{1})\cap\pi_{2}(V_{2})$ . Now, Properties (i) and (ii) of Definition 14 for $\psi$ follow, so $\psi$ is an extensional isomorphism between $\mathcal{A}$ and $\mathcal{A}^{\prime}$ , finishing the proof. ∎

Let $\bm{a}$ , $\delta$ , and $T$ be as in the statement of the lemma, such that $D_{k}^{\circ}(\bm{a},\delta)\subset\mathcal{U}$ and $F|_{T}\equiv 0$ . Then the function $F_{\bm{a}}\vcentcolon=F(\,\cdot\,+\bm{a})$ is holomorphic on $\mathcal{U}-\bm{a}$ , and $F_{\bm{a}}|_{T-\bm{a}}\equiv 0$ . Thus, as $F|_{\mathcal{U}}\equiv 0$ if and only if $F_{\bm{a}}|_{\mathcal{U}-\bm{a}}\equiv 0$ , it suffices to prove the result for $\bm{a}=\bm{0}$ . Let $T_{0}\vcentcolon=T$ , $T_{k}\vcentcolon=D^{\circ}_{k}(\bm{0},\delta)$ , and, for $r=1,\dots,k-1$ , define the sets

Note that $G$ is holomorphic, and $G|_{(-\delta,\delta)}\,\equiv 0$ by the induction hypothesis. Since the zero set of a nonzero holomorphic function in one variable does not have a limit point in the domain, we deduce that $G|_{D_{1}^{\circ}(0,\delta)}\equiv 0$ . But $z_{j}$ and $s_{j}$ were arbitrary, so we have $F|_{T_{r+1}}\equiv 0$ . We have thus shown that $F$ is identically zero on an open subset $T_{k}=D^{\circ}_{k}(\bm{0},\delta)$ of its connected domain $\mathcal{U}$ , and so, by the multivariate identity theorem [19, 1.2.12], it must be identically zero on $\mathcal{U}$ . ∎

and so, by Lemma 4, we obtain $G(0,z_{1},\dots,z_{k})=0$ , for all $(0,z_{1},\dots,z_{k})\in\mathcal{V}$ . By inspection of the power series expansion of $G$ in $\mathcal{V}$ , we find that $G$ must have the form $G(z_{0},z_{1},\dots,z_{k})=z_{0}\,\frac{\partial G}{\partial z_{0}}(z_{0},z_{1},\dots,z_{k})$ . As the function $\frac{\partial G}{\partial z_{0}}$ is holomorphic in $\mathcal{V}$ , we have that $z_{0}^{-(p+1)}F(z_{0},\dots,z_{k})=\frac{\partial G}{\partial z_{0}}(z_{0},\dots,z_{k})$ is holomorphic in $\mathcal{V}$ , contradicting the maximality of $p$ . Our hypothesis that $F|_{\mathcal{V}}$ is not identically zero must hence be false, i.e., we have $F|_{\mathcal{V}}\equiv 0$ . Finally, by the multivariate identity theorem [19, 1.2.12], we deduce that $F|_{\mathcal{U}}\equiv 0$ . ∎

The inclusion of $M$ in the right-hand side is clear, so we only need to show the reverse inclusion. Note that, since $M$ is closed, $T^{d}/M$ is a Lie group. We will rewrite the right-hand side of (48) by establishing a bijective correspondence between the characters $\chi:T^{d}\to S^{1}$ such that $M\subset\ker(\chi)$ , and the characters $f:T^{d}/M\to S^{1}$ . To this end, let $\pi:T^{d}\to T^{d}/M$ be the projection map, and suppose that $\chi:T^{d}\to S^{1}$ is a character such that $M\subset\ker(\chi)$ . Then $\chi$ factors according to $\chi=f\circ\pi$ , for some continuous homomorphism $f:T^{d}/M\to S^{1}$ , in other words, $f$ is a character on $T^{d}/M$ . Conversely, for any such $f$ we have that $f\circ\pi$ is a character $\chi$ on $T^{d}$ with $M\subset\ker(\chi)$ . Therefore it suffices to show that

by definition of $M$ , which is equivalent to