Neural network identifiability for a family of sigmoidal nonlinearities

Verner Vlačić, Helmut Bölcskei

I Introduction

Deep learning has become a highly successful machine learning method employed in a wide range of applications such as optical character recognition , image classification , and speech recognition . In a typical deep learning scenario one aims to fit a parametric model, realized by a deep neural network, to match a set of training data points. In order to make the ensuing discussion more concrete, we begin with the definition of a neural network and the map it realizes under a nonlinearity.

LL is a positive integer, referred to as the depth of N\mathcal{N},

(D0,D1,,DL)(D_{0},D_{1},\dots,D_{L}) is an (L+1)(L+1)-tuple of positive integers, called the layout,

where ρ\rho acts on real vectors in a componentwise fashion.

Given positive integers DinD_{in} and DoutD_{out}, define NDin,Dout\mathscr{N}^{D_{in},D_{out}} to be the set of all neural networks whose layouts (D0,,DL)(D_{0},\dots,D_{L}) satisfy D0=DinD_{0}=D_{in} and DL=DoutD_{L}=D_{out}, but are otherwise arbitrary. Let N\mathscr{N} be a subset of NDin,Dout\mathscr{N}^{D_{in},D_{out}}, ρ\rho a nonlinearity, and \sim an equivalence relation on NDin,Dout\mathscr{N}^{D_{in},D_{out}}.

We say that \sim is compatible with (N,ρ)(\mathscr{N},\rho) if, for all N1,N2N\mathcal{N}_{1},\mathcal{N}_{2}\in\mathscr{N},

We say that (N,ρ)(\mathscr{N},\rho) is identifiable up to \sim if, for all N1,N2N\mathcal{N}_{1},\mathcal{N}_{2}\in\mathscr{N},

Thus, by informally saying that a neural network N1\mathcal{N}_{1} in a certain class is identifiable, we mean that any neural network N2\mathcal{N}_{2} in the same class giving rise to the same output map, i.e., N1ρ=N2ρ\langle{\mathcal{N}_{1}}\rangle^{\rho}=\langle{\mathcal{N}_{2}}\rangle^{\rho}, is necessarily equivalent to N2\mathcal{N}_{2}. The role of the equivalence relation \sim in the previous definition is thus to “measure the degree of non-uniqueness”, and in particular, to accommodate symmetries within the network that may arise either from symmetries induced by the network weights and biases (such as the presence of clone pairs, to be introduced in Definition 5), symmetries of the nonlinearity (e.g., tanh\tanh is odd), or both simultaneously. These abstract concepts will be incarnated momentarily when discussing the seminal work by Fefferman , and in Section II through Definitions 4 and 5, as well as in the examples leading up to the formulation of the paper’s main results.

In , Fefferman showed that neural networks satisfying the following genericity conditions are, indeed, uniquely determined by the map they realize under the nonlinearity ρ=tanh\rho=\tanh, up to certain obvious isomorphisms of networks:

More precisely, for fixed positive integers DinD_{in} and DoutD_{out}, Fefferman showed that (NA1Din,Dout,tanh)(\mathscr{N}_{A1}^{D_{in},D_{out}},\tanh) is identifiable up to ±\sim_{\pm}, where NA1Din,Dout\mathscr{N}_{A1}^{D_{in},D_{out}} is defined as the set of all neural networks in NDin,Dout\mathscr{N}^{D_{in},D_{out}} satisfying Assumptions 1, and ±\sim_{\pm} is defined by stipulating that N±N~\mathcal{N}\sim_{\pm}\widetilde{\mathcal{N}} if and only if

L=L~L=\widetilde{L} and (D0,D1,,DL)=(D~0,D~1,,D~L)(D_{0},D_{1},\dots,D_{L})=(\widetilde{D}_{0},\widetilde{D}_{1},\dots,\widetilde{D}_{L}), and

Indeed, Fefferman remarks explicitly that it would be interesting to replace Assumptions 1 with minimal hypotheses, and to study nonlinearities other than tanh\tanh. The present paper aims to address these two issues. Characterizing the fundamental nature of conditions necessary for identifiability with respect to a fixed nonlinearity, even a simple one such as tanh\tanh, is likely a rather formidable task. In fact, the minimal identifiability conditions may generally depend on “fine” properties of the nonlinearity under consideration, and it is hence unclear how much insight can be obtained by having conditions that are specific to a given nonlinearity. We will thus be interested in an identification result with very mild conditions on the weights and biases of the neural networks to be identified, while still accommodating a broad class of nonlinearities.

II Contributions

We begin with two motivating examples. These lead up to the statements of our main contributions, whose corresponding proofs are developed in the remainder of the paper. We consider nonlinearities ρ\rho which are not necessarily odd (as tanh\tanh), and thus need an equivalence relation which dispenses with sign changes.

We say that the neural networks N\mathcal{N} and N~\widetilde{\mathcal{N}} are isomorphic, and write NN~\mathcal{N}\simeq\widetilde{\mathcal{N}}, if

L=L~L=\widetilde{L} and (D0,D1,,DL)=(D~0,D~1,,D~L)(D_{0},D_{1},\dots,D_{L})=(\widetilde{D}_{0},\widetilde{D}_{1},\dots,\widetilde{D}_{L}), and

If N\mathcal{N} does not have a clone pair, we say that N\mathcal{N} satisfies the no-clones condition.

As the nonlinearity ρ\rho in the example above is completely arbitrary, the no-clones condition is necessary to have any hope of obtaining identifiability up to \simeq. Hence, with our program in mind, given positive integers DinD_{in} and DoutD_{out}, we define

and seek nonlinearities ρ\rho such that (NncDin,Dout,ρ)(\mathscr{N}^{D_{in},D_{out}}_{nc},\rho) is identifiable up to \simeq. As any class strictly containing NncDin,Dout\mathscr{N}^{D_{in},D_{out}}_{nc}, paired with any nonlinearity, fails identifiability up to \simeq, the no-clones condition furnishes a canonical minimal assumption for identifiability up to \simeq. Similarly to NA1Din,Dout\mathscr{N}^{D_{in},D_{out}}_{A1}, the class NncDin,Dout\mathscr{N}^{D_{in},D_{out}}_{nc}, paired with any measurable nonlinearity ρ\rho such that limxρ(x)\displaystyle\lim_{x\to\infty}\rho(x) and limxρ(x)\displaystyle\lim_{x\to-\infty}\rho(x) exist and are not equal, satisfies the universal approximation property in the sense of Hornik and Cybenko . The following example demonstrates that insisting on the no-clones condition as the only assumption on the weights, biases, and layout will necessarily come at the cost of restricting the class of nonlinearities that allow for identifiability. Let ρ(x)=min{1,max{0,x}}\rho(x)=\min\{1,\max\{0,x\}\} be the clipped rectified linear unit (ReLU) function. Note that

Now, given an arbitrary neural network N=(W1,θ1,W2,θ2,,WL,θL)\mathcal{N}=(W^{1},\theta^{1},W^{2},\theta^{2},\dots,W^{L},\theta^{L}) with DL=1D_{L}=1 satisfying the no-clones condition, the network

Concretely, we have the following main result of this paper.

The function 1\bm{1} is included in the linearly independent set both for the sake of greater generality of the statement, and to facilitate the proof of Theorem 2.

for m{1,2,3,4}m\in\{1,2,3,4\}. As N\mathcal{N} satisfies the no-clones condition, the networks Nm\mathcal{N}_{m}, m{1,2,3,4}m\in\{1,2,3,4\}, also satisfy the no-clones condition, and are pairwise non-isomorphic.

which stands in contradiction to Theorem 2. This completes the proof of Theorem 1.

σ\sigma is ii-periodic, i.e., σ(z+i)=σ(z)\sigma(z+i)=\sigma(z), for all zDz\in\mathcal{D}, and

Amalgamation: In Section III we construct a neural network MNnc1,n\mathcal{M}\in\mathscr{N}^{1,n}_{nc}, called the amalgam of {Nj}j=1n\{\mathcal{N}_{j}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}, containing each Nj\mathcal{N}_{j} as a subnetwork. In particular, we have (Mσ)j=Njσ{(\langle{\mathcal{M}}\rangle^{\sigma})}_{j}=\langle{\mathcal{N}_{j}}\rangle^{\sigma}, for all j{1,,n}j\in\{1,\dots,n\}. The linear dependence of {Njσ}j=1n{1}\{\langle{\mathcal{N}_{j}}\rangle^{\sigma}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\} thus translates to

Input anchoring. We then construct a third network NM\mathcal{N}\in\mathscr{M}, obtained by fixing k1k-1 of the kk inputs of M\mathcal{M}^{\prime\prime} to specific real numbers, and “cutting out” all the parts of the network whose contributions to the output map have become constant in the process. The resulting network N\mathcal{N} will be a network in M\mathscr{M} of size smaller than M\mathcal{M}^{\prime}, which contradicts the minimality of M\mathcal{M}^{\prime}, and thereby completes the proof.

Input anchoring. Finally, we apply an input anchoring procedure to M\mathcal{M}^{\prime\prime} similar to the one described above. Even though now M\mathcal{M}^{\prime\prime} is not a network in the sense of Definition 1, the input anchoring procedure will result in a network NM\mathcal{N}\in\mathscr{M} which is a network in the sense of Definition 1, and is of smaller size than M\mathcal{M}^{\prime}, again completing the proof by contradiction.

We conclude this section by laying out the organization of the remainder of the paper. In Section III we develop a graph-theoretic framework needed to define amalgams of neural networks and several other technical concepts. In Section IV we state results from complex analysis and Kronecker’s theorem needed in arguments involving analytic continuation and input splitting, respectively. The proofs of these results are relegated to the Appendix. In Section V we discuss the fine structural properties of the function σ\sigma constructed in the proof of Theorem 2. Finally, Section VI contains the proofs of our two main results.

III Directed acyclic graphs, general neural networks, and neural network amalgams

As already mentioned, in the proof of Theorem 2 we will work with a form of neural networks that does not fit in with Definitions 1 and 2. In order to accommodate this notion of neural networks, and to lighten the manipulations needed to formalize the aforementioned techniques of amalgamation and input anchoring, we introduce a graph-theoretic framework.

We start by introducing the concept of a directed acyclic graph (DAG), commonly encountered in the graph theory literature .

A directed graph is an ordered pair G=(V,E)G=(V,E) where VV is a finite set of nodes, and EV×VE\subset V\times V is a set of directed edges.

A directed cycle of a directed graph GG is a set {v1,,vk}V\{v_{1},\dots,v_{k}\}\subset V such that, for every j{1,,k}j\in\{1,\dots,k\}, (vj,vj+1)E(v_{j},v_{j+1})\in E, where we set vk+1:=v1v_{k+1}\vcentcolon=v_{1}.

A directed graph GG is said to be a directed acyclic graph (DAG) if it has no directed cycles.

We interpret an edge (v,v~)(v,\widetilde{v}) as an arrow connecting the nodes vv and v~\widetilde{v} and pointing at v~\widetilde{v}.

Since the graph GG in Definition 7 is assumed to be acyclic, the level is well-defined for all nodes of GG. We are now ready to introduce our generalized definition of a neural network.

A general feed-forward neural network (GFNN) is an ordered sextuple N=(V,E,Vin,Vout,Ω,Θ)\mathcal{N}=(V,E,V_{in},\allowbreak V_{out},\Omega,\Theta), where

G=(V,E)G=(V,E) is a DAG, called the architecture of N\mathcal{N},

VoutVVinV_{out}\subset V\setminus V_{in} is the set of outputs of N\mathcal{N},

Let N=(V,E,Vin,Vout,Ω,Θ)\mathcal{N}=(V,E,V_{in},V_{out},\Omega,\Theta) be a GFNN. A subnetwork of N\mathcal{N} is a GFNN N=(V,E,Vin,Vout,Ω,Θ)\mathcal{N}^{\prime}=(V^{\prime},E^{\prime},V_{in}^{\prime},V_{out}^{\prime},\Omega^{\prime},\Theta^{\prime}) such that there exists a set SVS\subset V so that

E={(v,v~)E:v,v~V}E^{\prime}=\{(v,\widetilde{v})\in E:v,\widetilde{v}\in V^{\prime}\},

Ω={ωv~v:(v,v~)E}\Omega^{\prime}=\{\omega_{\widetilde{v}v}:(v,\widetilde{v})\in E^{\prime}\}, and

Θ={θv:vV}\Theta^{\prime}=\{\theta_{v}:v\in V^{\prime}\}.

If additionally Vout=SV_{out}^{\prime}=S, then N\mathcal{N}^{\prime} is uniquely specified by SS. In this case we say that N\mathcal{N}^{\prime} is the ancestor subnetwork of SS in N\mathcal{N}, and write N(S)\mathcal{N}(S) for this network.

We will treat nodes vVv\in V only as “handles”, and never as variables or functions. This is relevant when dealing with several networks with shared nodes, such as depicted in Figure 2. On the other hand, the output map vρ\left\langle{v}\right\rangle^{\rho} realized by vv is a function.

It follows that the natural domain Duσ\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}} of a node uu is open, as it is the preimage of an open set with respect to a continuous map. Moreover, the output map uσ\left\langle{u}\right\rangle^{\sigma} realized by uu is holomorphic on Duσ\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}}, as it is given explicitly by a concatenation of affine maps and the nonlinearity σ\sigma, which are themselves holomorphic functions.

The following definition is a straightforward generalization of Definition 5.

The following definition generalizes Definition 4 to GFNNs, and introduces two new concepts, termed extensional isomorphism and faithful isomorphism, which will play an important technical role throughout the remainder of the paper.

Let N1=(V1,E1,Vin,Vout1,Ω1,Θ1)\mathcal{N}^{1}=(V^{1},E^{1},V_{in},V_{out}^{1},\allowbreak\Omega^{1},\Theta^{1}) and N2=(V2,E2,Vin,Vout2,Ω2,Θ2)\mathcal{N}^{2}=(V^{2},E^{2},V_{in},V_{out}^{2},\Omega^{2},\Theta^{2}) be GFNNs with the same input nodes VinV_{in}.

We say that N1\mathcal{N}^{1} and N2\mathcal{N}^{2} are extensionally isomorphic, and write N1eN2\mathcal{N}^{1}\stackrel{{\scriptstyle e}}{{\sim}}\mathcal{N}^{2}, if there exists a bijection π:V1V2\pi:V^{1}\to V^{2}, called an extensional isomorphism, such that the following holds:

π\pi restricted to VinV_{in} is the identity map,

for all (v,v~)E1(v,\widetilde{v})\in E^{1}, we have ωπ(v~)π(v)2=ωv~v1\omega^{2}_{\pi(\widetilde{v})\pi(v)}=\omega^{1}_{\widetilde{v}v}, and

for all vV1Vinv\in V^{1}\setminus V_{in}, we have θπ(v)2=θv1\theta^{2}_{\pi(v)}=\theta^{1}_{v}.

We say that N1\mathcal{N}^{1} and N2\mathcal{N}^{2} are faithfully isomorphic, and write N1fN2\mathcal{N}^{1}\stackrel{{\scriptstyle f}}{{\sim}}\mathcal{N}^{2}, if they are extensionally isomorphic via π:V1V2\pi:V^{1}\to V^{2} with the following additional property:

Vout1=Vout2V_{out}^{1}=V_{out}^{2}, and π\pi restricted to Vout1V_{out}^{1} is the identity map.

In this case we call π\pi a faithful isomorphism.

The concept of faithful isomorphisms in Definition 14 generalizes that of isomorphisms according to Definition 4. It is easily seen that extensional isomorphism is an equivalence relation on the set of all GFNNs with the same input nodes, whereas faithful isomorphism is an equivalence relation on the set of all GFNNs with the same input and output nodes. Furthermore, if N1eN2\mathcal{N}^{1}\stackrel{{\scriptstyle e}}{{\sim}}\mathcal{N}^{2} via π:V1V2\pi:V^{1}\to V^{2}, then we have π(v)ρ,N2=vρ,N1\left\langle{\pi(v)}\right\rangle^{\rho,\,\mathcal{N}^{2}}=\left\langle{v}\right\rangle^{\rho,\,\mathcal{N}^{1}}, for all vV1v\in V^{1} and any nonlinearity ρ\rho, and if additionally N1fN2\mathcal{N}^{1}\stackrel{{\scriptstyle f}}{{\sim}}\mathcal{N}^{2}, then N1ρ=N2ρ\left\langle{\mathcal{N}^{1}}\right\rangle^{\rho}=\left\langle{\mathcal{N}^{2}}\right\rangle^{\rho}.

We say that a GFNN N=(V,E,Vin,Vout,Ω,Θ)\mathcal{N}=(V,E,V_{in},V_{out},\Omega,\Theta) is non-degenerate if

V=VN(Vout)V=V^{\mathcal{N}(V_{out})}, where VN(Vout)V^{\mathcal{N}(V_{out})} is the set of nodes of the ancestor subnetwork of VoutV_{out} in N\mathcal{N}. Networks that are not non-degenerate are referred to as degenerate.

Informally, a network is non-degenerate if its every node “leads up” to at least one output. This notion is best understood with the help of examples as in Figure 3.

We are now ready to introduce the concept of amalgams of LFNNs.

Let N1=(V1,E1,Vin,Vout1,Ω1,Θ1)\mathcal{N}_{1}=(V^{1},E^{1},V_{in},V_{out}^{1},\Omega^{1},\Theta^{1}) and N2=(V2,E2,Vin,Vout2,Ω2,Θ2)\mathcal{N}_{2}=(V^{2},E^{2},V_{in},V_{out}^{2},\Omega^{2},\Theta^{2}) be non-degenerate clones-free LFNNs with the same input set VinV_{in}.

Let A=(VA,EA,Vin,VoutA,ΩA,ΘA)\mathcal{A}=(V^{\mathcal{A}},E^{\mathcal{A}},V_{in},V_{out}^{\mathcal{A}},\Omega^{\mathcal{A}},\Theta^{\mathcal{A}}) be a non-degenerate LFNN with the following properties:

There exist injective maps π1:V1π1(V1)VA\pi_{1}:V^{1}\to\pi_{1}(V^{1})\subset V^{\mathcal{A}} and π2:V2π2(V2)VA\pi_{2}:V^{2}\to\pi_{2}(V^{2})\subset V^{\mathcal{A}} such that the networks N1\mathcal{N}_{1} and N2\mathcal{N}_{2} are extensionally isomorphic to the ancestor subnetworks A(π1(Vout1))\mathcal{A}(\pi_{1}(V_{out}^{1})) and A(π2(Vout2))\mathcal{A}(\pi_{2}(V_{out}^{2})) via π1\pi_{1} and π2\pi_{2}, respectively.

VA=π1(V1)π2(V2)V^{\mathcal{A}}=\pi_{1}(V^{1})\cup\pi_{2}(V^{2}) and VoutA=π1(Vout1)π2(Vout2)V_{out}^{\mathcal{A}}=\pi_{1}(V_{out}^{1})\cup\pi_{2}(V_{out}^{2}).

We then say that A\mathcal{A} is a proto-amalgam of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}.

If A\mathcal{A} is a clones-free proto-amalgam of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}, we say that A\mathcal{A} is an amalgam of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}.

Let N1=(V1,E1,Vin,Vout1,Ω1,Θ1)\mathcal{N}_{1}=(V^{1},E^{1},V_{in},V_{out}^{1},\Omega^{1},\Theta^{1}) and N2=(V2,E2,Vin,Vout2,Ω2,Θ2)\mathcal{N}_{2}=(V^{2},E^{2},V_{in},V_{out}^{2},\Omega^{2},\Theta^{2}) be non-degenerate clones-free LFNNs with a shared input set VinV_{in}. Then there exists an amalgam A\mathcal{A} of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}. Moreover, the amalgam is unique up to extensional isomorphisms.

As asserted in Proposition 1 (whose proof is deferred to the Appendix), an amalgam of two given non-degenerate clones-free LFNNs N1\mathcal{N}_{1} and N2\mathcal{N}_{2} always exists and is unique up to extensional isomorphisms. With slight abuse of notation, we will write N1N2\mathcal{N}_{1}\vee\mathcal{N}_{2} for an arbitrary element of the equivalence class (induced by e\stackrel{{\scriptstyle e}}{{\sim}}) of all the amalgams of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}. A concrete example of an amalgam construction is provided in Figure 4. Having defined the amalgam of two non-degenerate clones-free LFNNs, we define the amalgam of any finite collection N1,,Nn\mathcal{N}_{1},\dots,\mathcal{N}_{n} of non-degenerate clones-free LFNNs according to

By Definition 16, k=1nNk\bigvee_{k=1}^{n}\mathcal{N}_{k} is a non-degenerate clones-free LFNN. Moreover, there exist extensional isomorphisms πj:Njπj(Nj)k=1nNk\pi_{j}:\mathcal{N}_{j}\to\pi_{j}(\mathcal{N}_{j})\subset\bigvee_{k=1}^{n}\mathcal{N}_{k}, for j{1,,n}j\in\{1,\dots,n\}, and we have πj(v)ρ,k=1nNk=vρ,Nj\left\langle{\pi_{j}(v)}\right\rangle^{\rho,\,\bigvee_{k=1}^{n}\mathcal{N}_{k}}=\left\langle{v}\right\rangle^{\rho,\,\mathcal{N}_{j}}, for j{1,,n}j\in\{1,\dots,n\}, vVNjv\in V^{\mathcal{N}_{j}}, and any nonlinearity ρ\rho.

We are now in a position to prove two lemmas that form the basis for the proof of Theorem 2. The first lemma formalizes the idea of combining multiple pairwise non-isomorphic single-output networks with linearly dependent ouput maps into one multiple-output network with linear dependency among the maps of its ouput nodes.

For all wVoutMaw\in V_{out}^{{\mathcal{M}_{a}}},

is constant, and we denote its value by wρ,M ⁣(a)\left\langle{w}\right\rangle^{\rho,\,\mathcal{M}}\!\left(a\right).

VMa={vVM:{v10,,vD010}VM(v)}V^{{\mathcal{M}_{a}}}=\{v\in V^{\mathcal{M}}:\{v_{1}^{0},\dots,v_{D_{0}-1}^{0}\}\cap V^{\mathcal{M}(v)}\neq\varnothing\}, where M(v)\mathcal{M}(v) denotes the ancestor network of vv,

EMa={(v,v~),v,v~VMa}E^{{\mathcal{M}_{a}}}=\{(v,\widetilde{v}),v,\widetilde{v}\in V^{{\mathcal{M}_{a}}}\},

VinMa={v10,,vD010}V_{in}^{{\mathcal{M}_{a}}}=\{v_{1}^{0},\dots,v_{D_{0}-1}^{0}\}, VoutMa=VoutMVMaV_{out}^{{\mathcal{M}_{a}}}=V_{out}^{\mathcal{M}}\cap V^{{\mathcal{M}_{a}}}, and

ΩMa={ωv~v:(v,v~)EMa}\Omega^{{\mathcal{M}_{a}}}=\{\omega_{\widetilde{v}v}:(v,\widetilde{v})\in E^{{\mathcal{M}_{a}}}\}.

For a node vVMVMav\in V^{\mathcal{M}}\setminus V^{{\mathcal{M}_{a}}} we define recursively

and set ΘMa={θ~v:vVMa}\Theta^{{\mathcal{M}_{a}}}=\{\widetilde{\theta}_{v}:v\in V^{{\mathcal{M}_{a}}}\}.

The network Ma{\mathcal{M}_{a}} satisfies (IA-1) and (IA-2) by construction, and if M\mathcal{M} is layered, then so is Ma{\mathcal{M}_{a}}. Moreover, Ma{\mathcal{M}_{a}} is non-degenerate. To see this, let vVMav\in V^{\mathcal{M}_{a}} be arbitrary. Then, by non-degeneracy of M\mathcal{M}, there exists a wVoutMw\in V^{\mathcal{M}}_{out} such that vVM(w)v\in V^{\mathcal{M}(w)}. As ww is connected directly with a node in VMaV^{\mathcal{M}_{a}}, it follows that wVMaw\in V^{{\mathcal{M}_{a}}}, and so wVoutMaw\in V_{out}^{\mathcal{M}_{a}}.

For a pair of nodes (c1,c2)VM×VM(c_{1},c_{2})\in V^{{\mathcal{M}}}\times V^{{\mathcal{M}}} define

Let S={vVM({c1,c2}):VinMVM(v)={vD00}}S=\{v\in V^{\mathcal{M}(\{c_{1},c_{2}\})}:V_{in}^{\mathcal{M}}\cap V^{\mathcal{M}(v)}=\{v_{D_{0}}^{0}\}\}, and set

EN={(v,v~),  v,v~VN}E^{{\mathcal{N}}}=\{(v,\widetilde{v}),\;v,\widetilde{v}\in V^{{\mathcal{N}}}\},

VoutN={c1,c2}VNV_{out}^{\mathcal{N}}=\{c_{1},c_{2}\}\cap V^{\mathcal{N}},

ΩN={ωv~v:(v,v~)EN}\Omega^{\mathcal{N}}=\{\omega_{\widetilde{v}v}:(v,\widetilde{v})\in E^{\mathcal{N}}\},

As the construction of N\mathcal{N} does not depend on aa, we can fix an arbitrary aE(c1,c2)a\in E_{(c_{1},\,c_{2})}, and the condition that c1c_{1} and c2c_{2} are clones in Ma\mathcal{M}_{a} then implies

where the real numbers aua_{u} are defined according to (2). This, together with (4), yields

which would say that c1c_{1} and c2c_{2} are clones in M\mathcal{M} and hence stands in contradiction to the no-clones property of M\mathcal{M}. This establishes the no-clones property of N\mathcal{N}. The non-degeneracy of N\mathcal{N} follows by its construction. Now, by adding rr to both sides of (5) and applying ρ\rho, we find

IV Auxiliary results from complex analysis and Kronecker’s theorem

We state the remaining auxiliary results needed in the proof of our main statements. Since these results are relatively simple consequences of standard results in complex analysis and of Kronecker’s theorem, their proofs are relegated to the appendix.

Recall the definition of the natural domain Duσ\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}} of the map realized by a GFNN node uu with respect to a holomorphic nonlinearity as given in Definition 12.

In the proof of Theorem 2 it will be crucial that Duσ\mathcal{D}_{\left\langle{u}\right\rangle^{\sigma}} be connected for all nodes uu of a certain GFNN with a single input. The following lemma establishes this fact.

Suppose that Dk(a,δ)UD^{\circ}_{k}(\bm{a},\delta)\subset\mathcal{U}, and F(z)=0F(z)=0, for all zTz\in T. Then F=0F=0 identically on U\mathcal{U}.

tn,s|t^{n,\bm{s}}|\to\infty as nn\to\infty,

V Imaginary period and the self-avoiding property

In other words, a set SS is self-avoiding if the union of a finite number of distinct copies of SS obtained by translating and scaling by an odd integer contains a real number which is an element of exactly one of the copies.

Moreover, for r=1,2r=1,2, we have k1r<k2r<k3rk_{1}^{r}<k_{2}^{r}<k_{3}^{r} if ωr>0\omega_{r}>0 and k1r>k2r>k3rk_{1}^{r}>k_{2}^{r}>k_{3}^{r} if ωr<0\omega_{r}<0. Define the index sets

The following proposition formalizes the notion that nonlinearities σ\sigma of the form considered at the beginning of the chapter are dense in the set of sigmoidal nonlinearities, even after imposing the additional constraint that SS be self-avoiding.

Denote hα=12(1+tanh(α))h_{\alpha}=\frac{1}{2}\left(1+\tanh(\alpha\,\cdot\,)\right) and consider the function ρα\rho_{\alpha} defined by

VI The main theorems

Let N1\mathcal{N}_{1} and N2\mathcal{N}_{2} be non-degenerate clones-free LFNNs with the same input and ouput sets VinV_{in} and VoutV_{out}. Let

Let Nj\mathcal{N}_{j}, j{1,2,,n}j\in\{1,2,\dots,n\}, be non-degenerate clones-free LFNNs with the same input set VinV_{in} and the same single output node {vout}\{v_{out}\}. Furthermore, suppose that no two networks Nj1\mathcal{N}_{j_{1}}, Nj2\mathcal{N}_{j_{2}}, j1j2j_{1}\neq j_{2}, are extensionally isomorphic. Consider the nonlinearity

Before embarking on the proofs of Theorems 3 and 4, we show how Theorems 1 and 2 follow from these two results together with Proposition 3.

and SαS_{\alpha} is a discrete self-avoiding set (as the self-avoiding property is preserved under scaling by a nonzero real number), so by Theorem 3 we obtain NαfN~α\mathcal{N}^{\alpha}\stackrel{{\scriptstyle f}}{{\sim}}\widetilde{\mathcal{N}}^{\alpha}, which implies NN~\mathcal{N}\simeq\widetilde{\mathcal{N}}. ∎

and SαS_{\alpha} is a discrete self-avoiding set, so by Theorem 4 we obtain that {Njασα}j=1n{1}\{\langle{\mathcal{N}_{j}^{\alpha}}\rangle^{\sigma_{\alpha}}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\} is linearly independent. Now, suppose by way of contradiction that there is linear dependency λ0+j=1nλjNjσ=0\lambda_{0}+\sum_{j=1}^{n}\lambda_{j}\,\langle{\mathcal{N}_{j}}\rangle^{\sigma}=0 among {Njσ}j=1n{1}\{\left\langle{\mathcal{N}_{j}}\right\rangle^{\sigma}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}. But then

which contradicts the linear independence of {Njασα}j=1n{1}\{\langle{\mathcal{N}_{j}^{\alpha}}\rangle^{\sigma_{\alpha}}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\}. We deduce that {Njσ}j=1n{1}\{\left\langle{\mathcal{N}_{j}}\right\rangle^{\sigma}\}_{j\hskip 1.42262pt=\hskip 1.42262pt1}^{n}\cup\{\bm{1}\} must be linearly independent, as desired. ∎

Next, note that σ\sigma is a real meromorphic function whose set of poles is

We now perform a calculation that will enable us to interpret the single input variable of M\mathcal{M}^{\prime} as a rational linear combination of kk input variables of another LFNN M\mathcal{M}^{\prime\prime}, to be specified below. The argument will then proceed by anchoring at all but one of the inputs of M\mathcal{M}^{\prime\prime}. It is this last step that uses k2k\geq 2 as a key assumption, as anchoring requires at least two input nodes to be meaningful. We thus have

VinM={u1,,uk}V_{in}^{{\mathcal{M}^{\prime\prime}}}=\{u_{1},\dots,u_{k}\} is a set of kk newly-created input nodes (disjoint from VMV^{\mathcal{M}^{\prime}}),

VoutM:=VoutMV_{out}^{{\mathcal{M}^{\prime\prime}}}\vcentcolon=V_{out}^{{\mathcal{M}^{\prime}}},

Define ωvp1uj:=qpjωvj1vin\omega_{v_{p}^{1}u_{j}}:=q_{pj}\,\omega_{v_{j}^{1}v_{in}}, for p{1,,D1}p\in\{1,\dots,D_{1}\}, j{1,,k}j\in\{1,\dots,k\}, and let

ΘM:=ΘM\Theta^{{\mathcal{M}^{\prime\prime}}}:=\Theta^{\mathcal{M}^{\prime}}.

The procedure for constructing M{\mathcal{M}^{\prime\prime}} for a given M\mathcal{M}^{\prime} is illustrated in Figure 8.

Owing to (19) – (21) and the construction of M{\mathcal{M}^{\prime\prime}}, we have the following “input splitting” relationship

where FwF_{w} corresponds to the map realized by the LFNN with nodes

Now, by definition of natural domain, for each wVoutMw\in V_{out}^{\mathcal{M}^{\prime\prime}}, we have

Moreover, as M\mathcal{M}^{\prime} and M\mathcal{M}^{\prime\prime} share the nodes in (23), as well as the associated edges, weights, and biases, we have

for all wVoutMw\in V_{out}^{\mathcal{M}^{\prime\prime}}, and thus

for all p,j{1,,k}p,j\in\{1,\dots,k\}. Therefore, for each j{1,,k}j\in\{1,\dots,k\}, the node vj1v_{j}^{1} will be removed when anchoring the input uju_{j}. A concrete example of this input anchoring procedure in the case k2k\geq 2 is shown schematically in Figure 9.

Thus, having anchored the nodes u1,u2,,uk1u_{1},u_{2},\dots,u_{k-1} to appropriate real numbers a1,,ak1a_{1},\dots,a_{k-1}, we will be left with a non-degenerate clones-free LFNN N=(VN,EN,{uk},VoutN,ΩN,ΘN){\mathcal{N}}=(V^{\mathcal{N}},E^{\mathcal{N}},\{u_{k}\},V_{out}^{\mathcal{N}},\Omega^{\mathcal{N}},\Theta^{\mathcal{N}}) such that the function houtN:=wVoutNλwwσ,Nh_{out}^{\mathcal{N}}\vcentcolon=\sum_{w\in V_{out}^{\mathcal{N}}}\lambda_{w}\left\langle{w}\right\rangle^{\sigma,\,\mathcal{N}} satisfies

Claim 1: We have L(M)2L(\mathcal{M}^{\prime})\geq 2 and {v~V2:(vj1,v~)EM}\{\widetilde{v}\in V_{2}:(v_{j^{*}}^{1},\widetilde{v})\in E^{\mathcal{M}^{\prime}}\}\neq\varnothing. Proof of Claim 1. We first show that L(M)2L(\mathcal{M}^{\prime})\geq 2. To this end, suppose by way of contradiction that L(M)=1L(\mathcal{M}^{\prime})=1. Then VoutM=V1V_{out}^{\mathcal{M}^{\prime}}=V_{1} by non-degeneracy, so the function hout=wVoutMλwwσ,Mh_{out}=\sum_{w\in V_{out}^{\mathcal{M}^{\prime}}}\lambda_{w}\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime}} can be written as

where gg is analytic in an open neighborhood of tt^{*}. But vj1σ,M\langle{v_{j^{*}}^{1}}\rangle^{\sigma,\,\mathcal{M}^{\prime}} has a pole at tt^{*}, and so houth_{out} has a pole at tt^{*}, which stands in contradiction to houtch_{out}\equiv c, and thus establishes L(M)2L(\mathcal{M}^{\prime})\geq 2.

Next, by way of contradiction assume that {v~V2:(vj1,v~)EM}=\{\widetilde{v}\in V_{2}:(v_{j^{*}}^{1},\widetilde{v})\in E^{\mathcal{M}^{\prime}}\}=\varnothing. Then, by non-degeneracy of M\mathcal{M}^{\prime}, we have vj1VoutMv_{j^{*}}^{1}\in V_{out}^{\mathcal{M}^{\prime}}, and wσ,M\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime}}, for wVoutM{vj1}w\in V_{out}^{\mathcal{M}^{\prime}}\setminus\{v_{j^{*}}^{1}\}, are real holomorphic functions of \big{(}\langle{v_{j}^{1}}\rangle^{\sigma,\,{\mathcal{M}^{\prime}}}\big{)}_{j\in\{1,\dots,D_{1}\}\setminus\{j^{*}\}}. Now, as vj1σ,M\langle{v_{j}^{1}}\rangle^{\sigma,\,{\mathcal{M}^{\prime}}}, j{1,,D1}{j}j\in\{1,\dots,D_{1}\}\setminus\{j^{*}\}, are analytic and real-valued at tt^{*}, the function houth_{out} can again be written in the form (28) with gg analytic in an open neighborhood of tt^{*}. This again contradicts houtch_{out}\equiv c, and thus {v~V2:(vj1,v~)EM}\{\widetilde{v}\in V_{2}:(v_{j^{*}}^{1},\widetilde{v})\in E^{\mathcal{M}^{\prime}}\}\neq\varnothing, establishing the claim. We can therefore enumerate the nodes V2={v12,,vd2,vd+12,,vD22}V_{2}=\{v^{2}_{1},\dots,v^{2}_{d},v^{2}_{d+1},\dots,v^{2}_{D_{2}}\} so that

When D1=1D_{1}=1, the functions fpf_{p} are all identically zero. For given p{1,,d}p\in\{1,\dots,d\}, zLBz\in\mathcal{L}_{B} is a singularity of vp2σ,M\langle{v_{p}^{2}}\rangle^{\sigma,\,\mathcal{M}^{\prime}} if and only if zz is an element of D(t,ϵ)D(t^{*},\epsilon) such that

where PP is the set of poles of σ\sigma, expressed in terms of SS by (14). But

for all zD(t,ϵ)z\in D(t^{*},\epsilon), so it suffices to ensure that

where in (35) we used the definition of zn,sz^{n,\bm{s}}, in (36) we used the ii-periodicity of σ\sigma, in (37) we used (32), and in (38) we used ωvp2vj1=j=1kˉqˉpjωvj2vj1\omega_{v_{p}^{2}v_{j^{*}}^{1}}=\sum_{j=1}^{\bar{k}}{\bar{q}}_{pj}\,\omega_{v_{j}^{2}v_{j^{*}}^{1}} and the ii-periodicity of σ\sigma again. As BB was chosen so that the functions (29) do not have singularities in LB\mathcal{L}_{B}, all the quantities in the calculation (35)–(38) are well-defined.

Motivated by (35)–(38), we construct a GFNN M=(VM,EM,VinM,VoutM,ΩM,ΘM){\mathcal{M}^{\prime\prime}}=(V^{{\mathcal{M}^{\prime\prime}}},E^{{\mathcal{M}^{\prime\prime}}},V_{in}^{{\mathcal{M}^{\prime\prime}}},V_{out}^{{\mathcal{M}^{\prime\prime}}},\Omega^{{\mathcal{M}^{\prime\prime}}},\Theta^{{\mathcal{M}^{\prime\prime}}}) as follows

First, kˉ\bar{k} new nodes are created and enumerated as {u1,,ukˉ}\{u_{1},\dots,u_{\bar{k}}\}. Now, if D1>1D_{1}>1, then let VinM={vin,u1,,ukˉ}V_{in}^{{\mathcal{M}^{\prime\prime}}}=\{v_{in},u_{1},\dots,u_{\bar{k}}\}, and if D1=1D_{1}=1, set VinM={u1,,ukˉ}V_{in}^{{\mathcal{M}^{\prime\prime}}}=\{u_{1},\dots,u_{\bar{k}}\}.

VoutM:=VoutM{vj1}V_{out}^{{\mathcal{M}^{\prime\prime}}}\vcentcolon=V_{out}^{{\mathcal{M}^{\prime}}}\setminus\{v_{j^{*}}^{1}\},

define ωvp2uj:=qˉpjωvj2v11\omega_{v_{p}^{2}u_{j}}:={\bar{q}}_{pj}\,\omega_{v_{j}^{2}v_{1}^{1}}, for p=1,,dp=1,\dots,d, j=1,,kˉj=1,\dots,{\bar{k}}, and let

The construction of M\mathcal{M}^{\prime\prime} for a concrete M\mathcal{M}^{\prime} is illustrated in Figure 10. Note that M{\mathcal{M}^{\prime\prime}} is not layered in the case D1>1D_{1}>1, due to the presence of the node vinv_{in}. Owing to (35)–(38) and the construction of M{\mathcal{M}^{\prime\prime}}, we have the following “input splitting” relationship:

We next show that M{\mathcal{M}^{\prime\prime}} is non-degenerate and clones-free. To establish non-degeneracy, it suffices to show VinMwVoutMVM(w)V_{in}^{{\mathcal{M}^{\prime\prime}}}\subset\bigcup_{w\in V_{out}^{{\mathcal{M}^{\prime\prime}}}}V^{\mathcal{M}^{\prime\prime}(w)}. First note that, in both cases D1=1D_{1}=1 and D1>1D_{1}>1, for a given j{1,,kˉ}j\in\{1,\dots,\bar{k}\}, there exists a wVoutM{vj1}w\in V_{out}^{{\mathcal{M}^{\prime}}}\setminus\{v_{j^{*}}^{1}\} such that vj2VM(w)v_{j}^{2}\in V^{\mathcal{M}^{\prime}(w)}, by non-degeneracy of M\mathcal{M}^{\prime}. It follows that vj2VM(w)v_{j}^{2}\in V^{{\mathcal{M}^{\prime\prime}}(w)} and thus ujVM(w)u_{j}\in V^{{\mathcal{M}^{\prime\prime}}(w)}. As jj was arbitrary, we have {u1,,ukˉ}wVoutMVM(w)\{u_{1},\dots,u_{\bar{k}}\}\subset\bigcup_{w\in V_{out}^{{\mathcal{M}^{\prime\prime}}}}V^{\mathcal{M}^{\prime\prime}(w)}, which establishes non-degeneracy of M\mathcal{M}^{\prime\prime} in the case D1=1D_{1}=1. For D1>1D_{1}>1 we need to additionally show that vinVM(w)v_{in}\in V^{{\mathcal{M}^{\prime\prime}}(w)}. To this end, note that there exist an m{1,,D1}{j}m^{*}\in\{1,\dots,D_{1}\}\setminus\{j^{*}\} and a wVoutM{vj1}w\in V_{out}^{{\mathcal{M}^{\prime}}}\setminus\{v_{j^{*}}^{1}\} such that vm1VM(w)v_{m^{*}}^{1}\in V^{\mathcal{M}^{\prime}(w)}, and so vinVM(w)v_{in}\in V^{{\mathcal{M}^{\prime\prime}}(w)}, as desired. The clones-free property of M{\mathcal{M}^{\prime\prime}} follows by the same argument as in the case k2k\geq 2.

Once again, we revisit the function hout(t)=wVoutMλwwσ,M(t)=ch_{out}(t)=\sum_{w\in V_{out}^{\mathcal{M}^{\prime}}}\lambda_{w}\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime}}(t)=c, for all tDhoutt\in\mathcal{D}_{h_{out}}, and proceed in a similar fashion as in the case k2k\geq 2. This time, however, the output sets VoutMV^{\mathcal{M}^{\prime}}_{out} and VoutMV^{\mathcal{M}^{\prime\prime}}_{out} may differ by the node vj1v_{j^{*}}^{1}. This is a nuisance that will be dealt with below in Claim 2, but in the meantime, it is convenient to introduce the “truncated” linear dependency function

and proceed exactly as in the case k2k\geq 2. By examining the structure of M\mathcal{M}^{\prime}, we see that, for each wVoutM{vj1}w\in V_{out}^{\mathcal{M}^{\prime}}\setminus\{v^{1}_{j^{*}}\}, we can write

Now, by definition of natural domain, for each wVoutMw\in V_{out}^{\mathcal{M}^{\prime\prime}}, the natural domain Dwσ,M\mathcal{D}_{\left\langle{w}\right\rangle^{\sigma,\,\mathcal{M}^{\prime\prime}}} is the set of all zp=1dDvp2σ,MjjDvj1σ,M\bm{z}\in\bigcap_{p=1}^{d}\mathcal{D}_{\langle{v_{p}^{2}}\rangle^{\sigma,\,\mathcal{M}^{\prime\prime}}}\cap\bigcap_{j\neq j^{*}}\mathcal{D}_{\langle{v_{j}^{1}}\rangle^{\sigma,\,\mathcal{M}^{\prime\prime}}} such that

Moreover, as M\mathcal{M}^{\prime} and M\mathcal{M}^{\prime\prime} share the nodes in (41), as well as the associated edges, weights, and biases, we have

for all wVoutMw\in V_{out}^{\mathcal{M}^{\prime\prime}}, and thus

as nn\to\infty, which contradicts the fact that vj1σ,M\langle{v_{j^{*}}^{1}}\rangle^{\sigma,\,\mathcal{M}^{\prime}} has a pole at tt^{*}. This establishes vj1VoutMv_{j^{*}}^{1}\notin V_{out}^{\mathcal{M}^{\prime}}. As a consequence we further have htr=houth_{tr}=h_{out}, and so (44) reads

for all sCDkˉ(0,δ)\bm{s}\in C\cap D^{\circ}_{\bar{k}}(0,\delta). Now, define the set

for all sCDkˉ(0,δ)\bm{s}\in C\cap D^{\circ}_{\bar{k}}(0,\delta). Now, define the set

Let Nj=(Vj,Ej,Vin,Vout,Ωj,Θj)\mathcal{N}_{j}=(V^{j},E^{j},V_{in},V_{out},\Omega^{j},\Theta^{j}), j{1,2}j\in\{1,2\}, be networks as in the theorem statement. Let N=N1N2\mathcal{N}=\mathcal{N}_{1}\vee\mathcal{N}_{2} be their amalgam and πj:VNjπj(VNj)VN\pi_{j}:V^{\mathcal{N}_{j}}\to\pi_{j}(V^{\mathcal{N}_{j}})\subset V^{\mathcal{N}} the extensional isomorphisms between Nj\mathcal{N}_{j} and the corresponding subnetworks of N\mathcal{N}, for j{1,2}j\in\{1,2\}. We start by claiming that π1(w)=π2(w)\pi_{1}(w)=\pi_{2}(w), for all wVoutw\in V_{out}. Indeed, suppose to the contrary that we have π1(w)π2(w)\pi_{1}(w^{\prime})\neq\pi_{2}(w^{\prime}), for some wVoutw^{\prime}\in V_{out}, and denote wj=πj(w)w_{j}=\pi_{j}(w^{\prime}), j{1,2}j\in\{1,2\}. Since w1w2w_{1}\neq w_{2}, it follows that N(w1)\mathcal{N}(w_{1}) and N(w2)\mathcal{N}(w_{2}) are not extensionally isomorphic, for otherwise w1w_{1} and w2w_{2} would be clones, contradicting the no-clones condition for N\mathcal{N}. Now,

by assumption. But this contradicts the conclusion of Theorem 4, and thus establishes π1(w)=π2(w)\pi_{1}(w)=\pi_{2}(w), for all wVoutw\in V_{out}. By non-degeneracy of N1\mathcal{N}_{1}, for every vV1v\in V^{1}, there exists a wVoutw\in V_{out} such that vVN1(w)v\in V^{\mathcal{N}_{1}(w)}. Then π1(v)VN(π1(w))=VN(π2(w))=π2(VN2(w))π2(V2)\pi_{1}(v)\in V^{\mathcal{N}(\pi_{1}(w))}=V^{\mathcal{N}(\pi_{2}(w))}=\pi_{2}(V^{\mathcal{N}_{2}(w)})\subset\pi_{2}(V^{2}). Similarly, for every vV2v\in V^{2}, we have π2(v)π1(V1)\pi_{2}(v)\in\pi_{1}(V^{1}). Thus, the function ψ:V1V2\psi:V^{1}\to V^{2} given by ψ=π21π1\psi=\pi_{2}^{-1}\circ\pi_{1} is well-defined. This function is invertible with inverse π11π2\pi_{1}^{-1}\circ\pi_{2}, so it is a bijection. Therefore ψ\psi is an extensional isomorphism between N1\mathcal{N}_{1} and N2\mathcal{N}_{2}, by virtue of being a composition of two extensional isomorphisms. Moreover, we have ψ(w)=π21(π1(w))=w\psi(w)=\pi_{2}^{-1}(\pi_{1}(w))=w, for all wVoutw\in V_{out}, so ψ\psi restricted to VoutV_{out} is the identity map, and thus ψ\psi is a faithful isomorphism. ∎

Acknowledgment

The authors would like to thank Thomas Allard for useful suggestions regarding the proof of Proposition 3 and an anonymous reviewer for proposing a clearer exposition of Lemma 6.

References

Appendix: proofs of auxiliary results

Fix N1\mathcal{N}_{1} and N2\mathcal{N}_{2} as in the statement of the proposition. We begin by establishing the existence of a corresponding amalgam A\mathcal{A}. Let A\mathscr{A} denote the set of all proto-amalgams of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}. To see that A\mathscr{A} is non-empty, consider the LFNN N=(VN,EN,Vin,VoutN,ΩN,ΘN)\mathcal{N}=(V^{\mathcal{N}},E^{\mathcal{N}},V_{in},V_{out}^{\mathcal{N}},\Omega^{\mathcal{N}},\Theta^{\mathcal{N}}) specified as follows:

Let SS be a set of cardinality #(V1Vin)+#(V2Vin)\#(V^{1}\setminus V_{in})+\#(V^{2}\setminus V_{in}) disjoint from VinV_{in}, and set VN:=VinSV^{\mathcal{N}}\vcentcolon=V_{in}\cup S. Furthermore, let πjN:VjπjN(Vj)VN\pi_{j}^{\,\mathcal{N}}:V^{j}\to\pi_{j}^{\,\mathcal{N}}(V^{j})\subset V^{\mathcal{N}} be injective functions such that πjN(v)=v\pi_{j}^{\,\mathcal{N}}(v)=v, for vVinv\in V_{in}, j{1,2}j\in\{1,2\}, and π1N(V1Vin)π2N(V2Vin)=\pi_{1}^{\,\mathcal{N}}(V^{1}\setminus V_{in})\cap\pi_{2}^{\,\mathcal{N}}(V^{2}\setminus V_{in})=\varnothing, but otherwise arbitrary.

VoutN:=π1N(Vout1)π2N(Vout2)V^{\mathcal{N}}_{out}\vcentcolon=\pi_{1}^{\mathcal{N}}(V_{out}^{1})\cup\pi_{2}^{\mathcal{N}}(V_{out}^{2}).

ΩN:={ωvu:(u,v)EN}\Omega^{\mathcal{N}}\vcentcolon=\left\{\omega_{vu}:(u,v)\in E^{\mathcal{N}}\right\}.

For j=1,2j=1,2 and vVjVinv\in V^{j}\setminus V_{in}, let θπjN(v)=θv\theta_{\pi_{j}^{\,\mathcal{N}}(v)}=\theta_{v}, and set ΘN:={θu:uVNVin}\Theta^{\mathcal{N}}\vcentcolon=\left\{\theta_{u}:u\in V^{\mathcal{N}}\setminus V_{in}\right\}.

Informally, the network N{\mathcal{N}} is obtained by putting N1\mathcal{N}_{1} and N2\mathcal{N}_{2} “side by side”, sharing only the input nodes VinV_{in}. As N1\mathcal{N}_{1} and N2\mathcal{N}_{2} are non-degenerate, so is N\mathcal{N}. Moreover, Properties (i) and (ii) of Definition 16 hold for N\mathcal{N} with πjN:Vjπj(Vj)VN\pi_{j}^{\,\mathcal{N}}:V^{j}\to\pi_{j}(V^{j})\subset V^{\mathcal{N}}, for j=1,2j=1,2.

Thus N\mathcal{N} is a proto-amalgam of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}, and so A\mathscr{A}\neq\varnothing. Now, let A=(VA,EA,VinA,VoutA,ΩA,ΘA)A\mathcal{A}=(V^{\mathcal{A}},E^{\mathcal{A}},V_{in}^{\mathcal{A}},V_{out}^{\mathcal{A}},\Omega^{\mathcal{A}},\allowbreak\Theta^{\mathcal{A}})\in\mathscr{A} be a network with the least possible number of nodes among all the networks in A\mathscr{A}, and let πj:Vjπj(Vj)VA\pi_{j}:V^{j}\to\pi_{j}(V^{j})\subset V^{\mathcal{A}}, for j{1,2}j\in\{1,2\}, be extensional isomorphisms between Nj\mathcal{N}_{j} and the appropriate subnetworks of A\mathcal{A}. We now show that A\mathcal{A} is clones-free. To this end, suppose by way of contradiction that c1,c2VAc_{1},c_{2}\in V^{\mathcal{A}} are clones. As N1\mathcal{N}_{1} is clones-free, c1,c2c_{1},c_{2} cannot both be in π1(V1)\pi_{1}(V^{1}), for otherwise π11(c1)\pi_{1}^{-1}(c_{1}) and π11(c2)\pi_{1}^{-1}(c_{2}) would be clones in N1\mathcal{N}_{1}. By the same token, c1,c2c_{1},c_{2} cannot both be in π2(V2)\pi_{2}(V^{2}). Thus, we may write w.l.o.g. c1=π1(v1)c_{1}=\pi_{1}(v_{1}) and c2=π2(v2)c_{2}=\pi_{2}(v_{2}), for some v1V1v_{1}\in V^{1} and v2V2v_{2}\in V^{2}. Now, let A~\widetilde{\mathcal{A}} be the network obtained from A\mathcal{A} by making the following alterations:

For every edge (c2,v)EA(c_{2},v)\in E^{\mathcal{A}}, where vVAv\in V^{\mathcal{A}}, introduce a new edge (c1,v)(c_{1},v) together with the associated weight ωvc2\omega_{vc_{2}}, and delete the edge (c2,v)(c_{2},v).

Delete the edges (v,c2)EA(v,c_{2})\in E^{\mathcal{A}}, as well as the node c2c_{2}.

If c2c_{2} was a node in π2(Vout2)\pi_{2}(V_{out}^{2}), then add c1c_{1} to the set VoutA~V_{out}^{\widetilde{\mathcal{A}}}.

The network A~\widetilde{\mathcal{A}} is a proto-amalgam of N1\mathcal{N}_{1} and N2\mathcal{N}_{2} via the extensional isomorphisms π~1=π1{\widetilde{\pi}_{1}=\pi_{1}} and

But A~\widetilde{\mathcal{A}} has strictly fewer nodes than A\mathcal{A}, which contradicts the minimality of A\mathcal{A}, and thereby establishes that A\mathcal{A} is clones-free, and hence A\mathcal{A} is an amalgam of N1\mathcal{N}_{1} and N2\mathcal{N}_{2}, completing the proof of existence. To establish uniqueness—up to extensional isomorphisms—of the amalgam, suppose that A\mathcal{A} and A\mathcal{A}^{\prime} are both amalgams of N1\mathcal{N}_{1} and N2\mathcal{N}_{2} via extensional isomorphisms πj:Vjπj(Vj)VA\pi_{j}:V^{j}\to\pi_{j}(V^{j})\subset V^{\mathcal{A}}, πj:Vjπj(Vj)VA\pi_{j}^{\prime}:V^{j}\to\pi_{j}^{\prime}(V^{j})\subset V^{\mathcal{A}^{\prime}}, for j{1,2}j\in\{1,2\}. We first show that

Now define ψ:VAVA\psi:V^{\mathcal{A}}\to V^{\mathcal{A}^{\prime}} according to

It follows by (46) that this definition is consistent, in the sense that the two cases in (47) yield the same value for ψ(v)\psi(v) when vπ1(V1)π2(V2)v\in\pi_{1}(V_{1})\cap\pi_{2}(V_{2}). Now, Properties (i) and (ii) of Definition 14 for ψ\psi follow, so ψ\psi is an extensional isomorphism between A\mathcal{A} and A\mathcal{A}^{\prime}, finishing the proof. ∎

Let a\bm{a}, δ\delta, and TT be as in the statement of the lemma, such that Dk(a,δ)UD_{k}^{\circ}(\bm{a},\delta)\subset\mathcal{U} and FT0F|_{T}\equiv 0. Then the function Fa:=F(+a)F_{\bm{a}}\vcentcolon=F(\,\cdot\,+\bm{a}) is holomorphic on Ua\mathcal{U}-\bm{a}, and FaTa0F_{\bm{a}}|_{T-\bm{a}}\equiv 0. Thus, as FU0F|_{\mathcal{U}}\equiv 0 if and only if FaUa0F_{\bm{a}}|_{\mathcal{U}-\bm{a}}\equiv 0, it suffices to prove the result for a=0\bm{a}=\bm{0}. Let T0:=TT_{0}\vcentcolon=T, Tk:=Dk(0,δ)T_{k}\vcentcolon=D^{\circ}_{k}(\bm{0},\delta), and, for r=1,,k1r=1,\dots,k-1, define the sets

Note that GG is holomorphic, and G(δ,δ)0G|_{(-\delta,\delta)}\,\equiv 0 by the induction hypothesis. Since the zero set of a nonzero holomorphic function in one variable does not have a limit point in the domain, we deduce that GD1(0,δ)0G|_{D_{1}^{\circ}(0,\delta)}\equiv 0. But zjz_{j} and sjs_{j} were arbitrary, so we have FTr+10F|_{T_{r+1}}\equiv 0. We have thus shown that FF is identically zero on an open subset Tk=Dk(0,δ)T_{k}=D^{\circ}_{k}(\bm{0},\delta) of its connected domain U\mathcal{U}, and so, by the multivariate identity theorem [19, 1.2.12], it must be identically zero on U\mathcal{U}. ∎

and so, by Lemma 4, we obtain G(0,z1,,zk)=0G(0,z_{1},\dots,z_{k})=0, for all (0,z1,,zk)V(0,z_{1},\dots,z_{k})\in\mathcal{V}. By inspection of the power series expansion of GG in V\mathcal{V}, we find that GG must have the form G(z0,z1,,zk)=z0Gz0(z0,z1,,zk)G(z_{0},z_{1},\dots,z_{k})=z_{0}\,\frac{\partial G}{\partial z_{0}}(z_{0},z_{1},\dots,z_{k}). As the function Gz0\frac{\partial G}{\partial z_{0}} is holomorphic in V\mathcal{V}, we have that z0(p+1)F(z0,,zk)=Gz0(z0,,zk)z_{0}^{-(p+1)}F(z_{0},\dots,z_{k})=\frac{\partial G}{\partial z_{0}}(z_{0},\dots,z_{k}) is holomorphic in V\mathcal{V}, contradicting the maximality of pp. Our hypothesis that FVF|_{\mathcal{V}} is not identically zero must hence be false, i.e., we have FV0F|_{\mathcal{V}}\equiv 0. Finally, by the multivariate identity theorem [19, 1.2.12], we deduce that FU0F|_{\mathcal{U}}\equiv 0. ∎

The inclusion of MM in the right-hand side is clear, so we only need to show the reverse inclusion. Note that, since MM is closed, Td/MT^{d}/M is a Lie group. We will rewrite the right-hand side of (48) by establishing a bijective correspondence between the characters χ:TdS1\chi:T^{d}\to S^{1} such that Mker(χ)M\subset\ker(\chi), and the characters f:Td/MS1f:T^{d}/M\to S^{1}. To this end, let π:TdTd/M\pi:T^{d}\to T^{d}/M be the projection map, and suppose that χ:TdS1\chi:T^{d}\to S^{1} is a character such that Mker(χ)M\subset\ker(\chi). Then χ\chi factors according to χ=fπ\chi=f\circ\pi, for some continuous homomorphism f:Td/MS1f:T^{d}/M\to S^{1}, in other words, ff is a character on Td/MT^{d}/M. Conversely, for any such ff we have that fπf\circ\pi is a character χ\chi on TdT^{d} with Mker(χ)M\subset\ker(\chi). Therefore it suffices to show that

by definition of MM, which is equivalent to