Deep learning has become a highly successful machine learning method employed in a wide range of applications such as optical character recognition , image classification , and speech recognition . In a typical deep learning scenario one aims to fit a parametric model, realized by a deep neural network, to match a set of training data points. In order to make the ensuing discussion more concrete, we begin with the definition of a neural network and the map it realizes under a nonlinearity.
L is a positive integer, referred to as the depth of N,
(D0,D1,…,DL) is an (L+1)-tuple of positive integers, called the layout,
where ρ acts on real vectors in a componentwise fashion.
Given positive integers Din and Dout, define NDin,Dout to be the set of all neural networks whose layouts (D0,…,DL) satisfy D0=Din and DL=Dout, but are otherwise arbitrary. Let N be a subset of NDin,Dout, ρ a nonlinearity, and ∼ an equivalence relation on NDin,Dout.
We say that ∼ is compatible with (N,ρ) if, for all N1,N2∈N,
We say that (N,ρ) is identifiable up to ∼ if, for all N1,N2∈N,
Thus, by informally saying that a neural network N1 in a certain class is identifiable, we mean that any neural network N2 in the same class giving rise to the same output map, i.e., ⟨N1⟩ρ=⟨N2⟩ρ, is necessarily equivalent to N2. The role of the equivalence relation ∼ in the previous definition is thus to “measure the degree of non-uniqueness”, and in particular, to accommodate symmetries within the network that may arise either from symmetries induced by the network weights and biases (such as the presence of clone pairs, to be introduced in Definition 5), symmetries of the nonlinearity (e.g., tanh is odd), or both simultaneously. These abstract concepts will be incarnated momentarily when discussing the seminal work by Fefferman , and in Section II through Definitions 4 and 5, as well as in the examples leading up to the formulation of the paper’s main results.
In , Fefferman showed that neural networks satisfying the following genericity conditions are, indeed, uniquely determined by the map they realize under the nonlinearity ρ=tanh, up to certain obvious isomorphisms of networks:
More precisely, for fixed positive integers Din and Dout, Fefferman showed that (NA1Din,Dout,tanh) is identifiable up to ∼±, where NA1Din,Dout is defined as the set of all neural networks in NDin,Dout satisfying Assumptions 1, and ∼± is defined by stipulating that N∼±N if and only if
L=L and (D0,D1,…,DL)=(D0,D1,…,DL), and
Indeed, Fefferman remarks explicitly that it would be interesting to replace Assumptions 1 with minimal hypotheses, and to study nonlinearities other than tanh. The present paper aims to address these two issues. Characterizing the fundamental nature of conditions necessary for identifiability with respect to a fixed nonlinearity, even a simple one such as tanh, is likely a rather formidable task. In fact, the minimal identifiability conditions may generally depend on “fine” properties of the nonlinearity under consideration, and it is hence unclear how much insight can be obtained by having conditions that are specific to a given nonlinearity. We will thus be interested in an identification result with very mild conditions on the weights and biases of the neural networks to be identified, while still accommodating a broad class of nonlinearities.
II Contributions
We begin with two motivating examples. These lead up to the statements of our main contributions, whose corresponding proofs are developed in the remainder of the paper. We consider nonlinearities ρ which are not necessarily odd (as tanh), and thus need an equivalence relation which dispenses with sign changes.
We say that the neural networks N and N are isomorphic, and write N≃N, if
L=L and (D0,D1,…,DL)=(D0,D1,…,DL), and
If N does not have a clone pair, we say that N satisfies the no-clones condition.
As the nonlinearity ρ in the example above is completely arbitrary, the no-clones condition is necessary to have any hope of obtaining identifiability up to ≃. Hence, with our program in mind, given positive integers Din and Dout, we define
and seek nonlinearities ρ such that (NncDin,Dout,ρ) is identifiable up to ≃. As any class strictly containing NncDin,Dout, paired with any nonlinearity, fails identifiability up to ≃, the no-clones condition furnishes a canonical minimal assumption for identifiability up to ≃. Similarly to NA1Din,Dout, the class NncDin,Dout, paired with any measurable nonlinearity ρ such that x→∞limρ(x) and x→−∞limρ(x) exist and are not equal, satisfies the universal approximation property in the sense of Hornik and Cybenko . The following example demonstrates that insisting on the no-clones condition as the only assumption on the weights, biases, and layout will necessarily come at the cost of restricting the class of nonlinearities that allow for identifiability. Let ρ(x)=min{1,max{0,x}} be the clipped rectified linear unit (ReLU) function. Note that
Now, given an arbitrary neural network N=(W1,θ1,W2,θ2,…,WL,θL) with DL=1 satisfying the no-clones condition, the network
Concretely, we have the following main result of this paper.
The function 1 is included in the linearly independent set both for the sake of greater generality of the statement, and to facilitate the proof of Theorem 2.
for m∈{1,2,3,4}. As N satisfies the no-clones condition, the networks Nm, m∈{1,2,3,4}, also satisfy the no-clones condition, and are pairwise non-isomorphic.
which stands in contradiction to Theorem 2. This completes the proof of Theorem 1.
σ is i-periodic, i.e., σ(z+i)=σ(z), for all z∈D, and
Amalgamation: In Section III we construct a neural network M∈Nnc1,n, called the amalgam of {Nj}j=1n, containing each Nj as a subnetwork. In particular, we have (⟨M⟩σ)j=⟨Nj⟩σ, for all j∈{1,…,n}. The linear dependence of {⟨Nj⟩σ}j=1n∪{1} thus translates to
Input anchoring. We then construct a third network N∈M, obtained by fixing k−1 of the k inputs of M′′ to specific real numbers, and “cutting out” all the parts of the network whose contributions to the output map have become constant in the process. The resulting network N will be a network in M of size smaller than M′, which contradicts the minimality of M′, and thereby completes the proof.
Input anchoring. Finally, we apply an input anchoring procedure to M′′ similar to the one described above. Even though now M′′ is not a network in the sense of Definition 1, the input anchoring procedure will result in a network N∈M which is a network in the sense of Definition 1, and is of smaller size than M′, again completing the proof by contradiction.
We conclude this section by laying out the organization of the remainder of the paper. In Section III we develop a graph-theoretic framework needed to define amalgams of neural networks and several other technical concepts. In Section IV we state results from complex analysis and Kronecker’s theorem needed in arguments involving analytic continuation and input splitting, respectively. The proofs of these results are relegated to the Appendix. In Section V we discuss the fine structural properties of the function σ constructed in the proof of Theorem 2. Finally, Section VI contains the proofs of our two main results.
III Directed acyclic graphs, general neural networks, and neural network amalgams
As already mentioned, in the proof of Theorem 2 we will work with a form of neural networks that does not fit in with Definitions 1 and 2. In order to accommodate this notion of neural networks, and to lighten the manipulations needed to formalize the aforementioned techniques of amalgamation and input anchoring, we introduce a graph-theoretic framework.
We start by introducing the concept of a directed acyclic graph (DAG), commonly encountered in the graph theory literature .
A directed graph is an ordered pair G=(V,E) where V is a finite set of nodes, and E⊂V×V is a set of directed edges.
A directed cycle of a directed graph G is a set {v1,…,vk}⊂V such that, for every j∈{1,…,k}, (vj,vj+1)∈E, where we set vk+1:=v1.
A directed graph G is said to be a directed acyclic graph (DAG) if it has no directed cycles.
We interpret an edge (v,v) as an arrow connecting the nodes v and v and pointing at v.
Since the graph G in Definition 7 is assumed to be acyclic, the level is well-defined for all nodes of G. We are now ready to introduce our generalized definition of a neural network.
A general feed-forward neural network (GFNN) is an ordered sextuple N=(V,E,Vin,Vout,Ω,Θ), where
G=(V,E) is a DAG, called the architecture of N,
Vout⊂V∖Vin is the set of outputs of N,
Let N=(V,E,Vin,Vout,Ω,Θ) be a GFNN. A subnetwork of N is a GFNN N′=(V′,E′,Vin′,Vout′,Ω′,Θ′) such that there exists a set S⊂V so that
E′={(v,v)∈E:v,v∈V′},
Ω′={ωvv:(v,v)∈E′}, and
Θ′={θv:v∈V′}.
If additionally Vout′=S, then N′ is uniquely specified by S. In this case we say that N′ is the ancestor subnetwork of S in N, and write N(S) for this network.
We will treat nodes v∈V only as “handles”, and never as variables or functions. This is relevant when dealing with several networks with shared nodes, such as depicted in Figure 2. On the other hand, the output map ⟨v⟩ρ realized by v is a function.
It follows that the natural domain D⟨u⟩σ of a node u is open, as it is the preimage of an open set with respect to a continuous map. Moreover, the output map ⟨u⟩σ realized by u is holomorphic on D⟨u⟩σ, as it is given explicitly by a concatenation of affine maps and the nonlinearity σ, which are themselves holomorphic functions.
The following definition is a straightforward generalization of Definition 5.
The following definition generalizes Definition 4 to GFNNs, and introduces two new concepts, termed extensional isomorphism and faithful isomorphism, which will play an important technical role throughout the remainder of the paper.
Let N1=(V1,E1,Vin,Vout1,Ω1,Θ1) and N2=(V2,E2,Vin,Vout2,Ω2,Θ2) be GFNNs with the same input nodes Vin.
We say that N1 and N2 are extensionally isomorphic, and write N1∼eN2, if there exists a bijection π:V1→V2, called an extensional isomorphism, such that the following holds:
π restricted to Vin is the identity map,
for all (v,v)∈E1, we have ωπ(v)π(v)2=ωvv1, and
for all v∈V1∖Vin, we have θπ(v)2=θv1.
We say that N1 and N2 are faithfully isomorphic, and write N1∼fN2, if they are extensionally isomorphic via π:V1→V2 with the following additional property:
Vout1=Vout2, and π restricted to Vout1 is the identity map.
In this case we call π a faithful isomorphism.
The concept of faithful isomorphisms in Definition 14 generalizes that of isomorphisms according to Definition 4. It is easily seen that extensional isomorphism is an equivalence relation on the set of all GFNNs with the same input nodes, whereas faithful isomorphism is an equivalence relation on the set of all GFNNs with the same input and output nodes. Furthermore, if N1∼eN2 via π:V1→V2, then we have ⟨π(v)⟩ρ,N2=⟨v⟩ρ,N1, for all v∈V1 and any nonlinearity ρ, and if additionally N1∼fN2, then ⟨N1⟩ρ=⟨N2⟩ρ.
We say that a GFNN N=(V,E,Vin,Vout,Ω,Θ) is non-degenerate if
V=VN(Vout), where VN(Vout) is the set of nodes of the ancestor subnetwork of Vout in N. Networks that are not non-degenerate are referred to as degenerate.
Informally, a network is non-degenerate if its every node “leads up” to at least one output. This notion is best understood with the help of examples as in Figure 3.
We are now ready to introduce the concept of amalgams of LFNNs.
Let N1=(V1,E1,Vin,Vout1,Ω1,Θ1) and N2=(V2,E2,Vin,Vout2,Ω2,Θ2) be non-degenerate clones-free LFNNs with the same input set Vin.
Let A=(VA,EA,Vin,VoutA,ΩA,ΘA) be a non-degenerate LFNN with the following properties:
There exist injective maps π1:V1→π1(V1)⊂VA and π2:V2→π2(V2)⊂VA such that the networks N1 and N2 are extensionally isomorphic to the ancestor subnetworks A(π1(Vout1)) and A(π2(Vout2)) via π1 and π2, respectively.
VA=π1(V1)∪π2(V2) and VoutA=π1(Vout1)∪π2(Vout2).
We then say that A is a proto-amalgam of N1 and N2.
If A is a clones-free proto-amalgam of N1 and N2, we say that A is an amalgam of N1 and N2.
Let N1=(V1,E1,Vin,Vout1,Ω1,Θ1) and N2=(V2,E2,Vin,Vout2,Ω2,Θ2) be non-degenerate clones-free LFNNs with a shared input set Vin. Then there exists an amalgam A of N1 and N2. Moreover, the amalgam is unique up to extensional isomorphisms.
As asserted in Proposition 1 (whose proof is deferred to the Appendix), an amalgam of two given non-degenerate clones-free LFNNs N1 and N2 always exists and is unique up to extensional isomorphisms. With slight abuse of notation, we will write N1∨N2 for an arbitrary element of the equivalence class (induced by ∼e) of all the amalgams of N1 and N2. A concrete example of an amalgam construction is provided in Figure 4. Having defined the amalgam of two non-degenerate clones-free LFNNs, we define the amalgam of any finite collection N1,…,Nn of non-degenerate clones-free LFNNs according to
By Definition 16, ⋁k=1nNk is a non-degenerate clones-free LFNN. Moreover, there exist extensional isomorphisms πj:Nj→πj(Nj)⊂⋁k=1nNk, for j∈{1,…,n}, and we have ⟨πj(v)⟩ρ,⋁k=1nNk=⟨v⟩ρ,Nj, for j∈{1,…,n}, v∈VNj, and any nonlinearity ρ.
We are now in a position to prove two lemmas that form the basis for the proof of Theorem 2. The first lemma formalizes the idea of combining multiple pairwise non-isomorphic single-output networks with linearly dependent ouput maps into one multiple-output network with linear dependency among the maps of its ouput nodes.
For all w∈VoutMa,
is constant, and we denote its value by ⟨w⟩ρ,M(a).
VMa={v∈VM:{v10,…,vD0−10}∩VM(v)=∅}, where M(v) denotes the ancestor network of v,
EMa={(v,v),v,v∈VMa},
VinMa={v10,…,vD0−10}, VoutMa=VoutM∩VMa, and
ΩMa={ωvv:(v,v)∈EMa}.
For a node v∈VM∖VMa we define recursively
and set ΘMa={θv:v∈VMa}.
The network Ma satisfies (IA-1) and (IA-2) by construction, and if M is layered, then so is Ma. Moreover, Ma is non-degenerate. To see this, let v∈VMa be arbitrary. Then, by non-degeneracy of M, there exists a w∈VoutM such that v∈VM(w). As w is connected directly with a node in VMa, it follows that w∈VMa, and so w∈VoutMa.
For a pair of nodes (c1,c2)∈VM×VM define
Let S={v∈VM({c1,c2}):VinM∩VM(v)={vD00}}, and set
EN={(v,v),v,v∈VN},
VoutN={c1,c2}∩VN,
ΩN={ωvv:(v,v)∈EN},
As the construction of N does not depend on a, we can fix an arbitrary a∈E(c1,c2), and the condition that c1 and c2 are clones in Ma then implies
where the real numbers au are defined according to (2). This, together with (4), yields
which would say that c1 and c2 are clones in M and hence stands in contradiction to the no-clones property of M. This establishes the no-clones property of N. The non-degeneracy of N follows by its construction. Now, by adding r to both sides of (5) and applying ρ, we find
IV Auxiliary results from complex analysis and Kronecker’s theorem
We state the remaining auxiliary results needed in the proof of our main statements. Since these results are relatively simple consequences of standard results in complex analysis and of Kronecker’s theorem, their proofs are relegated to the appendix.
Recall the definition of the natural domain D⟨u⟩σ of the map realized by a GFNN node u with respect to a holomorphic nonlinearity as given in Definition 12.
In the proof of Theorem 2 it will be crucial that D⟨u⟩σ be connected for all nodes u of a certain GFNN with a single input. The following lemma establishes this fact.
Suppose that Dk∘(a,δ)⊂U, and F(z)=0, for all z∈T. Then F=0 identically on U.
∣tn,s∣→∞ as n→∞,
V Imaginary period and the self-avoiding property
In other words, a set S is self-avoiding if the union of a finite number of distinct copies of S obtained by translating and scaling by an odd integer contains a real number which is an element of exactly one of the copies.
Moreover, for r=1,2, we have k1r<k2r<k3r if ωr>0 and k1r>k2r>k3r if ωr<0. Define the index sets
The following proposition formalizes the notion that nonlinearities σ of the form considered at the beginning of the chapter are dense in the set of sigmoidal nonlinearities, even after imposing the additional constraint that S be self-avoiding.
Denote hα=21(1+tanh(α⋅)) and consider the function ρα defined by
VI The main theorems
Let N1 and N2 be non-degenerate clones-free LFNNs with the same input and ouput sets Vin and Vout. Let
Let Nj, j∈{1,2,…,n}, be non-degenerate clones-free LFNNs with the same input set Vin and the same single output node {vout}. Furthermore, suppose that no two networks Nj1, Nj2, j1=j2, are extensionally isomorphic. Consider the nonlinearity
Before embarking on the proofs of Theorems 3 and 4, we show how Theorems 1 and 2 follow from these two results together with Proposition 3.
and Sα is a discrete self-avoiding set (as the self-avoiding property is preserved under scaling by a nonzero real number), so by Theorem 3 we obtain Nα∼fNα, which implies N≃N. ∎
and Sα is a discrete self-avoiding set, so by Theorem 4 we obtain that {⟨Njα⟩σα}j=1n∪{1} is linearly independent. Now, suppose by way of contradiction that there is linear dependency λ0+∑j=1nλj⟨Nj⟩σ=0 among {⟨Nj⟩σ}j=1n∪{1}. But then
which contradicts the linear independence of {⟨Njα⟩σα}j=1n∪{1}. We deduce that {⟨Nj⟩σ}j=1n∪{1} must be linearly independent, as desired. ∎
Next, note that σ is a real meromorphic function whose set of poles is
We now perform a calculation that will enable us to interpret the single input variable of M′ as a rational linear combination of k input variables of another LFNN M′′, to be specified below. The argument will then proceed by anchoring at all but one of the inputs of M′′. It is this last step that uses k≥2 as a key assumption, as anchoring requires at least two input nodes to be meaningful. We thus have
VinM′′={u1,…,uk} is a set of k newly-created input nodes (disjoint from VM′),
VoutM′′:=VoutM′,
Define ωvp1uj:=qpjωvj1vin, for p∈{1,…,D1}, j∈{1,…,k}, and let
ΘM′′:=ΘM′.
The procedure for constructing M′′ for a given M′ is illustrated in Figure 8.
Owing to (19) – (21) and the construction of M′′, we have the following “input splitting” relationship
where Fw corresponds to the map realized by the LFNN with nodes
Now, by definition of natural domain, for each w∈VoutM′′, we have
Moreover, as M′ and M′′ share the nodes in (23), as well as the associated edges, weights, and biases, we have
for all w∈VoutM′′, and thus
for all p,j∈{1,…,k}. Therefore, for each j∈{1,…,k}, the node vj1 will be removed when anchoring the input uj. A concrete example of this input anchoring procedure in the case k≥2 is shown schematically in Figure 9.
Thus, having anchored the nodes u1,u2,…,uk−1 to appropriate real numbers a1,…,ak−1, we will be left with a non-degenerate clones-free LFNN N=(VN,EN,{uk},VoutN,ΩN,ΘN) such that the function houtN:=∑w∈VoutNλw⟨w⟩σ,N satisfies
Claim 1: We have L(M′)≥2 and {v∈V2:(vj∗1,v)∈EM′}=∅. Proof of Claim 1. We first show that L(M′)≥2. To this end, suppose by way of contradiction that L(M′)=1. Then VoutM′=V1 by non-degeneracy, so the function hout=∑w∈VoutM′λw⟨w⟩σ,M′ can be written as
where g is analytic in an open neighborhood of t∗. But ⟨vj∗1⟩σ,M′ has a pole at t∗, and so hout has a pole at t∗, which stands in contradiction to hout≡c, and thus establishes L(M′)≥2.
Next, by way of contradiction assume that {v∈V2:(vj∗1,v)∈EM′}=∅. Then, by non-degeneracy of M′, we have vj∗1∈VoutM′, and ⟨w⟩σ,M′, for w∈VoutM′∖{vj∗1}, are real holomorphic functions of \big{(}\langle{v_{j}^{1}}\rangle^{\sigma,\,{\mathcal{M}^{\prime}}}\big{)}_{j\in\{1,\dots,D_{1}\}\setminus\{j^{*}\}}. Now, as ⟨vj1⟩σ,M′, j∈{1,…,D1}∖{j∗}, are analytic and real-valued at t∗, the function hout can again be written in the form (28) with g analytic in an open neighborhood of t∗. This again contradicts hout≡c, and thus {v∈V2:(vj∗1,v)∈EM′}=∅, establishing the claim. We can therefore enumerate the nodes V2={v12,…,vd2,vd+12,…,vD22} so that
When D1=1, the functions fp are all identically zero. For given p∈{1,…,d}, z∈LB is a singularity of ⟨vp2⟩σ,M′ if and only if z is an element of D(t∗,ϵ) such that
where P is the set of poles of σ, expressed in terms of S by (14). But
for all z∈D(t∗,ϵ), so it suffices to ensure that
where in (35) we used the definition of zn,s, in (36) we used the i-periodicity of σ, in (37) we used (32), and in (38) we used ωvp2vj∗1=∑j=1kˉqˉpjωvj2vj∗1 and the i-periodicity of σ again. As B was chosen so that the functions (29) do not have singularities in LB, all the quantities in the calculation (35)–(38) are well-defined.
Motivated by (35)–(38), we construct a GFNN M′′=(VM′′,EM′′,VinM′′,VoutM′′,ΩM′′,ΘM′′) as follows
First, kˉ new nodes are created and enumerated as {u1,…,ukˉ}. Now, if D1>1, then let VinM′′={vin,u1,…,ukˉ}, and if D1=1, set VinM′′={u1,…,ukˉ}.
VoutM′′:=VoutM′∖{vj∗1},
define ωvp2uj:=qˉpjωvj2v11, for p=1,…,d, j=1,…,kˉ, and let
The construction of M′′ for a concrete M′ is illustrated in Figure 10. Note that M′′ is not layered in the case D1>1, due to the presence of the node vin. Owing to (35)–(38) and the construction of M′′, we have the following “input splitting” relationship:
We next show that M′′ is non-degenerate and clones-free. To establish non-degeneracy, it suffices to show VinM′′⊂⋃w∈VoutM′′VM′′(w). First note that, in both cases D1=1 and D1>1, for a given j∈{1,…,kˉ}, there exists a w∈VoutM′∖{vj∗1} such that vj2∈VM′(w), by non-degeneracy of M′. It follows that vj2∈VM′′(w) and thus uj∈VM′′(w). As j was arbitrary, we have {u1,…,ukˉ}⊂⋃w∈VoutM′′VM′′(w), which establishes non-degeneracy of M′′ in the case D1=1. For D1>1 we need to additionally show that vin∈VM′′(w). To this end, note that there exist an m∗∈{1,…,D1}∖{j∗} and a w∈VoutM′∖{vj∗1} such that vm∗1∈VM′(w), and so vin∈VM′′(w), as desired. The clones-free property of M′′ follows by the same argument as in the case k≥2.
Once again, we revisit the function hout(t)=∑w∈VoutM′λw⟨w⟩σ,M′(t)=c, for all t∈Dhout, and proceed in a similar fashion as in the case k≥2. This time, however, the output sets VoutM′ and VoutM′′ may differ by the node vj∗1. This is a nuisance that will be dealt with below in Claim 2, but in the meantime, it is convenient to introduce the “truncated” linear dependency function
and proceed exactly as in the case k≥2. By examining the structure of M′, we see that, for each w∈VoutM′∖{vj∗1}, we can write
Now, by definition of natural domain, for each w∈VoutM′′, the natural domain D⟨w⟩σ,M′′ is the set of all z∈⋂p=1dD⟨vp2⟩σ,M′′∩⋂j=j∗D⟨vj1⟩σ,M′′ such that
Moreover, as M′ and M′′ share the nodes in (41), as well as the associated edges, weights, and biases, we have
for all w∈VoutM′′, and thus
as n→∞, which contradicts the fact that ⟨vj∗1⟩σ,M′ has a pole at t∗. This establishes vj∗1∈/VoutM′. As a consequence we further have htr=hout, and so (44) reads
for all s∈C∩Dkˉ∘(0,δ). Now, define the set
for all s∈C∩Dkˉ∘(0,δ). Now, define the set
Let Nj=(Vj,Ej,Vin,Vout,Ωj,Θj), j∈{1,2}, be networks as in the theorem statement. Let N=N1∨N2 be their amalgam and πj:VNj→πj(VNj)⊂VN the extensional isomorphisms between Nj and the corresponding subnetworks of N, for j∈{1,2}. We start by claiming that π1(w)=π2(w), for all w∈Vout. Indeed, suppose to the contrary that we have π1(w′)=π2(w′), for some w′∈Vout, and denote wj=πj(w′), j∈{1,2}. Since w1=w2, it follows that N(w1) and N(w2) are not extensionally isomorphic, for otherwise w1 and w2 would be clones, contradicting the no-clones condition for N. Now,
by assumption. But this contradicts the conclusion of Theorem 4, and thus establishes π1(w)=π2(w), for all w∈Vout. By non-degeneracy of N1, for every v∈V1, there exists a w∈Vout such that v∈VN1(w). Then π1(v)∈VN(π1(w))=VN(π2(w))=π2(VN2(w))⊂π2(V2). Similarly, for every v∈V2, we have π2(v)∈π1(V1). Thus, the function ψ:V1→V2 given by ψ=π2−1∘π1 is well-defined. This function is invertible with inverse π1−1∘π2, so it is a bijection. Therefore ψ is an extensional isomorphism between N1 and N2, by virtue of being a composition of two extensional isomorphisms. Moreover, we have ψ(w)=π2−1(π1(w))=w, for all w∈Vout, so ψ restricted to Vout is the identity map, and thus ψ is a faithful isomorphism. ∎
Acknowledgment
The authors would like to thank Thomas Allard for useful suggestions regarding the proof of Proposition 3 and an anonymous reviewer for proposing a clearer exposition of Lemma 6.
References
Appendix: proofs of auxiliary results
Fix N1 and N2 as in the statement of the proposition. We begin by establishing the existence of a corresponding amalgam A. Let A denote the set of all proto-amalgams of N1 and N2. To see that A is non-empty, consider the LFNN N=(VN,EN,Vin,VoutN,ΩN,ΘN) specified as follows:
Let S be a set of cardinality #(V1∖Vin)+#(V2∖Vin) disjoint from Vin, and set VN:=Vin∪S. Furthermore, let πjN:Vj→πjN(Vj)⊂VN be injective functions such that πjN(v)=v, for v∈Vin, j∈{1,2}, and π1N(V1∖Vin)∩π2N(V2∖Vin)=∅, but otherwise arbitrary.
VoutN:=π1N(Vout1)∪π2N(Vout2).
ΩN:={ωvu:(u,v)∈EN}.
For j=1,2 and v∈Vj∖Vin, let θπjN(v)=θv, and set ΘN:={θu:u∈VN∖Vin}.
Informally, the network N is obtained by putting N1 and N2 “side by side”, sharing only the input nodes Vin. As N1 and N2 are non-degenerate, so is N. Moreover, Properties (i) and (ii) of Definition 16 hold for N with πjN:Vj→πj(Vj)⊂VN, for j=1,2.
Thus N is a proto-amalgam of N1 and N2, and so A=∅. Now, let A=(VA,EA,VinA,VoutA,ΩA,ΘA)∈A be a network with the least possible number of nodes among all the networks in A, and let πj:Vj→πj(Vj)⊂VA, for j∈{1,2}, be extensional isomorphisms between Nj and the appropriate subnetworks of A. We now show that A is clones-free. To this end, suppose by way of contradiction that c1,c2∈VA are clones. As N1 is clones-free, c1,c2 cannot both be in π1(V1), for otherwise π1−1(c1) and π1−1(c2) would be clones in N1. By the same token, c1,c2 cannot both be in π2(V2). Thus, we may write w.l.o.g. c1=π1(v1) and c2=π2(v2), for some v1∈V1 and v2∈V2. Now, let A be the network obtained from A by making the following alterations:
For every edge (c2,v)∈EA, where v∈VA, introduce a new edge (c1,v) together with the associated weight ωvc2, and delete the edge (c2,v).
Delete the edges (v,c2)∈EA, as well as the node c2.
If c2 was a node in π2(Vout2), then add c1 to the set VoutA.
The network A is a proto-amalgam of N1 and N2 via the extensional isomorphisms π1=π1 and
But A has strictly fewer nodes than A, which contradicts the minimality of A, and thereby establishes that A is clones-free, and hence A is an amalgam of N1 and N2, completing the proof of existence. To establish uniqueness—up to extensional isomorphisms—of the amalgam, suppose that A and A′ are both amalgams of N1 and N2 via extensional isomorphisms πj:Vj→πj(Vj)⊂VA, πj′:Vj→πj′(Vj)⊂VA′, for j∈{1,2}. We first show that
Now define ψ:VA→VA′ according to
It follows by (46) that this definition is consistent, in the sense that the two cases in (47) yield the same value for ψ(v) when v∈π1(V1)∩π2(V2). Now, Properties (i) and (ii) of Definition 14 for ψ follow, so ψ is an extensional isomorphism between A and A′, finishing the proof. ∎
Let a, δ, and T be as in the statement of the lemma, such that Dk∘(a,δ)⊂U and F∣T≡0. Then the function Fa:=F(⋅+a) is holomorphic on U−a, and Fa∣T−a≡0. Thus, as F∣U≡0 if and only if Fa∣U−a≡0, it suffices to prove the result for a=0. Let T0:=T, Tk:=Dk∘(0,δ), and, for r=1,…,k−1, define the sets
Note that G is holomorphic, and G∣(−δ,δ)≡0 by the induction hypothesis. Since the zero set of a nonzero holomorphic function in one variable does not have a limit point in the domain, we deduce that G∣D1∘(0,δ)≡0. But zj and sj were arbitrary, so we have F∣Tr+1≡0. We have thus shown that F is identically zero on an open subset Tk=Dk∘(0,δ) of its connected domain U, and so, by the multivariate identity theorem [19, 1.2.12], it must be identically zero on U. ∎
and so, by Lemma 4, we obtain G(0,z1,…,zk)=0, for all (0,z1,…,zk)∈V. By inspection of the power series expansion of G in V, we find that G must have the form G(z0,z1,…,zk)=z0∂z0∂G(z0,z1,…,zk). As the function ∂z0∂G is holomorphic in V, we have that z0−(p+1)F(z0,…,zk)=∂z0∂G(z0,…,zk) is holomorphic in V, contradicting the maximality of p. Our hypothesis that F∣V is not identically zero must hence be false, i.e., we have F∣V≡0. Finally, by the multivariate identity theorem [19, 1.2.12], we deduce that F∣U≡0. ∎
The inclusion of M in the right-hand side is clear, so we only need to show the reverse inclusion. Note that, since M is closed, Td/M is a Lie group. We will rewrite the right-hand side of (48) by establishing a bijective correspondence between the characters χ:Td→S1 such that M⊂ker(χ), and the characters f:Td/M→S1. To this end, let π:Td→Td/M be the projection map, and suppose that χ:Td→S1 is a character such that M⊂ker(χ). Then χ factors according to χ=f∘π, for some continuous homomorphism f:Td/M→S1, in other words, f is a character on Td/M. Conversely, for any such f we have that f∘π is a character χ on Td with M⊂ker(χ). Therefore it suffices to show that