Spatially Coupled Ensembles Universally Achieve Capacity under Belief Propagation

Shrinivas Kudekar, Tom Richardson, Ruediger Urbanke

I Introduction

Ever since the publication of Shannon’s seminal paper and the introduction of the first coding schemes by Hamming and Golay , coding theory has been concerned with finding low-delay and low-complexity capacity-achieving schemes. The interested reader can find an excellent historical review in . Let us just briefly mention some of the highlights before focusing on those parts that are the most relevant for our purpose.

In the first 50 years, coding theory focused on the construction of algebraic coding schemes and algorithms that were capable of exploiting the algebraic structure. Two early highlights of this line of research were the introduction of Bose-Chaudhuri-Hocquenghem (BCH) codes as well as Reed-Solomon (RS) codes . Berlekamp devised an efficient decoding algorithm and this algorithm was then interpreted by Massey as an algorithm for finding the shortest feedback-shift register that generates a given sequence . More recently, Sudan introduced a list decoding algorithm for RS codes that decodes beyond the guaranteed error-correcting radius . Guruswami and Sudan improved upon this algorithm and Koetter and Vardy showed how to handle soft information .

Another important branch started with the introduction of convolutional codes by Elias and the introduction of the sequential decoding algorithm by Wozencraft . Viterbi introduced the Viterbi algorithm . It was shown to be optimal by Forney and Omura and to be eminently practical by Heller .

An important development in transmission over the continuous input, band-limited, additive white Gaussian noise channel was the invention of the lattice codes. It was shown in that lattice codes achieve the Shannon capacity. A breakthrough in bandwidth-limited communications came about when Ungerboeck invented a technique to combine coding and modulation. Ungerboeck’s technique ushered in a new era of fast modems. The technique, called trellis-coded modulation (TCM), offered significant coding gains without compromising bandwidth efficiency by mapping binary code symbols, generated by a convolutional encoder, to a larger (non-binary) signal constellation. In Forney showed that lattice codes as well as TCM schemes may be generated by the same basic elements and the generalized technique was termed coset-coding.

Coming back to binary linear codes, in 1993, Berrou, Glavieux and Thitimajshima proposed turbo codes. These codes attain near-Shannon limit performance under low-complexity iterative decoding. Their remarkable performance lead to a flurry of research on the “turbo” principle. Around the same time, Spielman in his thesis , and MacKay and Neal in , independently rediscovered low-density parity-check (LDPC) codes and iterative decoding, both introduced in Gallager’s remarkable thesis . Wiberg showed that both turbo codes and LDPC codes fall under the umbrella of codes based on sparse graphs and that their iterative decoding algorithms are special cases of the sum-product algorithm. This line of research was formalized by Kschischang, Frey, and Loeliger who introduced the notion of factor graphs .

The next breakthrough in the design of codes (based on sparse graphs) came with the idea of using irregular LDPC codes by Luby, Mitzenmacher, Shokrollahi and Spielman , . With this added ingredient it became possible to construct irregular LDPC codes that achieved performance within $0.0045$ dB of the Shannon limit when transmitting over the binary-input additive white Gaussian noise channel, see Chung, Forney, Richardson and Urbanke . The development of these codes went hand in hand with the development of a systematic framework for their analysis by Luby, Mitzenmacher, Shokrollahi and Spielman and Richardson and Urbanke .

A central research topic for codes on graphs is the interaction of the graphical structure of a code and its performance. Turbo codes themselves are a prime example how the “right” structure is important to achieve good performance . Further important parameters and structures are, the degree distribution (dd) and in particular the fraction of degree-two variable nodes, multi-edge ensembles , degree-two nodes in a chain , and protographs .

Currently sparse graph codes and their associated iterative decoding algorithms are the best “practical” codes in terms of their trade-off between performance and complexity and they are part of essentially all new communication standards.

Polar codes represent the most recent development in coding theory . They are provably capacity achieving on binary-input memoryless output-symmetric (BMS) channels (and many others) and they have low decoding complexity. They also have no error floor due to a minimum distance which increases like the square root of the blocklength. The simplicity, elegance, and wide applicability of polar codes have made them a popular choice in the recent literature. There are perhaps only two areas in which polar codes could be further improved. First, for polar codes the convergence of their performance to the asymptotic limit is slow. Currently no rigorous statements regarding this convergence for the general case are known. But “calculations” suggest that, for a fixed desired error probability, the required blocklength scales like $1/\delta^{\mu}$ , where $\delta$ is the additive gap to capacity and where $\mu$ depends on the channel and has a value around $4$ , . Note that random block codes under MAP decoding have a similar scaling behavior but with $\mu=2$ . This implies a considerably faster convergence to the asymptotic behavior. The value $2$ is a lower bound for $\mu$ for any system since the variations of the channel itself imply that $\mu\geq 2$ . The second aspect is universality: the code design of polar codes depends on the specific channel being used and one and the same design cannot simultaneously achieve capacity over a non-trivial class of channels (under successive cancellation decoding).

Let us now connect the content of this paper to the previous discussion. Our main aim is to explain the role of a further structural element in the realm of sparse graph codes (besides the previously discussed such examples), namely that of “spatial coupling.” We will show that this coupling of graphs leads to a remarkable change in their performance. Ensembles designed in this way combine some of the nice elements of polar codes (namely the fact that they are provably capacity achieving under low complexity decoding) with the practical advantages of sparse graph codes (the codes are competitive already for moderate lengths). Perhaps most importantly, it is possible to construct universal such codes for the whole class of BMS channels. Here, universality refers to the fact that one and the same ensemble is good for a whole class of channels, assuming that at the receiver we have knowledge of the channel.

I-B Prior Work on Spatially Coupled Codes

The potential of spatially coupled codes has long been recognized. Our contribution lies therefore not in the introduction of a new coding scheme, but in clarifying the mechanism that make these codes perform so well.

The term spatially coupled codes was coined in . Convolutional LDPC codes (more precisely, terminated convolutional LDPC codes), which were introduced by Felström and Zigangirov in , and their many variants belong to this class. Why do we introduce a new term? The three perhaps most important reasons are: (i) the term “convolutional” conjures up a fairly specific node interconnection structure whereas experiments have shown that the particular nature of the connection is not important and that the threshold saturation effect occurs as soon as the connection is sufficiently strong; (ii) a well known result for convolutional codes says that the boundary conditions are “forgotten” exponentially fast; but for spatially coupled codes it is exactly the boundary condition which causes the effect and there is no decay of this effect in the spatial dimension of the code; (iii) the same effect has (empirically) been shown to hold in many other graphical models, most of them outside the realm of coding; the term “spatial coupling” is perhaps then somewhat more generally applicable.

There is a considerable literature on convolutional-like LDPC ensembles. Variations on the constructions as well as some analysis can be found in Engdahl and Zigangirov , Engdahl, Lentmaier, and Zigangirov , Lentmaier, Truhachev, and Zigangirov , as well as Tanner, D. Sridhara, A. Sridharan, Fuja, and Costello .

In , Sridharan, Lentmaier, Costello and Zigangirov consider density evolution (DE) analysis for convolutional LDPC ensembles and determine thresholds for the BEC. The equivalent results for general channels were reported by Lentmaier, Sridharan, Zigangirov and Costello in . This DE analysis is in many ways the starting point for our investigation. By comparing the thresholds to the thresholds of the underlying ensembles under MAP decoding (see e.g. ), it quickly becomes apparent that an interesting effect must be at work. Indeed, in a recent paper , Lentmaier and Fettweis followed this route and independently formulated the equality of the belief propagation (BP) threshold of convolutional LDPC ensembles and the MAP threshold of the underlying ensemble as a conjecture.

A representation of convolutional LDPC ensembles in terms of a protograph was introduced by Mitchell, Pusane, Zigangirov and Costello . The corresponding representation for terminated convolutional LDPC ensembles was introduced by Lentmaier, Fettweis, Zigangirov and Costello . A variety of constructions of LDPC convolutional codes from the graph-cover perspective is shown by Pusane, Smarandache, Vontobel, and Costello .

A pseudo-codeword analysis of convolutional LDPC codes was performed by Smarandache, Pusane, Vontobel, and Costello in . Such an analysis is important if we want to understand the error-floor behavior of spatially coupled ensembles.

In , Papaleo, Iyengar, Siegel, Wolf, and Corazza study the performance of windowed decoding of convolutional LDPC codes on the BEC. Such a decoder has a decoding complexity which is independent of the chain length, an important practical advantage. Luckily, it turns out that the performance under windowed decoding, when measured in terms of the threshold, approaches the “regular”” (without windowed decoding) threshold exponentially fast in the window size, see . The threshold saturation phenomenon therefore does not require an infinite window size.

The scaling behavior of spatially coupled ensembles, i.e., the relationship between the chain length, the number of variables per section, and the error probability is discussed by Olmos and Urbanke in .

I-C Prior Results for the Binary Erasure Channel

It was recently shown in that for transmission over the BEC spatially coupled ensembles have a BP threshold which is essentially equal to the MAP threshold of the underlying uncoupled ensemble. Further, this threshold is also essentially equal to the MAP threshold of the coupled ensemble. This phenomena was called threshold saturation in since the BP threshold takes on its largest possible value (the MAP threshold). This significant improvement in the performance is due to the spatial coupling of the underlying code. Those “sections” of the code that have already succeeded in decoding can help their neighboring less fortunate sections in the decoding process. In this manner, the information propagates from the “boundaries”, where the bits are known perfectly towards the “middle”. In a recent paper , Lentmaier and Fettweis independently formulated the same statement as a conjecture and provided numerical evidence for its validity. They attribute the observation of the equality of the two thresholds to G. Liva.

It was shown in that if we couple component codes whose Hamming distance grows linearly in the blocklength then also the resulting coupled ensembles have this property (assuming that the number of “sections” or copies of the underlying code is kept fixed). The equivalent result is true for stopping sets. This implies that for the transmission over the BEC the block BP threshold is equal to the bit BP threshold and that such ensembles do not exhibit error floors under BP decoding.

I-D Prior Results for General Binary-Input Memoryless Output-Symmetric Channels

As pointed out in a preceding section, BP thresholds for transmission over general BMS channels were computed by means of a numerical procedure by Lentmaier, Sridharan, Zigangirov and Costello in . Further, in (conjectured) MAP thresholds for some LDPC ensembles were computed according to the Maxwell construction. Comparing these two values, one can check empirically that also for transmission over general BMS channels the BP threshold of the coupled ensembles is essentially equal to the (conjectured) MAP threshold of the underlying ensemble. Indeed, recently both as well as provided further numerical evidence that the threshold saturation phenomenon also applies to general BMS channels.

For typical sparse graph ensembles the MAP threshold is not equal to the Shannon threshold but the Shannon threshold can only be reached by taking a sequence of such ensembles (e.g., a sequence of increasing degrees). There are some notable exceptions, like MN ensembles or HA ensembles. Kasai and Sakaniwa take this as a starting point to investigate in whether by spatially coupling such ensembles it is possible to create ensembles which are universally capacity achieving under BP decoding.

I-E Spatial Coupling for General Communication Scenarios, Signal Processing, Computer Science, and Statistical Physics

The principle which underlies the good performance of spatially coupled ensembles is broad. It has been shown to apply to a variety of problems in communications, computer science, signal processing, and physics. To mention some concrete examples, the threshold saturation effect (dynamical/algorithmic threshold of the system being equal to the static or condensation threshold) of coupled graphical models has been observed for rate-less codes by Aref and Urbanke , for channels with memory and multiple access channels with erasure by Kudekar and Kasai , for CDMA channels by Takeuchi, Tanaka, and Kawabata , for relay channels with erasure by Uchikawa, Kasai, and Sakaniwa , for the noisy Slepian-Wolf problem by Yedla, Pfister, and Narayanan , and for the BEC wiretap channel by Rathi, Urbanke, Andersson, and Skoglund . Uchikawa, Kurkoski, Kasai, and Sakaniwa recently showed an improvement of the BP threshold has also for transmission over the unconstrained AWGN channel using low-density lattice codes . Further, Yedla, Nguyen, Pfister and Narayanan, demonstrated the universality of spatially-coupled codes in the 2-user binary input Gaussian multiple-access channel and finite state ISI channels like the dicode-erasure channel and the dicode channel with AWGN . In they show in addition that for a fixed rate pair, spatially-coupled ensembles universally saturate the achievable region (i.e., the set of channel gain parameters that are achievable for the fixed rate pair) under BP decoding. Similarly, in they provide numerical evidence that spatially coupled ensembles achieve the symmetric information rate for the dicode erasure channel and the dicode channel with AWGN.

Statistical physics is another very natural area in which the threshold saturation phenomenon is of interest. For the so-called random $K$ -SAT problem, random graph coloring, and the Curie-Weiss model, spatially coupled ensembles were investigated by Hassani, Macris, and Urbanke, . In all these cases, the threshold saturation phenomenon was observed. This suggests that it might be possible to study difficult theoretical problems in this area, like the existence of the static threshold, by studying the dynamical threshold of a chain of coupled models, perhaps an easier problem. Further spatially-coupled models were considered by Takeuchi and Tanaka .

I-F Main Results and Consequences

In this paper we show that for transmission over general BMS channels coupled ensembles exhibit the threshold saturation phenomenon. By choosing e.g. regular component ensembles of fixed rate and increasing degree, this implies that coupled ensembles can achieve capacity over this class of channels. More precisely, for each $\delta>0$ there exists a coupled ensemble which achieves at least a fraction $1-\delta$ of capacity universally, under belief propagation decoding, over the whole class of BMS channels. The qualifier ”universal” is important here.

Coupled ensembles inherit to a large degree the error floor behavior of the underlying ensemble. Further, such an ensemble can be chosen so that it has a non-zero error correcting radius, and hence does not exhibit error floors. To achieve this, it suffices to take the variable-node degree to be at least five. This guarantees that a randomly chosen graph from such an ensemble is an expander with expansion exceeding three-quarters with high probability. This expansion guarantees an error correcting radius under the so-called flipping decoder as well as under the BP decoder, assuming that we suitably clip both the received as well as the internal messages .

Although one can empirically observe the threshold saturation phenomenon for a wide array of component codes, we state and prove the main result only for regular LDPC ensembles. This keeps the exposition manageable.

I-G Outline

In Section II we briefly review regular LDPC ensembles and their asymptotic (in the blocklength) analysis. Much of this material is standard and we only include it here to set the notation and to make the paper largely self-contained. The two most important exceptions are our in-depth discussion of the Wasserstein distance and the the so-called area threshold, in particular the (Negativity) Lemma 27.

In Section III we review some basic properties of coupled ensembles. Using simple extremes of information combining techniques, we will see in Section III-G that coupling indeed increases the BP threshold significantly, even though these simple arguments are not sufficient to characterize the BP threshold under coupling exactly.

We state our main result, namely that the BP threshold of coupled ensembles is essentially equal to the area threshold of the underlying component ensemble, in Section IV. We also discuss how one can easily strengthen this result to apply to individual codes rather than ensembles and how this gives rise to codes which are universally close to capacity under BP decoding for the whole class of BMS channels.

We end in Section IV-E with a discussion of what challenges still lie ahead. In particular, spatial coupling has been shown empirically to lead to the threshold saturation phenomenon in a wide class of graphical models. Rather than proving each such scenario in isolation, we want a common framework to analyze all such systems.

Many of the proofs are relegated to the appendices. This makes it possible to read the material on two levels – a casual level, skipping all the proofs and following only the flow of the argument, and a more detailed level, consulting the material in the appendices.

II Uncoupled Systems

II-B Binary-Input Memoryless Output-Symmetric Channels

Throughout we will assume that transmission is taking place over a BMS channel. Let $X$ denote the input and let $Y$ be the output. Further, let $p(Y=y\,|\,X=x)$ denote the transition probability describing the channel. An alternative characterization of the channel is by means of its so-called $L$ -distribution, denote it by $\mathsf{c}$ . More precisely, $\mathsf{c}$ is the distribution of

Given $\mathsf{c}$ , we write $\mathfrak{{c}}$ , $|\mathfrak{{c}}|$ , and $|\mathfrak{{C}}|$ to denote the corresponding $D$ distribution, the $|D|$ distribution and the cdf in the $|D|$ -domain, respectively, see [62, Section 4.1.4].

Typically we do not consider a single channel in isolation but a whole family of channels. We write $\{\text{BMS}(\sigma)\}$ to denote the family parameterized by the scalar $\sigma$ . Often it will be more convenient to denote this family by $\{\mathsf{c}_{\sigma}\}$ , i.e., to use the family of $L$ -densities which characterize the channel family. If it is important to make the range of the parameter $\sigma$ explicit, we will write $\{\mathsf{c}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ .

Sometimes it is convenient to use the natural parameter of the family. For example, for the three fundamental channels, the BEC, the binary symmetric channel (BSC) and the binary additive white-Gaussian noise channel (BAWGNC), the corresponding channel families are given by $\{\text{BEC}(\epsilon)\}_{0}^{1}$ , $\{\text{BSC}(p)\}_{0}^{\frac{1}{2}}$ , and $\{\text{BAWGNC}(\sigma)\}_{0}^{\infty}$ . Other times, it is more convenient to use a common parameterization. E.g., we will write $\{\text{BMS}({\tt{h}})\}$ to denote a channel family where $\text{BMS}({\tt{h}})$ denotes the element in the family of entropy ${\tt{h}}$ .

Assume that we are given a channel family $\{\text{BMS}(\sigma)\}_{\underline{\sigma}}^{\overline{\sigma}}$ . We say that the family is complete if $\text{H}(\text{BMS}(\underline{\sigma}))=0$ , $\text{H}(\text{BMS}(\overline{\sigma}))=1$ , and for each ${\tt{h}}\in$ there exists a parameter $\sigma$ so that $\text{H}(\text{BMS}(\sigma))={\tt{h}}$ . Here $\text{H}(\cdot)$ is the entropy functional defined in Section II-D.

Let $p_{Z\,|\,X}(z\,|\,x)$ denote the transition probability associated to a BMS channel $\mathsf{c}^{\prime}$ and let $p_{Y\,|\,X}(y\,|\,x)$ denote the transition probability of another BMS channel $\mathsf{c}$ . We then say that $\mathsf{c}^{\prime}$ is degraded with respect to $\mathsf{c}$ if there exists a channel $p_{Z\,|\,Y}(z\,|\,y)$ so that

We will use the notation $\mathsf{c}\prec\mathsf{c}^{\prime}$ to denote that $\mathsf{c}^{\prime}$ is degraded wrt $\mathsf{c}$ (as a mnemonic think of $\mathsf{c}$ as the erasure probability of a BEC and replace $\prec$ with $<$ ).

A useful characterization of degradation, see [62, Theorem 4.74], is that $\mathsf{c}\prec\mathsf{c}^{\prime}$ is equivalent to

for all $f(x)$ that are non-increasing and concave on $ $. Here,$ |\mathfrak{{c}}|(x) $is the so called$ |D| $-density associated to the$ L $-density$ \mathsf{c} $, see [62, p. 179]. In particular, this characterization implies that$ F(\mathsf{a})\leq F(\mathsf{b}) $for$ \mathsf{a}\prec\mathsf{b} $if$ F(\cdot) $is either the Battacharyya or the entropy functional. This is true since both are linear functionals of the distributions and their respective kernels in the$ |D| $-domain are decreasing and concave. An alternative characterization in terms of the cumulative distribution functions$ |\mathfrak{{C}}|(x) $and$ |\mathfrak{{C^{\prime}}}|(x) $is that for all$ z\in$,

A BMS channel family $\{\text{BMS}(\sigma)\}_{\underline{\sigma}}^{\overline{\sigma}}$ is said to be ordered by degradation if $\sigma_{1}\leq\sigma_{2}$ implies $\mathsf{c}_{\sigma_{1}}\prec\mathsf{c}_{\sigma_{2}}$ . (The reverse order, $\sigma_{1}\geq\sigma_{2},$ is also allowed but we generally stick to the stated convention.)

We say that an $L$ -density $\mathsf{c}$ is symmetric if $\mathsf{a}(-y)=\mathsf{a}(y)e^{-y}$ . We recall that all densities which stem from BMS channels are symmetric, see [62, Sections 4.1.4, 4.1.8 and 4.1.9]. All densities which we consider are symmetric. We will therefore not mention symmetry explicitly in the sequel.

A BMS channel family $\{\mathsf{c}_{\sigma}\}$ is said to be smooth if for all continuously differentiable functions $f(y)$ so that $e^{y/2}f(y)$ is bounded, the integral $\int f(y)\mathsf{c}_{\sigma}(y)\,{\text{d}}y$ exists and is a continuously differentiable function with respect to $\sigma$ , see [62, Definition 4.32].

The three fundamental channel families $\{\text{BEC}(\epsilon)\}_{0}^{1}$ , $\{\text{BSC}(p)\}_{0}^{\frac{1}{2}}$ , and $\{\text{BAWGNC}(\sigma)\}_{0}^{\infty}$ are all complete, ordered, smooth, and symmetric.

II-C MAP Decoder and MAP Threshold

The bit maximum a posteriori (bit-MAP) decoder for bit $i$ finds the value of $x_{i}$ which maximizes $p(x_{i}\,|\,y_{1}^{n})$ . It minimizes the bit error probability and is optimal in this sense. The block maximum a posteriori (block-MAP) decoder finds the codeword $x_{1}^{n}$ which maximizes $p(x_{1}^{n}\,|\,y_{1}^{n})$ . It minimizes the block error probability and is optimal in this sense.

Consider an ordered and complete channel family $\{\mathsf{c}_{\tt{h}}\}$ . The MAP threshold of the $(d_{l},d_{r})$ -regular ensemble for this channel family is denoted by ${\tt{h}}^{\text{\tiny MAP}}(d_{l},\!d_{r}\!)$ and defined by

In words, if we are transmitting above the MAP threshold, then the ensemble average bit-error probability is lower bounded by $h_{2}^{-1}(\delta)$ , a strictly positive constant. This ensemble is therefore not suitable for reliable transmission above this threshold.

II-D Belief Propagation, Density Evolution, and Some Important Functionals

In principle one can investigate the behavior of coupled ensembles under any message-passing algorithm. We limit our investigation to the analysis of the BP decoder, the most powerful local message-passing algorithm. We are interested in the asymptotic performance of the BP decoder, i.e., the performance when the blocklength $n$ tends to infinity. This asymptotic performance is characterized by the so-called density evolution (DE) equation .

As mentioned, all distributions associated to BMS channels are symmetric and symmetry is preserved under DE, see [62, Chapter 4] for details. There are a number of functionals of densities are of interest to us. The most important functionals are the Battacharyya, the entropy, and the error probability functional. For a density $\mathsf{a}$ these are denoted by $\operatorname{\mathfrak{B}}(\mathsf{a})$ , $\text{H}(\mathsf{a})$ , and $\operatorname{\mathfrak{E}}(\mathsf{a})$ , respectively. Assuming $\mathsf{a}$ is an $L$ -density, they are given by

We end this section with the following useful fact. The proof can be found in Appendix A.

For any $L$ -density $\mathsf{a}$ , $\operatorname{\mathfrak{B}}^{2}(\mathsf{a})\leq\text{H}(\mathsf{a})\leq\operatorname{\mathfrak{B}}(\mathsf{a})$ .

II-E Extremes of Information Combining and the Duality Rule

When we are operating on BMS channels, the quantities appearing in the DE equations are distributions. These are hard to track analytically in general, unless we are transmitting over the BEC. Often we only need bounds. In these cases extremes of information combining ideas are handy, see , [62, p. 242].

Let $F(\cdot)$ denote either $\text{H}(\cdot)$ or $\operatorname{\mathfrak{B}}(\cdot)$ and let $\alpha\in$ . Let $\mathsf{a}_{\text{\tiny BEC}}$ and $\mathsf{a}_{\text{BSC}}$ denote $L$ -densities from the families $\{\text{BEC}(\epsilon)\}$ and $\{\text{BSC}(p)\}$ , respectively, so that $F(\mathsf{a}_{\text{\tiny BEC}})=F(\mathsf{a}_{\text{BSC}})=\alpha$ . Then for any $\mathsf{b}$ ,

$\min_{\mathsf{a}:F(\mathsf{a})=\alpha}F(\mathsf{a}\circledast\mathsf{b})=F(\mathsf{a}_{\text{\tiny BEC}}\circledast\mathsf{b})$

$\max_{\mathsf{a}:F(\mathsf{a})=\alpha}F(\mathsf{a}\circledast\mathsf{b})=F(\mathsf{a}_{\text{\tiny BSC}}\circledast\mathsf{b})$

$\min_{\mathsf{a}:F(\mathsf{a})=\alpha}F(\mathsf{a}\boxast\mathsf{b})=F(\mathsf{a}_{\text{\tiny BSC}}\boxast\mathsf{b})$

$\max_{\mathsf{a}:F(\mathsf{a})=\alpha}F(\mathsf{a}\boxast\mathsf{b})=F(\mathsf{a}_{\text{\tiny BEC}}\boxast\mathsf{b})$

Discussion: Although the extremes of information combining bounds are only stated for pairs of distributions, they naturally extend to more than two distributions. E.g., we claim that $\min_{\mathsf{a}:F(\mathsf{a})=\alpha}F(\mathsf{a}^{\circledast d})=F(\mathsf{a}_{\text{\tiny BEC}})^{d}=\alpha^{d}$ . To see this, let $\{\mathsf{a}_{i}\}_{i=1}^{d}$ be any set of distributions so that $F(\mathsf{a}_{i})=\alpha$ . Then we can use Lemma 5 repeatedly to conclude that

The same remark and the same proof technique applies to the other cases.

For any $\mathsf{a}$ and $\mathsf{b}$ $\text{H}(\mathsf{a}\circledast\mathsf{b})+\text{H}(\mathsf{a}\boxast\mathsf{b})=\text{H}(\mathsf{a})+\text{H}(\mathsf{b})$ .

Note: We give a simple proof of this identity at the end of the proof of Lemma 53.

II-F Fixed Points, Convergence, and BP Threshold

We say that the density $\mathsf{x}$ is a fixed point (FP) of DE for the $(d_{l},d_{r})$ -regular ensemble and the channel $\mathsf{c}$ if

More succinctly, when the underlying ensemble is understood from the context, we say that $(\mathsf{c},\mathsf{x})$ is a FP.

One way to generate a FP is to initialize $\mathsf{x}_{0}$ with $\Delta_{0}$ and to run DE, as stated in Definition 3. We call such a FP a FP of forward DE. The resulting FPs are the “natural” FPs since they have a natural operational meaning – if we pick sufficiently long ensembles, these are the FPs which we can observe in simulations when we run the BP decoder.

We say that a sequence of distributions $\{\mathsf{a}_{i}\}$ converges weakly to a limit distribution $\mathsf{a}$ if for the corresponding cumulative distributions in the $|D|$ -domain, call them $\{\mathfrak{{A}}_{i}\}$ , for all bounded and continuous functions $f(x)$ on $$ we have

An equivalent definition is that $|\mathfrak{{A}}|_{i}(x)$ converges to $|\mathfrak{{A}}|(x)$ at points of continuity of $x.$ ∎

A simple proof of the following lemma can be found at the end of Section II-I.

In other words, the BP threshold is characterized by the largest channel parameter so that the forward DE FP is trivial.

We have just seen that the FPs of forward DE are important since they characterize the BP threshold. But there exist FPs that cannot be achieved this way. Let us review a general method of constructing FPs. Assume that, given a channel family $\{\mathsf{c}_{\sigma}\}$ , we need a FP $\mathsf{x}$ which has a given error probability $\operatorname{\mathfrak{E}}(\mathsf{x})$ , entropy $\text{H}(\mathsf{x})$ , or Battacharyya parameter $\operatorname{\mathfrak{B}}(\mathsf{x})$ . Such FPs can often be constructed, or at least their existence can be guaranteed, by a procedure introduced in . Let us recall this procedure for the case of fixed entropy.

Consider a smooth, complete, and ordered family $\{\mathsf{c}_{\tt{h}}\}$ and the $(d_{l},d_{r})$ -regular ensemble. Let us denote by $T_{{\tt{h}}}$ the ordinary density evolution operator at fixed channel $\mathsf{c}_{\tt{h}}$ . Formally,

For any ${\tt{e}}\in$ , we define the density evolution operator at fixed entropy ${\tt{e}}$ , call it $R_{{\tt{e}}}$ , as

where ${\tt{h}}(\mathsf{a},{\tt{e}})$ is the solution of $\text{H}(T_{{\tt{h}}}(\mathsf{a}))={\tt{e}}$ . Whenever no such value of ${\tt{h}}$ exists, $R_{{\tt{e}}}(\mathsf{a})$ is left undefined. Since, for a given $\mathsf{a}$ , the family $T_{{\tt{h}}}(\mathsf{a})$ is ordered by degradation, $\text{H}(T_{{\tt{h}}}(\mathsf{a}))$ is a non-decreasing function of ${\tt{h}}$ . As a consequence the equation $\text{H}(T_{{\tt{h}}}(\mathsf{a}))={\tt{e}}$ cannot have more than a single solution. Furthermore, by the smoothness of the channel family $\mathsf{c}_{\tt{h}}$ , $\text{H}(T_{{\tt{h}}}(\mathsf{a}))$ is continuous as a function of ${\tt{h}}$ . Notice that $\text{H}(T_{0}(\mathsf{a}))=0$ : if the channel is noiseless the output density at a variable nodes is noiseless as well. Therefore, a necessary and sufficient condition for a solution ${\tt{h}}(\mathsf{a},{\tt{e}})$ to exist (when the family $\{\mathsf{c}_{\tt{h}}\}$ is complete) is that $\text{H}(T_{1}(\mathsf{a}))=\text{H}((\mathsf{a}^{\boxast d_{r}-1})^{\circledast d_{l}-1})\geq{\tt{e}}$ (see Theorem 6 in ).

II-G BP Threshold for Large Degrees

What happens to the BP threshold when we fix the design rate $r=1-d_{l}/d_{r}$ and increase the degrees? The proof of the following lemma, which uses basic extremes of information combining arguments, can be found in Appendix B.

Consider transmission over an ordered and complete family $\{\mathsf{c}_{\tt{h}}\}$ of BMS channels using an $(d_{l},d_{r})$ -regular dd and BP decoding. Let $r=1-\frac{d_{l}}{d_{r}}$ be the design rate and let ${\tt{h}}^{\text{\tiny BP}}(d_{l},d_{r})$ denote the BP threshold. Then,

In particular, by increasing $d_{r}$ while keeping the rate $r$ fixed, the BP threshold converges to .

II-H The Wasserstein Metric: Definition and Basic Properties

In the sequel we will often need to measure how close various distributions are. Sometimes it is convenient to compare their entropy or their Battacharyya constant. But sometimes a more general measure is required. The Wasserstein metric is our measure of choice.

Let $|\mathfrak{{a}}|$ and $|\mathfrak{{b}}|$ denote two $|D|$ -distributions. The Wasserstein metric, denoted by $d(|\mathfrak{{a}}|,|\mathfrak{{b}}|)$ , is defined as

where $\text{Lip}(1)$ denotes the class of Lipschitz continuous functions on $ $with Lipschitz constant$ 1$. ∎

Discussion: In the sequel we will say that a function $f(x)$ is $\text{Lip}(c)$ as a shorthand to mean that it is Lipschitz continuous with constant $c$ . If we want to emphasize the domain, then we write e.g., $\text{Lip}(c)$ . Why have we defined the metric in the $|D|$ -domain? As the next lemma shows, convergence in this metric implies weak convergence. Since all the distributions of interest are symmetric, it suffices to look at the $|D|$ -domain instead of the $D$ -domain. To ease our notation, however, we will formally write expressions like $d(\mathsf{a},\mathsf{b})$ , i.e., we will allow the arguments to be e.g. $L$ -distributions. It is then implied that the metric is determined using the equivalent $|D|$ -domain representations as defined above.

In the following, $\mathsf{a}$ , $\mathsf{b}$ , $\mathsf{c}$ , and $\mathsf{d}$ denote $L$ -distributions.

In the $|D|$ domain we have the following expressions for $\operatorname{\mathfrak{B}}(\mathsf{a})$ and $\text{H}(\mathsf{a})$ (compare this to the expressions in the $L$ domain given in Section II-D),

where $h_{2}(x)=-x\log_{2}x-(1-x)\log_{2}(1-x)$ is the binary entropy function. See for more details on metrics for probability measures.

Boundedness: $d(\mathsf{a},\mathsf{b})\leq 1$ .

Metrizable and Weak Convergence: The Wasserstein metric induces the weak topology on the space of probability measures on $$. In other words, the space of probability measures under the weak topology is metrizable and convergence in the Wasserstein metric is equivalent to weak convergence (see [104, Theorem 6.9]).

Polish Space: The space of probability distributions on $ $metrized by the Wasserstein distance is a complete separable metric space, i.e., a Polish space, and any measure can be approximated by a sequence of probability measures with finite support, i.e., distributions of the form$ \sum_{i=1}^{n}c_{i}\delta(x-x_{i}) $, where$ \sum_{i=1}^{n}c_{i}=1 $,$ c_{i}\geq 0 $, and$ x_{i}\in$. Further, the space is compact. (See [104, Theorem 6.18].)

In general, if $\sum_{i}\alpha_{i}=1$ , then

Regularity wrt $\circledast$ : The Wasserstein metric satisfies the regularity property $d(\mathsf{a}\circledast\mathsf{c},\mathsf{b}\circledast\mathsf{c})\leq 2d(\mathsf{a},\mathsf{b})$ , so that

and for $i\geq 2$ and any distribution $\mathsf{c},$ $d(\mathsf{a}^{\circledast i}\circledast\mathsf{c},\mathsf{b}^{\circledast i}\circledast\mathsf{c})\leq 2id(\mathsf{a},\mathsf{b})$ .

Regularity wrt $\boxast$ : The Wasserstein metric satisfies the regularity property $d(\mathsf{a}\boxast\mathsf{c},\mathsf{b}\boxast\mathsf{c})\leq d(\mathsf{a},\mathsf{b})\sqrt{1-\operatorname{\mathfrak{B}}^{2}(\mathsf{c})}\leq d(\mathsf{a},\mathsf{b})$ , so that

Regularity wrt DE: Let $T_{\mathsf{c}}(\cdot)$ denote the DE operator for the dd $(d_{l},d_{r})$ and the channel $\mathsf{c}$ . Then $d(T_{\mathsf{c}}(\mathsf{a}),T_{\mathsf{c}}(\mathsf{b}))\leq\alpha d(\mathsf{a},\mathsf{b})$ , with

Wasserstein Bounds Battacharyya and Entropy:

Battacharyya Sometimes Bounds Wasserstein:

Discussion: Perhaps the most useful property of the Wasserstein metric is that it interacts nicely with the operations of variable- and check-node convolution. This is the essence of properties (vi), (vii), and (viii). For example, it is easy to see why property (viii) might be useful: Given that two distributions $\mathsf{a}$ and $\mathsf{b}$ are close, it asserts that after one iteration of DE these two distributions are again close. Indeed, as we will see shortly, depending on the Battacharyya parameter of the starting distributions the distance might in fact become smaller, i.e., we might have a contraction.

II-I Wasserstein Metric and Degradation

When densities ordered by degradation, some the Wasserstein metric inherits some additional properties.

In the following $\mathsf{a}$ and $\mathsf{b}$ denote $L$ -distributions.

Wasserstein versus Degradation: Let $\mathsf{a}\prec\mathsf{b}$ . Let $|\mathfrak{{A}}|$ and $|\mathfrak{{B}}|$ denote the corresponding $|D|$ -domain cdfs. Define $D(\mathsf{a},\mathsf{b})=\int_{0}^{1}x(|\mathfrak{{B}}|(x)-|\mathfrak{{A}}|(x)){\text{d}}x$ . Note that $D(\mathsf{a},\mathsf{b})$ can be seen as a measure of how much $\mathsf{b}$ is degraded wrt $\mathsf{a}$ since it is the average of the non-negative integrals $\int_{z}^{1}(|\mathfrak{{B}}|(x)-|\mathfrak{{A}}|(x)){\text{d}}x$ (cf. (2)). Then

Furthermore, $D(\mathsf{a},\mathsf{b})\leq 1$ and for any symmetric densities such that $\mathsf{a}\prec\mathsf{b}\prec\mathsf{c}$ , $D(\mathsf{a},\mathsf{c})=D(\mathsf{a},\mathsf{b})+D(\mathsf{b},\mathsf{c})$ .

Entropy and Battacharyya Bound Wasserstein Distance: Let $\mathsf{a}\prec\mathsf{b}$ . Then

and $\operatorname{\mathfrak{B}}(\mathsf{b})-\operatorname{\mathfrak{B}}(\mathsf{a})\leq\sqrt{2(\text{H}(\mathsf{b})-\text{H}(\mathsf{a}))}\,.$

Continuity for Ordered Families: Consider a smooth family of $L$ -distributions $\{\mathsf{c}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ ordered by degradation so that $\operatorname{\mathfrak{B}}(\cdot)$ is continuous wrt $\sigma\in[\underline{\sigma},\overline{\sigma}]$ . Then the Wasserstein metric is also continuous in $\sigma$ .

Discussion: Property (i) is particularly useful. Imagine a sequence of distributions $\{\mathsf{a}_{i}\}_{i=0}^{n}$ ordered by degradation, i.e., $\mathsf{a}_{0}\prec\mathsf{a}_{1}\prec\dots\prec\mathsf{a}_{n}$ . Then $\mathsf{a}_{0}\prec\mathsf{a}_{n}$ and we know from that $D(\mathsf{a}_{0},\mathsf{a}_{n})=\int_{0}^{1}z(|\mathfrak{{A}}|_{n}-|\mathfrak{{A}}|_{0}){\text{d}}z$ is non-negative since it is the “average” of the non-negative integrals $\int_{y}^{1}(|\mathfrak{{A}}|_{n}-|\mathfrak{{A}}|_{0}){\text{d}}z$ . Now note that $D(\cdot,\cdot)$ is additive and that $D(\mathsf{a}_{0},\mathsf{a}_{n})\leq 1$ . From these two facts we can conclude that there must exist an index $i$ , $0\leq i\leq n-1$ , so that $D(\mathsf{a}_{i},\mathsf{a}_{i+1})\leq\frac{1}{n}$ . More generally, we can conclude for any $1\leq k\leq n$ that there must exist an index $i$ , $0\leq i\leq n-k$ , so that $D(\mathsf{a}_{i},\mathsf{a}_{i+k})\leq\min\{\frac{k}{n-k+1},1\}\leq\frac{2k}{n}$ . This follows by upper bounding the average of all these $n-k+1$ such distances. By property (i) this implies “closeness” also in the Wasserstein sense. In words, in a sequence of distributions ordered by degradation we are always able to find a subsequence of distributions which are “close” in the Wasserstein sense.

As an exercise in using the basic properties of the Wasserstein distance, let us give a proof of Lemma 8.

II-J GEXIT Curve

As we have discussed in the preceding section, FPs of DE play a crucial role in the asymptotic analysis. E.g., the BP threshold is characterized by the existence/non-existence of a non-trivial FP of forward DE for a particular channel.

An even more powerful picture arises if instead of looking at a single FP at a time we visualize a whole collection of FPs. In order to visualize many FPs at the same time it is convenient to project them. E.g., given the FP pair $(\mathsf{c},\mathsf{x})$ we might decide to plot the point $(\text{H}(\mathsf{c}),\text{H}(\mathsf{x}))$ in the two-dimensional unit box $\times$ .

Note that for the BEC, erasure probability is equal to Battacharyya parameter, and also equal to entropy. Even though all these parameters are equal in this case, our language will reflect that we are plotting entropy.

Rather than plotting $x$ itself it is convenient to plot the EXIT value $(1-(1-x)^{d_{r}-1})^{d_{l}}$ . This is the locally best estimate of a bit based on the internal messages only, excluding the direct observation. For this choice the resulting curve is usually called the BP EXIT curve, see and [62, Sections 3.14 and 4.10]. It is the BP EXIT curve since the estimate is a BP estimate. And it is the BP EXIT (where the E stands for “extrinsic”) curve since the estimate excludes the received value associated to this bit.

The FP equation is $x=\epsilon(1-(1-x^{d_{r}-1}))^{d_{l}-1}$ , which we can solve for $\epsilon$ to get

Using (7) we can write down the parametric characterization of the BP EXIT curve

This curve is shown in the left-hand side in Figure 1 for the $(3,6)$ -regular ensemble and has a typical $C$ shape. In fact, one can show that, in this case, for $\epsilon<\epsilon^{\text{\tiny BP}}(d_{l},d_{r})$ (the BP threshold) there is only one FP at $x=0$ corresponding to perfect decoding; for $\epsilon=\epsilon^{\text{\tiny BP}}(d_{l},d_{r})$ there are 2 FPs, one is at $x=0$ and the other is the FP corresponding to forward DE; and for $\epsilon>\epsilon^{\text{\tiny BP}}(d_{l},d_{r})$ there are exactly 3 FPs of DE, one of the FPs is at $x=0$ and the remaining two FPs are strictly positive, one of which is stable, denoted by $x_{\text{s}}(\epsilon)$ , whereas the other is unstable, denoted by $x_{\text{u}}(\epsilon)$ . The stable FP is the FP which is reached by forward DE. For details see Lemma 59.

A quantity which will appear throughout this paper is the value of the unstable FP when transmitting over BEC $(\epsilon=1)$ . We denote this FP by $x_{\text{u}}(1)$ . More precisely, $x_{\text{u}}(1)$ is the smaller non-zero solution of $x=(1-(1-x)^{d_{r}-1})^{d_{l}-1}$ . Note that $x_{\text{u}}(1)$ depends on the degrees, but we drop it from the notation for ease of exposition.

Discussion: The above example raises the following two questions. (1) We have a large degree of freedom in selecting the projection operator. Which one is “best”? (2) From the above example we see that the set of FPs forms a smooth curve. Indeed, for the BEC it is not hard to see that the only FPs are the ones on the curve together with all the FPs of the form $(\mathsf{c}_{\epsilon},\Delta_{+\infty})$ , where $\mathsf{c}_{\epsilon}$ is any element of the family of BEC channels and $\Delta_{+\infty}$ corresponds to erasure value of 0. Is this picture still valid for general channel families?

In the remainder of this section we address the first question, i.e., we will discuss a particularly effective choice of the projection operator. In the next section we will address the question of the existence and nature of this curve for the general case, presenting some partial results.

A good choice for the projection operator for general channels is the GEXIT functional . For the BEC this coincides with the EXIT functional that we saw in Example 15. For the general case take a FP $(\mathsf{c}_{\sigma},\mathsf{x}_{\sigma})$ and define $\mathsf{y}=\mathsf{x}^{\boxast d_{r}-1}_{\sigma}$ . Then

where we think of $\mathsf{y}$ as fixed with respect to $\sigma$ . In words, $G(\mathsf{c}_{\sigma},\cdot)$ measures the ratio of the change in entropy of $\mathsf{c}_{\sigma}\circledast\mathsf{y}^{\circledast d_{l}}$ (the entropy of the decision of any variable node under BP decoding) versus the change of entropy of the channel $\mathsf{c}_{\sigma}$ as a function of $\sigma$ .

Discussion: Note that if the parameterization in $\sigma$ is Lipschitz, i.e., if for some positive constant $\alpha$ , $|\text{H}(\mathsf{c}_{\sigma_{2}})-\text{H}(\mathsf{c}_{\sigma_{1}})|\leq\alpha|\sigma_{2}-\sigma_{1}|$ , then the derivative $\frac{{\text{d}}}{{\text{d}}\sigma}\text{H}(\mathsf{c}_{\sigma})$ exists almost everywhere. Further, in this case also $\text{H}(\mathsf{c}_{\sigma}\circledast\mathsf{y}^{\circledast d_{l}})$ is Lipschitz and hence differentiable almost everywhere. This follows since by (the Duality Rule in) Lemma 6, for $\sigma_{2}\geq\sigma_{1}$ ,

where the last step on the right-hand side assumes that the parameterization is such that $\text{H}(\mathsf{c}_{\sigma})$ increases in $\sigma$ . The claim follows since both terms on the left are non-negative (due to degradation), so that in particular the first term is upper bounded by $\alpha|\sigma_{2}-\sigma_{1}|$ , i.e., it is Lipschitz. This formulation also shows that the numerator is no larger than the denominator (so that the ratio exists) and that the GEXIT value is upper bounded by $1$ (and is non-negative).

We get the GEXIT curve by plotting $(\text{H}(\mathsf{c}_{\sigma}),G(\mathsf{c}_{\sigma},\mathsf{y}^{\circledast d_{l}}))$ for a family of FPs $\{\mathsf{c}_{\sigma},\mathsf{x}_{\sigma}\}$ . This is shown in Figure 2 for the $(3,6)$ -regular ensemble assuming that transmission takes place over the BAWGNC. In the last section we have already explained how we can construct in the general case FPs by a numerical procedure. To plot Figure 2 we have used this procedure to get a complete family of FPs for all entropies from to $1$ . In each of the two pictures of Figure 2 there is a small black dot. This dot marks a particular FP and the two small inlets show the corresponding distribution of the channel $\mathsf{c}_{\sigma}$ as well as the message distribution emitted at the variable nodes, call it $\mathsf{x}_{\sigma}$ . For a detailed discussion we refer the reader to .

Why do we use this particular representation? As we will discuss in detail in Section II-L, assuming this curve indeed exists and is “smooth”, the area which is enclosed by it is equal to $r=1-d_{l}/d_{r}$ , the design rate of the ensemble.

This is easy to see for the BEC. To simplify notation, denote the GEXIT value in this case by $G(\epsilon,y^{d_{l}})$ , where $\epsilon$ is the erasure probability, $x$ is the FP for this channel parameter, and $y=1-(1-x)^{d_{r}-1}$ . We then have $G(\epsilon,y^{d_{l}})=(1-(1-x)^{d_{r}-1})^{d_{l}}$ . Let us integrate the area which is enclosed by this curve. We call the corresponding integral the GEXIT integral. For our particular case it is given by

Perhaps surprisingly, the result stays valid for general channels as we will discuss in Section II-L. This property is one of the main ingredients in our proof.

Note that given $\mathsf{c}_{{\tt{h}}}$ and $\mathsf{z}_{{\tt{h}}}$ , the GEXIT functional $G(\mathsf{c}_{{\tt{h}}},\mathsf{z}_{{\tt{h}}})$ can be expressed in the form $\int\mathsf{z}_{{\tt{h}}}(w)f({\tt{h}},w){\text{d}}w$ , where $f({\tt{h}},w)$ is called as the GEXIT kernel. In the $|D|$ -domain this kernel is given by

For a proof of the following see Lemma 4.77, .

For a smooth, ordered, channel family $\{\mathsf{c}_{{\tt{h}}}\}_{{\tt{h}}}$ , $f({\tt{h}},w)$ , as a function of $w$ , exists, is continuous, non-negative, non-increasing and concave on its entire domain. Further $f({\tt{h}},0)=1$ and $f({\tt{h}},1)=0$ .

We remark that the above lemma also holds when $\{\mathsf{c}_{{\tt{h}}}\}$ is piece-wise linear.

II-K Existence of GEXIT Curve

As we briefly discussed above, for the BEC it is trivial to see that the BP GEXIT curve indeed exists. But for general BMS channels this is not immediate. The aim of this section is to show the existence of the BP GEXIT curve for at least a subset of parameters.

Let us first recall the following lemma which was stated and proved in a slightly weaker form in . For the convenience of the reader we reproduce the proof in Appendix E.

Assume that communication takes place over an ordered and complete family $\{\mathsf{c}_{{\tt{h}}}\}_{{\tt{h}}}$ , where ${\tt{h}}=\text{H}(\mathsf{c}_{\tt{h}})$ , using the dd pair $(d_{l},d_{r})$ .

Then, for any ${\tt{h}}\in$ , there exists at most one density $\mathsf{x}_{{\tt{h}}}$ so that $(\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}})$ forms a FP which fulfills

Furthermore, if such a density $\mathsf{x}_{{\tt{h}}}$ exists, then it coincides with the forward DE FP. Finally, $\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}})$ is Lipschitz continuous with respect to $\operatorname{\mathfrak{B}}(\mathsf{c}_{{\tt{h}}})$ . More precisely, if two FPs $(\mathsf{c}_{{\tt{h}}_{1}},\mathsf{x}_{{\tt{h}}_{1}})$ and $(\mathsf{c}_{{\tt{h}}_{2}},\mathsf{x}_{{\tt{h}}_{2}})$ satisfy the condition $\operatorname{\mathfrak{B}}(\mathsf{c}_{{\tt{h}}_{i}})(d_{l}-1)(d_{r}-1)(1-\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}_{i}})^{2})^{d_{r}-2}\leq 1-\delta$ for some $\delta>0$ , then

The following lemma states that, at least for sufficiently large entropies, the BP GEXIT curve indeed exists and is well behaved.

Assume that communication takes place over an ordered and complete family $\{\mathsf{c}_{{\tt{h}}}\}_{{\tt{h}}}$ , where ${\tt{h}}=\text{H}(\mathsf{c}_{\tt{h}})$ , using the dd pair $(d_{l},d_{r})$ . Consider the set of FP pairs $\{(\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}})\}$ obtained by applying forward DE to each channel $\mathsf{c}_{{\tt{h}}}$ . Let

Although it is easy and stable to compute the above lower bound on the entropy numerically, it will be convenient to have a universal and analytic such lower bound. This is accomplished in the following lemma, whose proof can be found in Appendix E.

Assume that communication takes place over an ordered and complete family $\{\mathsf{c}_{{\tt{h}}}\}_{{\tt{h}}}$ , where ${\tt{h}}=\text{H}(\mathsf{c}_{\tt{h}})$ , using the dd pair $(d_{l},d_{r})$ with $d_{r}\geq 4$ and $d_{l}\geq 3$ . Let $a(x)$ be defined as in Lemma 18. Consider the set of FP pairs $\{(\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}})\}$ which is derived by applying forward DE to each channel $\mathsf{c}_{{\tt{h}}}$ . Then the GEXIT curve associated to $\{(\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}})\}_{{\tt{h}}>\overline{{\tt{h}}}}$ , where

Table I lists these universal upper bounds $\overline{{\tt{h}}}$ for all the dds.

The following corollary follows immediately from Lemma 17, property (ii) of Lemma 14, and property (ix) of Lemma 13.

The proof of the following lemma can be found in Appendix F.

Bound for Partially Degraded Case: Let $\mathsf{a^{\prime}}$ be degraded with respect to the channel density $\mathsf{a}$ and let $\mathsf{b^{\prime}}$ be such that $d(\mathsf{b}^{\prime},\mathsf{b})\leq\delta$ . Then

Bound for Fully Degraded Case: Let $\mathsf{a^{\prime}}$ be degraded with respect to the channel density $\mathsf{a}$ and let $\mathsf{b^{\prime}}$ be degraded with respect to the channel density $\mathsf{b}.$ Then

We will find it more convenient to parameterize the densities using $b=b({\tt{h}})=\operatorname{\mathfrak{B}}(\mathsf{c}_{\tt{h}}).$ Let us define

We claim that $D(b^{\prime},b)$ is continuous in both its arguments. Note that $G_{\tt{h}}=D(b({\tt{h}}),b({\tt{h}}))\frac{{\text{d}}b({\tt{h}})}{{\text{d}}{\tt{h}}}$ and, correspondingly, we define $G_{b}=D(b,b).$ To show continuity of $D$ in the first component note that $(D(b^{\prime\prime},b)-D(b^{\prime},b))\rightarrow 0$ by the smooth channel family assumption. To show continuity of $D$ in the second component consider $\text{H}((\mathsf{c}_{b^{\prime\prime\prime}}-\mathsf{c}_{b^{\prime\prime}})\circledast(\mathsf{z}_{b^{\prime}}-\mathsf{z}_{b})).$ By (the Entropy Product Inequality) Lemma 21, property (iii), we have

showing that $D$ is actually Lipschitz in its second argument. It follows, in particular, that $G_{b}$ is continuous in $b$ . Since the Battacharyya parameter is a bounded functional and the channel family is smooth, we have $\frac{{\text{d}}b({\tt{h}})}{{\text{d}}{\tt{h}}}$ is continuous in ${\tt{h}}$ . Consequently, $G_{\tt{h}}$ is continuous in ${\tt{h}}$ . ∎

II-L Area Theorem

In Section II-J we introduced the GEXIT curve associated to a regular ensemble, see e.g. Figure 2. In Section II-K we then derived conditions which guarantee that this curve indeed exists and is continuous in a given region. We will now discuss the GEXIT integral, the area under the GEXIT curve. In order to derive some properties of this integral, we will first introduce GEXIT integrals in a slightly more general form before we apply them to ensembles.

Given two families $\{\mathsf{c}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ and $\{\mathsf{z}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ , the GEXIT integral $\{\mathsf{c}_{\sigma},\mathsf{z}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ is defined as

Discussion: In the above definition, and some definitions below, we need regularity conditions to ensure that the integrals exist. Rather than stating some general conditions here, we will discuss and verify them in the specific cases. E.g., one case we will discuss is if the channel family $\mathsf{c}_{\sigma}$ is smooth and $\mathsf{z}_{\sigma}$ is a polynomial in $\sigma$ with “coefficients” which are fixed densities.

Consider a binary linear code of length $n$ whose graphical representation is a tree. Assume that we are given an ordered family of channels $\{\mathsf{c}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ . Assume that when all variable nodes “see” the channel $\mathsf{c}_{\sigma}$ the distribution of the resulting extrinsic BP message density at the $i$ -th variable node is $\mathsf{z}_{\sigma,i}$ . Then the GEXIT integral associated of the $i$ -th variable node is $G(\{\mathsf{c}_{\sigma},\mathsf{z}_{\sigma,i}\}_{\underline{\sigma}}^{\overline{\sigma}})$ . ∎

Discussion: Note that the distribution $\mathsf{z}_{\sigma,i}$ is the best guess we can make about bit $i$ given the code constraints and all observations except the direct observation on bit $i$ . This is why we have called the distribution the extrinsic message density. Note further that we have assumed that the graphical structure of the code is a tree. Therefore, BP equals MAP, the optimal such estimator.

The GEXIT integral applied to an ensemble is just the integral under the GEXIT curve of this ensemble.

Consider the $(d_{l},d_{r})$ -regular ensemble and assume that $\{\mathsf{c}_{\sigma},\mathsf{x}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ is a family of FPs of DE. Define $\mathsf{y}_{\sigma}=\mathsf{x}_{\sigma}^{\boxast d_{r}-1}$ . Then

In the sequel it will be handy to explicitly evaluate the integral. The proof of the following lemma is contained in Appendix G.

Discussion: Note that this GEXIT integral has a simple graphical interpretation; it is the area under the GEXIT curve as e.g. shown in the right-hand picture of Figure 1. The condition ${\tt{h}}^{*}\geq{\tt{h}}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ ensures that this curve is well defined and integrable.

We have seen in the last section that the value of a GEXIT integral of an ensemble is determined by the expression $A$ . We will soon see that it is crucial to describe the region where $A$ is negative. The following lemma, whose proof can be found in Appendix H, gives a characterization of this property.

Let $(\mathsf{c},\mathsf{x})$ be an approximate FP of the $(d_{l},d_{r})$ -regular ensemble of design rate $r=1-d_{l}/d_{r}$ . Assume that $d_{r}\geq 1+5(\frac{1}{1-r})^{\frac{4}{3}}$ and for some fixed $0\leq\delta\leq(\frac{\ln(2)d_{l}}{16\sqrt{2}d_{r}})^{2}$ , $d(\mathsf{x},\mathsf{c}\circledast(\mathsf{x}^{\boxast d_{r}-1})^{\circledast d_{l}-1})\leq\delta$ . Let

For $0\leq\kappa\leq\frac{1}{4e}\frac{d_{l}}{d_{r}}$ , if $\text{H}(\mathsf{x})\in[(\frac{3}{4})^{\frac{d_{l}-1}{2}}+\frac{1}{(d_{r}-1)^{3}},\frac{d_{l}}{d_{r}}-d_{l}e^{-4(d_{r}-1)(\frac{2d_{l}}{11ed_{r}})^{\frac{4}{3}}}-\kappa]$ , then $A\leq-\kappa$ .

Discussion: In words, for sufficiently high degrees, $A(\mathsf{x})$ is strictly negative for all $\mathsf{x}$ with entropies in the range $(0,d_{l}/d_{r})$ . Note that $d_{l}/d_{r}$ corresponds to the Shannon threshold for a code of rate $1-d_{l}/d_{r}$ . In the preceding lemma we introduced the notion of an approximate FP of DE: we say that $(\mathsf{c},\mathsf{x})$ is a $\delta$ -approximate FP if for some $\delta>0$ we have $d(\mathsf{x},\mathsf{c}\circledast(\mathsf{x}^{\boxast d_{r}-1})^{\circledast d_{l}-1})\leq\delta$ .

II-M Area Threshold

The most important goal of this paper is to show that suitable coupled ensembles achieve the capacity. The preceding (Negativity) Lemma 27 is an important tool for this purpose. But we will in fact prove a refined statement, namely we will determine the threshold for fixed dds. This threshold is the so-called area threshold and it was first introduced in in the context of the Maxwell construction.

Consider the $(d_{l},d_{r})$ -regular ensemble and transmission over a complete and ordered channel family $\{\mathsf{c}_{\tt{h}}\}_{{\tt{h}}=0}^{1}$ . For each ${\tt{h}}\in$ , let $\mathsf{x}_{\tt{h}}$ be the forward DE FP associated to channel $\mathsf{c}_{\tt{h}}$ . The area threshold, denote it by ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ , is defined as

where $A(\mathsf{x}_{{\tt{h}}},d_{l},d_{r})$ is equal to $A$ , which is given in Lemma 26, evaluated at the FP $\mathsf{x}_{{\tt{h}}}$ , when transmitting with the $(d_{l},d_{r})$ -regular ensemble. ∎

Table II gives some values for ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ for various dds and channels.

Recall that the GEXIT integral has a simple graphical interpretation – it is the area under the GEXIT curve, assuming of course that both the curve and the integral exist. The area threshold is therefore that channel parameter ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ such that the GEXIT integral from ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ to $1$ is equal to $1-\frac{d_{l}}{d_{r}}$ , the design rate.

Consider e.g. the case of the $(10,20)$ -regular dd depicted in Figure 3.

From Lemma 19 we know that the GEXIT curve is Lipschitz continuous at least in the range ${\tt{h}}\in[0.341,1]$ . An explicit check shows that $A(\mathsf{x}_{{\tt{h}}=0.341})<0$ , so that ${\tt{h}}^{A}\geq 0.341$ . We know that for ${\tt{h}}\in[0.341.1]$ the expression $1-\frac{d_{l}}{d_{l}}-A(\mathsf{x}_{\tt{h}})$ corresponds to the area under this GEXIT curve between ${\tt{h}}$ and $1$ . This expression is therefore a decreasing function in ${\tt{h}}$ , or equivalently, $A(\mathsf{x}_{\tt{h}})$ is an increasing function in ${\tt{h}}$ . Using bisection, we can therefore efficiently find the area threshold and we get ${\tt{h}}^{A}\approx 0.49985$ . Note that for this case the area threshold has the interpretation as that unique channel parameter ${\tt{h}}^{A}$ so that the enclosed area under the GEXIT curve between ${\tt{h}}^{A}$ and $1$ is equal to $1-\frac{d_{l}}{d_{r}}$ . This is obviously the reason for calling ${\tt{h}}^{A}$ the area threshold.

The same interpretation applies to any dd ( $d_{l},d_{r})$ and any BMS channel where the area threshold ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ is such that the GEXIT curve from ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ up till $1$ exists and is integrable. Empirically this is true for all regular dds and all BMS channels. Consider e.g. the case of the $(3,6)$ ensemble and transmission over the BAWGNC, see Figure 4. From Table I we are assured that this curve exists and is smooth at least in the range ${\tt{h}}\in[0.5931,1]$ . This region is unfortunately too small. But it is easy to compute the curve numerically over the whole range. Since the resulting curve is smooth everywhere, it is easy to compute the area threshold numerically in this way. We get ${\tt{h}}^{A}\approx 0.4792$ .

Fortunately, if we fix the rate then for all dd of sufficiently high degree this interpretation applies.

Consider a sequence of $(d_{l},d_{r})$ -regular ensembles of fixed design rate $r=1-d_{l}/d_{r}$ and with $d_{l},d_{r}$ tending to infinity.

Furthermore, $A(\mathsf{x}_{{\tt{h}}^{A}},d_{l},d_{r})=0$ and, for fixed rate and increasing degrees, the sequence of the area thresholds ${\tt{h}}^{\text{A}}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ converges to the Shannon threshold ${\tt{h}}^{\text{Shannon}}(d_{l},d_{r})=\frac{d_{l}}{d_{r}}=1-r$ universally over the whole class of BMS channel families.

Let us prove the last claim of the lemma. We want to show that at the area threshold $A(\mathsf{x}_{{\tt{h}}^{A}},d_{l},d_{r})=0$ . Recall that the area threshold was defined as the supremum over all ${\tt{h}}$ so that $A(\mathsf{x}_{{\tt{h}}},d_{l},d_{r})$ is less than or equal to zero. Therefore, all we need to show is that $A(\mathsf{x}_{{\tt{h}}},d_{l},d_{r})$ is continuous as a function of ${\tt{h}}$ around ${\tt{h}}^{A}$ .

III Coupled Systems

Our goal is to show that coupled ensembles can achieve capacity on general BMS channels. Let us recall the definition of an ensemble which is particularly suited for the purpose of analysis. We call it the $(d_{l},d_{r},L,w)$ ensemble. This is the ensemble we use throughout the paper. For a quick historical review on some of the many variants see Section I-B.

For the whole paper we will always be interested in the limit when $M$ tends to infinity while $L$ as well as $d_{l}$ and $d_{r}$ stay fixed. In this limit we can analyze the system via density evolution, simplifying our task.

Not surprisingly, spatially coupled ensembles inherit many of their properties from the underlying ensemble. Perhaps most importantly, the local connectivity is the same. Further, the design rate of the coupled ensemble is close to that of the original one. A proof of the following lemma can be found in .

The design rate of the ensemble $(d_{l},d_{r},L,w)$ , with $w\leq L$ , is given by

There is an entirely equivalent way of describing a spatially coupled ensemble in terms of a circular construction. This construction has the advantage that it is completely symmetric. This simplifies some of the ensuing proofs.

Given an $(d_{l},d_{r},L,w)$ ensemble we can associate to it a circular ensemble. This circular ensemble has $w-1$ extra sections, all of whose variable nodes are set to zero. To be concrete, we assume that the sections are numbered from $[-L,L+w-1]$ , where the sections in $[-L,L]$ are the sections of the original ensemble and the sections in $[L+1,L+w-1]$ are the extra sections. In this new circular ensemble all index calculations (for the connections) are done modulo $2L+w$ and indices are mapped to the range $[-L,L+w-1]$ . For all positions in the range $i\in[L+1,L+w-1]$ the channel is $\mathsf{c}_{i}=\Delta_{+\infty}$ , and consequently, $\mathsf{x}_{i}=\Delta_{+\infty}$ . For all “regular” positions $i\in[-L,L]$ the associated channel is the standard channel $\mathsf{c}$ . This circular ensemble has design rate equal to $1-d_{l}/d_{r}$ . ∎

As we will see, it is the global structure which helps all the individual codes to perform so well – individually they can only achieve their BP threshold, but together they reach their MAP performance.

III-B Density Evolution for Coupled Ensemble

Let us describe the DE equations for the $(d_{l},d_{r},L,w)$ ensemble. In the sequel, densities are $L$ -densities. Let $\mathsf{c}$ denote the channel and let $\mathsf{x}_{i}$ denote the density which is emitted by variable nodes at position $i$ . Throughout the paper, $\Delta_{+\infty}$ denotes an $L$ -density with all its mass at $+\infty$ and represents the perfect decoding density. Also, $\Delta_{0}$ denotes an $L$ -density with all its mass at and represents a density with no information.

Note that $g(\mathsf{x},\dots,\mathsf{x})=(\mathsf{x}^{\boxast d_{r}-1})^{\circledast d_{l}-1}$ , where the right-hand side represents DE (without the effect of the channel) for the underlying $(d_{l},d_{r})$ -regular ensemble. Also define

As before we see that $\hat{g}(\mathsf{x},\!\dots\!,\mathsf{x})$ denotes the EXIT value of DE for the underlying $(d_{l},d_{r})$ -regular ensemble. It is not hard to see that both $g(\mathsf{x}_{i-w+1},\dots,\mathsf{x}_{i+w-1})$ as well as $\hat{g}(\mathsf{x}_{i-w+1},\dots,\mathsf{x}_{i+w-1})$ are monotone wrt degradation in all their arguments $\mathsf{x}_{j}$ , $j=i-w+1,\dots,i+w-1$ . More precisely, if we degrade any of the densities $\mathsf{x}_{j}$ , $j=i-w+1,\dots,i+1-1$ , then $g(\cdot)$ (respectively $\hat{g}(\cdot)$ ) is degraded. We say that $g(\cdot)$ (respectively $\hat{g}(\cdot)$ ) is monotone in its arguments. ∎

Fix the parameters $(d_{l},d_{r})$ and $w$ and assume that $d(\mathsf{a}_{i},\mathsf{b}_{i})\leq\kappa$ , $i=-w+1,\dots,w-1$ . Then

Using once again property (v) of Lemma 13

III-C Fixed Points and Admissible Schedules

Consider DE for the $(d_{l},d_{r},L,w)$ ensemble. Let $\mathsf{\underline{x}}=(\mathsf{x}_{-L},\dots,\mathsf{x}_{L})$ . We call $\mathsf{\underline{x}}$ the constellation (of $L$ -densities). We say that $\mathsf{\underline{x}}$ forms a FP of DE with channel $\mathsf{c}$ if $\mathsf{\underline{x}}$ fulfills (12) for $i\in[-L,L]$ . As a short hand we say that $(\mathsf{c},\mathsf{\underline{x}})$ is a FP. We say that $(\mathsf{c},\mathsf{\underline{x}})$ is a non-trivial FP if $\mathsf{x}_{i}\neq\Delta_{+\infty}$ for at least one $i\in[-L,L]$ . Again, for $i\notin[-L,L]$ , $\mathsf{x}_{i}=\Delta_{+\infty}$ . ∎

III-D Entropy, Error and Battacharyya Functionals for Coupled Ensemble

Let $\mathsf{\underline{x}}$ be a constellation. Let $F(\cdot)$ denote either the $\text{H}(\cdot)$ (entropy), $\operatorname{\mathfrak{E}}(\cdot)$ (error probability), or $\operatorname{\mathfrak{B}}(\cdot)$ (Battacharyya) functional defined in Section II-D.

We define the (normalized) entropy , error and Battacharyya functionals of the constellation $\mathsf{\underline{x}}$ to be

III-E BP GEXIT Curve for Coupled Ensemble

We now come to a key object, the BP GEXIT curve for the coupled ensemble. We have discussed how to compute BP GEXIT curves for uncoupled ensembles in Section III-E. For coupled ensembles the procedure is similar.

In Section III-C we have seen that for coupled systems FPs of forward DE are well defined and that they can be computed by applying a parallel schedule. This procedure allows us to compute some FPs.

But we can also use DE at fixed entropy, as discussed in Section II, to compute further FPs (in particular unstable ones). More, precisely, fix the desired average entropy of the constellation, call it ${\tt{h}}$ . Start with the initialization $\mathsf{\underline{x}}^{(0)}=\underline{\Delta}_{0}$ , the vector of all $\Delta_{0}$ . In each iteration proceed as follows. Perform one round of DE without incorporating the channel, i.e., set

Now find a channel $\mathsf{c}_{\sigma}\in\{\mathsf{c}_{\sigma}\}$ , assuming it exists, so that after the convolution with this channel the average entropy of the constellation is equal to ${\tt{h}}$ . Continue this procedure until the constellation has converged (under some suitable metric).

Assume that we have computed (via the above procedure) a complete family $\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}$ of FPs of DE, i.e., a family so that for each ${\tt{h}}\in$ , there exists a parameter $\sigma$ so that ${\tt{h}}=\frac{1}{2L+1}\sum_{i=-L}^{L}\text{H}(\mathsf{x}_{\sigma,i})$ . Then we can derive from it a BP GEXIT curve by projecting it onto

where $\hat{g}(\cdot)$ was introduced in Section III-B, and $\frac{1}{2L+1}\sum_{i=-L}^{L}G(\mathsf{c}_{\sigma},\hat{g}(\mathsf{x}_{\sigma,i\!-\!w\!+\!1},\!\dots\!,\mathsf{x}_{\sigma,i\!+\!w\!-\!1}))$ is the (normalized) GEXIT function of the constellation $\mathsf{\underline{x}}_{\sigma}$ .

Figure 5 shows the result of this numerical computation when transmission takes place over the BAWGNC (left-hand side) and the BSC (right-hand side). Note that the resulting curves look similar to the curves when transmission takes place over the BEC, see . For small values of $L$ the curves are far to the right due to the significant rate loss that is incurred at the boundary. For $L$ around $10$ and above, the BP threshold of each ensemble is close to the area threshold of the underlying $(3,6)$ -regular ensemble, namely $0.4792$ for the BAWGNC and $0.4680$ for the BSC (see the values in Table II). The picture suggests that the threshold saturation effect which was shown analytically to hold for the BEC in also occurs for general BMS channels.

The aim of this paper is to prove rigorously that the situation is indeed as indicated in Figure 5, i.e., that the BP threshold of coupled ensembles is essentially equal to the area threshold of the underlying uncoupled ensemble.

III-F Review for the BEC

Let us briefly recall the main result of which deals with transmission over the BEC. Let $\epsilon^{\text{\tiny BP}}_{\text{BEC}}(d_{l},d_{r},L,w)$ and $\epsilon^{\text{\tiny MAP}}_{\text{BEC}}(d_{l},d_{r},L,w)$ denote the BP threshold and the MAP threshold of the $(d_{l},d_{r},L,w)$ ensemble. Also, let $\epsilon^{\text{\tiny MAP}}_{\text{BEC}}(d_{l},d_{r})$ denote the MAP threshold of the underlying $(d_{l},d_{r})$ -regular LDPC ensemble. Then the main result of states that

Also, (see ) as $d_{l},d_{r}\to\infty$ , with the ratio $d_{l}/d_{r}$ fixed, $\epsilon^{\text{\tiny MAP}}_{\text{BEC}}(d_{l},d_{r})\to d_{l}/d_{r}$ . Thus, with increasing degrees, $(d_{l},d_{r},L,w)$ ensembles under BP decoding achieve the Shannon capacity for the BEC.

III-G First Result

Before we state and prove our main result (namely that coupled codes can achieve capacity also for general BMS channels), let us first quickly discuss a simple argument which shows that spatial coupling of codes does have a non-trivial effect.

First consider the uncoupled case. We have seen in Lemma 11 that when we fix the design rate $1-d_{l}/d_{r}$ and increase the degrees the BP threshold converges to . What happens if we couple such ensembles? We know that for the BEC such ensembles achieve capacity. The next lemma asserts that this implies a non-trivial BP threshold also for general BMS channels.

Consider transmission over an ordered and complete family $\{\mathsf{c}_{\tt{h}}\}$ of BMS channels using a $(d_{l},d_{r},L,w)$ ensemble and BP decoding.

Let ${\tt{h}}^{\text{\tiny BP}}={\tt{h}}^{\text{\tiny BP}}(d_{l},d_{r},L,w,\{\mathsf{c}_{\tt{h}}\})$ denote the corresponding BP threshold and let $\epsilon^{\text{\tiny BP}}=\epsilon^{\text{\tiny BP}}(d_{l},d_{r},L,w)$ denote the corresponding BP threshold for transmission over the BEC. Then

Consider DE of the coupled ensemble (cf. (12)). Applying the Battacharyya functional, we get

where we use the multiplicative property of the Battacharyya functional at the variable node side.

Using the linearity of the Battacharyya functional and extremes of information combining bounds for the check node convolution ([62, Chapter 4]) we get

The preceding set of equations is formally equivalent to the DE equations for the same spatially coupled ensemble and the BEC. Therefore, if $\operatorname{\mathfrak{B}}(\mathsf{c}_{\tt{h}})<\epsilon^{\text{\tiny BP}}(d_{l},d_{r},L,w)$ then the DE recursions, initialized with $\mathsf{c}_{\tt{h}}$ must converge to $\Delta_{+\infty}$ , which implies (13).

Further, from we know that for sufficiently large degrees $(d_{l},d_{r})$ , with their ratio fixed, and with $w$ sufficiently large, $\epsilon^{\text{\tiny BP}}(d_{l},d_{r},L,w)$ approaches $d_{l}/d_{r}$ arbitrarily closely (see the discussion in the preceding section), which proves the final claim. ∎

Let us specialize to the case of transmission over the BSC using $(3,6)$ -regular ensemble. Then we have $\operatorname{\mathfrak{B}}(\mathsf{c})=2\sqrt{p(1-p)}$ . Using the above argument and solving for $\epsilon$ in $2\sqrt{\epsilon(1-\epsilon)}>\frac{1}{2},$ we conclude that by a proper choice of $w$ and $(d_{l},d_{r})$ we can transmit reliably at least up to an error probability of $0.067$ .

Combining the above result with Lemma 4 we conclude that the BP threshold of the coupled ensemble is at least $(d_{l}/d_{r})^{2}-\delta$ . In summary, for general BMS channels and regular ensembles of fixed rate and increasing degrees, their uncoupled BP threshold tends to but their coupled BP threshold is lower bounded by a non-zero value. We conclude that coupling changes the performance in a fundamental way. In the rest of the paper we will strengthen the above result by showing that this non-zero value is in fact the area threshold of the underlying ensemble and as degrees become large, this will tend to the Shannon threshold, $d_{l}/d_{r}$ .

IV Main Results

In the sequel we will impose some restrictions on the parameters. Rather than repeating these restrictions in each statement, we collect them once and for all and give them a name.

Fix the design rate $r$ of the uncoupled system. We say that the parameters $(d_{l},d_{r})$ and $w$ are admissible if the following conditions are fulfilled with $r=1-\frac{d_{l}}{d_{r}}$ :

$d_{r}\geq\sqrt{3}b\ln(b)$ , $b=\frac{6}{\ln(4/3)(1-r)}$ ,

$2(d_{l}-1)(d_{r}-1)(1-c^{2})^{\frac{d_{r}-2}{2}}<1$ , $c=(1-r)(1-d_{r}e^{-4(d_{r}-1)(\frac{2(1-r)}{11e})^{\frac{4}{3}}})-\frac{1}{d_{r}}.$

$w>2(d_{l}-1)(d_{r}-1)(\frac{16\sqrt{2}d_{r}}{\ln(2)d_{l}})^{2}$ ,

$w>2(d_{l}-1)(d_{r}-1)d_{r}^{2}(4(\sqrt{2}+\frac{2}{\ln 2}d_{l}(d_{r}-1)))^{2}$ ,

We say that the ensemble $(d_{l},d_{r},L,w)$ is admissible if the parameters $(d_{l},d_{r})$ and $w$ are admissible. If we are only concerned about the conditions on $(d_{l},d_{r})$ , then we will say that $(d_{l},d_{r})$ is admissible. ∎

Discussion: Conditions (i), (ii) and (iii) are fulfilled if we take the degrees sufficiently large. Conditions (iv), (v), and (vi) can all be fulfilled by picking a sufficiently large connection width $w$ .

Why do we impose these conditions? At several places we use simple extremes of information combining bounds and these bounds are loose and require, for the proof to work, the above conditions. We believe that with sufficient effort these bounds can be tightened and so the restrictions on the degrees can be removed or at least significantly loosened. We leave this as an interesting open problem.

Numerical experiments indicated that for any $3\leq d_{l}\leq d_{r}$ and $w\geq 2$ the threshold saturation phenomenon happens, with a “wiggle-size” which vanishes exponentially in $w$ .

Note that the above bounds imply the following bounds which we will need at various places:

$d_{r}\geq\frac{1}{1-r}(1+\frac{2}{\ln(4/3)}\ln(2(d_{r}-1)^{3}))$ ,

$d_{r}\geq 1+5(\frac{1}{1-r})^{\frac{4}{3}}$ .

Instead of condition (iii) above we can impose the stronger but somewhat easier to check condition $\bar{{\tt{h}}}\leq(1-r)(1-d_{r}e^{-4(d_{r}-1)(\frac{2(1-r)}{11e})^{\frac{4}{3}}})-\frac{1}{d_{r}}$ , where $\bar{{\tt{h}}}$ is the upper bound stated in Lemma 19, or even further strengthen the condition to $\frac{e^{\frac{1}{4}}\sqrt{2}}{(d_{r}-2)^{\frac{1}{4}}}\leq(1-r)(1-d_{r}e^{-4(d_{r}-1)(\frac{2(1-r)}{11e})^{\frac{4}{3}}})-\frac{1}{d_{r}}$ . The last condition can be easily checked to be satisfied for sufficiently large degrees.

IV-B Main Result

Consider transmission over a complete, smooth, and ordered family of BMS channels, denote it by $\{\mathsf{c}_{\tt{h}}\}$ , using the admissible ensemble $(d_{l},d_{r},L,w)$ . Let ${\tt{h}}^{\text{\tiny BP}}(d_{l},d_{r},L,w,\{\mathsf{c}_{\tt{h}}\})$ and ${\tt{h}}^{\text{\tiny MAP}}(d_{l},d_{r},L,w,\{\mathsf{c}_{\tt{h}}\})$ denote the corresponding BP and MAP threshold. Further, let $R(d_{l},d_{r},L,w)$ denote the design rate of this ensemble and set $r=1-d_{l}/d_{r}$ . Finally, let ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ denote the area threshold of the underlying $(d_{l},d_{r})$ -regular ensemble and the given channel family. Then

where $f(d_{l},d_{r},w)=8(d_{r}-1)^{3}\big{(}\sqrt{2}+\frac{2}{\ln 2}d_{l}(d_{r}-1)\big{)}\sqrt{\frac{2(d_{l}-1)(d_{r}-1)}{w}}$ . Note that $f(d_{l},d_{r},w)$ depends only on the dd $(d_{l},d_{r})$ and $w$ but is universal wrt the channel family $\{\mathsf{c}_{\tt{h}}\}$ . Furthermore,

The bound ${\tt{h}}^{\text{\tiny BP}}\leq{\tt{h}}^{\text{\tiny MAP}}$ is trivial and only listed for completeness. Consider the upper bound on ${\tt{h}}^{\text{\tiny MAP}}$ stated in (17). Start with the circular ensemble stated in Definition 31. The original ensemble is recovered by setting the $w-1$ consecutive positions in $[L,L+w-1]$ to . Define $K=2L+w$ . We first provide a lower bound on the conditional entropy for the circular ensemble when transmitting over a BMS channel with entropy ${\tt{h}}$ . We then show that setting $w-1$ sections to does not significantly decrease this entropy. Overall this gives an upper bound on the MAP threshold of the coupled ensemble in terms of the area threshold of the underlying ensemble.

It is not hard to see that the BP GEXIT curve is the same for both the $(d_{l},d_{r})$ -regular ensemble and the circular ensemble (when all sections have the standard channel). Indeed, forward DE (see Definition 35) converges to the same FP for both ensembles. Consider the circular ensemble and let ${\tt{h}}\in({\tt{h}}^{A},1]$ . The conditional entropy when transmitting over the BMS channel with entropy ${\tt{h}}$ is at least equal to $1-d_{l}/d_{r}$ minus the area under the BP EXIT curve of $[{\tt{h}},1]$ (see Theorem 3.120 in ). Indeed, from the proof of Theorem 4.172 in , we have

Here, the entropy is normalized by $n=KM$ , where $K$ is the length of the circular ensemble and $M$ denotes the number of variable nodes per section. Assume that we set $w-1$ consecutive sections of the circular ensemble to in order to recover the original ensemble. As a consequence, we “remove” an entropy (degrees of freedom) of at most $(w-1)/K$ from the circular system. The remaining entropy is therefore positive (and hence we are above the MAP threshold of the coupled ensemble) as long as $1-d_{l}/d_{r}-(w-1)/K-G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}}^{1})>0$ . From Lemmas 26 and 29 we have $G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}^{A}}^{1})=1-d_{l}/d_{r}$ , so that the condition becomes $G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}^{A}}^{1})-G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}}^{1})<(w-1)/K$ . For all channels with ${\tt{h}}\geq{\tt{h}}^{A}$ we have $G(\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}})\geq\frac{1}{2(d_{r}-1)^{3}}$ . For a derivation of this statement we refer the reader to the proof of part (vi) of Theorem 47. This implies that $G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}^{A}}^{{\tt{h}}})\geq({\tt{h}}-{\tt{h}}^{A})/(2(d_{r}-1)^{3})$ . Furthermore, $G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}}^{1})\leq G(\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}^{A}}^{1})$ . This follows from the definition of area threshold, which implies that for ${\tt{h}}>{\tt{h}}^{A}$ , $A(\mathsf{x}_{{\tt{h}}},d_{l},d_{r})>0$ (cf. Lemma 26) and then combining with Lemma 26. Putting things together we get

We get the stated condition on ${\tt{h}}^{\text{\tiny MAP}}$ by lower bounding $K$ by $2L$ .

The lower bound on ${\tt{h}}^{\text{\tiny BP}}(d_{l},d_{r},L,w,\{\mathsf{c}_{{\tt{h}}}\})$ expressed in (16) is the main result of this paper. It shows that, up to a term which tends to zero when $w$ tends to infinity, the BP threshold of the coupled ensemble is at least as large as the area threshold of the underlying ensemble.

Empirical evidence suggests that the convergence speed wrt $w$ is exponential. Our bound only guarantees a convergence speed of order $\sqrt{1/w}$ .

Let us summarize. In order to prove Theorem 41 we “only” have to prove the lower bound on ${\tt{h}}^{\text{\tiny BP}}$ . Not surprisingly, this is also the most difficult to accomplish. The remainder of this paper is dedicated to this task.

IV-C Extensions

In Theorem 41 we start with a smooth, complete and ordered channel family. But it is straightforward to convert this theorem and to apply it directly to single channels or to a collection of channels. The next statement makes this precise.

The $(d_{l},d_{r},L,w)$ ensemble is universally capacity achieving for the class of BMS channels. More precisely, assume we are given $\epsilon>0$ and a target rate $R$ . Let ${\mathcal{C}}(R)$ denote the set of BMS channels of capacity at least $R$ . To each $\mathsf{c}\in{\mathcal{C}}(R)$ associate the family $\{\mathsf{c}_{\tt{h}}\}_{{\tt{h}}=0}^{1}$ , by defining

Then there exists a set of parameters $(d_{l},d_{r},L,w)$ so that

Since for each $\mathsf{c}\in{\mathcal{C}}(R)$ the associated family $\{\mathsf{c}_{\tt{h}}\}_{{\tt{h}}=0}^{1}$ is ordered by degradation, this implies that we can transmit with this ensemble reliably over each of the channels in ${\mathcal{C}}(R)$ at a rate of at least $R-4\epsilon$ , i.e., arbitrarily close to the Shannon limit.

Fix the ratio of the degrees so that $R-3\epsilon\leq 1-d_{l}/d_{r}\leq R-2\epsilon$ . Note that for each $\mathsf{c}\in{\mathcal{C}}(R)$ the constructed family $\{\mathsf{c}_{\tt{h}}\}$ is piece-wise smooth, ordered and complete. By applying Theorem 41 to each such channel family we conclude that for admissible parameters (i.e., as long as we choose the degrees and the connection width sufficiently large) the threshold of the ensemble $(d_{l},d_{r},w,L)$ for the given channel family is at least ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})-f(d_{l},d_{r},w)$ , where ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ is the area threshold and $f(d_{l},d_{r},w)$ is a universal quantity, i.e., a quantity which does not depend on the channel family and which converges to when $w$ tends to infinity. Further, we know from Lemma 29 that the area threshold ${\tt{h}}^{A}(d_{l},d_{r},\{\mathsf{c}_{\tt{h}}\})$ approaches the Shannon threshold uniformly over all BMS channels for increasing degrees. By our choice of $(d_{l},d_{r})$ the Shannon threshold is $1-(1-d_{l}/d_{r})\geq 1-R+2\epsilon$ . Therefore, by first choosing sufficiently large degrees $(d_{l},d_{r})$ , and then a sufficiently large connection width $w$ , we can ensure that the BP threshold is at least $1-R+\epsilon$ . Finally, by choosing the constellation length $L$ sufficiently large, we can ensure that the rate loss we incur with respect to the design rate the underlying ensemble is sufficiently small so that the design rate of the coupled ensemble is at least $R-4\epsilon$ . ∎

Assume we are given $\epsilon>0$ and a target rate $R$ . Let ${\mathcal{C}}(R)$ denote the set of BMS channels of capacity at least $R$ . Then there exists a set of parameters $(d_{l},d_{r},L,w)$ of rate at least $R-5\epsilon$ with the following property. Let $C(n)$ be an element of $(d_{l},d_{r},L,w)$ with blocklength $n$ , where we assume that $n$ only goes over the subsequence of admissible values. Then

In words, almost all codes in $(d_{l},d_{r},L,w)$ of sufficient length are good for all channels in ${\mathcal{C}}(R)$ .

Note that according to (iv) in Lemma 13 the space of $|D|$ distributions endowed with the Wasserstein metric is compact, and hence so is ${\mathcal{C}}(R)$ . Hence there exists a finite set of channels, denote it by $\{\mathsf{c}_{i}\}_{i=1}^{I(\delta)}$ , so that each channel in ${\mathcal{C}}(R)$ is within a (Wasserstein) distance at most $\delta$ from the set $\{\mathsf{c}_{i}\}$ . We will fix the value of $\delta$ shortly.

Let us modify the set $\{\mathsf{c}_{i}\}$ so that ${\mathcal{C}}(R)$ is not only close to $\{\mathsf{c}_{i}\}$ but is also “dominated” by it. For each $\mathsf{c}\in\{\mathsf{c}_{i}\}$ , define

IV-D Proof of Main Result – Theorem 41

We start by proving some basic properties which any spatial FP has to fulfill. Since we are considering a symmetric ensemble (in terms of the spatial arrangement) it will be useful to consider “one-sided” FPs.

We say that $\mathsf{\underline{x}}$ is a one-sided FP (of DE) with channel $\mathsf{c}$ if (12) is fulfilled for $i\in[-N,0]$ with $\mathsf{x}_{i}=\Delta_{+\infty}$ for $i<-N$ . We say that the FP has a free boundary condition if $\mathsf{x}_{i}=\mathsf{x}_{0}$ for $i>0$ . We say that it has a forced boundary condition if $\mathsf{x}_{i}=\Delta_{0}$ for $i>0$ . Lastly, we say that it has an increasing boundary condition if $\mathsf{x}_{i-1}\prec\mathsf{x}_{i}$ for $i>0$ , where $\mathsf{x}_{i}$ , for $i\geq 1$ , are fixed but arbitrary symmetric densities. ∎

We say that $\mathsf{\underline{x}}$ is non-decreasing if $\mathsf{x}_{i}\prec\mathsf{x}_{i+1}$ for $i=-N,\dots,-1$ . Let $(\mathsf{c},\mathsf{\underline{x}})$ be a non-trivial and non-decreasing one-sided FP (with any boundary condition). As a short hand, we then say that $(\mathsf{c},\mathsf{\underline{x}})$ is a proper one-sided FP. Figure 6 shows an example. ∎

Similar to Definition 35, one can define one-sided forward DE by initializing all sections with $\Delta_{0}$ and by applying DE according to an admissible schedule. ∎

There are two key ingredients of the proof. The first ingredient is to show that any one-sided spatial FP which is increasing, “small” on the left, and “not too small” and “flat” on the right must have a channel parameter very close to the area threshold ${\tt{h}}^{\small A}$ . This is made precise in (the Saturation) Theorem 47.

The second key ingredient is to show the existence of a such a one-sided FP $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ . Figure IV-D shows a typical (two-sided) such example. This is accomplished in (the Existence) Theorem 48.

Fix $r\in(0,1)$ and let $(d_{l},d_{r},w)$ be admissible, with $r=1-\frac{d_{l}}{d_{r}}$ , in the sense of conditions (ii), (iii), (v), (vi), (vii) and (viii) of Definition 40. Let $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ be a proper one-sided FP on $[-N,0]$ , with forced boundary condition, so that for some $\delta>0$ , $2(w-1)\leq L$ , and $L+w\leq K\leq N$ the following conditions hold.

Constellation is close to $\Delta_{+\infty}$ “on the left”:

Constellation is not too small “on the right”:

Here $c(d_{l},d_{r},\delta,w,K,L)$ is a function which can be made arbitrarily small by choosing $\delta$ sufficiently small, $w$ sufficiently large, and $L$ and $K$ sufficiently large compared to $w$ . (This implies of course that the constellation length $N$ is also chosen sufficiently large.) More precisely,

Fix $r\in(0,1)$ and let $(d_{l},d_{r},w)$ be admissible in the sense of conditions (i), (ii), (iii), (iv), (v), (vi) in Definition 40 with $r=1-\frac{d_{l}}{d_{r}}$ . Let $\{\mathsf{c}_{\sigma}\}_{\sigma=0}^{1}$ be a smooth, ordered and complete channel family.

In the sequel, $N(d_{l},d_{r},w)$ is a positive constant which depends on the ensemble but not the channel or the channel family and $c(d_{l},d_{r})$ is a positive constant which depends on $d_{l}$ and $d_{r}$ , but not on the channel $\mathsf{c}$ , the channel family, $N$ or $w$ .

For any $N>N(d_{l},d_{r},w)$ and $0<\delta<\frac{x_{\text{u}}(1)}{4}$ , there exists a proper one-sided FP $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ on $[-N,0]$ with parameters $(d_{l},d_{r},w)$ and with forced boundary condition so that the following conditions are fulfilled:

Constellation is close to $\Delta_{+\infty}$ “on the left”: Let

Then $\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{i})\leq\delta$ for $i\in[-N,-N+N_{1}-1]$ .

Constellation is not too small “on the right”: Let

Then $\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{i})\geq x_{\text{u}}(1)$ for $i\in[-N_{2},0]$ .

Maxwell Conjecture: As a byproduct of our proof, we know that the MAP threshold of coupled ensembles is essentially equal to the area threshold of the uncoupled ensemble. In addition we know that the MAP threshold of the uncoupled ensemble is also upper bounded by the area threshold. The Maxwell conjecture states that in fact the MAP threshold of the uncoupled ensemble is equal to the area threshold. So if one can establish that the MAP threshold of the uncoupled ensemble is at least as large as the MAP threshold of the coupled ensemble, then the Maxwell conjectured would be proved. A natural approach to resolve this issue is to use interpolation techniques and it is likely that the Maxwell conjecture can be proved in a way similar as this was done in for other graphical models.

Convergence Speed: As discussed previously, we only give weak bounds on the speed of convergence of the ensemble to the Shannon capacity (as a function of the degrees, the constellation length $L$ , as well as the coupling width $w$ ). Numerical evidence suggests much stronger results. Settling the question of the actual convergence speed is both challenging and interesting.

Lifting of Restrictions: Our results apply only to sufficiently large degrees whereas numerical calculations indicate that the threshold saturation effect equally shows up for small degrees. This is a consequence of the fact that at many places we have used simple extremes of information combining bounds. With sufficient effort it is likely that one can extend the proof to many dds which are currently not covered by our statement.

General Ensembles: In a similar vein, we restricted our investigation to regular ensembles to keep things simple, but the same technique applies in principle also to irregular or even structured ensembles. Again, depending on the structure of the underlying ensemble, much effort might be required to derived the necessary bounds.

Wiggle Size: Perhaps the weakest link in our derivation is the treatment of the connection width $w$ . In our current statements this connection width has to be chosen large. Empirically, small such lengths, such as the extreme case $w=2$ give already excellent results and by increasing $w$ the convergence to the area threshold seems to happen exponentially fast. How to derive practically relevant bounds for such small values of $w$ is an important open problem.

Scaling: More generally, from a practical point of view, what is needed is a firm understanding of how the performance of such codes scale in each of the parameters in $d_{l}$ , $d_{r}$ , $L$ , $M$ , as well as $w$ . Only then will it be possible to design codes in a principled fashion.

Practical Issues: Further important topics are, the design of good termination schemes which mitigate the rate-loss, a systematic investigation of how structure in the interconnection pattern as well as the codes influences the performance, and how to optimally choose the scheduling (e.g., windowed decoding) to control the complexity of the decoder .

General Models: As was discussed briefly in the introduction, the threshold saturation phenomenon has been empirically found to hold in a large variety of systems. This suggests that one should be able to formulate a rather general theory rather than finding a separate proof for each of these cases. For all one-dimensional systems this has recently been accomplished in . For higher-dimensional or infinite-dimensional systems this is a challenging open problem.

We would like to thank H. Hassani, S. Korada, N. Macris, C. Méasson, and A. Montanari for interesting discussions on this topic and H. Hassani for his feedback on an early draft. S. Kudekar would like to thank Misha Chertkov, Cyril Méasson, Jason Johnson, René Pfitzner and Venkat Chandrasekaran for their encouragement and Bob Ecke for hosting him in the Center for Nonlinear Studies, Los Alamos National Laboratory (LANL), where most of his work was done. He also gratefully acknowledges his support from the U.S. Department of Energy at Los Alamos National Laboratory under Contract No. DE-AC52-06NA25396 as well as from NMC via the NSF collaborative grant CCF-0829945 on “Harnessing Statistical Physics for Computing and Communications.” The work of R. Urbanke was supported by the European project STAMINA, 265496.

Let $h_{2}(x)=-x\log_{2}(x)-(1-x)\log_{2}(1-x)$ . Then for $x\in[0,1/2]$ ,

Consider now (25). Set $g(z)=2\sqrt{(1-x)x}-h_{2}(x)\,|\,_{x=(1-z)/2}=\sqrt{1-z^{2}}-h_{2}(\frac{1-z}{2})$ . We want to show that $g(z)\geq 0$ for $z\in$ . We have

The following claims are straightforward to verify using the explicit formulae for $g(z)$ , $g^{\prime}(z)$ , and $g^{\prime\prime}(z)$ : (i) $g(0)=g(1)=0$ , (ii) $g^{\prime}(0)=0$ , (iii) $g^{\prime\prime}(0)>0$ , (iv) $g^{\prime\prime}(z)=0$ has exactly one solution in $$.

Suppose there exists a $w$ , $0<w<1$ , so that $g(w)<0$ . Then from (i), (ii) and (iii) we must have $g(z)=0$ for at least three distinct elements of $ $. Rolle’s theorem then implies that$ g^{\prime}(z)=0 $has at least two distinct solutions in$ (0,1) $and hence at least three distinct solutions in$ $(since by (i)$ g^{\prime}(0)=0 $). Using Rolle’s theorem again, this implies that$ g^{\prime\prime}(z)=0 $has at least two solutions in$ $, contradiction (iv).

We prove (26) along similar lines. Consider $g(x)=\frac{11}{4}x^{\frac{3}{4}}-a\frac{x}{\ln 2}+x\log_{2}(x)$ , where $a=4(1+\ln(\frac{11\ln(2)}{16}))\approx 1.035>1$ . Note that $g(x)\leq\frac{11}{4}x^{\frac{3}{4}}-h_{2}(x)$ for $x\in[0,\frac{1}{2}]$ (to verify this, upper bound the term $-(1-x)\log_{2}(1-x)$ of the entropy function by $x/\ln(2)$ ). So if we can prove that $g(x)\geq 0$ for $x\in[0,\frac{1}{2}]$ then we are done.

Direct inspections of the quantities shows that $g(0)=0$ , $g^{\prime}(0+)=+\infty$ , $g(x^{*})=g^{\prime}(x^{*})=0$ , where $x^{*}=\frac{14641\log(2)^{4}}{65536}\approx 0.05157$ , and $g(\frac{1}{2})>0$ .

It follows that if there exists an $x\in[0,\frac{1}{2}]$ so that $g(x)<0$ then $g(x)$ must have at least $4$ roots in this range, therefore by Rolle $g^{\prime}(x)$ must have at least $3$ roots, and again by Rolle $g^{\prime\prime}(x)$ must have at least $2$ roots. But an explicit check shows that $g^{\prime\prime}(x)=-\frac{33}{64x^{\frac{5}{4}}}+\frac{1}{x\ln(2)}=0$ . So $g^{\prime\prime}(x)=0$ can only have a single solution. ∎

Proof of Lemma 4: Let $|\mathfrak{{a}}|$ denote the density in the $|D|$ -domain. Then

This proves that $\operatorname{\mathfrak{B}}(|\mathfrak{{a}}|)^{2}$ lower bounds $\text{H}(|\mathfrak{{a}}|)$ . For the upper bound we have

Appendix B Upper Bound on BP Threshold – Lemma 11

We use ideas from extremes of information combining. We get an upper bound on the BP threshold by assuming that the densities at check nodes are from the BSC family and that densities at variable nodes are from the BEC family.

Let $x$ represent the entropy of the variable-to-check message and let $c$ denote the entropy of the channel. If for any $x\in[0,c]$

then DE will not converge to the perfect decoding FP. The left-hand side represents the minimum entropy at the output of a check node which we can get if the input entropy is $x$ (and this minimum is achieved if the input density is from the BSC family). The right-hand side represents the maximum input entropy which we can have at the input of a variable node if we want an output entropy equal to $x$ (and this minimum is achieved if the input density is from the BEC family). Note that we can extend the inequality (27) to all $x\in$ without changing the condition since for $x\in(c,1]$ , the right hand side is strictly bigger than $1$ , whereas the left-hand side is always bounded above by $1$ .

The preceding condition is equivalent to saying that in order for DE to succeed, we must have

for all $x\in$ . We can also write this as

We want to show that $c$ cannot be too large, i.e., we are looking for an upper bound on $c$ . Note that any value of $x$ gives a bound. Let us choose $x=\frac{1}{2\sqrt{d_{r}-1}}$ . This gives the bound

To obtain the above inequality we first write $(1-2x)^{d_{r}-1}$ as $\text{exp}((d_{r}-1)\log(1-2x))$ . For $x\in[0,\frac{1}{2}]$ we use the Taylor expansion

Thus $\text{exp}((d_{r}-1)\log(1-2x))\leq\text{exp}(-\sqrt{d_{r}-1})$ and $h_{2}((1-(1-2x)^{d_{r}-1})/2)\geq h_{2}(\frac{1-e^{-\sqrt{d_{r}-1}}}{2})$ . We want to simplify the expression even further. Using [110, Lemma II.1] and bringing out the first term in the summation,

Substituting $x=(1-e^{-\sqrt{d_{r}-1}})/2$ we have

Appendix C Basic Properties of the Wasserstein Metric – Lemma 13

Alternative Definitions: The equivalence of the basic definition (cf. Definition 12) and the first alternative description is shown in (6.2) and (6.3) in . The equivalence of the first and second alternative descriptions is shown in .

Boundedness: Follows directly from either of the two alternative descriptions.

Metrizable and Weak Convergence: See [104, Theorem 6.9].

Let $\mathfrak{{d}}=\mathfrak{{a}}\circledast\mathfrak{{c}}$ and $\mathfrak{{e}}=\mathfrak{{b}}\circledast\mathfrak{{c}}$ be the $D$ -domain representation. Thus $d(\mathfrak{{d}},\mathfrak{{e}})$ is characterized by

In step (i) we use the construction of $f(z)$ along with the relation between $D$ and $|D|$ domains given by (29). We defined $g(x,y)=\tanh(\tanh^{-1}(x)+\tanh^{-1}(y))=\frac{x+y}{1+xy}$ and step (ii) follows by explicitly writing the variable node convolution in the $D$ -domain. In step (iii) we defined

To obtain this equivalent formulation of the integral in step (iii) we make use of the symmetry conditions of $D$ -densities and the implied relationship between $D$ and $|D|$ densities for $y\in$ ,

We claim that $h(x,y)$ is $\text{Lip}(2)$ (as a function $x$ ). This will settle the proof of the lemma. Notice that $h(x,y)$ is a linear combination of four functions. Let us consider a generic term. Writing $g(\cdot,\cdot)$ explicitly, we have

Now we sum over all possible $i,j$ and divide by 4 to get

Let us consider the other term. We split the sum into two parts, one sum over $ij>0$ and the other over $ij<0$ . We have

Adding the two we get the total contribution

To get a good bound on $d(\mathsf{a}^{\circledast i}\circledast\mathsf{c},\mathsf{b}^{\circledast i}\circledast\mathsf{c})$ in terms of $d(\mathsf{a},\mathsf{b})$ for $i\geq 2$ consider

and note that the Wasserstein metric can be expressed directly in the L-domain as

Applying this representation we observe that

Regularity wrt $\boxast$ : Let $f(x)$ be $\text{Lip}(1)$ . Let $\mathfrak{{d}}=\mathfrak{{a}}\boxast\mathfrak{{c}}$ and $\mathfrak{{e}}=\mathfrak{{b}}\boxast\mathfrak{{c}}$ be the $D$ -domain representation.

where step (a) follows since in the $|D|$ -domain, check-node convolution corresponds to a multiplication of the values.

But note that if $f(x)$ is $\text{Lip}(1)$ then $f(xy)$ is $\text{Lip}(|y|)$ . Hence,

Above, the relation between the Battacharyya and error parameters can be obtained via extremes of information combining (see ). Let us focus on the last part. To get a good bound on $d(\mathfrak{{a}}^{\boxast i},\mathfrak{{b}}^{\boxast i})$ in terms of $d(\mathfrak{{a}},\mathfrak{{b}})$ for $i\geq 2$ , consider

and note that the Wasserstein metric can be expressed directly in the D-domain as

Applying this representation, we observe that

Regularity wrt DE: Follows from properties (vi) and (vii).

Wasserstein Bounds Battacharyya and Entropy: Let $g$ be a positive function on $ $and let$ f $be a$ C^{2} $concave decreasing function on$ . $Then, for any$ c\geq|g|_{\infty},$

Before proving the inequality let us use it to establish the stated bounds. Set $g(z)=||\mathfrak{{B}}|(z)-|\mathfrak{{A}}|(z)|.$ Then $|g|_{\infty}\leq 1$ and $\int_{0}^{1}g(z)\text{d}z=d(\mathfrak{{a}},\mathfrak{{b}}).$ Now, for the Battacharyya bound let $f(z)=\sqrt{1-z^{2}}$ and note

For the entropy case we set $f(z)=h_{2}(\frac{1-z}{2}).$ The same argument as above yields

We prove the stated inequality. Let us define

where $c\geq|g|_{\infty}.$ For each $z\in$ we have $\int_{0}^{1}(g(z)-\hat{g}(z))\text{d}z\geq 0$ with equality at $z=1.$ Hence,

Battacharyya Sometimes Bounds Wasserstein: Since the cumulative $|D|$ -distribution of $\Delta_{0}$ is equal to $1$ on $$, the maximum possible value, we have

Similarly, since the cumulative $|D|$ -distribution of $\Delta_{1}$ is on $[0,1),$ we have

Appendix D Wasserstein Metric and Degradation – Lemma 14

Wasserstein versus Degradation: Let $f$ be a function of bounded total variation on $.$ (This implies that $f$ has left and right limits.) Note that we include $|f(0-)|$ and $|f(1+)|$ in the definition of total variation, which we denote by $\int_{0}^{1}|f^{\prime}(x)|\text{d}x.$ Define $F(x)=\int_{0}^{x}f(z)\text{d}z.$ We claim that if $F\geq 0$ then

This claim implies statement (i) by setting $f(z)=(|\mathfrak{{B}}|(1-z)-|\mathfrak{{A}}|(1-z))$ and noting that, in this case, $\int_{0}^{1}|f^{\prime}(z)|\text{d}z\leq 2.$

We now prove the claim. Let $S$ be the set of points $x$ in $,$ including the endpoints, where $f(x-)f(x+)\leq 0.$ Note that $S$ is closed and we may assume $f=0$ on $S.$ The complement of $S$ is a collection of disjoint open intervals such that $f$ is either strictly positive or strictly negative in each interval. Consider the subset of intervals on which $f$ is strictly negative. Without loss of generality we may take this collection to be finite. Indeed, suppose there are countably infinitely many such intervals $J_{1},J_{2},...$ Define an approximation $f_{k}$ by setting $f_{k}(x)=-f(x)$ for $x\in\cup_{i=k+1}^{\infty}J_{i}$ and $f_{k}(x)=f(x)$ otherwise. Then $F_{k}(x)=\int_{0}^{x}f_{k}(z)\text{d}z\geq F(x)\geq 0$ and $F_{k}\rightarrow F$ uniformly. Furthermore, $\int_{0}^{1}|f_{k}(x)|=\int_{0}^{1}|f(x)|$ and $\int_{0}^{1}|f^{\prime}_{k}(x)|$ converges to $\int_{0}^{1}|f^{\prime}(x)|$ from below.

By taking unions of intervals as necessary we can find an increasing sequence $0=x_{1},x_{2},...,x_{2k},x_{2k+1}=1$ such that on $I_{i}=[x_{i},x_{i+1}]$ we have $f\geq 0$ for $i$ odd and $f\leq 0$ for $i$ even. The sequence of points $x_{i}$ is strictly increasing except possibly for the last pair which may coincide at $1$ . Define

where $w_{i}=0$ if $h_{i}=0.$ Note that $w_{i}\leq|I_{i}|.$ We have

The desired result then follows from Jensen’s inequality

It is straightforward to show that for odd $i$ we have

where $\bar{x}=1-x.$ Indeed, for odd $i$ we have $\int_{x_{i}}^{z}(f(x)-h_{i}1_{\{x\geq x_{i+1}-w_{i}\}})\text{d}x\geq 0$ for all $z\in[x_{i},x_{i+1}]$ with equality at $z=x_{i+1}.$ Hence

The argument for even $i$ is similar. We obtain

Defining $\bar{x}_{2k+2}=0$ for notational convenience, we can write

Entropy and Battacharyya Bound Wasserstein: Let us first focus on the inequality between the Wasserstein distance and the Battacharyya parameter. From point (i) we know that

For the final inequality first note that $g(z)=(1-z^{2})^{-1}\Bigl{(}\int_{z}^{1}(|\mathfrak{{B}}|(x)-|\mathfrak{{A}}|(x))\text{d}x\Bigr{)}\leq 1\,.$ Let $v=\int_{0}^{1}g(z)\text{d}z=(\ln 2){(\text{H}(\mathsf{b})-\text{H}(\mathsf{a}))}\,.$ It follows that

Continuity for Ordered Families: Assume that $\mathsf{a}\prec\mathsf{b}$ . From point (ii) we know that

and the continuity follows from the continuity of the Battacharyya parameter for smooth channel families.

Consider two $L$ -densities $\mathsf{a}_{1}\prec\mathsf{a}_{2}$ . Then, for any degree distribution $\rho(\cdot)$ ,

Let $\mathsf{a}$ be a density and let $U$ be distributed according to the corresponding $|D|$ -distribution. By Jensen’s inequality we have

where $\alpha_{k}$ is positive for each $k$ . The functionals $m_{\mathsf{a},k}$ have the important (Fourier) property $m_{\rho(\mathsf{a}),k}=\rho(m_{\mathsf{a},k})$ .We introduced here only the even moments, since only these are needed. The odd moments are multiplicative as well. Since $u^{k}$ is convex and increasing for $k\geq 1,$ we have $m_{\mathsf{a}_{1},k}\geq m_{\mathsf{a}_{2},k}$ . Hence,

Consider two $L$ -densities $\mathsf{a}_{1}\prec\mathsf{a}_{2}$ . Let $0\leq{\tt{h}}_{1}\leq{\tt{h}}_{2}\leq 1$ and let $\mathsf{c}_{{\tt{h}}_{1}}$ and $\mathsf{c}_{{\tt{h}}_{2}}$ denote the two corresponding channels from an ordered family $\{\mathsf{c}_{\tt{h}}\}$ . Set $B_{{\tt{h}}_{i}}=\operatorname{\mathfrak{B}}(\mathsf{c}_{{\tt{h}}_{i}}\}$ for $i=1,2$ . Then, for any dd pair $(\lambda,\rho)$

where $\alpha=B_{{\tt{h}}_{1}}\lambda^{\prime}(1)\rho^{\prime}(1-\operatorname{\mathfrak{B}}^{2}({\mathsf{a}_{1}}))$ .

First, since $\operatorname{\mathfrak{B}}(\mathsf{a}\circledast\mathsf{b})=\operatorname{\mathfrak{B}}(\mathsf{a})\operatorname{\mathfrak{B}}(\mathsf{b})$ , $\operatorname{\mathfrak{B}}(T_{{\tt{h}}}(\mathsf{a}))=B_{{\tt{h}}}\lambda(\operatorname{\mathfrak{B}}(\rho(\mathsf{a})))$ . Second, since $0\leq\lambda(x)\leq 1$ and $\lambda^{\prime}(x)\leq\lambda^{\prime}(1)$ , $|\lambda(x_{1})-\lambda(x_{2})|\leq\lambda^{\prime}(1)|x_{1}-x_{2}|$ for all $x_{1},x_{2}\in$ . This implies that $|\operatorname{\mathfrak{B}}(T_{{\tt{h}}_{1}}(\mathsf{a}_{1}))\!-\!\operatorname{\mathfrak{B}}(T_{{\tt{h}}_{1}}(\mathsf{a}_{2}))|$ is upper bounded by $\lambda^{\prime}(1)B_{{\tt{h}}_{1}}|\operatorname{\mathfrak{B}}(\rho(\mathsf{a}_{1}))-\operatorname{\mathfrak{B}}(\rho(\mathsf{a}_{2}))|$ . Using the triangle inequality, we get

The first term above can be bounded using Lemma 50. ∎

Assume on the other hand that $\mathsf{x}_{\tt{h}}$ satisfies (9) and that there exists a distinct FP for the same channel, necessarily upgraded with respect to $\mathsf{x}_{{\tt{h}}}$ , also satisfying (9). Call this density $\mathsf{x}_{{\tt{h}}}^{\prime}$ . In this case,

a contradiction since $\delta>0$ . The above argument shows that there can be at most one FP with this property and that this FP must be the forward DE one.

Note that $g(1)=1$ and that $g(\beta)$ is continuous.

The induction is anchored by noting that $1=\operatorname{\mathfrak{B}}(\Delta_{0})\geq\beta$ since we assumed that $\beta\in$ . In summary, for each $\beta\in(0,1]$ , equation (34) gives us the lower bound $\operatorname{\mathfrak{B}}(\mathsf{x})\geq\beta$ for the FP $\mathsf{x}$ of forward DE with the channel $\operatorname{\mathfrak{B}}(\mathsf{c})=g(\beta)$ . Another way of interpreting (34) is that it gives us an upper bound on $\operatorname{\mathfrak{B}}(\mathsf{c})$ if we fix $\operatorname{\mathfrak{B}}(\mathsf{x})=\beta$ .

According to Lemma 17, the GEXIT curve is Lipschitz continuous (in the Battacharyya parameter) at the FP $(\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}})$ if

Note that (34) as well as (35) (if we interpret the inequality as an equality) give rise to curves in the $(\operatorname{\mathfrak{B}}(\mathsf{c}),\operatorname{\mathfrak{B}}(\mathsf{x}))$ space. Inserting (34) into (35) gives us the points where these two curves cross. If we set $\sqrt{x}=\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}})$ , massage the resulting expression, and set it to , we get (11). As shown in the subsequent Lemma 52, (11) has a unique positive solution in $(0,1]$ (i.e., the two curves only cross once), $b(x)<a(x)$ after this solution, and $g(\beta)$ is an increasing function above this solution. The situation is shown in Figure 8.

Set $L=d_{l}-1$ and $R=d_{r}-1$ , multiply the equation by $1/L^{2}$ and set $y=(1-x)^{R}$ . This gives the equivalent equation $A(y)=B(y)$ , where $A(y)=(1-y)^{L}/L^{2}$ , and $B(y)=R^{2}(y^{2-\frac{2}{R}}-y^{2-\frac{1}{R}})$ .

The function $A(y)$ is (i) decreasing and convex for $L\geq 2$ , (ii) $A(0)=1/L^{2}>0$ , (iii) $A(1)=0$ . The function $B(y)$ is (i) increasing for $y\in[0,y_{1}=(\frac{2R-2}{2R-1})^{R}]$ , (ii) decreasing for $y\in[y_{1},1]$ , (iii) concave for $y\in[y_{2}=(\frac{(2R-2)(R-2)}{(2R-1)(R-1)})^{R},1]$ , and (iv) $B(0)=B(1)=0$ . Note that $0\leq y_{2}<y_{1}$ since we assumed that $R\geq 2$ .

In the region, $y\in[y_{1},1)$ there can be no further solution since $A(y_{1})<B(y_{1})$ , $A(1)=B(1)=0$ , and $A(y)$ is convex whereas $B(y)$ is concave.

and $p(x)\geq 0$ for $x\in$ . The factor $q(x)$ can be written as $y^{d_{r}-1}(d_{l}d_{r}-d_{l}-d_{r})-y^{d_{r}-2}((d_{l}d_{r}-d_{l}-d_{r}+1)+1)$ , where $y=1-x$ . This polynomial has two sign changes and hence by Descarte’s rule of signs at most two positive roots. It follows that $q(x)$ has at most $2$ roots for $x\leq 1$ . Since $q(0)=0$ and $q(1)=1$ , there must be exactly one root of $q(x)$ in $(0,1]$ and once the function is positive, it stays so within $ $. It therefore suffices to prove that$ q(\hat{x})\geq 0 $. By definition of$ \hat{x} $we have$ (1-\hat{x})^{d_{r}-2}=\frac{1}{(d_{l}-1)(d_{r}-1)\sqrt{\hat{x}}} $. We therefore have$ q(\hat{x})=r(z)\,|\,_{z=\hat{x}}$, where

A quick check shows that $r(z)\geq 0$ for $z\in[\frac{1}{(d_{l}d_{r}-d_{l}-d_{r})^{2}},1]$ . The proof will be complete if we can show that $\hat{x}\in[\frac{1}{(d_{l}d_{r}-d_{l}-d_{r})^{2}},1]$ . We do this in two steps. We claim that $\hat{x}\geq\breve{x}=\frac{c\ln\sqrt{(d_{l}-1)(d_{r}-1)}}{d_{r}-2}$ , where $c=\frac{1}{1+\frac{\ln\sqrt{(d_{l}-1)(d_{r}-1)}}{d_{r}-2}}$ , and that $\breve{x}\in[\frac{1}{(d_{l}d_{r}-d_{l}-d_{r})^{2}},1]$ . The second claim is immediate. To see the first,

This shows that that the maximum of $b(x)$ in $ $is above$ 1 $and so$ \hat{x} $is well defined. Since further,$ b(x) $is a unimodal function and$ \hat{x} $was defined to be the largest value of$ x\in $so that$ b(\hat{x})=1 $it follows that$ \hat{x}\geq\breve{x}$, as claimed. ∎

Proof of Lemma 19: Let $a(x),b(x)$ and $c(x)$ be as defined in Lemma 18. We will provide an upper bound on the unique solution of $a(x)=b(x)$ . Notice that $a(x)$ represents the DE equations for a BEC with parameter $\epsilon=1$ . Therefore, we know that for $x\geq x_{\text{u}}(1)$ , $a(x)\geq x$ . We claim that $b(x)$ and $l(x)=x$ intersect only at one point in $(0,1]$ . Indeed $b(x)=x$ , $x\in(0,1]$ , is equivalent to

Since $b(1)=0$ , whereas $l(1)=1$ , we conclude that for $x\in[\overline{x},1]$ , $b(x)\leq x$ .

We further claim that $\overline{x}\geq x_{\text{u}}(1)$ . Let us assume this for a moment. Then we have $a(x)\geq x\geq b(x)$ for $x\in[\overline{x},1]$ . We conclude that the unique solution of $a(x)=b(x)$ in $(0,1]$ is upper bounded by $\overline{x}$ .

We finish the lemma by proving $\overline{x}\geq x_{\text{u}}(1)$ . Indeed, since $\overline{x}\neq 0$ , all we need to show is that $(1-(1-\overline{x})^{d_{r}-1})^{d_{l}-1}\geq\overline{x}$ Recall that for the BEC(1), the DE equation is given by $x=(1-(1-x)^{d_{r}-1})^{d_{l}-1}$ . Furthermore, there are 3 FPs namely, 0, $x_{\text{u}}(1)$ (unstable) and $1$ (stable). Finally, we have that $(1-(1-x)^{d_{r}-1})^{d_{l}-1}\geq x$ if and only if $x=0$ or $x\in[x_{\text{u}}(1),1]$ . See Chapter 3 in for more details.. For $3=d_{r}=d_{l}$ one can verify the validity of the claim directly. In general, we have

where the last inequality follows since $(\frac{1}{(d_{l}-1)(d_{r}-1)})^{\frac{1}{d_{r}-2}}\stackrel{{\scriptstyle d_{r}\geq 4}}{{\geq}}\frac{1}{d_{r}-1}$ .

The Battacharyya parameter of the channel is thus upper bounded by $\sqrt{\overline{x}/a(\overline{x})}$ . Using the upper bound on the entropy in Lemma 4, we get the claimed bound.

It remains to show that this bound converges to when we fix the rate and let the dds tend to infinity. To simplify our notation, let $L=d_{l}-1$ and $R=d_{r}-1$ . We have

where (a) is obtained by using the following sequence of inequalities,

Appendix F Entropy Product Inequality – Lemma 21

with the kernel as given in the statement. Differentiating, we have

Integrating by parts twice for each dimension, we see that

This proves the alternative representation of this integral.

Note that the bound (i) is implied by $\frac{1}{(1-x^{2}y^{2})^{3}}\leq(1-x^{2})^{-\frac{3}{2}}(1-y^{2})^{-\frac{3}{2}}\,.$ Let $u=(1-x^{2})^{-1}$ and $v=(1-y^{2})^{-1}.$ Then the desired inequality is equivalent to $\frac{1}{(1/u+1/v-1/uv)^{3}}\leq u^{\frac{3}{2}}v^{\frac{3}{2}}\,$ for $u,v\geq 1.$ Raising both sides to the power of $\frac{2}{3}$ this becomes $\frac{1}{(1/u+1/v-1/uv)^{2}}\leq uv\,.$ Multiplying both sides by $\frac{1}{(uv)^{2}}$ this can be written as $uv\leq(v+u-1)^{2}$ which is equivalent to $0\leq(v-1)^{2}+(u-1)^{2}+uv-1\,,$ proving the claim.

where to obtain the second inequality we combine the upper bound on $k_{xxyy}(x,y)$ derived above with the alternative representation of $\operatorname{\mathfrak{B}}(\mathsf{a})$ as given in (ii).

Appendix G Evaluation of GEXIT Integral – Lemma 26

For the proof of Lemma 26 it will be handy to have the following two lemmas available.

Consider a single-parity check code of length $d_{r}$ . Let $X$ denote a codeword, chosen uniformly at random from this code. Let $Y$ denote the result of passing the codeword through a BMS channel with density $\mathsf{x}$ . Then

Let $X_{1},...,X_{d_{r}}$ be uniform random bits and let $Z$ denote their parity. Suppose $X_{i}$ is transmitted through the BMS channel with density $\mathsf{x}$ . Let the received vector be $Y$ .

The entropy of the single parity check code is $\text{H}(X|Z=0,Y).$ By symmetry we have $\text{H}(X|Z=0,Y)=\text{H}(X|Z=1,Y)=\text{H}(X|Z,Y).$ Now $\text{H}(X,Z|Y)=\text{H}(X|Y)+\text{H}(Z|X,Y)=\text{H}(X|Y)=\sum_{i=1}^{d_{r}}\text{H}(\mathsf{x})$ , but we also have $\text{H}(X,Z|Y)=\text{H}(Z|Y)+\text{H}(X|Z,Y)=\text{H}(\mathsf{x}^{\boxast d_{r}})+\text{H}(X|Z,Y)$ . Thus, the entropy of the single parity check code is

Now consider the channel that transmits a bit once through the channel with density $\mathsf{a}$ and again through a channel with density $\mathsf{b}.$ The entropy of the combined channel is $\text{H}(\mathsf{a}\circledast\mathsf{b}).$ This is equivalent to the single parity check code of two bits. Hence

which proves (the Duality Rule of) Lemma 6. ∎

Consider the $(d_{l},d_{r})$ -regular computation tree of height $2$ (see e.g., Figure 9). This tree represents a code of length $1+d_{l}(d_{r}-1)$ containing $2^{1+d_{l}(d_{r}-2)}$ codewords. Let $X$ be chosen uniformly at random from the set of codewords and let $Y$ be the result of sending the components of $X$ through independent BMS channels. The root node goes through the BMS channel $\mathsf{c}$ and all leaf nodes are passed through the BMS channel $\mathsf{x}$ . Then,

Using the chain rule, rewrite $\text{H}(X\,|\,Y)$ as

where $X_{1}$ corresponds to the root variable node and $X_{\sim 1}$ is the set of all the leaf nodes. The first term is computed by density evolution by considering all the independent messages flowing from the leaf nodes into the root node. Indeed, we convolve the channel density $\mathsf{c}$ with the densities coming from the $d_{l}$ check nodes, each of which has density $\mathsf{y}=\mathsf{x}^{\boxast d_{r}-1}$ . Thus we get

Indeed, when we condition on the root node to take either or $1$ , we split the code into $d_{l}$ codes, each of which is a single parity-check code of length $d_{r}-1$ . Using the previous Lemma 53, we obtain the above expressions. Combining the above statements proves the claim. For completeness, although the exact marginal does not factor into the computation, note that there are $2^{1+d_{l}(d_{r}-2)}$ codewords in the code. Out of those, $2^{d_{l}(d_{r}-2)}$ have a in the root node. So the marginal of $X_{1}=0/1$ is one-half. ∎

To evaluate the integral we consider the code corresponding to the $(d_{l},d_{r})$ -regular computation tree of height $2$ as in Lemma 54. Let $X$ be chosen uniformly at random from the set of codewords and assume that the component corresponding to the root node is sent through the channel $\mathsf{c}_{{\tt{h}}}$ , whereas all components corresponding to the leaf nodes are sent through the channel $\mathsf{x}_{{\tt{h}}}$ . Let $Y$ be the received word. Since $\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}}$ is, by assumption, a FP family, the density flowing from any check node into the root node is $\mathsf{y}_{{\tt{h}}}=\mathsf{x}_{{\tt{h}}}^{\boxast d_{r}-1}$ and so the total density seen by the variable node (excluding the observation of the variable node itself) is $\mathsf{y}_{{\tt{h}}}^{\circledast d_{l}}$ . Therefore, the GEXIT integral associated to the root of this tree code is the desired integral. We will evaluate this integral by first determining the sum of all the GEXIT integrals associated to this tree and then by subtracting from it the GEXIT integrals associated to the leaf nodes.

In the sequel we will perform manipulations, such as writing a total derivative as the sum of its partial derivatives or writing a function as the integral of its derivative. In a first pass we will assume that all these operations are well defined. In a second step we will then see how to justify these steps by approximating the desired integrals by a series of simple integrals.

Label the variable nodes of the tree with the set $\{1,\dots,1+d_{l}(d_{r}-1)\}$ so that the root has label $1$ . Note that by assumption $\text{H}(\mathsf{c}_{{\tt{h}}})={\tt{h}}$ , so that the entropy of the first component of $Y$ , call it ${\tt{h}}_{1}$ , is ${\tt{h}}$ . The entropy of the remaining components, call them ${\tt{h}}_{i}$ , $i\in\{2,\dots,1+d_{l}(d_{r}-1)\}$ , are all equal and take on the value $\text{H}(\mathsf{x}_{\tt{h}})$ . So we imagine that all components are parameterized by ${\tt{h}}$ .

The last inequality is obtained by using Lemma 54 for the two endpoints and recalling that we set $\mathsf{x}=\mathsf{x}_{{\tt{h}}^{*}}$ .

Let us consider the leaf node contributions. By symmetry these contributions are all identical. If we focus on a single check node, then again due to symmetry, the GEXIT integrals of all leaf nodes is the same. But the sum of all the GEXIT integrals is equal to the change in entropy of a single-parity check code of length $d_{r}$ . Thus, using Lemma 53, we see that the integral of any single GEXIT integral is equal to

It remains to justify the previous derivation. We proceed as follows. Instead of working with $\{\mathsf{c}_{\tt{h}},\mathsf{x}_{\tt{h}}\}$ , we will work with a simpler family which is piece-wise linear and “close” to the original family. Because it is piece-wise linear, the operations are simple to justify. Because it is “close” to the original family, the result is “close” to what we want to show. By taking a sequence of such families which approximate the original family closer and closer, we obtain the desired result.

where $i=\lfloor\frac{{\tt{h}}}{\Delta{\tt{h}}}\rfloor$ and $\alpha\in$ is a suitable interpolation factor. In the last step we have made use of (v) in Lemma 13, the convexity property of the Wasserstein distance, and the fact that consecutive samples have an entropy difference of (at most) $\Delta{\tt{h}}$ . Further, since they are ordered, i.e., $\mathsf{c}_{i\Delta{\tt{h}}}\prec\mathsf{c}_{\tt{h}}\prec\mathsf{c}_{(i+1)\Delta{\tt{h}}}$ , an entropy difference of at most $\Delta{\tt{h}}$ implies a Wasserstein distance of at most $2\sqrt{\ln(2)\Delta{\tt{h}}}$ (cf. (ii) of Lemma 14).

To each $\mathsf{c}_{{\tt{h}}=i\Delta{\tt{h}}}$ corresponds a FP $\mathsf{x}_{\tt{h}}$ , call it $\mathsf{x}_{i\Delta{\tt{h}}}$ . Take the collection $\{\mathsf{x}_{i\Delta{\tt{h}}}\}$ . Since this collection is ordered we can construct from it an ordered and piece-wise smooth family via a linear interpolation of consecutive samples in the same manner as we have done this for the channel family. We have

The last inequality above follows from considering the same steps as before, since the densities are ordered and each of them are FPs at channels with entropy difference at most $\Delta{\tt{h}}$ . Recall that $\{\mathsf{c}_{\tt{h}},\mathsf{x}_{\tt{h}}\}$ is a FP family, hence we can write

So the root integral is in fact well defined. The same argument can be repeated for the leaf integrals to show that they are also well defined.

If we consider one segment and add all the contributions (which as we saw can be written down explicitly) we can verify that the sum of all the GEXIT integrals is indeed equal to the difference of the entropy of the tree. This calculation is in principle straightforward but somewhat tedious, so we skip the details.

Note that for any ${\tt{h}}\in[{\tt{h}}^{*},1]$ we have

The only thing which remains to be done is to prove that the GEXIT integral of the root node when we use the linearized family converges to the true GEXIT integral when we let $\Delta{\tt{h}}$ tend to . We will do this in several steps by considering the chain of integrals

$G(d_{l},d_{r},\{\mathsf{c}_{{\tt{h}}},\mathsf{x}_{{\tt{h}}}\}_{{\tt{h}}^{*}}^{1})$ ,

$G(d_{l},d_{r},\{\mathsf{c}_{{\tt{h}}},\hat{\mathsf{x}}_{{\tt{h}}}\}_{{\tt{h}}^{*}}^{1})$ ,

and by showing that the value of consecutive such integrals is arbitrarily close. Here, $\{\hat{\mathsf{x}}_{{\tt{h}}}\}$ is a family which is piece-wise constant on each segment, taking on the value of its left boundary.

Appendix H Negativity – Lemma 27

We prove Lemma 27 by showing the following slightly stronger statement.

Let $\mathsf{x}$ be an $L$ -density and consider a degree-distribution $(d_{l},d_{r})$ such that $d_{r}\geq 1+5(\frac{1}{1-r})^{\frac{4}{3}}$ . Define $I_{1}=[(\frac{3}{4})^{\frac{d_{l}-1}{2}}+\frac{1}{2(d_{r}-1)^{3}},\frac{1}{2e}\frac{d_{l}}{d_{r}}]$ , and $I_{2}=[\frac{1}{2e}\frac{d_{l}}{d_{r}},\frac{d_{l}}{d_{r}}-d_{l}e^{-4(d_{r}-1)(\frac{2d_{l}}{11ed_{r}})^{\frac{4}{3}}}-\kappa]$ , where $\kappa>0$ .

Assume that $\mathsf{x}$ is a $\delta$ -approximate FP, i.e., $d(\mathsf{x},\mathsf{c}\circledast(\mathsf{x}^{\boxast d_{r}-1})^{\circledast d_{l}-1})\leq\delta$ , for some channel $\mathsf{c}$ and $\delta\leq(\frac{\ln(2)d_{l}}{16\sqrt{2}d_{r}})^{2}$ . Then if $\text{H}(\mathsf{x})\in I_{1}$ , $A\leq-\frac{1}{16e}\frac{d_{l}}{d_{r}}$ .

For $\text{H}(\mathsf{x})\in I_{2}$ , $A\leq-\kappa$ .

Set $\mathsf{y}=\mathsf{x}^{\boxast d_{r}-1}$ . Let us first characterize the area $A$ in a more convenient form. We have

For the $L$ -distributions $\mathsf{x}$ and $\mathsf{y}$ let $|\mathfrak{{x}}|$ and $|\mathfrak{{y}}|$ be the associated $|D|$ distributions. Following the lead of L. Boczkowski we write

In step (a) we have used the expansion of Lemma 49, where $\alpha_{n}=\frac{1}{2\ln(2)n(2n-1)}$ , $n\geq 0$ . Note that $\alpha_{n}\geq 0$ and that $\sum_{n\geq 1}\alpha_{n}=1$ . Most importantly, as mentioned in the proof of Lemma 50, the moments $m_{\mathsf{x},n}$ are multiplicative under $\boxast$ . This implies that for $d\geq 1$ , $\text{H}(\mathsf{x}^{\boxast d})=1-\sum_{n\geq 1}\alpha_{n}m_{\mathsf{x},n}^{d}$ . E.g., for two distributions $\mathsf{x}$ and $\mathsf{y}$ we have

where in the first equality we use that in the $|D|$ -domain the check node operation is simply a multiplication.

Assume at first that $\text{H}(\mathsf{x})\in[(\frac{3}{4})^{\frac{d_{l}-1}{2}},\frac{1}{2e}\frac{d_{l}}{d_{r}}+\frac{1}{2(d_{r}-1)^{3}}]$ and that $\mathsf{x}=\mathsf{c}\circledast\mathsf{y}^{\circledast d_{l}-1}$ for some channel $\mathsf{c}$ . Define $\psi(x)=(1-x)x^{d_{r}-1}$ . Then

In (a) we used the bound $\psi(x)\leq\frac{(1-\frac{1}{d_{r}})^{d_{r}}}{d_{r}-1}$ so that $\sum_{n}\alpha_{n}\psi(m_{\mathsf{x},n})\leq\frac{(1-\frac{1}{d_{r}})^{d_{r}}}{d_{r}-1}$ . Consider step (b). Set $\text{H}(\mathsf{y})=h_{2}(p)\stackrel{{\scriptstyle\text{Lemma}~{}\ref{lem:boundsonent}}}{{\geq}}4p\bar{p}$ . Then

In step (c) we substituted the upper and lower bounds on $\text{H}(\mathsf{x})$ for the first and second expression respectively. Also, in the last inequality, we have $\frac{1}{2(d_{r}-1)^{3}}\leq\frac{d_{l}}{d_{r}}(\frac{3}{4}-\frac{13}{8e})$ since we assumed that $d_{r}\geq 1+5(d_{r}/d_{l})^{\frac{4}{3}}\geq 1+(2\frac{d_{l}}{d_{r}}(\frac{3}{4}-\frac{13}{8e}))^{-\frac{4}{3}}$ .

For $\text{H}(\mathsf{x})\in[\frac{1}{2e}\frac{d_{l}}{d_{r}},\frac{d_{l}}{d_{r}}-d_{l}e^{-4(d_{r}-1)(\frac{d_{l}}{16ed_{r}})^{2}}-\kappa]$ ,

In (a) we upper bound $\psi(x)=(1-x)x^{d_{r}-1}$ by $x^{d_{r}-1}$ , $x\in$ , and note that $m_{\mathsf{x},n}\in$ . In (b) we use $m_{\mathsf{x},n}\leq m_{\mathsf{x},1}$ (this is true since $x^{2n}$ is decreasing for each fixed $x\in$ as a function of $n$ ) and that $x^{d_{r}-1}$ is increasing. Step (c) is a consequence of the bound $m_{\mathsf{x},1}\leq(1-2h_{2}^{-1}(\text{H}(\mathsf{x})))^{2}$ . Let us prove this inequality. Equivalently, we want to show $\text{H}(\mathsf{x})\leq h_{2}\Big{(}\frac{1-\sqrt{m_{\mathsf{x},1}}}{2}\Big{)}$ . By Jensen

The claim is proven by noticing that the lhs above is equal to $\text{H}(\mathsf{x})$ .

Step (d) uses the following lower bound on $\text{H}(\mathsf{y})=\text{H}(\mathsf{x}^{\boxast d_{r}-1})$ . Set $\text{H}(\mathsf{x})=h_{2}(p)$ . From extremes of information combining we know that we get the lowest entropy if we assume that $\mathsf{x}$ is a BSC density. Therefore,

Consider finally step (e). We know that $h_{2}(p)\in I_{2}$ . Combined with (26) and $(1\!-\!2p)^{2(d_{r}-1)}\leq e^{-4(d_{r}-1)p}$ we conclude that $p\geq(\frac{2}{11e}\frac{d_{l}}{d_{r}})^{\frac{4}{3}}$ . ∎

Appendix I Spacing of FPs –Lemma 57 and Transition Length of FPs – Lemma 61

If we are given a proper one-side FP (with any boundary condition) then consecutive elements of the FP cannot be too different from each other. This is made precise in the following lemma.

Let $(\mathsf{c},\mathsf{\underline{x}})$ be a proper one-sided FP on $[-N,0]$ , $N\geq 0$ with any boundary condition.

Let $\mathsf{\bar{x}}_{i}$ denote the weighted average $\mathsf{\bar{x}}_{i}=\frac{1}{w^{2}}\sum_{j,k=0}^{w-1}\mathsf{x}_{i+j-k}$ . Then, for any $i\in[-\infty,\infty]$ ,

Discussion: Each of these two claims states that consecutive distributions are “close” either wrt the Wasserstein distance or the Battacharyya parameter. Further, the difference is either for the distributions themselves or their averages.

To simplify notation, for $i\in[-N+1,0]$ fixed, let $\mathsf{f}_{j}=\bigl{(}\frac{1}{w}\sum_{k=0}^{w-1}\mathsf{x}_{i+j-k-1}\bigr{)}^{\boxast d_{r}-1}$ . Writing the DE equations explicitly,

Note that the expressions for $\mathsf{x}_{i}$ and $\mathsf{x}_{i-1}$ are similar. The only difference is that $\mathsf{x}_{i}$ contains $\mathsf{f}_{w}$ whereas $\mathsf{x}_{i-1}$ contains $\mathsf{f}_{0}$ . Rewrite both expressions in the form

where $\mathsf{a}_{i}=\mathsf{b}_{i}=\mathsf{f}_{i-1}$ , $i=2,\dots,w$ , $\mathsf{a}_{1}=\mathsf{f}_{w}$ , and $\mathsf{b}_{1}=\mathsf{f}_{0}$ . Now expand $\mathsf{x}_{i}$ as well as $\mathsf{x}_{i-1}$ in the form

where $\mathsf{c}_{d_{2},\dots,d_{w}}=\mathsf{a}_{2}^{\circledast d_{2}}\circledast\dots\circledast\mathsf{a}_{w}^{\circledast d_{w}}\circledast\mathsf{c}$ . Note that the terms in the expansions of $\mathsf{x}_{i}$ and $\mathsf{x}_{i-1}$ with $d_{1}=0$ are identical. Therefore, if we consider $\operatorname{\mathfrak{B}}(\mathsf{x}_{i})-\operatorname{\mathfrak{B}}(\mathsf{x}_{i-1})$ , these terms cancel. We can upper bound the difference by the Battacharyya constant of all those terms of the expansion of $\mathsf{x}_{i}$ which correspond to $d_{1}\geq 1$ , i.e.,

If we are interested in the Wasserstein distance instead, we can proceed in an almost identical fashion. The only difference is that in the last sequence of inequalities we use the convexity property (v) and the boundedness property (ii) of (the Wasserstein metric) Lemma 13.

Using the convexity property (v) of (the Wasserstein metric) Lemma 13 and canceling common terms, we get

The proof for the Battacharyya parameter proceeds in an identical fashion and uses the linearity of the Battacharyya parameter.

Let $(\mathsf{c},\mathsf{\underline{x}})$ be a proper one-sided FP on $[-N,0]$ , $N\geq 0$ with any boundary condition. Let $\operatorname{\mathfrak{B}}_{i}=\operatorname{\mathfrak{B}}(\mathsf{x}_{i})$ denote the Battacharyya parameter of the density of the $i$ -th section. Then for all $i\in[-N,0]$ ,

Since the Battacharyya parameter is multiplicative in $\circledast$ and linear,

Further, recall from Lemma 5, property (iv), and the ensuing discussion, that $\operatorname{\mathfrak{B}}(\mathsf{a}^{\boxast d_{r}-1})\leq 1-(1-\operatorname{\mathfrak{B}}(\mathsf{a}))^{d_{r}-1}$ , so that

Let $f(x)=(1-x)^{d_{r}-1}$ , $x\in$ . Since $f^{\prime\prime}(x)=(d_{r}-1)(d_{r}-2)(1-x)^{d_{r}-3}\geq 0$ , $f(x)$ is convex. Let $y_{j}=\frac{1}{w}\sum_{k=0}^{w-1}\operatorname{\mathfrak{B}}_{i+j-k}$ . Then by Jensen,

Consider the $(d_{l},d_{r})$ -regular ensemble with $d_{l}\geq 3$ and let $\epsilon\in(\epsilon^{\text{\tiny BP}},1]$ , where $\epsilon^{\text{\tiny BP}}(d_{l},d_{r})$ is the BP threshold the regular ensemble when transmitting over the BEC. Define $h(x)=\epsilon(1-(1-x)^{d_{r}-1})^{d_{l}-1}-x$ .

For $\epsilon>\epsilon^{\text{\tiny BP}}$ , $h(x)=0$ has exactly three solutions, one of them being 0 and the other two denoted by $x_{\text{u}}(\epsilon)$ and $x_{\text{s}}(\epsilon)$ with $0<x_{\text{u}}(\epsilon)<x_{\text{s}}(\epsilon)$ . Further, $h(x)\leq 0$ for all $x\in[0,x_{\text{u}}(\epsilon)]$ and $h(x)\geq 0$ for all $x\in[x_{\text{u}}(\epsilon),x_{\text{s}}(\epsilon)]$ .

$h^{\prime}(x_{\text{u}}(\epsilon))>0$ and $h^{\prime}(x_{\text{s}}(\epsilon))<0$ ; $|h^{\prime}(x)|\leq d_{l}d_{r}$ for $x\in$ .

There exists a unique value $0\leq x_{*}(\epsilon)\leq x_{\text{u}}(\epsilon)$ so that $h^{\prime}(x_{*}(\epsilon))=0$ , and there exists a unique value $x_{\text{u}}(\epsilon)\leq x^{*}(\epsilon)\leq x_{\text{s}}(\epsilon)$ so that $h^{\prime}(x^{*}(\epsilon))=0$ . Further, $h(x)$ is decreasing in $[0,x_{*}(\epsilon)]$ .

Let $\kappa_{*}(\epsilon)=\min\{-h^{\prime}(0),\frac{-h(x_{*}(\epsilon))}{x_{*}(\epsilon)}\}$ . The quantity $\kappa_{*}(\epsilon)$ is non-negative and depends only on the channel parameter $\epsilon$ and the degrees $(d_{l},d_{r})$ .

For $0\leq\epsilon\leq 1$ , $x_{*}(\epsilon)>\frac{1}{d_{l}^{2}d_{r}^{2}}$ .

For $0\leq\epsilon\leq 1$ , $\kappa_{*}(\epsilon)\geq\frac{1}{8d_{r}^{2}}$ .

Let $\kappa_{*}$ and $x_{*}$ denote the universal lower bounds, given in the previous part, on $\kappa_{*}(\epsilon)$ and $x_{*}(\epsilon)$ , respectively. If we draw a line from with slope $-\kappa_{*}$ , then $h(x)$ lies below this line for $x\in[0,x_{*}]$ .

For $\epsilon\in(\epsilon^{\text{\tiny BP}},1]$ we have

The function $h(x)$ is the DE equation for the $(d_{l},d_{r})$ -regular ensemble when transmitting over the BEC. The two non-zero solutions, $x_{\text{u}}(\epsilon)$ and $x_{\text{s}}(\epsilon)$ represent the unstable and the stable FPs of DE . In the following, we will be using extremes of information combining techniques to relate the Battacharyya parameters via $h(x)$ .

In Figure 6 we see that within a few sections the constellation changes from reliable sections (towards the boundary) to sections which all have more or less the same reliability. In other words, this transition happens quickly. This is made precise in the following lemma.

Let $\epsilon^{\text{\tiny BP}}$ be the BP threshold for transmission over the BEC using the $(d_{l},d_{r})$ -regular (uncoupled) ensemble. For $\epsilon\in(\epsilon^{\text{\tiny BP}},1]$ , let $x_{\text{u}}(\epsilon)$ be the smaller of the two strictly positive roots of the equation $h(x)=0$ , where $h(x)=\epsilon(1-(1-x)^{d_{r}-1})^{d_{l}-1}-x$ . For $0\leq\epsilon\leq\epsilon^{\text{\tiny BP}}$ , define $x_{\text{u}}(\epsilon)=\lim_{\delta\downarrow\epsilon^{\text{\tiny BP}}}x_{\text{u}}(\delta)$ .

Consider transmission over a BMS channel $\mathsf{c}$ . Let $w$ be admissible in the sense of property (iv) of Definition 40. Let $(\mathsf{c},\mathsf{\underline{x}})$ be a proper one-sided FP on $[-N,0]$ with any boundary condition. Let $\operatorname{\mathfrak{B}}_{i}=\operatorname{\mathfrak{B}}(\mathsf{x}_{i})$ denote the Battacharyya parameter of the density associated to the $i$ -th section and define $\epsilon=\operatorname{\mathfrak{B}}(\mathsf{c})$ .

Then, there exists a positive constant $c(d_{l},d_{r})$ which depends on $d_{l}$ and $d_{r}$ , but not on $N$ or the channel $\mathsf{c}$ , so that for any $\delta>0$

Throughout the proof we set $\epsilon=\operatorname{\mathfrak{B}}(\mathsf{c})$ and we write $\operatorname{\mathfrak{B}}_{i}$ for $\operatorname{\mathfrak{B}}(\mathsf{x}_{i})$ .

Note first that we have to prove the statement only for $\epsilon\in(\epsilon^{\text{\tiny BP}},1]$ . This is true since we have defined $x_{\text{u}}(\epsilon)$ to coincide with $x_{\text{u}}(\epsilon^{\text{\tiny BP}})$ for $\epsilon\in[0,\epsilon^{\text{\tiny BP}}]$ and since further the function $h$ , which we use to bound the process, is strictly decreasing as a function of $\epsilon$ . Hence, in the sequel our language will reflect the fact that we have $\epsilon\in(\epsilon^{\text{\tiny BP}},1]$ .

(i) The number of sections such that $\operatorname{\mathfrak{B}}_{i}\in[\delta,x_{*}(\epsilon)]$ is at most $w(\frac{1}{\kappa_{*}\delta}+1)$ . If $\delta>x_{*}(\epsilon)$ then the number of sections in this part is 0. Hence wlog assume $\delta<x_{*}(\epsilon)$ . Let $i$ be the smallest index so that $\operatorname{\mathfrak{B}}_{i}\geq\delta$ . If $\operatorname{\mathfrak{B}}_{i+(w-1)}\geq x_{*}(\epsilon)$ then the claim is trivially fulfilled. Assume therefore that $\operatorname{\mathfrak{B}}_{i+(w-1)}\leq x_{*}(\epsilon)$ . From the monotonicity of $g(\cdot)$ and the fact that $\mathsf{\underline{x}}$ is increasing,

This is equivalent to $\operatorname{\mathfrak{B}}_{i+(w-1)}\geq\operatorname{\mathfrak{B}}_{i}+\kappa_{*}(\epsilon)\delta$ . More generally, using the same line of reasoning, $\operatorname{\mathfrak{B}}_{i+l(w-1)}\geq\operatorname{\mathfrak{B}}_{i}+l\kappa_{*}(\epsilon)\delta$ , as long as $\operatorname{\mathfrak{B}}_{i+l(w-1)}\leq x_{*}(\epsilon)$ .

We summarize, the total distance we have to cover is $x_{*}-\delta$ and every $(w-1)$ sections we cover a distance of at least $\kappa_{*}(\epsilon)\delta$ as long as we have not surpassed $x_{*}(\epsilon)$ . Therefore, after $(w-1)\lfloor\frac{x_{*}(\epsilon)-\delta}{\kappa_{*}(\epsilon)\delta}\rfloor$ sections we have either passed $x_{*}$ or we must be strictly closer to $x_{*}$ than $\kappa_{*}(\epsilon)\delta$ . Hence, to cover the remaining distance we need at most $(w-2)$ extra sections. The total number of sections needed is therefore upper bounded by $w-2+(w-1)\lfloor\frac{x_{*}(\epsilon)-\delta}{\kappa_{*}(\epsilon)\delta}\rfloor$ , which, in turn, is upper bounded by $w(\frac{x_{*}(\epsilon)}{\kappa_{*}(\epsilon)\delta}+1)$ . The final claim follows by bounding $x_{*}(\epsilon)$ with $1$ and $\kappa_{*}(\epsilon)$ by $\kappa_{*}$ .

(ii) The number of sections such that $\operatorname{\mathfrak{B}}_{i}\in[x_{*}(\epsilon),x_{\text{u}}(\epsilon)]$ is at most $2w(\frac{4}{3\kappa_{*}(x_{*})^{2}}+1)$ Let us define $\overline{\mathfrak{B}}_{i}=\frac{1}{w^{2}}\sum_{j,k=0}^{w-1}\operatorname{\mathfrak{B}}_{i+j-k}.$ From Lemma 58, $\operatorname{\mathfrak{B}}_{i}\leq\epsilon g(\overline{\mathfrak{B}}_{i},\overline{\mathfrak{B}}_{i},\dots,\overline{\mathfrak{B}}_{i})=\overline{\mathfrak{B}}_{i}+h(\overline{\mathfrak{B}}_{i})$ . Summing this inequality over all sections from $-\infty$ to $k\leq 0$ we get,

Writing $\sum_{i=-\infty}^{k}\overline{\mathfrak{B}}_{i}$ in terms of the $\operatorname{\mathfrak{B}}_{i}$ s and rearranging terms,

Without loss of generality we can assume that there exists a section $k$ so that $x_{*}(\epsilon)\leq\operatorname{\mathfrak{B}}_{k-(w-1)}$ (we know from point (i) that we must reach this point unless the constellation is too short, in which case the statement is trivially fulfilled). Consider sections $\operatorname{\mathfrak{B}}_{k-(w-1)},\dots,\operatorname{\mathfrak{B}}_{k+(w-1)}$ , so that in addition $\operatorname{\mathfrak{B}}_{k+(w-1)}\leq x_{\text{u}}(\epsilon)$ . If no such $k$ exists then there are at most $2w-1$ points in the interval $[x_{*}(\epsilon),x_{\text{u}}(\epsilon)]$ , and the statement is correct a fortiori.

Our plan is to use (41) to lower bound $\operatorname{\mathfrak{B}}_{k+(w-1)}-\operatorname{\mathfrak{B}}_{k-(w-1)}$ . This means, we need a lower bound for $-\frac{6}{w}\sum_{i=-\infty}^{k}h(\overline{\mathfrak{B}}_{i})$ . Since by assumption $\operatorname{\mathfrak{B}}_{k+(w-1)}\leq x_{\text{u}}(\epsilon)$ , it follows that $\overline{\mathfrak{B}}_{k}\leq x_{\text{u}}(\epsilon)$ , so that every contribution in the sum $-\frac{6}{w}\sum_{i=-\infty}^{k}h(\overline{\mathfrak{B}}_{i})$ is positive (cf. Lemma 59 (i)). Further, by (the Spacing) Lemma 57, $w(\overline{\mathfrak{B}}_{i}-\overline{\mathfrak{B}}_{i-1})\leq 1$ . Hence,

Let us explain how we obtain the last inequality. First we claim that there must exist a section $i$ with $\overline{\mathfrak{B}}_{i}$ between $x_{*}(\epsilon)/2$ and $x_{*}(\epsilon)$ . Indeed, suppose on the contrary that this was not true. Let $k^{*}\leq k$ be the smallest section number such that $\overline{\mathfrak{B}}_{k^{*}}\geq x_{*}(\epsilon)$ . Clearly, such a $k^{*}$ exists. Indeed, since $x_{*}(\epsilon)\leq\operatorname{\mathfrak{B}}_{k-(w-1)}$ , it follows that $\overline{\mathfrak{B}}_{k}\geq x_{*}(\epsilon)$ . Since $\overline{\mathfrak{B}}_{-\infty}=0$ , we must have $\overline{\mathfrak{B}}_{k^{*}-1}\leq x_{*}(\epsilon)/2$ . This implies that $\overline{\mathfrak{B}}_{k^{*}}-\overline{\mathfrak{B}}_{k^{*}-1}>x_{*}(\epsilon)/2$ . Using (the Spacing) Lemma 57 we conclude that $\frac{d_{l}-1}{w}\geq x_{*}(\epsilon)/2$ . Hence $w\leq 2d_{l}/x_{*}(\epsilon)$ . Using the universal lower bound on $x_{*}(\epsilon)$ , we get $w\leq 2d_{l}^{3}d_{r}^{2}$ , a contradiction to the hypothesis of the lemma. Finally, according to Lemma 59 part (iv), $-h(x)\geq\kappa_{*}(\epsilon)x$ for $x\in[0,x_{*}(\epsilon)]$ , which implies the inequality. Combined with (41) this implies that

We summarize, the total distance we have to cover is $x_{\text{u}}(\epsilon)-x_{*}(\epsilon)$ and every $2(w-1)$ steps we cover a distance of at least $\frac{3\kappa_{*}(\epsilon)(x_{*}(\epsilon))^{2}}{4}$ as long as we have not surpassed $x_{\text{u}}(\epsilon)$ . Allowing for $2(w-1)-1$ extra steps to cover the last part, bounding again $w-1$ by $w$ , bounding $x_{\text{u}}(\epsilon)-x_{*}(\epsilon)$ by $1$ and replacing $\kappa_{*}(\epsilon)$ and $x_{*}(\epsilon)$ by their universal lower bounds, proves the claim. ∎

Appendix J Saturation – Theorem 47

Before we proceed to prove the Saturation theorem, we introduce a key technical element required in the proof, a family of spatial (approximate) FPs. This is the content of Definition 62 and Theorem 63. Then, Theorem 64 shows that the GEXIT integral of this family depends only on its end-points. Combined with the Negativity lemma 27 this imposes a strong constraint on the channel value of the spatial FPs, culminating in the proof of the Saturation theorem.

Let $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ , $\mathsf{c}^{*}\in\{\mathsf{c}_{\tt{h}}\}$ , denote an increasing one-sided constellation on $[-N,0]$ for the parameters $(d_{l},d_{r},w)$ . Let ${\tt{h}}^{*}=\text{H}(\mathsf{c}^{*})>0$ and let $0\leq L\leq N$ .

The family (of constellations) for the $(d_{l},d_{r},L,w)$ -ensemble, based on $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ , is denoted by $\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\sigma=0}^{{\tt{h}}^{*}}$ .

Each element $\mathsf{\underline{x}}_{\sigma}$ is symmetric with respect to the spatial index and the components are indexed by $[-L,L]$ . Hence it suffices to define the constellations in the range $[-L,0]$ and then we set $\mathsf{x}_{\sigma,i}=\mathsf{x}_{\sigma,-i}$ for $i\in[0,L]$ . As usual, we set $\mathsf{x}_{\sigma,i}=\Delta_{+\infty}$ for $i\notin[-L,L]$ . For $i\in[-L,0]$ and $\sigma\in[0,{\tt{h}}^{*})$ define

where for $\sigma\in(\frac{{\tt{h}}^{*}}{2},{\tt{h}}^{*})$ ,

Finally, $\mathsf{c}_{\sigma}=\mathsf{c}_{{\tt{h}}={\tt{h}}^{*}}=\mathsf{c}^{*}$ . ∎

Notice that in the above definition when $\sigma$ approaches ${\tt{h}}^{*}$ , then $\mathsf{x}_{\sigma,i}=\mathsf{x}^{*}_{i}$ .

In the definition above, we keep the channel constant across the sections and over $\sigma$ . In other words, the channel remains constant for all the constellations in the family.

We denote the two partitions in the interpolation as phases, e.g., $({\tt{h}}^{*}/2,{\tt{h}}^{*})$ corresponds to phase I and $[0,\frac{{\tt{h}}^{*}}{2}]$ corresponds to phase II.

The above interpolation might look complicated. But there is a straightforward interpretation. Think of one-sided constellations. We are interested in a constellation of size $L$ .

In phase I, the basic idea is to “move” the constellation $\mathsf{\underline{x}}^{*}$ to the right and at each point in time to “chop off” the overhanging parts both on the left and on the right. We do this until the left most section of $\mathsf{\underline{x}}^{*}$ is at position $-L$ . If $\mathsf{\underline{x}}^{*}$ were a continuous function, i.e., suppose we had a continuum of sections, then this would be all we need to do. But $\mathsf{\underline{x}}^{*}$ is discrete, so in order to get a continuous interpolation we interpolate between two consecutive elements of $\mathsf{\underline{x}}^{*}$ . This mimics the “wave effect” we mentioned in the beginning.

In phase II, the residual constellation is uniformly brought down to $\Delta_{+\infty}$ in each section.

In the next lemma we show that if we have an interpolated family constructed via the above definition, then the resulting family is a family of approximate FPs.

Let $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ , $\mathsf{c}^{*}\in\{\mathsf{c}_{\tt{h}}\}$ , denote an increasing one-sided constellation on $[-N,0]$ with free or fixed boundary condition for the parameters $(d_{l},d_{r},w)$ and let $w\leq L<N$ . Assume that $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ fulfills the following conditions, for some $0<\delta\leq\frac{1}{w}$ .

Constellation is close to $\Delta_{+\infty}$ “on the left”:

Also, $d(\mathsf{x}^{*}_{-L-w+1},\mathsf{x})\leq\delta.$

Constellation is approximate FP: For $i\in[-N,0]$ ,

Let $\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\sigma=0}^{{\tt{h}}^{*}}$ denote the family as described in Definition 62. Then this family is an approximate FP family. More precisely, for $\underline{\sigma}=0$ and $\overline{\sigma}={\tt{h}}^{*}$

$\mathsf{x}_{\sigma,i}=\Delta_{+\infty}$ for $i\notin[-L,L]$ and for all $\sigma$ and

for any $\sigma\in[\underline{\sigma},\overline{\sigma})$ and any $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$

Discussion: For the boundary $[-L,-L+w-2]\cup[L-w+2,L]$ and in the middle $[-w+2,w-2]$ the interpolation does not in general result in an approximate FP. Fortunately this does not cause problems. We will see in Theorem 64 that each section gives only a small contribution to the GEXIT integral. If we choose $L$ sufficiently large then we can safely ignore a fixed number of sections.

That $\{\mathsf{c}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ and $\{\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ are ordered by degradation, increasing, and piece-wise linear follows by construction.

In the same way, that $\mathsf{x}_{\sigma,i}=\Delta_{+\infty}$ for $i\notin[-L,L]$ and for all $\sigma$ also follows by construction.

It remains to check that the family so defined constitutes an approximate FP family. Since the family, by definition, is symmetric around the section , we check only for the sections belonging in $[-L+w-1,-w+1]$ .

Phase I: Think of $i$ and $\sigma$ as fixed, $i\in[-L+w-1,-w+1]$ . Define $c=c(\sigma)$ and $j=i-\lceil(2-\frac{2}{{\tt{h}}^{*}}\sigma)(N-L)\rceil$ . Set $\mathsf{z}^{*}_{j}=c\mathsf{x}^{*}_{j}+\bar{c}\mathsf{x}^{*}_{j+1}$ . With these conventions, we want to bound

Using the convexity property (v) of (the Wasserstein metric) Lemma 13, it is sufficient to bound

separately. The two bounds are identical and their derivation is also essentially identical. Let us therefore concentrate on the first expression. Using first the triangle inequality and then the regularity properties (vi) and (vii) as well as the convexity property (v), we upper bound the first expression by

where to obtain the first inequality we use the approximate nature of $\mathsf{\underline{x}}^{*}$ and in the last step we have used property (ii) of Lemma 13.

Phase II: In this regime we interpolate the “tail” of the original constellation uniformly to $\Delta_{+\infty}$ . From the assumption of the lemma we have $\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{-N+L})\leq\delta$ . Since $\mathsf{\underline{x}}^{*}$ is increasing we must have $\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{i-N+L})\leq\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{-N+L})$ for $i\in[-L,0]$ . Lemma 13, property (iii), then implies that $d(\mathsf{x}^{*}_{i-N+L},\Delta_{+\infty})\leq\delta$ for all $i\in[-L,0]$ .

Again, think of $i$ and $\sigma$ as fixed, $i\in[-L+w-1,-w+1]$ . Set $c=2\sigma/{\tt{h}}^{*}$ and $j=i-N+L$ . Then

where to obtain the penultimate inequality we use Lemma 33 to bound the distance of $\mathsf{c}_{{\tt{h}}^{*}}\!\circledast\!g(c\mathsf{x}^{*}_{j\!-\!w\!+\!1}+\bar{c}\Delta_{+\infty},\dots,c\mathsf{x}^{*}_{j\!+\!w\!-\!1}+\bar{c}\Delta_{+\infty}\!)$ to $\Delta_{+\infty}(=\mathsf{c}_{{\tt{h}}^{*}}\!\circledast\!g(\Delta_{+\infty},\dots,\Delta_{+\infty})$ , since $\Delta_{+\infty}$ is always an FP of DE) and the second expression is the distance of $c\mathsf{x}^{*}_{j}+\bar{c}\Delta_{+\infty}$ to $\Delta_{+\infty}$ , which is bounded using the previous arguments.

Next, we show that if we have an approximate family of FPs, then the area under the GEXIT integral associated to the family depends only on the “end points” of the interpolated family.

Let $\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ denote an approximate FP family for the $(d_{l},d_{r},L,w)$ ensemble. More precisely,

$\{\mathsf{c}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ and $\{\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}}$ are ordered by degradation, increasing, and piece-wise linearIn fact, we will apply this theorem to the family given in Definition 62. More generally, however, given a set of distinct ordered densities $\mathsf{a}_{1}\prec\mathsf{a}_{2}\prec\dots\prec\mathsf{a}_{n}$ , we get a piece-wise linear family by linearly interpolating always between consecutive densities.,

$\mathsf{x}_{\sigma,i}=\Delta_{+\infty}$ for $i\notin[-L,L]$ and for all $\sigma$ ,

$\mathsf{x}_{\underline{\sigma},i}=\mathsf{x}_{\underline{\sigma}}$ for $i\in[-L,L]$ ,

$\mathsf{x}_{\overline{\sigma},i}=\mathsf{x}_{\overline{\sigma}}$ for $i\in[-L,L]$ , and

for all $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ and $\sigma\in[\underline{\sigma},\overline{\sigma}]$

where $G(\{\mathsf{c}_{\sigma},\hat{g}(\mathsf{x}_{\sigma,i\!-\!w\!+\!1},\!\dots\!,\mathsf{x}_{\sigma,i\!+\!w\!-\!1})\}_{\underline{\sigma}}^{\overline{\sigma}})$ is the GEXIT integral introduced in Definition 23. Let

Then $A(\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})$ is well defined and

Discussion: In words, the theorem says that for any family of spatial FPs which start and end at a constant (over all sections) FP, the GEXIT integral is given by the end-points and is close to the difference of the $A$ expression introduced in Lemma 26. In fact, from the Lemma 26 we see that, graphically, this is equal to the area under the BP GEXIT curve of the underlying ensemble between the two end-points.

Let us consider the circular ensemble which is associated to $(d_{l},d_{r},L,w)$ (see Definition 31). As defined in the statement of the lemma, for $i\in[-L,L]$ , the channel “seen” at position $i$ is $\mathsf{c}_{\sigma,i}=\mathsf{c}_{\sigma}$ . For the remaining sections $i\in[L+1,L+w-1]$ we impose the “natural” condition $\mathsf{c}_{\sigma,i}=\Delta_{+\infty}$ . As a consequence, for these positions $\mathsf{x}_{\sigma,i}=\Delta_{+\infty}$ .

Since $\{\mathsf{c}_{\sigma}\}$ as well as $\{\mathsf{\underline{x}}_{\sigma}\}$ are piece-wise linear, all GEXIT integrals are well defined (see the proof of Lemma 26). Consequently, $A(\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})$ is well-defined.

Instead of determining $A(\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})$ , directly, let us determine the equivalent quantity associated to the circular ensemble, i.e., we include the $w-1$ extra positions $[L+1,L+w-1]$ . Since for all “extra” positions the associated channel is constant, and so the additional integrals are zero, the numerical value of these two unnormalized GEXIT integrals is in fact identical.

We will now derive upper and lower bounds for the GEXIT integrals for the given approximate FP family. Recall: for $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ we have a $\delta$ -approximate (in the Wasserstein metric) FP family. For $i\in[-L,-L+w-2]\cup[-w+2,w-2]\cup[L-w+2,L]$ all we know is that the channel is a monotone function of $\sigma$ . Finally, for $i\in[L+1,L+w-1]$ the channel is frozen to “perfect.”

Boundary: For $i\in[-L,-L+w-2]\cup[-w+2,w-2]\cup[L-w+2,L]$ the GEXIT integral is non-negative. Thus, in this regime, we get a lower bound by setting each GEXIT integral to 0 (cf. Lemma 16).

Interior: Consider the GEXIT integrals for $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ .

Technique: Rather than evaluating these integrals directly we use the technique introduced in , i.e., we consider the computation tree of height $2$ rooted in node $i$ as shown in Figure 9 for the specific case $(d_{l}=2,d_{r}=4)$ .

More precisely, there are $d_{l}$ check nodes connected to this root variable node and $(d_{r}-1)$ further variable nodes connected to each such check node. So in total there are $d_{l}$ check nodes in this tree and $1+d_{l}(d_{r}-1)$ variable nodes. We call the starting variable node, the root and all other variable nodes, leaves. By symmetry it suffices to consider one branch of this computation tree in detail. Let $j$ , $j\in[i,i+w-1]$ , denote the position of a particular check node. We assume that the choice of $j$ is done uniformly over this interval. Let $k_{l}$ , $l\in[1,d_{r}-1]$ , $k_{l}\in[j-w+1,j]$ , denote the position of the $l$ -th variable node attached to this check node, and let the index of the root node be . For the leaf nodes we assume again a uniform choice of $k_{l}$ over the allowed interval. Note that, wlog, we have set the position $l=0$ for the root variable node. For each computation tree assign to its root node the channel $\mathsf{c}_{\sigma,i}$ , whereas each leaf variable node at position $k_{l}$ “sees” the channel $\mathsf{x}_{\sigma,k_{l}}$ . Note that for our model of the tree, the distribution (averaged over this choice) which flows into the root node is exactly $\hat{g}(\mathsf{x}_{\sigma,i\!-\!w\!+\!1},\!\dots\!,\mathsf{x}_{\sigma,i\!+\!w\!-\!1})$ , as required for the computation of $A(\{\mathsf{c}_{\sigma},\mathsf{\underline{x}}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})$ .

Let us describe the basic trick which will help us to accomplish the computation. We will first determine the sum of all GEXIT integrals associated to such a tree. From this we will then subtract the GEXIT integrals associated to its leaf nodes. This will give us the GEXIT integral associated to the root node, which is what we are interested in.

More precisely, we use (G). The lhs of this equation gives us the contribution of the overall tree and the rhs contains the GEXIT integral of the root node plus the GEXIT integrals of the leaf nodes. For the current case, we stress that all the operations (integrals of derivatives and partial derivatives) in (G) are well-defined since the family we consider is piece-wise linear

Contributions from overall tree: Recall that for $i\in[-L,L]$ , $\mathsf{x}_{i,\underline{\sigma}}=\mathsf{x}_{\underline{\sigma}}$ and $\mathsf{x}_{i,\overline{\sigma}}=\mathsf{x}_{\overline{\sigma}}$ .

Consider first the case $\sigma=\overline{\sigma}$ and $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ . From Lemma 54 we know that the conditional entropy $\text{H}(X\,|\,Y)$ of the tree code is given by

Exactly the same argument tells us that the entropy of such a tree for $\sigma=\underline{\sigma}$ is, up to a possible error of size $2\sqrt{2\delta}$ , equal to $T(\mathsf{x}_{\underline{\sigma}})$ . We conclude: the difference of the total entropy of such a tree is lower bounded by $T(\mathsf{x}_{\overline{\sigma}})-T(\mathsf{x}_{\underline{\sigma}})-4\sqrt{2\delta}$ , call this $B-4\sqrt{2\delta}$ .

Contributions from leaves: We need to find the contributions of GEXIT integrals associated to all the leaf nodes of each such tree rooted at a position $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ . The exact such sum is difficult to determine. But we only need an upper bound to derive a lower bound on the overall GEXIT integral. Note that GEXIT integrals are non-negative. Hence, let us compute the sum of GEXIT integrals of leaf nodes of all computation trees, whether they are rooted in a position $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ or not.

By symmetry, this contribution is easy to determine. More precisely, consider the following equivalent procedure. Pick a check node at position $j$ , $j\in[-L,L+w-1]$ . Every check node has $d_{r}$ connected variable nodes, where each variable node is picked with uniform probability and independently from the range $[j-w+1,j]$ and the choice of the $d_{r}$ variables is iid (note that the connections are taken on the circular ensemble).

Contributions from checks in the range $[-L,-L+2w-3]\cup[-w+2,2w-3]\cup[L-w+2,L+w-1]$ : Check nodes in this range might see some frozen channels or channels which do not form approximate FPs. Hence we upper bound all GEXIT integrals associated to check nodes in this range by $1$ (cf. Lemma 16). The number of such integrals is $(7w-8)d_{l}(d_{r}-1)$ .

Contributions from checks in the range $[-L+2w-2,-w+1]\cup[2w-2,L-w+1]$ : Check nodes in this range only see channels which are approximate FPs and none of the channels are frozen. There are $(2L-6w+8)d_{l}(d_{r}-1)$ such integrals. Let us determine the contribution for each such integral. Since we consider an average over all possible computation trees, the (average) density entering a check node is equal for all the leaf nodes (there are $d_{r}-1$ such densities). Let us call this density $\mathsf{x}_{\sigma}$ . If we focus on a check node at position $j$ , this density is equal to

To see the last inequality, using (ii), Lemma 21 we have

Accounting: Putting everything together, we have

Let us derive an upper bound in the same manner.

Boundary: For $i\in[-L,-L+w-2]\cup[-w+2,w-2]\cup[L-w+2,L]$ the GEXIT integrals are at most $1$ . This gives a contribution of $4w-5$ . As usual, for $i\in[L+1,L+w-1]$ the GEXIT integral is and does not contribute to the area.

Interior: Consider the GEXIT integrals for $i\in[-L+w-1,-w+1]\cup[w-1,L-w+1]$ .

Technique: We use the same procedure as beforehand. But this time we need a lower bound of the GEXIT integrals of the leaf nodes.

Contributions from overall tree: As before, the overall contribution of each tree is equal to $T(\mathsf{x}_{\overline{\sigma}})-T(\mathsf{x}_{\underline{\sigma}})$ plus an error term of absolute value equal to $4\sqrt{2\delta}$ .

Contributions from leaves: The idea is same as before and as before, we will consider the computation from the point of view of check nodes. As before, we split the contribution in two regimes, $[-L,-L+2w-3]\cup[-w+2,2w-3]\cup[L-w+2,L+w-1]$ and $[-L+2w-2,-w+1]\cup[2w-2,L-w+1]$ .

Contributions from checks in the range $[-L,-L+2w-3]\cup[-w+2,2w-3]\cup[L-w+2,L+w-1]$ : Check nodes in this range might see some frozen channels or channels which are not approximate FPs. Since we are looking for an upper bound, we set the contribution of such check nodes to be 0.

Contributions from checks in the range $[-L+2w-2,-w+1]\cup[2w-2,L-w+1]$ : As we discussed before, check nodes in this range only see channels which are approximate FPs and none of the channels are frozen. Further, all these GEXIT integrals corresponds to computation trees whose root $i$ is in the range $[-L+w-1,-w+1]\cup[w-1,L-w+1]$ . We can, therefore, subtract all their contributions, which are obtained by arguments similar to those used in the lower bound. There are $(2L-6w+8)d_{l}(d_{r}-1)$ such integrals and the contribution for each such integral is at least $C-\frac{8}{\ln 2}\sqrt{\delta}$ . Here, the last term takes into account the approximate FP nature of the channels and $C$ was defined in the arguments for obtaining the lower bound.

Proof of Theorem 47: Rather than deriving the bound $c(d_{l},d_{r},\delta,w,K,L)$ for all values of the parameters, we are only interested in the behavior of this bound for values of $\delta$ tending to and values of $K$ and $L$ tending to $\infty$ . Hence, in the sequel, nothing is lost by assuming at several spots that $\delta$ is “sufficiently” small and $K$ and $L$ are “sufficiently” large (consequently $N$ is also sufficiently large). This will simplify our arguments significantly.

Let $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ denote the proper one-sided FP on $[-N,0]$ with forced boundary condition which fulfills the stated conditions for some $\delta>0$ and $2(w-1)\leq L$ and $L+w\leq K\leq N$ . We prove the claim in several steps, where in each step we assert further properties that such a FP has to fulfill.

Constellation is almost flat and not too small “on the right”: Recall that by assumption $\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{-K})\geq x_{\text{u}}(1)$ so that $\operatorname{\mathfrak{B}}(\mathsf{x}^{*}_{i})\geq x_{\text{u}}(1)$ for $i\in[-K,0]$ . Using the same reasoning as in the discussion at the end of Lemma 14, we can conclude that there exists an $i^{*}\in[-K,-L-w]$ such that $D(\mathsf{x}^{*}_{j},\mathsf{x}^{*}_{k})\leq D(\mathsf{x}^{*}_{i^{*}},\mathsf{x}^{*}_{j})+D(\mathsf{x}^{*}_{j},\mathsf{x}^{*}_{k})+D(\mathsf{x}^{*}_{k},\mathsf{x}^{*}_{i^{*}+L+w})=D(\mathsf{x}^{*}_{i^{*}},\mathsf{x}^{*}_{i^{*}+L+w})\leq\frac{2(L+w)}{K}$ for all $j\leq k$ and $j,k\in[i^{*},i^{*}+L+w]$ . From part (i) of Lemma 14 we conclude that $d(\mathsf{x}^{*}_{j},\mathsf{x}^{*}_{k})\leq\sqrt{8(L+w)/K}$ for all $i^{*}\leq j\leq k\leq i^{*}+L+w$ . Clearly, the right-hand side can be made arbitrarily small by picking $K$ sufficiently larger than $L+w$ .

Constellation can be made exactly flat and not too small “on the right”: Create from $(\mathsf{c}^{*},\mathsf{\underline{x}}^{*})$ the increasing constellation $(\mathsf{c}^{*},\mathsf{\underline{z}}^{*})$ on $[-N,0]$ with free boundary condition in the following way,

The graphical interpretation is simple. We replace the “almost” flat part on the right plus the extra part on the right which might not be flat with an exactly flat part. To simplify our subsequent notation we set $\mathsf{x}=\mathsf{x}^{*}_{i^{*}+w}$ and from above arguments note that $\operatorname{\mathfrak{B}}(\mathsf{x})\geq x_{\text{u}}(1)$ . Hence $\operatorname{\mathfrak{B}}(\mathsf{z}^{*}_{i})\geq x_{\text{u}}(1)$ for all $i\geq i^{*}+w$ .

Constellation is approximate FP: Note that by going from $\mathsf{\underline{x}}$ to $\mathsf{\underline{z}}$ no component in $[-N,i^{*}+L+w]$ is changed by more than a distance $\kappa=\sqrt{8(L+w)/K}$ . Therefore, if we run DE on the modified components it is clear that in this range the output must still be close to the original output. More precisely, we have for every $i\in[-N,i^{*}+L+1]$

where to get the penultimate inequality we first replace $\mathsf{x}_{i}^{*}$ by $\mathsf{c}^{*}\circledast g(\mathsf{x}_{i-w+1}^{*},\dots,\mathsf{x}_{i+w-1}^{*})$ , since $\mathsf{\underline{x}}^{*}$ is a true FP, and then to obtain the last inequality we apply Lemma 33. Since $\kappa$ can be made arbitrarily small by choosing $K$ sufficiently large, this verifies the approximate FP nature for $i\in[-N,i^{*}+L+1]$ . Let us now focus on $i\in[i^{*}+L+2,0]$ . Note that since $L\geq 2(w-1)$ , we can use the above argument in particular for $i=i^{*}+2w-1$ . For this choice of $i$ all involved densities, $\mathsf{z}_{i-w+1}^{*},\dots,\mathsf{z}^{*}_{i+w-1}$ , are equal to $\mathsf{x}$ . Therefore, the previous argument shows that

But for $i\geq i^{*}+w$ all components of $\mathsf{\underline{z}}^{*}$ are equal to $\mathsf{x}$ and so the approximate FP nature of $\mathsf{\underline{z}}^{*}$ is also verified for $i\geq i^{*}+2w-1$ . Since $i^{*}+2w-1\leq i^{*}+L+1$ , we conclude that $\mathsf{\underline{z}}^{*}$ is an approximate FP.

From FP to FP family: From the approximate FP $(\mathsf{c}^{*},\mathsf{\underline{z}}^{*})$ on $[-N,0]$ we create the approximate FP family $\{\mathsf{c}^{*}_{\sigma},\mathsf{\underline{z}}^{*}_{\sigma}\}_{\underline{\sigma}=0}^{\overline{\sigma}={\tt{h}}^{*}}$ on $[-L,0]$ as described in Definition 62.

Computing GEXIT integral – Definition 23: Using the basic definition of the GEXIT functional in Definition 23 we conclude that the GEXIT integral associated to $\{\mathsf{c}^{*}_{\sigma},\mathsf{\underline{z}}^{*}_{\sigma}\}_{\underline{\sigma}=0}^{\overline{\sigma}={\tt{h}}^{*}}$ , $A(\{\mathsf{c}^{*}_{\sigma},\mathsf{\underline{z}}^{*}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})$ is since the channel remains constant throughout the interpolation.

Computing GEXIT integral – Theorem 64: We now compute the GEXIT integral associated to $\{\mathsf{c}^{*}_{\sigma},\mathsf{\underline{z}}^{*}_{\sigma}\}_{\underline{\sigma}=0}^{\overline{\sigma}={\tt{h}}^{*}}$ by first applying Lemma 63 and then Theorem 64.

More precisely, from the previous arguments we satisfy all the hypotheses of Lemma 63. This allows us to conclude that the FP family constructed above is $\frac{2(d_{l}-1)(d_{r}-1)}{w}+\delta$ approximate FP (cf. (42)) if $K$ is chosen sufficiently large. Furthermore, since the starting ( $\mathsf{z}_{\sigma={\tt{h}}^{*},i}^{*}=\mathsf{x}$ for all sections $i\in[-L,0]$ ) and ending constellations ( $\mathsf{z}_{\sigma=0,i}^{*}=\Delta_{+\infty}$ for all sections $i\in[-L,0]$ ) are flat, we satisfy all the hypotheses of Theorem 64 from which we conclude that the GEXIT integral is upper bounded by $A(\mathsf{x})+b(d_{l},d_{r},\frac{2(d_{l}-1)(d_{r}-1)}{w}+\delta,w,L)$ .Note that $A(\Delta_{+\infty})=0$ .

Flat region has entropy not much smaller than $\frac{d_{l}}{d_{r}}$ : From part (viii) Lemma 59 we get

where in the last step we have used condition (vii) in Definition 40. We conclude that

We now proceed by contradiction. Let us assume that $\text{H}(\mathsf{x})\leq\frac{d_{l}}{d_{r}}-d_{l}e^{-4(d_{r}-1)(\frac{2d_{l}}{11ed_{r}})^{\frac{4}{3}}}-\frac{1}{d_{r}}$ . As we just discussed,

In the last step we assumed without loss of generality that $\delta$ is chosen sufficiently small. The inequality then follows from the condition (v) in Definition 40. This, together with (44), guarantees that we satisfy the hypothesis of (the Negativity) Lemma 27. Hence we conclude that $A(\mathsf{x})\leq-\frac{1}{d_{r}}$ . From condition (vi) in Definition 40 $4(\sqrt{2}+\frac{2}{\ln 2}d_{l}(d_{r}-1))\sqrt{\frac{2(d_{l}-1)(d_{r}-1)}{w}}<\frac{1}{d_{r}}$ . Hence for a sufficiently small $\delta$ and a sufficiently large $L$ , this leads to the conclusion that the GEXIT integral $A(\{\mathsf{c}^{*}_{\sigma},\mathsf{\underline{z}}^{*}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})\leq A(\mathsf{x})+b(d_{l},d_{r},\frac{2(d_{l}-1)(d_{r}-1)}{w}+\delta,w,L)<0$ , a contradiction to the previous computation. As a consequence, we must have

To prove the intermediate claim we argue inductively that

Let us now bound the two terms containing Battacharyya parameters. Note that in each iteration the distance of the respective Battacharyya constants decreases by a factor of at least $\beta=\operatorname{\mathfrak{B}}(\mathsf{c}^{*})(d_{l}\!-\!1)(d_{r}\!-\!1)(1\!-\!\min\{\operatorname{\mathfrak{B}}(\mathsf{x}),\operatorname{\mathfrak{B}}(\mathsf{x}_{\text{\tiny BP}})\}^{2})^{d_{r}-2}$ . Indeed, from Lemma 51,

Putting everything together we conclude that by choosing $L,K$ sufficiently large $d(\mathsf{x},\mathsf{x}^{\text{\tiny BP}})$ can be made as small as desired.

${\tt{h}}^{*}$ is close to ${\tt{h}}^{A}$ : From Theorem 64 we have

From above arguments we have $A(\!\{\mathsf{c}^{*}_{\sigma},\mathsf{\underline{z}}^{*}_{\sigma}\}_{\underline{\sigma}}^{\overline{\sigma}})=0$ hence

Using the formula for $A(\cdot)$ given in Lemma 26 and properties (vii) and (ix) given in Lemma 13 we have

Recall that $\mathsf{x}^{\text{\tiny BP}}=\mathsf{x}^{\text{\tiny BP}}(\mathsf{c}^{*})$ . Combining, we get

Then for any ${\tt{h}}\geq\min\{{\tt{h}}^{A},{\tt{h}}^{*}\}$ we have $\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}})\geq x_{\text{u}}(1)$ (cf. Lemma 18). Thus we conclude that $\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}})\geq x_{\text{u}}(1)\geq\frac{1}{(d_{r}-1)^{3/2}}$ for any ${\tt{h}}\geq\min\{{\tt{h}}^{A},{\tt{h}}^{*}\}$ . Denoting $\mathsf{y}_{{\tt{h}}}=\mathsf{x}^{\boxast d_{r}-1}_{{\tt{h}}}$ we have,

To obtain (a) we use $\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}})=\operatorname{\mathfrak{B}}(\mathsf{c}_{{\tt{h}}})(\operatorname{\mathfrak{B}}(\mathsf{y}_{{\tt{h}}}))^{d_{l}-1}$ , since $\mathsf{c}_{{\tt{h}}}$ and $\mathsf{x}_{{\tt{h}}}$ form a FP pair. This implies that $(\operatorname{\mathfrak{B}}(\mathsf{y}_{{\tt{h}}}))^{2d_{l}}=(\frac{\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}})}{\operatorname{\mathfrak{B}}(\mathsf{c}_{{\tt{h}}})})^{\frac{2d_{l}}{d_{l}-1}}\geq(\operatorname{\mathfrak{B}}(\mathsf{x}_{{\tt{h}}}))^{\frac{2d_{l}}{d_{l}-1}}\geq(x_{\text{u}}(1))^{\frac{2d_{l}}{d_{l}-1}}\stackrel{{\scriptstyle\text{Lemma}~{}\ref{lem:propertyofh(x)}}}{{\geq}}(d_{r}-1)^{\frac{-2d_{l}}{d_{l}-2}}\geq(d_{r}-1)^{-3}.$ The last inequality follows since condition (vii) in Definition 40 implies that $d_{l}\geq 6$ . This implies

where the last equality follows since $A(\mathsf{x}^{\text{\tiny BP}}_{{\tt{h}}^{A}})=0$ (cf. Lemma 29).

Appendix K Existence of FP – Theorem 48

Before proceeding to the main part of the proof, let us show that if we assume that there exists a proper FP on $[-N,0]$ , with forced boundary condition on the right and $\Delta_{+\infty}$ on the left ( $i<-N$ ) and with Battacharyya parameter of the constellation (cf. Definition 37) equal to $x_{\text{u}}(1)/2$ , then the desired properties (i) and (ii) mentioned in the statement of the theorem follow.

Constellation is close to $\Delta_{+\infty}$ “on the left”: Let $N_{1}$ be the largest integer so that for all $i<-N+N_{1}$ , $\operatorname{\mathfrak{B}}(\mathsf{x}_{i})\leq\delta$ . We have a proper FP and $w>2d_{l}^{3}d_{r}^{2}$ (because $w$ is by assumption admissible in the sense of condition (iv) in Definition 40). Hence by applying (the Transition Length) Lemma 61 we conclude that the number of sections with Battacharyya parameter bounded between $\delta$ and $x_{\text{u}}(1)$ is at most $wc(d_{l},d_{r})/\delta$ , where $c(d_{l},d_{r})$ is the constant defined in Lemma 61. Since the Battacharyya parameter of the constellation is $x_{\text{u}}(1)/2$ , we have

This implies that $N_{1}\geq(N+1)\Big{(}\frac{1}{2}-\frac{wc(d_{l},d_{r})}{(N+1)\delta}\Big{)}$ . Using property (x) of (the Wasserstein metric) Lemma 13, we conclude that for all $i<-N+N_{1}$ , $d(\mathsf{x}_{i},\Delta_{+\infty})\leq\delta$ .

Constellation is not too small “on the right”: Let $N_{1}$ be as defined previously. Again, since the Battacharyya parameter of the constellation is equal to $x_{\text{u}}(1)/2$ we have

where on the rhs above we have replaced the sections with value greater than $\delta$ by the maximum value of $1$ .

This implies that $N_{1}\leq(N+1)\frac{1-\frac{x_{\text{u}}(1)}{2}}{1-\delta}$ . Thus if we define $N_{2}$ as the number of sections with Battacharyya parameter at least equal to $x_{\text{u}}(1)$ , we must have

where we used $\delta\leq\frac{x_{\text{u}}(1)}{4}$ to obtain the above expression.

It remains to show the existence of the proper FP itself, with Battacharyya parameter of the constellation equal to $x_{\text{u}}(1)/2$ . We use the Schauder FP theorem in a strong form recently proved by Cauty : This theorem states that every continuous map $f$ from a convex compact subset $S$ of a topological vector space to itself has a FP.

Let ${\mathcal{S}}=L_{1}$ (where $L_{1}$ denotes the $L_{1}$ norm). Note that ${\mathcal{S}}$ is a real normed vector space and hence a topological vector space. Let ${\mathcal{P}}$ denote the space of probability measures on $ $endowed with the Wasserstein metric. Note that$ {\mathcal{P}}\subset{\mathcal{S}} $, where we represent elements of$ {\mathcal{P}} $by their cumulative distribution functions. Note that the topology on$ {\mathcal{P}} $induced by$ {\mathcal{S}} $coincides with our choice (cf. second alternative definition in part (i) of Lemma 13). Also, on$ {\mathcal{P}} $the topology induced by the Wasserstein metric is equivalent to the weak topology. Since$ $is a complete separable metric space, so is$ {\mathcal{P}} $, see [104, Theorem 6.18]. Since$ $is compact, so is$ {\mathcal{P}}$, see [104, Remark 6.19].

A Cartesian product of a family of topological vector spaces, when endowed with the product topology, is a topological vector space. Hence, ${\mathcal{S}}^{N+1}$ , endowed with the product topology, is a topological vector space.

Discussion: As we discussed above, we think of the elements of ${\mathcal{P}}$ as cumulative distribution functions. In particular, these are the cdfs in the so called $|D|$ domain. In the sequel, rather than only referring to cdfs it will often be more convenient to write down the $|D|$ distributions $|\mathfrak{{x}}|$ or $D$ distributions $\mathfrak{{x}}$ , directly.

$S$ is non-empty: Setting all elements of $|\mathfrak{{\underline{x}}}|$ equal to $x_{\text{u}}(1)/2\Delta_{0}+(1-x_{\text{u}}(1)/2)\Delta_{1}$ gives an element in this space.

$S$ is convex: Let $\mathsf{\underline{x}},\mathsf{\underline{y}}\in S$ with $|D|$ -distributions given by $|\mathfrak{{\underline{x}}}|$ and $|\mathfrak{{\underline{y}}}|$ respectively. Let $|\mathfrak{{\underline{v}}}|=\beta|\mathfrak{{\underline{x}}}|+(1-\beta)|\mathfrak{{\underline{y}}}|$ for some $\beta\in(0,1)$ . Since $\operatorname{\mathfrak{B}}(\cdot)$ is a linear operator, we see that

Also, using (2), we see that $|\mathfrak{{\underline{v}}}|_{i-1}\prec|\mathfrak{{\underline{v}}}|_{i}$ for all $i\in[-N+1,0]$ . Hence $\beta\mathsf{\underline{x}}+(1-\beta)\mathsf{\underline{y}}\in S$ .

From Lemma $4.25$ in we know that each component of $|\mathfrak{{\underline{x}}}|^{(\infty)}$ is a symmetric $|D|$ distribution. It therefore remains to shows that (i) $\operatorname{\mathfrak{B}}(|\mathfrak{{\underline{x}}}|^{(\infty)})=x_{\text{u}}(1)/2$ , and (ii) $|\mathfrak{{x}}|^{(\infty)}_{i-1}\prec|\mathfrak{{x}}|^{(\infty)}_{i}$ for all $i\in[-N+1,0]$ . Both claims follow from the fact that we can encode the above properties in terms of continuous functions and that continuous functions preserve the properties under limits.

$S$ is compact: Note that $S$ is a closed subset of ${\mathcal{P}}^{N+1}$ , which is compact since it is the product of compact spaces. Hence $S$ is compact as well.

Definition of map $V(\cdot)$ : In order to show (via Schauder’s FP theorem) that $S$ contains a FP of DE we need to exhibit a continuous map which maps $S$ into itself. Our first step is to define a map, call it $V(|\mathfrak{{\underline{x}}}|)$ , which “approximates” the DE equation and is well-suited for applying the FP theorem. The final step in our proof is then to show that the FP of the map $V(|\mathfrak{{\underline{x}}}|)$ is in fact a FP of DE itself.

The map $V(|\mathfrak{{\underline{x}}}|)$ is constructed as follows. For $|\mathfrak{{\underline{x}}}|\in S$ , let $U(|\mathfrak{{\underline{x}}}|)$ be the map,

where $|\mathfrak{{x}}|_{i}=\Delta_{+\infty}$ for $i<-N$ , and where $|\mathfrak{{x}}|_{i}=\Delta_{0}$ for $i>0$ . Define $V:S\to S$ as

In words, if $U(|\mathfrak{{\underline{x}}}|)$ is “too large”, upgrade it by an appropriate channel $|\mathfrak{{c}}|$ . If, on the other hand, $U(|\mathfrak{{\underline{x}}}|)$ is “too small” then we take a convex combination with $\Delta_{0}$ . In the preceding expressions, terms like $\underline{\bar{\alpha}}U(|\mathfrak{{\underline{x}}}|)$ denote component-wise products, i.e., the result is a vector of densities, where the $i$ -th component is the result of multiplying the $i$ -th component of $U(|\mathfrak{{\underline{x}}}|)$ with the scalar $\bar{\alpha}(|\mathfrak{{\underline{x}}}|)_{i}$ . Further, $\underline{\bar{\alpha}}$ is a shorthand for $(1-\underline{\alpha}(|\mathfrak{{\underline{x}}}|))$ .

It remains to specify the components of $\underline{\alpha}(|\mathfrak{{\underline{x}}}|)$ . Note that $\underline{\alpha}(|\mathfrak{{\underline{x}}}|)\in^{N+1}$ . Further, we require that its components are increasing and that they are all either or $1$ , except possibly one. I.e., $\underline{\alpha}(|\mathfrak{{\underline{x}}}|)$ has the form $(0,0,\dots,0,\alpha_{i},1,\dots,1)$ , where $i\in[-N,0]$ , and $\alpha_{i}\in$ . This defines the vector uniquely. Pictorially we can think of this map in the following way. We start at component $(U(|\mathfrak{{\underline{x}}}|))_{0}$ . We take an increasing convex combination with $\Delta_{0}$ until the overall Battacharyya constant is equal to $x_{\text{u}}(1)/2$ . If this is not sufficient, then we set $(V(|\mathfrak{{\underline{x}}}|))_{0}=\Delta_{0}$ and repeat this procedure with component $(U(|\mathfrak{{\underline{x}}}|))_{-1}$ , and so on. To apply Schauder’s theorem, we need to show that the map $V(\cdot)$ is well-defined and continuous.

Map $V(\cdot)$ is well defined: First consider the case $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))\geq x_{\text{u}}(1)/2$ . In this case $\frac{x_{\text{u}}(1)}{2\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))}\leq 1$ . Since the Battacharyya parameter is a strictly increasing and continuous function of the channelThat the Battacharyya parameter is continuous follows since the channel family is smooth. Further, since the Battacharyya kernel is strictly concave and the channel family is ordered by degradation, the Battacharyya parameter is strictly increasing., there exists a unique $|\mathfrak{{c}}|\in\{|\mathfrak{{c}}|_{\sigma}\}$ such that $\operatorname{\mathfrak{B}}(|\mathfrak{{c}}|)=\frac{x_{\text{u}}(1)}{2\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))}$ . Note also that $U(|\mathfrak{{\underline{x}}}|)$ is monotone (spatially) since $g(\cdot)$ is monotonic (as a function of its arguments) and $|\mathfrak{{\underline{x}}}|$ is monotone. Consequently, $U(|\mathfrak{{\underline{x}}}|)\circledast|\mathfrak{{c}}|$ is monotone. Further, from the multiplicative property of the Battacharyya parameter at the variable node, we get that $\operatorname{\mathfrak{B}}(V(|\mathfrak{{\underline{x}}}|))=\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))\operatorname{\mathfrak{B}}(|\mathfrak{{c}}|)=x_{\text{u}}(1)/2$ . It follows that in this case $V(|\mathfrak{{\underline{x}}}|)\in S$ .

Consider next the case $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))<x_{\text{u}}(1)/2$ . If we choose $\underline{\alpha}=\underline{1}$ then we get a Battacharyya parameter of $1$ . Further, the increase in the Battacharyya parameter is continuous. Hence there exists an $\underline{\alpha}$ so that the resulting constellation has Battacharyya constant equal to $x_{\text{u}}(1)/2$ . Also, by construction the resulting constellation is monotone. This shows that also in this case $V(|\mathfrak{{\underline{x}}}|)\in S$ . In both the cases above, the map maintains the symmetric nature of the $D$ -distributions.

We summarize, $V$ maps $S$ into itself. In the rest of the proof, we will use the notation $d(|\mathfrak{{\underline{x}}}|,|\mathfrak{{\underline{y}}}|)=\sum_{i=-N}^{0}d(|\mathfrak{{x}}|_{i},|\mathfrak{{y}}|_{i})$ to denote the Wasserstein distance between two constellations $|\mathfrak{{\underline{x}}}|$ and $|\mathfrak{{\underline{y}}}|$ .

Continuity of map $V(\cdot)$ : We will show that for every $|\mathfrak{{\underline{x}}}|\in S$ and for any $\varepsilon>0$ , there exists a $\nu>0$ such, that if $|\mathfrak{{\underline{y}}}|\in S$ and $d(|\mathfrak{{\underline{x}}}|,|\mathfrak{{\underline{y}}}|)\leq\nu$ , then $d(V(|\mathfrak{{\underline{x}}}|),V(|\mathfrak{{\underline{y}}}|))\leq\varepsilon$ . Note that if $d(|\mathfrak{{\underline{x}}}|,|\mathfrak{{\underline{y}}}|)\leq\nu$ then

$d(U(|\mathfrak{{\underline{x}}}|)_{i},U(|\mathfrak{{\underline{y}}}|)_{i})\leq 2(d_{l}-1)(d_{r}-1)\nu$ , $i\in[-N,0]$ ;

$|\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|)_{i})-\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|)_{i})|\leq\sqrt{4(d_{l}-1)(d_{r}-1)\nu}$ , $i\in[-N,0]$ ;

$d(|\mathfrak{{c}}|_{|\mathfrak{{\underline{x}}}|},|\mathfrak{{c}}|_{|\mathfrak{{\underline{y}}}|})\leq 2\sqrt{\frac{2(N+1)\sqrt{4(d_{l}-1)(d_{r}-1)\nu}}{x_{\text{u}}(1)}}$ if $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))\geq x_{\text{u}}(1)/2$ and $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|))\geq x_{\text{u}}(1)/2$ .We abuse notation slightly to denote the channel associated to $|\mathfrak{{\underline{x}}}|,|\mathfrak{{\underline{y}}}|$ by $|\mathfrak{{c}}|_{|\mathfrak{{\underline{x}}}|},|\mathfrak{{c}}|_{|\mathfrak{{\underline{y}}}|}$ , respectively, rather than denoting them by the standard parameterization $\sigma$ .

Assertion (i) is equivalent to Lemma (33) since if $d(|\mathfrak{{\underline{x}}}|,|\mathfrak{{\underline{y}}}|)\leq\nu$ then a fortiori $d(|\mathfrak{{x}}|_{i},|\mathfrak{{y}}|_{i})\leq\nu$ , $i\in[-N,0]$ . Assertion (ii) follows from assertion (i) by applying property (ix) of Lemma 13. To see assertion (iii) we write

The last inequality follows from assertion (ii) and $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|)),\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|))\geq x_{\text{u}}(1)/2.$ Recall that the channel family is ordered by degradation. We can therefore apply property (ii) of Lemma 14 to prove our claim.

Choosing $\nu$ as a function of $|\mathfrak{{\underline{x}}}|$ and using assertion (ii) above, we can therefore assume that either $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))\geq x_{\text{u}}(1)/2$ and $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|))\geq x_{\text{u}}(1)/2$ or $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|))\leq x_{\text{u}}(1)/2$ and $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|))\leq x_{\text{u}}(1)/2$ . In the first case,

Let us now focus on the second case. Let $i^{*}$ denote the largest integer in $[-N,0]$ such that $\alpha(|\mathfrak{{\underline{x}}}|)_{i^{*}}$ is non-zero. Clearly if $\operatorname{\mathfrak{B}}(U(|\mathfrak{{x}}|))<x_{\text{u}}(1)$ , then $i^{*}\leq 0$ , else we set $i^{*}=1$ . Similarly, let $j^{*}$ be the corresponding index in $\underline{\alpha}(|\mathfrak{{\underline{y}}}|)$ . Let us denote $\alpha(|\mathfrak{{\underline{x}}}|)_{i^{*}}=\alpha$ and $\alpha(|\mathfrak{{\underline{y}}}|)_{j^{*}}=\beta$ . Note that $0\leq\alpha,\beta\leq 1$ . Wlog we can assume that $j^{*}\leq i^{*}$ . With this we can upper bound $d(V(|\mathfrak{{\underline{x}}}|),V(|\mathfrak{{\underline{y}}}|))$ by,

Above we have used that for $i\geq i^{*}+1$ we have $V(|\mathfrak{{\underline{y}}}|)_{i}=V(|\mathfrak{{\underline{x}}}|)_{i}=\Delta_{0}.$ In the case $i^{*}=j^{*}$ , the terms in the interval $[j^{*},i^{*}]$ collapse to $d(\bar{\alpha}U(|\mathfrak{{\underline{x}}}|)_{i^{*}}+\alpha\Delta_{0},\bar{\beta}U(|\mathfrak{{\underline{y}}}|)_{i^{*}}+\beta\Delta_{0})$ .

Let us first consider the case when $j^{*}<i^{*}$ . Note that $\operatorname{\mathfrak{B}}(V(|\mathfrak{{\underline{x}}}|))=\operatorname{\mathfrak{B}}(V(|\mathfrak{{\underline{y}}}|))=x_{\text{u}}(1)/2$ . This implies that if we replace the Wasserstein distance by the Battacharyya parameter in (K) the expression evaluates to 0. Then writing the $j^{*}$ term as $\bar{\beta}(\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|)_{j^{*}}\!)-\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|)_{j^{*}}))+\beta(\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|)_{j^{*}})-\operatorname{\mathfrak{B}}(\Delta_{0}))$ we get

where above we use $\operatorname{\mathfrak{B}}(\Delta_{0})=1$ .

We now continue with (K). We use $d(U(|\mathfrak{{\underline{x}}}|)_{j^{*}},\bar{\beta}U(|\mathfrak{{\underline{y}}}|)_{j^{*}}+\beta\Delta_{0})\leq d(U(|\mathfrak{{\underline{x}}}|)_{j^{*}},U(|\mathfrak{{\underline{y}}}|)_{j^{*}})+\beta d(U(|\mathfrak{{\underline{x}}}|)_{j^{*}},\Delta_{0})$ , (x) of Lemma 13 and (K) to get the upper bound

Finally using assertions (i) and (ii) above we get that

Wlog we can assume $\beta\geq\alpha$ . This implies $\bar{\alpha}U(|\mathfrak{{\underline{y}}}|)_{i^{*}}\!+\!\alpha\Delta_{0}\prec\bar{\beta}U(|\mathfrak{{\underline{y}}}|)_{i^{*}}\!+\!\beta\Delta_{0}$ . Hence from (ii) of Lemma 14 we can bound the second Wasserstein distance above by the difference of the Battacharyya parameters. Further,

The first Battacharyya difference on the rhs can be bounded by $|\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{y}}}|)_{i^{*}})-\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|)_{i^{*}})|$ . For the second difference we use same arguments as (K) to obtain

Combining everything with the assertions (i) and (ii), in this case we get

Existence of FP of $V(\cdot)$ via Schauder: We can invoke Schauder’s FP theorem to conclude that $V(\cdot)$ has a FP in $S$ , call it $|\mathfrak{{\underline{x}}}|^{*}$ .

Existence of FP of DE ( $U(\cdot)$ ): Let us show that, as a consequence, DE itself has a FP $(|\mathfrak{{c}}|^{*},|\mathfrak{{x}}|^{*})$ with the desired properties.

If $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|^{*}))\geq x_{\text{u}}(1)/2$ , then $|\mathfrak{{\underline{x}}}|^{*}=V(|\mathfrak{{\underline{x}}}|^{*})=U(|\mathfrak{{\underline{x}}}|^{*})\circledast|\mathfrak{{c}}|^{*}$ with $|\mathfrak{{c}}|^{*}\in\{|\mathfrak{{c}}|_{\sigma}\}$ . Hence indeed, $(|\mathfrak{{c}}|^{*},|\mathfrak{{x}}|^{*})$ is a FP of DE.

Consider hence the case $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|^{*}))<x_{\text{u}}(1)/2$ . We will show that it leads to a contradiction. Recall that in this case

and that $|\mathfrak{{x}}|^{*}_{i}=\Delta_{0}$ for $i\geq 1$ .

Given a density $|\mathfrak{{x}}|$ we say that it has a “BEC component” of $u$ if $|\mathfrak{{x}}|$ contains a delta at of “weight” $u$ (i.e., contains a mass of $u$ at $\Delta_{0}$ ). In the sequel we will think of $u$ as the erasure probability of a binary erasure channel.

Let $\underline{u}$ be the vector of BEC components corresponding to $|\mathfrak{{\underline{x}}}|^{*}$ . Since $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|^{*}))<x_{\text{u}}(1)/2$ we know that $\underline{u}$ has some non-trivial components in $[-N,0]$ , and by definition of the right boundary, $u_{i}=1$ for $i>0$ . We claim that for $i\in[-N,0]$ ,

Let us prove this claim immediately. Extract the BEC component from both the left-hand as well as the right-hand side of (51). This gives

where we wrote $\alpha_{i}$ as a shorthand for $\alpha(|\mathfrak{{\underline{x}}}|^{*})_{i}$ and BEC( $\cdot$ ) denotes weight at $\Delta_{0}$ . To see the second step, i.e., to see that $\text{BEC}(U(|\mathfrak{{\underline{x}}}|^{*})_{i})\geq g(u_{i-w+1},\dots,u_{i+w-1})$ , let $|\mathfrak{{\underline{v}}}|^{*}$ denote the density at the output of the check nodes when the input is $|\mathfrak{{\underline{x}}}|^{*}$ . Let $\underline{v}$ denote the (BEC) density at the output of the check nodes when the input is $\underline{u}$ . Some thought shows that $\underline{v}$ is also the BEC component of $|\mathfrak{{\underline{v}}}|^{*}$ . In words, at check nodes the BEC component evolves according to density evolution – we get an erasure at the output of a check node if and only if at least one of the incoming messages is an erasure. At variable nodes we only get a bound. If all inputs to a variable node are erasures then the output is also an erasure, but this is only a sufficient condition. Thus (53) is proved. If $\alpha_{i}=1$ , then $u_{i}=1$ and (52) is true. If $\alpha_{i}<1$ , then $u_{i}\geq\frac{u_{i}-\alpha_{i}}{1-\alpha_{i}}\geq g(u_{i-w+1},\dots,u_{i+w-1})$ , where the second step follows from (53).

Extend the constellation $\underline{u}$ by $N_{3}=\lceil(N+1)\frac{w}{\frac{d_{r}}{d_{l}}-1}\rceil+1$ sections on the right, with values equal to $1$ , and let $\underline{u}^{(0)}$ denote this constellation. We claim that $\underline{u}^{(0)}$ has at least

sections on the left with Battacharyya value between and $\delta$ where $c(d_{l},d_{r})$ is the constant of Lemma 61 and only depends on the dd.

To prove this claim, we consider our original $|\mathfrak{{\underline{x}}}|^{*}$ (before we extracted the BEC components) which was the FP obtained by Schauder’s theorem. We claim that $|\mathfrak{{\underline{x}}}|^{*}$ has at least $N_{4}$ segments on the left with Battacharyya constant at most $\delta$ , where

Let us explain each of the terms on the right. There are $N+1$ segments to start with, which explains (a). At most $(N+1)/2$ sections on the right can have a Battacharyya value of $x_{\text{u}}(1)$ or larger (since $\operatorname{\mathfrak{B}}(|\mathfrak{{\underline{x}}}|^{*})=x_{\text{u}}(1)/2$ ). This accounts for the (b) term. Finally, all sections $i$ , with $i<-(N+1)/2+1$ , must be sections where $|\mathfrak{{\underline{x}}}|^{*}$ fulfills the actual FP equations, i.e., these cannot be sections where the map $V(\cdot)$ “pushes” the constellation up to $\Delta_{0}$ . More precisely, we must have $\alpha(|\mathfrak{{\underline{x}}}|^{*})_{i}=0$ for $i<-(N+1)/2+1$ . Indeed, from construction, starting from the rightmost section, each section is increased all the way up to $\Delta_{0}$ before we move on to the next section on the left. Since the constellation $|\mathfrak{{\underline{x}}}|^{*}$ has Battacharyya parameter equal to $x_{\text{u}}(1)/2\leq 1/2$ we conclude that for $i<-(N+1)/2+1$ we must have $|\mathfrak{{x}}|^{*}_{i}=(U(|\mathfrak{{\underline{x}}}|^{*}))_{i}$ , which is a true FP of DE for the channel $\Delta_{0}$ . Therefore, for these section we can apply (the Transition Length) Lemma 61 and conclude that there are at most $c(d_{l},d_{r})w/\delta$ such section which have Battacharyya value between $\delta$ and $x_{\text{u}}(1)$ . This is the term (c).

The claim now follows since the BEC component $u_{i}$ is upper bounded by the corresponding Battacharyya parameter, $\operatorname{\mathfrak{B}}(|\mathfrak{{x}}|^{*}_{i})$ .

Now consider a further constellation $\underline{v}^{(0)}$ on $[-N,N_{3}]$ . We set $v_{i}^{(0)}=0$ for all $i\in[-N,0]$ . For $i\in[1,N_{3}]$ we set $\underline{v}^{(0)}$ to the FP of forward DE according to Lemma 22 in , where the length of the constellation is taken to be $N_{3}-1$ , $\epsilon=1$ , and $\chi=\frac{1}{2}(1-d_{l}/d_{r})$ . More precisely, Lemma 22 in says that if we run forward DE, with free boundary condition, when transmitting over the BEC with $\epsilon=1$ and $(d_{l},d_{r},N_{3}-1,w)$ coupled ensemble, then for large enough length, the one-sided FP of forward DE must be proper (non-trivial and increasing) and we can lower bound the Battacharyya parameter of the resulting FP. By our choice of $N_{3}$ this FP (on $[1,N_{3}]$ ) has Battacharyya parameter at least $\frac{1}{2}(1-d_{l}/d_{r})$ . Now since $w\geq 2d_{l}^{3}d_{r}^{2}$ we have $N_{3}=\lceil(N+1)\frac{w}{\frac{d_{r}}{d_{l}}-1}\rceil+1\geq N+1$ . This implies that $\frac{N_{3}}{N+1+N_{3}}\geq\frac{1}{2}$ . Thus $\operatorname{\mathfrak{B}}(\underline{v}^{(0)})\geq\frac{1}{4}(1-d_{l}/d_{r}).$ Clearly, $\underline{v}^{(0)}\leq\underline{u}^{(0)}$ (component-wise).

We summarize, $\underline{v}^{(\infty)}$ is a proper one-sided FP of DE for $\epsilon=1$ with fixed boundary condition and $\operatorname{\mathfrak{B}}(v^{(\infty)}_{-N+L})\leq\delta$ and $\operatorname{\mathfrak{B}}(v^{(\infty)}_{N_{3}-K})\geq x_{\text{u}}(1)$ . But we know from Theorem 47 that such a FP, $\underline{v}^{(\infty)}$ , must have a channel value close to $\epsilon^{A}(d_{l},d_{r})$ , the area threshold of $(d_{l},d_{r})$ -regular ensemble when transmitting over BEC. More precisely, applying Theorem 47 we conclude that the entropy of the channel of $\underline{v}^{(\infty)}$ must be less than $\epsilon^{A}(d_{l},d_{r})+c(d_{l},d_{r},\delta,w,K,L)$ . Since $\epsilon^{A}(d_{l},d_{r})\leq\frac{d_{l}}{d_{r}}<1$ For transmission over the BEC using a $(d_{l},d_{r})$ -regular ensemble, from Theorem 3.120 in we know that $\epsilon^{A}(d_{l},d_{r})=\epsilon^{\text{\tiny MAP}}(d_{l},d_{r})$ . Further the MAP threshold is upper bounded by the Shannon threshold, $\frac{d_{l}}{d_{r}}$ ., we conclude that by choosing $\delta$ small enough and $K,L,N$ large enough, $c(d_{l},d_{r},\delta,w,K,L)$ can be made arbitrarily small and hence the channel of $\underline{v}^{(\infty)}$ is strictly less than 1, leading to a contradiction since we started with $\epsilon=1$ . This contradiction tells us that we cannot have $\operatorname{\mathfrak{B}}(U(|\mathfrak{{\underline{x}}}|^{*}))<x_{\text{u}}(1)/2$ when we apply the Schauder theorem. Hence the FP must be a true FP of DE. ∎