Spatially Coupled Ensembles Universally Achieve Capacity under Belief Propagation
Shrinivas Kudekar, Tom Richardson, Ruediger Urbanke
I Introduction
Ever since the publication of Shannon’s seminal paper and the introduction of the first coding schemes by Hamming and Golay , coding theory has been concerned with finding low-delay and low-complexity capacity-achieving schemes. The interested reader can find an excellent historical review in . Let us just briefly mention some of the highlights before focusing on those parts that are the most relevant for our purpose.
In the first 50 years, coding theory focused on the construction of algebraic coding schemes and algorithms that were capable of exploiting the algebraic structure. Two early highlights of this line of research were the introduction of Bose-Chaudhuri-Hocquenghem (BCH) codes as well as Reed-Solomon (RS) codes . Berlekamp devised an efficient decoding algorithm and this algorithm was then interpreted by Massey as an algorithm for finding the shortest feedback-shift register that generates a given sequence . More recently, Sudan introduced a list decoding algorithm for RS codes that decodes beyond the guaranteed error-correcting radius . Guruswami and Sudan improved upon this algorithm and Koetter and Vardy showed how to handle soft information .
Another important branch started with the introduction of convolutional codes by Elias and the introduction of the sequential decoding algorithm by Wozencraft . Viterbi introduced the Viterbi algorithm . It was shown to be optimal by Forney and Omura and to be eminently practical by Heller .
An important development in transmission over the continuous input, band-limited, additive white Gaussian noise channel was the invention of the lattice codes. It was shown in that lattice codes achieve the Shannon capacity. A breakthrough in bandwidth-limited communications came about when Ungerboeck invented a technique to combine coding and modulation. Ungerboeck’s technique ushered in a new era of fast modems. The technique, called trellis-coded modulation (TCM), offered significant coding gains without compromising bandwidth efficiency by mapping binary code symbols, generated by a convolutional encoder, to a larger (non-binary) signal constellation. In Forney showed that lattice codes as well as TCM schemes may be generated by the same basic elements and the generalized technique was termed coset-coding.
Coming back to binary linear codes, in 1993, Berrou, Glavieux and Thitimajshima proposed turbo codes. These codes attain near-Shannon limit performance under low-complexity iterative decoding. Their remarkable performance lead to a flurry of research on the “turbo” principle. Around the same time, Spielman in his thesis , and MacKay and Neal in , independently rediscovered low-density parity-check (LDPC) codes and iterative decoding, both introduced in Gallager’s remarkable thesis . Wiberg showed that both turbo codes and LDPC codes fall under the umbrella of codes based on sparse graphs and that their iterative decoding algorithms are special cases of the sum-product algorithm. This line of research was formalized by Kschischang, Frey, and Loeliger who introduced the notion of factor graphs .
The next breakthrough in the design of codes (based on sparse graphs) came with the idea of using irregular LDPC codes by Luby, Mitzenmacher, Shokrollahi and Spielman , . With this added ingredient it became possible to construct irregular LDPC codes that achieved performance within dB of the Shannon limit when transmitting over the binary-input additive white Gaussian noise channel, see Chung, Forney, Richardson and Urbanke . The development of these codes went hand in hand with the development of a systematic framework for their analysis by Luby, Mitzenmacher, Shokrollahi and Spielman and Richardson and Urbanke .
A central research topic for codes on graphs is the interaction of the graphical structure of a code and its performance. Turbo codes themselves are a prime example how the “right” structure is important to achieve good performance . Further important parameters and structures are, the degree distribution (dd) and in particular the fraction of degree-two variable nodes, multi-edge ensembles , degree-two nodes in a chain , and protographs .
Currently sparse graph codes and their associated iterative decoding algorithms are the best “practical” codes in terms of their trade-off between performance and complexity and they are part of essentially all new communication standards.
Polar codes represent the most recent development in coding theory . They are provably capacity achieving on binary-input memoryless output-symmetric (BMS) channels (and many others) and they have low decoding complexity. They also have no error floor due to a minimum distance which increases like the square root of the blocklength. The simplicity, elegance, and wide applicability of polar codes have made them a popular choice in the recent literature. There are perhaps only two areas in which polar codes could be further improved. First, for polar codes the convergence of their performance to the asymptotic limit is slow. Currently no rigorous statements regarding this convergence for the general case are known. But “calculations” suggest that, for a fixed desired error probability, the required blocklength scales like , where is the additive gap to capacity and where depends on the channel and has a value around , . Note that random block codes under MAP decoding have a similar scaling behavior but with . This implies a considerably faster convergence to the asymptotic behavior. The value is a lower bound for for any system since the variations of the channel itself imply that . The second aspect is universality: the code design of polar codes depends on the specific channel being used and one and the same design cannot simultaneously achieve capacity over a non-trivial class of channels (under successive cancellation decoding).
Let us now connect the content of this paper to the previous discussion. Our main aim is to explain the role of a further structural element in the realm of sparse graph codes (besides the previously discussed such examples), namely that of “spatial coupling.” We will show that this coupling of graphs leads to a remarkable change in their performance. Ensembles designed in this way combine some of the nice elements of polar codes (namely the fact that they are provably capacity achieving under low complexity decoding) with the practical advantages of sparse graph codes (the codes are competitive already for moderate lengths). Perhaps most importantly, it is possible to construct universal such codes for the whole class of BMS channels. Here, universality refers to the fact that one and the same ensemble is good for a whole class of channels, assuming that at the receiver we have knowledge of the channel.
I-B Prior Work on Spatially Coupled Codes
The potential of spatially coupled codes has long been recognized. Our contribution lies therefore not in the introduction of a new coding scheme, but in clarifying the mechanism that make these codes perform so well.
The term spatially coupled codes was coined in . Convolutional LDPC codes (more precisely, terminated convolutional LDPC codes), which were introduced by Felström and Zigangirov in , and their many variants belong to this class. Why do we introduce a new term? The three perhaps most important reasons are: (i) the term “convolutional” conjures up a fairly specific node interconnection structure whereas experiments have shown that the particular nature of the connection is not important and that the threshold saturation effect occurs as soon as the connection is sufficiently strong; (ii) a well known result for convolutional codes says that the boundary conditions are “forgotten” exponentially fast; but for spatially coupled codes it is exactly the boundary condition which causes the effect and there is no decay of this effect in the spatial dimension of the code; (iii) the same effect has (empirically) been shown to hold in many other graphical models, most of them outside the realm of coding; the term “spatial coupling” is perhaps then somewhat more generally applicable.
There is a considerable literature on convolutional-like LDPC ensembles. Variations on the constructions as well as some analysis can be found in Engdahl and Zigangirov , Engdahl, Lentmaier, and Zigangirov , Lentmaier, Truhachev, and Zigangirov , as well as Tanner, D. Sridhara, A. Sridharan, Fuja, and Costello .
In , Sridharan, Lentmaier, Costello and Zigangirov consider density evolution (DE) analysis for convolutional LDPC ensembles and determine thresholds for the BEC. The equivalent results for general channels were reported by Lentmaier, Sridharan, Zigangirov and Costello in . This DE analysis is in many ways the starting point for our investigation. By comparing the thresholds to the thresholds of the underlying ensembles under MAP decoding (see e.g. ), it quickly becomes apparent that an interesting effect must be at work. Indeed, in a recent paper , Lentmaier and Fettweis followed this route and independently formulated the equality of the belief propagation (BP) threshold of convolutional LDPC ensembles and the MAP threshold of the underlying ensemble as a conjecture.
A representation of convolutional LDPC ensembles in terms of a protograph was introduced by Mitchell, Pusane, Zigangirov and Costello . The corresponding representation for terminated convolutional LDPC ensembles was introduced by Lentmaier, Fettweis, Zigangirov and Costello . A variety of constructions of LDPC convolutional codes from the graph-cover perspective is shown by Pusane, Smarandache, Vontobel, and Costello .
A pseudo-codeword analysis of convolutional LDPC codes was performed by Smarandache, Pusane, Vontobel, and Costello in . Such an analysis is important if we want to understand the error-floor behavior of spatially coupled ensembles.
In , Papaleo, Iyengar, Siegel, Wolf, and Corazza study the performance of windowed decoding of convolutional LDPC codes on the BEC. Such a decoder has a decoding complexity which is independent of the chain length, an important practical advantage. Luckily, it turns out that the performance under windowed decoding, when measured in terms of the threshold, approaches the “regular”” (without windowed decoding) threshold exponentially fast in the window size, see . The threshold saturation phenomenon therefore does not require an infinite window size.
The scaling behavior of spatially coupled ensembles, i.e., the relationship between the chain length, the number of variables per section, and the error probability is discussed by Olmos and Urbanke in .
I-C Prior Results for the Binary Erasure Channel
It was recently shown in that for transmission over the BEC spatially coupled ensembles have a BP threshold which is essentially equal to the MAP threshold of the underlying uncoupled ensemble. Further, this threshold is also essentially equal to the MAP threshold of the coupled ensemble. This phenomena was called threshold saturation in since the BP threshold takes on its largest possible value (the MAP threshold). This significant improvement in the performance is due to the spatial coupling of the underlying code. Those “sections” of the code that have already succeeded in decoding can help their neighboring less fortunate sections in the decoding process. In this manner, the information propagates from the “boundaries”, where the bits are known perfectly towards the “middle”. In a recent paper , Lentmaier and Fettweis independently formulated the same statement as a conjecture and provided numerical evidence for its validity. They attribute the observation of the equality of the two thresholds to G. Liva.
It was shown in that if we couple component codes whose Hamming distance grows linearly in the blocklength then also the resulting coupled ensembles have this property (assuming that the number of “sections” or copies of the underlying code is kept fixed). The equivalent result is true for stopping sets. This implies that for the transmission over the BEC the block BP threshold is equal to the bit BP threshold and that such ensembles do not exhibit error floors under BP decoding.
I-D Prior Results for General Binary-Input Memoryless Output-Symmetric Channels
As pointed out in a preceding section, BP thresholds for transmission over general BMS channels were computed by means of a numerical procedure by Lentmaier, Sridharan, Zigangirov and Costello in . Further, in (conjectured) MAP thresholds for some LDPC ensembles were computed according to the Maxwell construction. Comparing these two values, one can check empirically that also for transmission over general BMS channels the BP threshold of the coupled ensembles is essentially equal to the (conjectured) MAP threshold of the underlying ensemble. Indeed, recently both as well as provided further numerical evidence that the threshold saturation phenomenon also applies to general BMS channels.
For typical sparse graph ensembles the MAP threshold is not equal to the Shannon threshold but the Shannon threshold can only be reached by taking a sequence of such ensembles (e.g., a sequence of increasing degrees). There are some notable exceptions, like MN ensembles or HA ensembles. Kasai and Sakaniwa take this as a starting point to investigate in whether by spatially coupling such ensembles it is possible to create ensembles which are universally capacity achieving under BP decoding.
I-E Spatial Coupling for General Communication Scenarios, Signal Processing, Computer Science, and Statistical Physics
The principle which underlies the good performance of spatially coupled ensembles is broad. It has been shown to apply to a variety of problems in communications, computer science, signal processing, and physics. To mention some concrete examples, the threshold saturation effect (dynamical/algorithmic threshold of the system being equal to the static or condensation threshold) of coupled graphical models has been observed for rate-less codes by Aref and Urbanke , for channels with memory and multiple access channels with erasure by Kudekar and Kasai , for CDMA channels by Takeuchi, Tanaka, and Kawabata , for relay channels with erasure by Uchikawa, Kasai, and Sakaniwa , for the noisy Slepian-Wolf problem by Yedla, Pfister, and Narayanan , and for the BEC wiretap channel by Rathi, Urbanke, Andersson, and Skoglund . Uchikawa, Kurkoski, Kasai, and Sakaniwa recently showed an improvement of the BP threshold has also for transmission over the unconstrained AWGN channel using low-density lattice codes . Further, Yedla, Nguyen, Pfister and Narayanan, demonstrated the universality of spatially-coupled codes in the 2-user binary input Gaussian multiple-access channel and finite state ISI channels like the dicode-erasure channel and the dicode channel with AWGN . In they show in addition that for a fixed rate pair, spatially-coupled ensembles universally saturate the achievable region (i.e., the set of channel gain parameters that are achievable for the fixed rate pair) under BP decoding. Similarly, in they provide numerical evidence that spatially coupled ensembles achieve the symmetric information rate for the dicode erasure channel and the dicode channel with AWGN.
Statistical physics is another very natural area in which the threshold saturation phenomenon is of interest. For the so-called random -SAT problem, random graph coloring, and the Curie-Weiss model, spatially coupled ensembles were investigated by Hassani, Macris, and Urbanke, . In all these cases, the threshold saturation phenomenon was observed. This suggests that it might be possible to study difficult theoretical problems in this area, like the existence of the static threshold, by studying the dynamical threshold of a chain of coupled models, perhaps an easier problem. Further spatially-coupled models were considered by Takeuchi and Tanaka .
I-F Main Results and Consequences
In this paper we show that for transmission over general BMS channels coupled ensembles exhibit the threshold saturation phenomenon. By choosing e.g. regular component ensembles of fixed rate and increasing degree, this implies that coupled ensembles can achieve capacity over this class of channels. More precisely, for each there exists a coupled ensemble which achieves at least a fraction of capacity universally, under belief propagation decoding, over the whole class of BMS channels. The qualifier ”universal” is important here.
Coupled ensembles inherit to a large degree the error floor behavior of the underlying ensemble. Further, such an ensemble can be chosen so that it has a non-zero error correcting radius, and hence does not exhibit error floors. To achieve this, it suffices to take the variable-node degree to be at least five. This guarantees that a randomly chosen graph from such an ensemble is an expander with expansion exceeding three-quarters with high probability. This expansion guarantees an error correcting radius under the so-called flipping decoder as well as under the BP decoder, assuming that we suitably clip both the received as well as the internal messages .
Although one can empirically observe the threshold saturation phenomenon for a wide array of component codes, we state and prove the main result only for regular LDPC ensembles. This keeps the exposition manageable.
I-G Outline
In Section II we briefly review regular LDPC ensembles and their asymptotic (in the blocklength) analysis. Much of this material is standard and we only include it here to set the notation and to make the paper largely self-contained. The two most important exceptions are our in-depth discussion of the Wasserstein distance and the the so-called area threshold, in particular the (Negativity) Lemma 27.
In Section III we review some basic properties of coupled ensembles. Using simple extremes of information combining techniques, we will see in Section III-G that coupling indeed increases the BP threshold significantly, even though these simple arguments are not sufficient to characterize the BP threshold under coupling exactly.
We state our main result, namely that the BP threshold of coupled ensembles is essentially equal to the area threshold of the underlying component ensemble, in Section IV. We also discuss how one can easily strengthen this result to apply to individual codes rather than ensembles and how this gives rise to codes which are universally close to capacity under BP decoding for the whole class of BMS channels.
We end in Section IV-E with a discussion of what challenges still lie ahead. In particular, spatial coupling has been shown empirically to lead to the threshold saturation phenomenon in a wide class of graphical models. Rather than proving each such scenario in isolation, we want a common framework to analyze all such systems.
Many of the proofs are relegated to the appendices. This makes it possible to read the material on two levels – a casual level, skipping all the proofs and following only the flow of the argument, and a more detailed level, consulting the material in the appendices.
II Uncoupled Systems
II-B Binary-Input Memoryless Output-Symmetric Channels
Throughout we will assume that transmission is taking place over a BMS channel. Let denote the input and let be the output. Further, let denote the transition probability describing the channel. An alternative characterization of the channel is by means of its so-called -distribution, denote it by . More precisely, is the distribution of
Given , we write , , and to denote the corresponding distribution, the distribution and the cdf in the -domain, respectively, see [62, Section 4.1.4].
Typically we do not consider a single channel in isolation but a whole family of channels. We write to denote the family parameterized by the scalar . Often it will be more convenient to denote this family by , i.e., to use the family of -densities which characterize the channel family. If it is important to make the range of the parameter explicit, we will write .
Sometimes it is convenient to use the natural parameter of the family. For example, for the three fundamental channels, the BEC, the binary symmetric channel (BSC) and the binary additive white-Gaussian noise channel (BAWGNC), the corresponding channel families are given by , , and . Other times, it is more convenient to use a common parameterization. E.g., we will write to denote a channel family where denotes the element in the family of entropy .
Assume that we are given a channel family . We say that the family is complete if , , and for each there exists a parameter so that . Here is the entropy functional defined in Section II-D.
Let denote the transition probability associated to a BMS channel and let denote the transition probability of another BMS channel . We then say that is degraded with respect to if there exists a channel so that
We will use the notation to denote that is degraded wrt (as a mnemonic think of as the erasure probability of a BEC and replace with ).
A useful characterization of degradation, see [62, Theorem 4.74], is that is equivalent to
for all that are non-increasing and concave on $|\mathfrak{{c}}|(x)|D|L\mathsf{c}F(\mathsf{a})\leq F(\mathsf{b})\mathsf{a}\prec\mathsf{b}F(\cdot)|D||\mathfrak{{C}}|(x)|\mathfrak{{C^{\prime}}}|(x)z\in$,
A BMS channel family is said to be ordered by degradation if implies . (The reverse order, is also allowed but we generally stick to the stated convention.)
We say that an -density is symmetric if . We recall that all densities which stem from BMS channels are symmetric, see [62, Sections 4.1.4, 4.1.8 and 4.1.9]. All densities which we consider are symmetric. We will therefore not mention symmetry explicitly in the sequel.
A BMS channel family is said to be smooth if for all continuously differentiable functions so that is bounded, the integral exists and is a continuously differentiable function with respect to , see [62, Definition 4.32].
The three fundamental channel families , , and are all complete, ordered, smooth, and symmetric.
II-C MAP Decoder and MAP Threshold
The bit maximum a posteriori (bit-MAP) decoder for bit finds the value of which maximizes . It minimizes the bit error probability and is optimal in this sense. The block maximum a posteriori (block-MAP) decoder finds the codeword which maximizes . It minimizes the block error probability and is optimal in this sense.
Consider an ordered and complete channel family . The MAP threshold of the -regular ensemble for this channel family is denoted by and defined by
In words, if we are transmitting above the MAP threshold, then the ensemble average bit-error probability is lower bounded by , a strictly positive constant. This ensemble is therefore not suitable for reliable transmission above this threshold.
II-D Belief Propagation, Density Evolution, and Some Important Functionals
In principle one can investigate the behavior of coupled ensembles under any message-passing algorithm. We limit our investigation to the analysis of the BP decoder, the most powerful local message-passing algorithm. We are interested in the asymptotic performance of the BP decoder, i.e., the performance when the blocklength tends to infinity. This asymptotic performance is characterized by the so-called density evolution (DE) equation .
As mentioned, all distributions associated to BMS channels are symmetric and symmetry is preserved under DE, see [62, Chapter 4] for details. There are a number of functionals of densities are of interest to us. The most important functionals are the Battacharyya, the entropy, and the error probability functional. For a density these are denoted by , , and , respectively. Assuming is an -density, they are given by
We end this section with the following useful fact. The proof can be found in Appendix A.
For any -density , .
II-E Extremes of Information Combining and the Duality Rule
When we are operating on BMS channels, the quantities appearing in the DE equations are distributions. These are hard to track analytically in general, unless we are transmitting over the BEC. Often we only need bounds. In these cases extremes of information combining ideas are handy, see , [62, p. 242].
Let denote either or and let . Let and denote -densities from the families and , respectively, so that . Then for any ,
Discussion: Although the extremes of information combining bounds are only stated for pairs of distributions, they naturally extend to more than two distributions. E.g., we claim that . To see this, let be any set of distributions so that . Then we can use Lemma 5 repeatedly to conclude that
The same remark and the same proof technique applies to the other cases.
For any and .
Note: We give a simple proof of this identity at the end of the proof of Lemma 53.
II-F Fixed Points, Convergence, and BP Threshold
We say that the density is a fixed point (FP) of DE for the -regular ensemble and the channel if
More succinctly, when the underlying ensemble is understood from the context, we say that is a FP.
One way to generate a FP is to initialize with and to run DE, as stated in Definition 3. We call such a FP a FP of forward DE. The resulting FPs are the “natural” FPs since they have a natural operational meaning – if we pick sufficiently long ensembles, these are the FPs which we can observe in simulations when we run the BP decoder.
We say that a sequence of distributions converges weakly to a limit distribution if for the corresponding cumulative distributions in the -domain, call them , for all bounded and continuous functions on $$ we have
An equivalent definition is that converges to at points of continuity of ∎
A simple proof of the following lemma can be found at the end of Section II-I.
In other words, the BP threshold is characterized by the largest channel parameter so that the forward DE FP is trivial.
We have just seen that the FPs of forward DE are important since they characterize the BP threshold. But there exist FPs that cannot be achieved this way. Let us review a general method of constructing FPs. Assume that, given a channel family , we need a FP which has a given error probability , entropy , or Battacharyya parameter . Such FPs can often be constructed, or at least their existence can be guaranteed, by a procedure introduced in . Let us recall this procedure for the case of fixed entropy.
Consider a smooth, complete, and ordered family and the -regular ensemble. Let us denote by the ordinary density evolution operator at fixed channel . Formally,
For any , we define the density evolution operator at fixed entropy , call it , as
where is the solution of . Whenever no such value of exists, is left undefined. Since, for a given , the family is ordered by degradation, is a non-decreasing function of . As a consequence the equation cannot have more than a single solution. Furthermore, by the smoothness of the channel family , is continuous as a function of . Notice that : if the channel is noiseless the output density at a variable nodes is noiseless as well. Therefore, a necessary and sufficient condition for a solution to exist (when the family is complete) is that (see Theorem 6 in ).
II-G BP Threshold for Large Degrees
What happens to the BP threshold when we fix the design rate and increase the degrees? The proof of the following lemma, which uses basic extremes of information combining arguments, can be found in Appendix B.
Consider transmission over an ordered and complete family of BMS channels using an -regular dd and BP decoding. Let be the design rate and let denote the BP threshold. Then,
In particular, by increasing while keeping the rate fixed, the BP threshold converges to .
II-H The Wasserstein Metric: Definition and Basic Properties
In the sequel we will often need to measure how close various distributions are. Sometimes it is convenient to compare their entropy or their Battacharyya constant. But sometimes a more general measure is required. The Wasserstein metric is our measure of choice.
Let and denote two -distributions. The Wasserstein metric, denoted by , is defined as
where denotes the class of Lipschitz continuous functions on $1$. ∎
Discussion: In the sequel we will say that a function is as a shorthand to mean that it is Lipschitz continuous with constant . If we want to emphasize the domain, then we write e.g., . Why have we defined the metric in the -domain? As the next lemma shows, convergence in this metric implies weak convergence. Since all the distributions of interest are symmetric, it suffices to look at the -domain instead of the -domain. To ease our notation, however, we will formally write expressions like , i.e., we will allow the arguments to be e.g. -distributions. It is then implied that the metric is determined using the equivalent -domain representations as defined above.
In the following, , , , and denote -distributions.
In the domain we have the following expressions for and (compare this to the expressions in the domain given in Section II-D),
where is the binary entropy function. See for more details on metrics for probability measures.
Boundedness: .
Metrizable and Weak Convergence: The Wasserstein metric induces the weak topology on the space of probability measures on $$. In other words, the space of probability measures under the weak topology is metrizable and convergence in the Wasserstein metric is equivalent to weak convergence (see [104, Theorem 6.9]).
Polish Space: The space of probability distributions on $\sum_{i=1}^{n}c_{i}\delta(x-x_{i})\sum_{i=1}^{n}c_{i}=1c_{i}\geq 0x_{i}\in$. Further, the space is compact. (See [104, Theorem 6.18].)
In general, if , then
Regularity wrt : The Wasserstein metric satisfies the regularity property , so that
and for and any distribution .
Regularity wrt : The Wasserstein metric satisfies the regularity property , so that
Regularity wrt DE: Let denote the DE operator for the dd and the channel . Then , with
Wasserstein Bounds Battacharyya and Entropy:
Battacharyya Sometimes Bounds Wasserstein:
Discussion: Perhaps the most useful property of the Wasserstein metric is that it interacts nicely with the operations of variable- and check-node convolution. This is the essence of properties (vi), (vii), and (viii). For example, it is easy to see why property (viii) might be useful: Given that two distributions and are close, it asserts that after one iteration of DE these two distributions are again close. Indeed, as we will see shortly, depending on the Battacharyya parameter of the starting distributions the distance might in fact become smaller, i.e., we might have a contraction.
II-I Wasserstein Metric and Degradation
When densities ordered by degradation, some the Wasserstein metric inherits some additional properties.
In the following and denote -distributions.
Wasserstein versus Degradation: Let . Let and denote the corresponding -domain cdfs. Define . Note that can be seen as a measure of how much is degraded wrt since it is the average of the non-negative integrals (cf. (2)). Then
Furthermore, and for any symmetric densities such that , .
Entropy and Battacharyya Bound Wasserstein Distance: Let . Then
and
Continuity for Ordered Families: Consider a smooth family of -distributions ordered by degradation so that is continuous wrt . Then the Wasserstein metric is also continuous in .
Discussion: Property (i) is particularly useful. Imagine a sequence of distributions ordered by degradation, i.e., . Then and we know from that is non-negative since it is the “average” of the non-negative integrals . Now note that is additive and that . From these two facts we can conclude that there must exist an index , , so that . More generally, we can conclude for any that there must exist an index , , so that . This follows by upper bounding the average of all these such distances. By property (i) this implies “closeness” also in the Wasserstein sense. In words, in a sequence of distributions ordered by degradation we are always able to find a subsequence of distributions which are “close” in the Wasserstein sense.
As an exercise in using the basic properties of the Wasserstein distance, let us give a proof of Lemma 8.
II-J GEXIT Curve
As we have discussed in the preceding section, FPs of DE play a crucial role in the asymptotic analysis. E.g., the BP threshold is characterized by the existence/non-existence of a non-trivial FP of forward DE for a particular channel.
An even more powerful picture arises if instead of looking at a single FP at a time we visualize a whole collection of FPs. In order to visualize many FPs at the same time it is convenient to project them. E.g., given the FP pair we might decide to plot the point in the two-dimensional unit box .
Note that for the BEC, erasure probability is equal to Battacharyya parameter, and also equal to entropy. Even though all these parameters are equal in this case, our language will reflect that we are plotting entropy.
Rather than plotting itself it is convenient to plot the EXIT value . This is the locally best estimate of a bit based on the internal messages only, excluding the direct observation. For this choice the resulting curve is usually called the BP EXIT curve, see and [62, Sections 3.14 and 4.10]. It is the BP EXIT curve since the estimate is a BP estimate. And it is the BP EXIT (where the E stands for “extrinsic”) curve since the estimate excludes the received value associated to this bit.
The FP equation is , which we can solve for to get
Using (7) we can write down the parametric characterization of the BP EXIT curve
This curve is shown in the left-hand side in Figure 1 for the -regular ensemble and has a typical shape. In fact, one can show that, in this case, for (the BP threshold) there is only one FP at corresponding to perfect decoding; for there are 2 FPs, one is at and the other is the FP corresponding to forward DE; and for there are exactly 3 FPs of DE, one of the FPs is at and the remaining two FPs are strictly positive, one of which is stable, denoted by , whereas the other is unstable, denoted by . The stable FP is the FP which is reached by forward DE. For details see Lemma 59.
A quantity which will appear throughout this paper is the value of the unstable FP when transmitting over BEC. We denote this FP by . More precisely, is the smaller non-zero solution of . Note that depends on the degrees, but we drop it from the notation for ease of exposition.
Discussion: The above example raises the following two questions. (1) We have a large degree of freedom in selecting the projection operator. Which one is “best”? (2) From the above example we see that the set of FPs forms a smooth curve. Indeed, for the BEC it is not hard to see that the only FPs are the ones on the curve together with all the FPs of the form , where is any element of the family of BEC channels and corresponds to erasure value of 0. Is this picture still valid for general channel families?
In the remainder of this section we address the first question, i.e., we will discuss a particularly effective choice of the projection operator. In the next section we will address the question of the existence and nature of this curve for the general case, presenting some partial results.
A good choice for the projection operator for general channels is the GEXIT functional . For the BEC this coincides with the EXIT functional that we saw in Example 15. For the general case take a FP and define . Then
where we think of as fixed with respect to . In words, measures the ratio of the change in entropy of (the entropy of the decision of any variable node under BP decoding) versus the change of entropy of the channel as a function of .
Discussion: Note that if the parameterization in is Lipschitz, i.e., if for some positive constant , , then the derivative exists almost everywhere. Further, in this case also is Lipschitz and hence differentiable almost everywhere. This follows since by (the Duality Rule in) Lemma 6, for ,
where the last step on the right-hand side assumes that the parameterization is such that increases in . The claim follows since both terms on the left are non-negative (due to degradation), so that in particular the first term is upper bounded by , i.e., it is Lipschitz. This formulation also shows that the numerator is no larger than the denominator (so that the ratio exists) and that the GEXIT value is upper bounded by (and is non-negative).
We get the GEXIT curve by plotting for a family of FPs . This is shown in Figure 2 for the -regular ensemble assuming that transmission takes place over the BAWGNC. In the last section we have already explained how we can construct in the general case FPs by a numerical procedure. To plot Figure 2 we have used this procedure to get a complete family of FPs for all entropies from to . In each of the two pictures of Figure 2 there is a small black dot. This dot marks a particular FP and the two small inlets show the corresponding distribution of the channel as well as the message distribution emitted at the variable nodes, call it . For a detailed discussion we refer the reader to .
Why do we use this particular representation? As we will discuss in detail in Section II-L, assuming this curve indeed exists and is “smooth”, the area which is enclosed by it is equal to , the design rate of the ensemble.
This is easy to see for the BEC. To simplify notation, denote the GEXIT value in this case by , where is the erasure probability, is the FP for this channel parameter, and . We then have . Let us integrate the area which is enclosed by this curve. We call the corresponding integral the GEXIT integral. For our particular case it is given by
Perhaps surprisingly, the result stays valid for general channels as we will discuss in Section II-L. This property is one of the main ingredients in our proof.
Note that given and , the GEXIT functional can be expressed in the form , where is called as the GEXIT kernel. In the -domain this kernel is given by
For a proof of the following see Lemma 4.77, .
For a smooth, ordered, channel family , , as a function of , exists, is continuous, non-negative, non-increasing and concave on its entire domain. Further and .
We remark that the above lemma also holds when is piece-wise linear.
II-K Existence of GEXIT Curve
As we briefly discussed above, for the BEC it is trivial to see that the BP GEXIT curve indeed exists. But for general BMS channels this is not immediate. The aim of this section is to show the existence of the BP GEXIT curve for at least a subset of parameters.
Let us first recall the following lemma which was stated and proved in a slightly weaker form in . For the convenience of the reader we reproduce the proof in Appendix E.
Assume that communication takes place over an ordered and complete family , where , using the dd pair .
Then, for any , there exists at most one density so that forms a FP which fulfills
Furthermore, if such a density exists, then it coincides with the forward DE FP. Finally, is Lipschitz continuous with respect to . More precisely, if two FPs and satisfy the condition for some , then
The following lemma states that, at least for sufficiently large entropies, the BP GEXIT curve indeed exists and is well behaved.
Assume that communication takes place over an ordered and complete family , where , using the dd pair . Consider the set of FP pairs obtained by applying forward DE to each channel . Let
Although it is easy and stable to compute the above lower bound on the entropy numerically, it will be convenient to have a universal and analytic such lower bound. This is accomplished in the following lemma, whose proof can be found in Appendix E.
Assume that communication takes place over an ordered and complete family , where , using the dd pair with and . Let be defined as in Lemma 18. Consider the set of FP pairs which is derived by applying forward DE to each channel . Then the GEXIT curve associated to , where
Table I lists these universal upper bounds for all the dds.
The following corollary follows immediately from Lemma 17, property (ii) of Lemma 14, and property (ix) of Lemma 13.
The proof of the following lemma can be found in Appendix F.
Bound for Partially Degraded Case: Let be degraded with respect to the channel density and let be such that . Then
Bound for Fully Degraded Case: Let be degraded with respect to the channel density and let be degraded with respect to the channel density Then
We will find it more convenient to parameterize the densities using Let us define
We claim that is continuous in both its arguments. Note that and, correspondingly, we define To show continuity of in the first component note that by the smooth channel family assumption. To show continuity of in the second component consider By (the Entropy Product Inequality) Lemma 21, property (iii), we have
showing that is actually Lipschitz in its second argument. It follows, in particular, that is continuous in . Since the Battacharyya parameter is a bounded functional and the channel family is smooth, we have is continuous in . Consequently, is continuous in . ∎
II-L Area Theorem
In Section II-J we introduced the GEXIT curve associated to a regular ensemble, see e.g. Figure 2. In Section II-K we then derived conditions which guarantee that this curve indeed exists and is continuous in a given region. We will now discuss the GEXIT integral, the area under the GEXIT curve. In order to derive some properties of this integral, we will first introduce GEXIT integrals in a slightly more general form before we apply them to ensembles.
Given two families and , the GEXIT integral is defined as
Discussion: In the above definition, and some definitions below, we need regularity conditions to ensure that the integrals exist. Rather than stating some general conditions here, we will discuss and verify them in the specific cases. E.g., one case we will discuss is if the channel family is smooth and is a polynomial in with “coefficients” which are fixed densities.
Consider a binary linear code of length whose graphical representation is a tree. Assume that we are given an ordered family of channels . Assume that when all variable nodes “see” the channel the distribution of the resulting extrinsic BP message density at the -th variable node is . Then the GEXIT integral associated of the -th variable node is . ∎
Discussion: Note that the distribution is the best guess we can make about bit given the code constraints and all observations except the direct observation on bit . This is why we have called the distribution the extrinsic message density. Note further that we have assumed that the graphical structure of the code is a tree. Therefore, BP equals MAP, the optimal such estimator.
The GEXIT integral applied to an ensemble is just the integral under the GEXIT curve of this ensemble.
Consider the -regular ensemble and assume that is a family of FPs of DE. Define . Then
In the sequel it will be handy to explicitly evaluate the integral. The proof of the following lemma is contained in Appendix G.
Discussion: Note that this GEXIT integral has a simple graphical interpretation; it is the area under the GEXIT curve as e.g. shown in the right-hand picture of Figure 1. The condition ensures that this curve is well defined and integrable.
We have seen in the last section that the value of a GEXIT integral of an ensemble is determined by the expression . We will soon see that it is crucial to describe the region where is negative. The following lemma, whose proof can be found in Appendix H, gives a characterization of this property.
Let be an approximate FP of the -regular ensemble of design rate . Assume that and for some fixed , . Let
For , if , then .
Discussion: In words, for sufficiently high degrees, is strictly negative for all with entropies in the range . Note that corresponds to the Shannon threshold for a code of rate . In the preceding lemma we introduced the notion of an approximate FP of DE: we say that is a -approximate FP if for some we have .
II-M Area Threshold
The most important goal of this paper is to show that suitable coupled ensembles achieve the capacity. The preceding (Negativity) Lemma 27 is an important tool for this purpose. But we will in fact prove a refined statement, namely we will determine the threshold for fixed dds. This threshold is the so-called area threshold and it was first introduced in in the context of the Maxwell construction.
Consider the -regular ensemble and transmission over a complete and ordered channel family . For each , let be the forward DE FP associated to channel . The area threshold, denote it by , is defined as
where is equal to , which is given in Lemma 26, evaluated at the FP , when transmitting with the -regular ensemble. ∎
Table II gives some values for for various dds and channels.
Recall that the GEXIT integral has a simple graphical interpretation – it is the area under the GEXIT curve, assuming of course that both the curve and the integral exist. The area threshold is therefore that channel parameter such that the GEXIT integral from to is equal to , the design rate.
Consider e.g. the case of the -regular dd depicted in Figure 3.
From Lemma 19 we know that the GEXIT curve is Lipschitz continuous at least in the range . An explicit check shows that , so that . We know that for the expression corresponds to the area under this GEXIT curve between and . This expression is therefore a decreasing function in , or equivalently, is an increasing function in . Using bisection, we can therefore efficiently find the area threshold and we get . Note that for this case the area threshold has the interpretation as that unique channel parameter so that the enclosed area under the GEXIT curve between and is equal to . This is obviously the reason for calling the area threshold.
The same interpretation applies to any dd ( and any BMS channel where the area threshold is such that the GEXIT curve from up till exists and is integrable. Empirically this is true for all regular dds and all BMS channels. Consider e.g. the case of the ensemble and transmission over the BAWGNC, see Figure 4. From Table I we are assured that this curve exists and is smooth at least in the range . This region is unfortunately too small. But it is easy to compute the curve numerically over the whole range. Since the resulting curve is smooth everywhere, it is easy to compute the area threshold numerically in this way. We get .
Fortunately, if we fix the rate then for all dd of sufficiently high degree this interpretation applies.
Consider a sequence of -regular ensembles of fixed design rate and with tending to infinity.
Furthermore, and, for fixed rate and increasing degrees, the sequence of the area thresholds converges to the Shannon threshold universally over the whole class of BMS channel families.
Let us prove the last claim of the lemma. We want to show that at the area threshold . Recall that the area threshold was defined as the supremum over all so that is less than or equal to zero. Therefore, all we need to show is that is continuous as a function of around .
III Coupled Systems
Our goal is to show that coupled ensembles can achieve capacity on general BMS channels. Let us recall the definition of an ensemble which is particularly suited for the purpose of analysis. We call it the ensemble. This is the ensemble we use throughout the paper. For a quick historical review on some of the many variants see Section I-B.
For the whole paper we will always be interested in the limit when tends to infinity while as well as and stay fixed. In this limit we can analyze the system via density evolution, simplifying our task.
Not surprisingly, spatially coupled ensembles inherit many of their properties from the underlying ensemble. Perhaps most importantly, the local connectivity is the same. Further, the design rate of the coupled ensemble is close to that of the original one. A proof of the following lemma can be found in .
The design rate of the ensemble , with , is given by
There is an entirely equivalent way of describing a spatially coupled ensemble in terms of a circular construction. This construction has the advantage that it is completely symmetric. This simplifies some of the ensuing proofs.
Given an ensemble we can associate to it a circular ensemble. This circular ensemble has extra sections, all of whose variable nodes are set to zero. To be concrete, we assume that the sections are numbered from , where the sections in are the sections of the original ensemble and the sections in are the extra sections. In this new circular ensemble all index calculations (for the connections) are done modulo and indices are mapped to the range . For all positions in the range the channel is , and consequently, . For all “regular” positions the associated channel is the standard channel . This circular ensemble has design rate equal to . ∎
As we will see, it is the global structure which helps all the individual codes to perform so well – individually they can only achieve their BP threshold, but together they reach their MAP performance.
III-B Density Evolution for Coupled Ensemble
Let us describe the DE equations for the ensemble. In the sequel, densities are -densities. Let denote the channel and let denote the density which is emitted by variable nodes at position . Throughout the paper, denotes an -density with all its mass at and represents the perfect decoding density. Also, denotes an -density with all its mass at and represents a density with no information.
Note that , where the right-hand side represents DE (without the effect of the channel) for the underlying -regular ensemble. Also define
As before we see that denotes the EXIT value of DE for the underlying -regular ensemble. It is not hard to see that both as well as are monotone wrt degradation in all their arguments , . More precisely, if we degrade any of the densities , , then (respectively ) is degraded. We say that (respectively ) is monotone in its arguments. ∎
Fix the parameters and and assume that , . Then
Using once again property (v) of Lemma 13
III-C Fixed Points and Admissible Schedules
Consider DE for the ensemble. Let . We call the constellation (of -densities). We say that forms a FP of DE with channel if fulfills (12) for . As a short hand we say that is a FP. We say that is a non-trivial FP if for at least one . Again, for , . ∎
III-D Entropy, Error and Battacharyya Functionals for Coupled Ensemble
Let be a constellation. Let denote either the (entropy), (error probability), or (Battacharyya) functional defined in Section II-D.
We define the (normalized) entropy , error and Battacharyya functionals of the constellation to be
III-E BP GEXIT Curve for Coupled Ensemble
We now come to a key object, the BP GEXIT curve for the coupled ensemble. We have discussed how to compute BP GEXIT curves for uncoupled ensembles in Section III-E. For coupled ensembles the procedure is similar.
In Section III-C we have seen that for coupled systems FPs of forward DE are well defined and that they can be computed by applying a parallel schedule. This procedure allows us to compute some FPs.
But we can also use DE at fixed entropy, as discussed in Section II, to compute further FPs (in particular unstable ones). More, precisely, fix the desired average entropy of the constellation, call it . Start with the initialization , the vector of all . In each iteration proceed as follows. Perform one round of DE without incorporating the channel, i.e., set
Now find a channel , assuming it exists, so that after the convolution with this channel the average entropy of the constellation is equal to . Continue this procedure until the constellation has converged (under some suitable metric).
Assume that we have computed (via the above procedure) a complete family of FPs of DE, i.e., a family so that for each , there exists a parameter so that . Then we can derive from it a BP GEXIT curve by projecting it onto
where was introduced in Section III-B, and is the (normalized) GEXIT function of the constellation .
Figure 5 shows the result of this numerical computation when transmission takes place over the BAWGNC (left-hand side) and the BSC (right-hand side). Note that the resulting curves look similar to the curves when transmission takes place over the BEC, see . For small values of the curves are far to the right due to the significant rate loss that is incurred at the boundary. For around and above, the BP threshold of each ensemble is close to the area threshold of the underlying -regular ensemble, namely for the BAWGNC and for the BSC (see the values in Table II). The picture suggests that the threshold saturation effect which was shown analytically to hold for the BEC in also occurs for general BMS channels.
The aim of this paper is to prove rigorously that the situation is indeed as indicated in Figure 5, i.e., that the BP threshold of coupled ensembles is essentially equal to the area threshold of the underlying uncoupled ensemble.
III-F Review for the BEC
Let us briefly recall the main result of which deals with transmission over the BEC. Let and denote the BP threshold and the MAP threshold of the ensemble. Also, let denote the MAP threshold of the underlying -regular LDPC ensemble. Then the main result of states that
Also, (see ) as , with the ratio fixed, . Thus, with increasing degrees, ensembles under BP decoding achieve the Shannon capacity for the BEC.
III-G First Result
Before we state and prove our main result (namely that coupled codes can achieve capacity also for general BMS channels), let us first quickly discuss a simple argument which shows that spatial coupling of codes does have a non-trivial effect.
First consider the uncoupled case. We have seen in Lemma 11 that when we fix the design rate and increase the degrees the BP threshold converges to . What happens if we couple such ensembles? We know that for the BEC such ensembles achieve capacity. The next lemma asserts that this implies a non-trivial BP threshold also for general BMS channels.
Consider transmission over an ordered and complete family of BMS channels using a ensemble and BP decoding.
Let denote the corresponding BP threshold and let denote the corresponding BP threshold for transmission over the BEC. Then
Consider DE of the coupled ensemble (cf. (12)). Applying the Battacharyya functional, we get
where we use the multiplicative property of the Battacharyya functional at the variable node side.
Using the linearity of the Battacharyya functional and extremes of information combining bounds for the check node convolution ([62, Chapter 4]) we get
The preceding set of equations is formally equivalent to the DE equations for the same spatially coupled ensemble and the BEC. Therefore, if then the DE recursions, initialized with must converge to , which implies (13).
Further, from we know that for sufficiently large degrees , with their ratio fixed, and with sufficiently large, approaches arbitrarily closely (see the discussion in the preceding section), which proves the final claim. ∎
Let us specialize to the case of transmission over the BSC using -regular ensemble. Then we have . Using the above argument and solving for in we conclude that by a proper choice of and we can transmit reliably at least up to an error probability of .
Combining the above result with Lemma 4 we conclude that the BP threshold of the coupled ensemble is at least . In summary, for general BMS channels and regular ensembles of fixed rate and increasing degrees, their uncoupled BP threshold tends to but their coupled BP threshold is lower bounded by a non-zero value. We conclude that coupling changes the performance in a fundamental way. In the rest of the paper we will strengthen the above result by showing that this non-zero value is in fact the area threshold of the underlying ensemble and as degrees become large, this will tend to the Shannon threshold, .
IV Main Results
In the sequel we will impose some restrictions on the parameters. Rather than repeating these restrictions in each statement, we collect them once and for all and give them a name.
Fix the design rate of the uncoupled system. We say that the parameters and are admissible if the following conditions are fulfilled with :
, ,
,
,
,
We say that the ensemble is admissible if the parameters and are admissible. If we are only concerned about the conditions on , then we will say that is admissible. ∎
Discussion: Conditions (i), (ii) and (iii) are fulfilled if we take the degrees sufficiently large. Conditions (iv), (v), and (vi) can all be fulfilled by picking a sufficiently large connection width .
Why do we impose these conditions? At several places we use simple extremes of information combining bounds and these bounds are loose and require, for the proof to work, the above conditions. We believe that with sufficient effort these bounds can be tightened and so the restrictions on the degrees can be removed or at least significantly loosened. We leave this as an interesting open problem.
Numerical experiments indicated that for any and the threshold saturation phenomenon happens, with a “wiggle-size” which vanishes exponentially in .
Note that the above bounds imply the following bounds which we will need at various places:
,
.
Instead of condition (iii) above we can impose the stronger but somewhat easier to check condition , where is the upper bound stated in Lemma 19, or even further strengthen the condition to . The last condition can be easily checked to be satisfied for sufficiently large degrees.
IV-B Main Result
Consider transmission over a complete, smooth, and ordered family of BMS channels, denote it by , using the admissible ensemble . Let and denote the corresponding BP and MAP threshold. Further, let denote the design rate of this ensemble and set . Finally, let denote the area threshold of the underlying -regular ensemble and the given channel family. Then
where f(d_{l},d_{r},w)=8(d_{r}-1)^{3}\big{(}\sqrt{2}+\frac{2}{\ln 2}d_{l}(d_{r}-1)\big{)}\sqrt{\frac{2(d_{l}-1)(d_{r}-1)}{w}}. Note that depends only on the dd and but is universal wrt the channel family . Furthermore,
The bound is trivial and only listed for completeness. Consider the upper bound on stated in (17). Start with the circular ensemble stated in Definition 31. The original ensemble is recovered by setting the consecutive positions in to . Define . We first provide a lower bound on the conditional entropy for the circular ensemble when transmitting over a BMS channel with entropy . We then show that setting sections to does not significantly decrease this entropy. Overall this gives an upper bound on the MAP threshold of the coupled ensemble in terms of the area threshold of the underlying ensemble.
It is not hard to see that the BP GEXIT curve is the same for both the -regular ensemble and the circular ensemble (when all sections have the standard channel). Indeed, forward DE (see Definition 35) converges to the same FP for both ensembles. Consider the circular ensemble and let . The conditional entropy when transmitting over the BMS channel with entropy is at least equal to minus the area under the BP EXIT curve of (see Theorem 3.120 in ). Indeed, from the proof of Theorem 4.172 in , we have
Here, the entropy is normalized by , where is the length of the circular ensemble and denotes the number of variable nodes per section. Assume that we set consecutive sections of the circular ensemble to in order to recover the original ensemble. As a consequence, we “remove” an entropy (degrees of freedom) of at most from the circular system. The remaining entropy is therefore positive (and hence we are above the MAP threshold of the coupled ensemble) as long as . From Lemmas 26 and 29 we have , so that the condition becomes . For all channels with we have . For a derivation of this statement we refer the reader to the proof of part (vi) of Theorem 47. This implies that . Furthermore, . This follows from the definition of area threshold, which implies that for , (cf. Lemma 26) and then combining with Lemma 26. Putting things together we get
We get the stated condition on by lower bounding by .
The lower bound on expressed in (16) is the main result of this paper. It shows that, up to a term which tends to zero when tends to infinity, the BP threshold of the coupled ensemble is at least as large as the area threshold of the underlying ensemble.
Empirical evidence suggests that the convergence speed wrt is exponential. Our bound only guarantees a convergence speed of order .
Let us summarize. In order to prove Theorem 41 we “only” have to prove the lower bound on . Not surprisingly, this is also the most difficult to accomplish. The remainder of this paper is dedicated to this task.
IV-C Extensions
In Theorem 41 we start with a smooth, complete and ordered channel family. But it is straightforward to convert this theorem and to apply it directly to single channels or to a collection of channels. The next statement makes this precise.
The ensemble is universally capacity achieving for the class of BMS channels. More precisely, assume we are given and a target rate . Let denote the set of BMS channels of capacity at least . To each associate the family , by defining
Then there exists a set of parameters so that
Since for each the associated family is ordered by degradation, this implies that we can transmit with this ensemble reliably over each of the channels in at a rate of at least , i.e., arbitrarily close to the Shannon limit.
Fix the ratio of the degrees so that . Note that for each the constructed family is piece-wise smooth, ordered and complete. By applying Theorem 41 to each such channel family we conclude that for admissible parameters (i.e., as long as we choose the degrees and the connection width sufficiently large) the threshold of the ensemble for the given channel family is at least , where is the area threshold and is a universal quantity, i.e., a quantity which does not depend on the channel family and which converges to when tends to infinity. Further, we know from Lemma 29 that the area threshold approaches the Shannon threshold uniformly over all BMS channels for increasing degrees. By our choice of the Shannon threshold is . Therefore, by first choosing sufficiently large degrees , and then a sufficiently large connection width , we can ensure that the BP threshold is at least . Finally, by choosing the constellation length sufficiently large, we can ensure that the rate loss we incur with respect to the design rate the underlying ensemble is sufficiently small so that the design rate of the coupled ensemble is at least . ∎
Assume we are given and a target rate . Let denote the set of BMS channels of capacity at least . Then there exists a set of parameters of rate at least with the following property. Let be an element of with blocklength , where we assume that only goes over the subsequence of admissible values. Then
In words, almost all codes in of sufficient length are good for all channels in .
Note that according to (iv) in Lemma 13 the space of distributions endowed with the Wasserstein metric is compact, and hence so is . Hence there exists a finite set of channels, denote it by , so that each channel in is within a (Wasserstein) distance at most from the set . We will fix the value of shortly.
Let us modify the set so that is not only close to but is also “dominated” by it. For each , define
IV-D Proof of Main Result – Theorem 41
We start by proving some basic properties which any spatial FP has to fulfill. Since we are considering a symmetric ensemble (in terms of the spatial arrangement) it will be useful to consider “one-sided” FPs.
We say that is a one-sided FP (of DE) with channel if (12) is fulfilled for with for . We say that the FP has a free boundary condition if for . We say that it has a forced boundary condition if for . Lastly, we say that it has an increasing boundary condition if for , where , for , are fixed but arbitrary symmetric densities. ∎
We say that is non-decreasing if for . Let be a non-trivial and non-decreasing one-sided FP (with any boundary condition). As a short hand, we then say that is a proper one-sided FP. Figure 6 shows an example. ∎
Similar to Definition 35, one can define one-sided forward DE by initializing all sections with and by applying DE according to an admissible schedule. ∎
There are two key ingredients of the proof. The first ingredient is to show that any one-sided spatial FP which is increasing, “small” on the left, and “not too small” and “flat” on the right must have a channel parameter very close to the area threshold . This is made precise in (the Saturation) Theorem 47.
The second key ingredient is to show the existence of a such a one-sided FP . Figure IV-D shows a typical (two-sided) such example. This is accomplished in (the Existence) Theorem 48.
Fix and let be admissible, with , in the sense of conditions (ii), (iii), (v), (vi), (vii) and (viii) of Definition 40. Let be a proper one-sided FP on , with forced boundary condition, so that for some , , and the following conditions hold.
Constellation is close to “on the left”:
Constellation is not too small “on the right”:
Here is a function which can be made arbitrarily small by choosing sufficiently small, sufficiently large, and and sufficiently large compared to . (This implies of course that the constellation length is also chosen sufficiently large.) More precisely,
Fix and let be admissible in the sense of conditions (i), (ii), (iii), (iv), (v), (vi) in Definition 40 with . Let be a smooth, ordered and complete channel family.
In the sequel, is a positive constant which depends on the ensemble but not the channel or the channel family and is a positive constant which depends on and , but not on the channel , the channel family, or .
For any and , there exists a proper one-sided FP on with parameters and with forced boundary condition so that the following conditions are fulfilled:
Constellation is close to “on the left”: Let
Then for .
Constellation is not too small “on the right”: Let
Then for .
Maxwell Conjecture: As a byproduct of our proof, we know that the MAP threshold of coupled ensembles is essentially equal to the area threshold of the uncoupled ensemble. In addition we know that the MAP threshold of the uncoupled ensemble is also upper bounded by the area threshold. The Maxwell conjecture states that in fact the MAP threshold of the uncoupled ensemble is equal to the area threshold. So if one can establish that the MAP threshold of the uncoupled ensemble is at least as large as the MAP threshold of the coupled ensemble, then the Maxwell conjectured would be proved. A natural approach to resolve this issue is to use interpolation techniques and it is likely that the Maxwell conjecture can be proved in a way similar as this was done in for other graphical models.
Convergence Speed: As discussed previously, we only give weak bounds on the speed of convergence of the ensemble to the Shannon capacity (as a function of the degrees, the constellation length , as well as the coupling width ). Numerical evidence suggests much stronger results. Settling the question of the actual convergence speed is both challenging and interesting.
Lifting of Restrictions: Our results apply only to sufficiently large degrees whereas numerical calculations indicate that the threshold saturation effect equally shows up for small degrees. This is a consequence of the fact that at many places we have used simple extremes of information combining bounds. With sufficient effort it is likely that one can extend the proof to many dds which are currently not covered by our statement.
General Ensembles: In a similar vein, we restricted our investigation to regular ensembles to keep things simple, but the same technique applies in principle also to irregular or even structured ensembles. Again, depending on the structure of the underlying ensemble, much effort might be required to derived the necessary bounds.
Wiggle Size: Perhaps the weakest link in our derivation is the treatment of the connection width . In our current statements this connection width has to be chosen large. Empirically, small such lengths, such as the extreme case give already excellent results and by increasing the convergence to the area threshold seems to happen exponentially fast. How to derive practically relevant bounds for such small values of is an important open problem.
Scaling: More generally, from a practical point of view, what is needed is a firm understanding of how the performance of such codes scale in each of the parameters in , , , , as well as . Only then will it be possible to design codes in a principled fashion.
Practical Issues: Further important topics are, the design of good termination schemes which mitigate the rate-loss, a systematic investigation of how structure in the interconnection pattern as well as the codes influences the performance, and how to optimally choose the scheduling (e.g., windowed decoding) to control the complexity of the decoder .
General Models: As was discussed briefly in the introduction, the threshold saturation phenomenon has been empirically found to hold in a large variety of systems. This suggests that one should be able to formulate a rather general theory rather than finding a separate proof for each of these cases. For all one-dimensional systems this has recently been accomplished in . For higher-dimensional or infinite-dimensional systems this is a challenging open problem.
We would like to thank H. Hassani, S. Korada, N. Macris, C. Méasson, and A. Montanari for interesting discussions on this topic and H. Hassani for his feedback on an early draft. S. Kudekar would like to thank Misha Chertkov, Cyril Méasson, Jason Johnson, René Pfitzner and Venkat Chandrasekaran for their encouragement and Bob Ecke for hosting him in the Center for Nonlinear Studies, Los Alamos National Laboratory (LANL), where most of his work was done. He also gratefully acknowledges his support from the U.S. Department of Energy at Los Alamos National Laboratory under Contract No. DE-AC52-06NA25396 as well as from NMC via the NSF collaborative grant CCF-0829945 on “Harnessing Statistical Physics for Computing and Communications.” The work of R. Urbanke was supported by the European project STAMINA, 265496.
Let . Then for ,
Consider now (25). Set . We want to show that for . We have
The following claims are straightforward to verify using the explicit formulae for , , and : (i) , (ii) , (iii) , (iv) has exactly one solution in $$.
Suppose there exists a , , so that . Then from (i), (ii) and (iii) we must have for at least three distinct elements of $g^{\prime}(z)=0(0,1)g^{\prime}(0)=0g^{\prime\prime}(z)=0$, contradiction (iv).
We prove (26) along similar lines. Consider , where . Note that for (to verify this, upper bound the term of the entropy function by ). So if we can prove that for then we are done.
Direct inspections of the quantities shows that , , , where , and .
It follows that if there exists an so that then must have at least roots in this range, therefore by Rolle must have at least roots, and again by Rolle must have at least roots. But an explicit check shows that . So can only have a single solution. ∎
Proof of Lemma 4: Let denote the density in the -domain. Then
This proves that lower bounds . For the upper bound we have
Appendix B Upper Bound on BP Threshold – Lemma 11
We use ideas from extremes of information combining. We get an upper bound on the BP threshold by assuming that the densities at check nodes are from the BSC family and that densities at variable nodes are from the BEC family.
Let represent the entropy of the variable-to-check message and let denote the entropy of the channel. If for any
then DE will not converge to the perfect decoding FP. The left-hand side represents the minimum entropy at the output of a check node which we can get if the input entropy is (and this minimum is achieved if the input density is from the BSC family). The right-hand side represents the maximum input entropy which we can have at the input of a variable node if we want an output entropy equal to (and this minimum is achieved if the input density is from the BEC family). Note that we can extend the inequality (27) to all without changing the condition since for , the right hand side is strictly bigger than , whereas the left-hand side is always bounded above by .
The preceding condition is equivalent to saying that in order for DE to succeed, we must have
for all . We can also write this as
We want to show that cannot be too large, i.e., we are looking for an upper bound on . Note that any value of gives a bound. Let us choose . This gives the bound
To obtain the above inequality we first write as . For we use the Taylor expansion
Thus and . We want to simplify the expression even further. Using [110, Lemma II.1] and bringing out the first term in the summation,
Substituting we have
Appendix C Basic Properties of the Wasserstein Metric – Lemma 13
Alternative Definitions: The equivalence of the basic definition (cf. Definition 12) and the first alternative description is shown in (6.2) and (6.3) in . The equivalence of the first and second alternative descriptions is shown in .
Boundedness: Follows directly from either of the two alternative descriptions.
Metrizable and Weak Convergence: See [104, Theorem 6.9].
Let and be the -domain representation. Thus is characterized by
In step (i) we use the construction of along with the relation between and domains given by (29). We defined and step (ii) follows by explicitly writing the variable node convolution in the -domain. In step (iii) we defined
To obtain this equivalent formulation of the integral in step (iii) we make use of the symmetry conditions of -densities and the implied relationship between and densities for ,
We claim that is (as a function ). This will settle the proof of the lemma. Notice that is a linear combination of four functions. Let us consider a generic term. Writing explicitly, we have
Now we sum over all possible and divide by 4 to get
Let us consider the other term. We split the sum into two parts, one sum over and the other over . We have
Adding the two we get the total contribution
To get a good bound on in terms of for consider
and note that the Wasserstein metric can be expressed directly in the L-domain as
Applying this representation we observe that
Regularity wrt : Let be . Let and be the -domain representation.
where step (a) follows since in the -domain, check-node convolution corresponds to a multiplication of the values.
But note that if is then is . Hence,
Above, the relation between the Battacharyya and error parameters can be obtained via extremes of information combining (see ). Let us focus on the last part. To get a good bound on in terms of for , consider
and note that the Wasserstein metric can be expressed directly in the D-domain as
Applying this representation, we observe that
Regularity wrt DE: Follows from properties (vi) and (vii).
Wasserstein Bounds Battacharyya and Entropy: Let be a positive function on $fC^{2}.c\geq|g|_{\infty},$
Before proving the inequality let us use it to establish the stated bounds. Set Then and Now, for the Battacharyya bound let and note
For the entropy case we set The same argument as above yields
We prove the stated inequality. Let us define
where For each we have with equality at Hence,
Battacharyya Sometimes Bounds Wasserstein: Since the cumulative -distribution of is equal to on $$, the maximum possible value, we have
Similarly, since the cumulative -distribution of is on we have
Appendix D Wasserstein Metric and Degradation – Lemma 14
Wasserstein versus Degradation: Let be a function of bounded total variation on (This implies that has left and right limits.) Note that we include and in the definition of total variation, which we denote by Define We claim that if then
This claim implies statement (i) by setting and noting that, in this case,
We now prove the claim. Let be the set of points in including the endpoints, where Note that is closed and we may assume on The complement of is a collection of disjoint open intervals such that is either strictly positive or strictly negative in each interval. Consider the subset of intervals on which is strictly negative. Without loss of generality we may take this collection to be finite. Indeed, suppose there are countably infinitely many such intervals Define an approximation by setting for and otherwise. Then and uniformly. Furthermore, and converges to from below.
By taking unions of intervals as necessary we can find an increasing sequence such that on we have for odd and for even. The sequence of points is strictly increasing except possibly for the last pair which may coincide at . Define
where if Note that We have
The desired result then follows from Jensen’s inequality
It is straightforward to show that for odd we have
where Indeed, for odd we have for all with equality at Hence
The argument for even is similar. We obtain
Defining for notational convenience, we can write
Entropy and Battacharyya Bound Wasserstein: Let us first focus on the inequality between the Wasserstein distance and the Battacharyya parameter. From point (i) we know that
For the final inequality first note that g(z)=(1-z^{2})^{-1}\Bigl{(}\int_{z}^{1}(|\mathfrak{{B}}|(x)-|\mathfrak{{A}}|(x))\text{d}x\Bigr{)}\leq 1\,. Let It follows that
Continuity for Ordered Families: Assume that . From point (ii) we know that
and the continuity follows from the continuity of the Battacharyya parameter for smooth channel families.
Consider two -densities . Then, for any degree distribution ,
Let be a density and let be distributed according to the corresponding -distribution. By Jensen’s inequality we have
where is positive for each . The functionals have the important (Fourier) property .We introduced here only the even moments, since only these are needed. The odd moments are multiplicative as well. Since is convex and increasing for we have . Hence,
Consider two -densities . Let and let and denote the two corresponding channels from an ordered family . Set for . Then, for any dd pair
where .
First, since , . Second, since and , for all . This implies that is upper bounded by . Using the triangle inequality, we get
The first term above can be bounded using Lemma 50. ∎
Assume on the other hand that satisfies (9) and that there exists a distinct FP for the same channel, necessarily upgraded with respect to , also satisfying (9). Call this density . In this case,
a contradiction since . The above argument shows that there can be at most one FP with this property and that this FP must be the forward DE one.
Note that and that is continuous.
The induction is anchored by noting that since we assumed that . In summary, for each , equation (34) gives us the lower bound for the FP of forward DE with the channel . Another way of interpreting (34) is that it gives us an upper bound on if we fix .
According to Lemma 17, the GEXIT curve is Lipschitz continuous (in the Battacharyya parameter) at the FP if
Note that (34) as well as (35) (if we interpret the inequality as an equality) give rise to curves in the space. Inserting (34) into (35) gives us the points where these two curves cross. If we set , massage the resulting expression, and set it to , we get (11). As shown in the subsequent Lemma 52, (11) has a unique positive solution in (i.e., the two curves only cross once), after this solution, and is an increasing function above this solution. The situation is shown in Figure 8.
Set and , multiply the equation by and set . This gives the equivalent equation , where , and .
The function is (i) decreasing and convex for , (ii) , (iii) . The function is (i) increasing for , (ii) decreasing for , (iii) concave for , and (iv) . Note that since we assumed that .
In the region, there can be no further solution since , , and is convex whereas is concave.
and for . The factor can be written as , where . This polynomial has two sign changes and hence by Descarte’s rule of signs at most two positive roots. It follows that has at most roots for . Since and , there must be exactly one root of in and once the function is positive, it stays so within $q(\hat{x})\geq 0\hat{x}(1-\hat{x})^{d_{r}-2}=\frac{1}{(d_{l}-1)(d_{r}-1)\sqrt{\hat{x}}}q(\hat{x})=r(z)\,|\,_{z=\hat{x}}$, where
A quick check shows that for . The proof will be complete if we can show that . We do this in two steps. We claim that , where , and that . The second claim is immediate. To see the first,
This shows that that the maximum of in $1\hat{x}b(x)\hat{x}x\inb(\hat{x})=1\hat{x}\geq\breve{x}$, as claimed. ∎
Proof of Lemma 19: Let and be as defined in Lemma 18. We will provide an upper bound on the unique solution of . Notice that represents the DE equations for a BEC with parameter . Therefore, we know that for , . We claim that and intersect only at one point in . Indeed , , is equivalent to
Since , whereas , we conclude that for , .
We further claim that . Let us assume this for a moment. Then we have for . We conclude that the unique solution of in is upper bounded by .
We finish the lemma by proving . Indeed, since , all we need to show is that Recall that for the BEC(1), the DE equation is given by . Furthermore, there are 3 FPs namely, 0, (unstable) and (stable). Finally, we have that if and only if or . See Chapter 3 in for more details.. For one can verify the validity of the claim directly. In general, we have
where the last inequality follows since .
The Battacharyya parameter of the channel is thus upper bounded by . Using the upper bound on the entropy in Lemma 4, we get the claimed bound.
It remains to show that this bound converges to when we fix the rate and let the dds tend to infinity. To simplify our notation, let and . We have
where (a) is obtained by using the following sequence of inequalities,
Appendix F Entropy Product Inequality – Lemma 21
with the kernel as given in the statement. Differentiating, we have
Integrating by parts twice for each dimension, we see that
This proves the alternative representation of this integral.
Note that the bound (i) is implied by Let and Then the desired inequality is equivalent to for Raising both sides to the power of this becomes Multiplying both sides by this can be written as which is equivalent to proving the claim.
where to obtain the second inequality we combine the upper bound on derived above with the alternative representation of as given in (ii).
Appendix G Evaluation of GEXIT Integral – Lemma 26
For the proof of Lemma 26 it will be handy to have the following two lemmas available.
Consider a single-parity check code of length . Let denote a codeword, chosen uniformly at random from this code. Let denote the result of passing the codeword through a BMS channel with density . Then
Let be uniform random bits and let denote their parity. Suppose is transmitted through the BMS channel with density . Let the received vector be .
The entropy of the single parity check code is By symmetry we have Now , but we also have . Thus, the entropy of the single parity check code is
Now consider the channel that transmits a bit once through the channel with density and again through a channel with density The entropy of the combined channel is This is equivalent to the single parity check code of two bits. Hence
which proves (the Duality Rule of) Lemma 6. ∎
Consider the -regular computation tree of height (see e.g., Figure 9). This tree represents a code of length containing codewords. Let be chosen uniformly at random from the set of codewords and let be the result of sending the components of through independent BMS channels. The root node goes through the BMS channel and all leaf nodes are passed through the BMS channel . Then,
Using the chain rule, rewrite as
where corresponds to the root variable node and is the set of all the leaf nodes. The first term is computed by density evolution by considering all the independent messages flowing from the leaf nodes into the root node. Indeed, we convolve the channel density with the densities coming from the check nodes, each of which has density . Thus we get
Indeed, when we condition on the root node to take either or , we split the code into codes, each of which is a single parity-check code of length . Using the previous Lemma 53, we obtain the above expressions. Combining the above statements proves the claim. For completeness, although the exact marginal does not factor into the computation, note that there are codewords in the code. Out of those, have a in the root node. So the marginal of is one-half. ∎
To evaluate the integral we consider the code corresponding to the -regular computation tree of height as in Lemma 54. Let be chosen uniformly at random from the set of codewords and assume that the component corresponding to the root node is sent through the channel , whereas all components corresponding to the leaf nodes are sent through the channel . Let be the received word. Since is, by assumption, a FP family, the density flowing from any check node into the root node is and so the total density seen by the variable node (excluding the observation of the variable node itself) is . Therefore, the GEXIT integral associated to the root of this tree code is the desired integral. We will evaluate this integral by first determining the sum of all the GEXIT integrals associated to this tree and then by subtracting from it the GEXIT integrals associated to the leaf nodes.
In the sequel we will perform manipulations, such as writing a total derivative as the sum of its partial derivatives or writing a function as the integral of its derivative. In a first pass we will assume that all these operations are well defined. In a second step we will then see how to justify these steps by approximating the desired integrals by a series of simple integrals.
Label the variable nodes of the tree with the set so that the root has label . Note that by assumption , so that the entropy of the first component of , call it , is . The entropy of the remaining components, call them , , are all equal and take on the value . So we imagine that all components are parameterized by .
The last inequality is obtained by using Lemma 54 for the two endpoints and recalling that we set .
Let us consider the leaf node contributions. By symmetry these contributions are all identical. If we focus on a single check node, then again due to symmetry, the GEXIT integrals of all leaf nodes is the same. But the sum of all the GEXIT integrals is equal to the change in entropy of a single-parity check code of length . Thus, using Lemma 53, we see that the integral of any single GEXIT integral is equal to
It remains to justify the previous derivation. We proceed as follows. Instead of working with , we will work with a simpler family which is piece-wise linear and “close” to the original family. Because it is piece-wise linear, the operations are simple to justify. Because it is “close” to the original family, the result is “close” to what we want to show. By taking a sequence of such families which approximate the original family closer and closer, we obtain the desired result.
where and is a suitable interpolation factor. In the last step we have made use of (v) in Lemma 13, the convexity property of the Wasserstein distance, and the fact that consecutive samples have an entropy difference of (at most) . Further, since they are ordered, i.e., , an entropy difference of at most implies a Wasserstein distance of at most (cf. (ii) of Lemma 14).
To each corresponds a FP , call it . Take the collection . Since this collection is ordered we can construct from it an ordered and piece-wise smooth family via a linear interpolation of consecutive samples in the same manner as we have done this for the channel family. We have
The last inequality above follows from considering the same steps as before, since the densities are ordered and each of them are FPs at channels with entropy difference at most . Recall that is a FP family, hence we can write
So the root integral is in fact well defined. The same argument can be repeated for the leaf integrals to show that they are also well defined.
If we consider one segment and add all the contributions (which as we saw can be written down explicitly) we can verify that the sum of all the GEXIT integrals is indeed equal to the difference of the entropy of the tree. This calculation is in principle straightforward but somewhat tedious, so we skip the details.
Note that for any we have
The only thing which remains to be done is to prove that the GEXIT integral of the root node when we use the linearized family converges to the true GEXIT integral when we let tend to . We will do this in several steps by considering the chain of integrals
,
,
and by showing that the value of consecutive such integrals is arbitrarily close. Here, is a family which is piece-wise constant on each segment, taking on the value of its left boundary.
Appendix H Negativity – Lemma 27
We prove Lemma 27 by showing the following slightly stronger statement.
Let be an -density and consider a degree-distribution such that . Define , and , where .
Assume that is a -approximate FP, i.e., , for some channel and . Then if , .
For , .
Set . Let us first characterize the area in a more convenient form. We have
For the -distributions and let and be the associated distributions. Following the lead of L. Boczkowski we write
In step (a) we have used the expansion of Lemma 49, where , . Note that and that . Most importantly, as mentioned in the proof of Lemma 50, the moments are multiplicative under . This implies that for , . E.g., for two distributions and we have
where in the first equality we use that in the -domain the check node operation is simply a multiplication.
Assume at first that and that for some channel . Define . Then
In (a) we used the bound so that . Consider step (b). Set . Then
In step (c) we substituted the upper and lower bounds on for the first and second expression respectively. Also, in the last inequality, we have since we assumed that .
For ,
In (a) we upper bound by , , and note that . In (b) we use (this is true since is decreasing for each fixed as a function of ) and that is increasing. Step (c) is a consequence of the bound . Let us prove this inequality. Equivalently, we want to show \text{H}(\mathsf{x})\leq h_{2}\Big{(}\frac{1-\sqrt{m_{\mathsf{x},1}}}{2}\Big{)}. By Jensen
The claim is proven by noticing that the lhs above is equal to .
Step (d) uses the following lower bound on . Set . From extremes of information combining we know that we get the lowest entropy if we assume that is a BSC density. Therefore,
Consider finally step (e). We know that . Combined with (26) and we conclude that . ∎
Appendix I Spacing of FPs –Lemma 57 and Transition Length of FPs – Lemma 61
If we are given a proper one-side FP (with any boundary condition) then consecutive elements of the FP cannot be too different from each other. This is made precise in the following lemma.
Let be a proper one-sided FP on , with any boundary condition.
Let denote the weighted average . Then, for any ,
Discussion: Each of these two claims states that consecutive distributions are “close” either wrt the Wasserstein distance or the Battacharyya parameter. Further, the difference is either for the distributions themselves or their averages.
To simplify notation, for fixed, let \mathsf{f}_{j}=\bigl{(}\frac{1}{w}\sum_{k=0}^{w-1}\mathsf{x}_{i+j-k-1}\bigr{)}^{\boxast d_{r}-1}. Writing the DE equations explicitly,
Note that the expressions for and are similar. The only difference is that contains whereas contains . Rewrite both expressions in the form
where , , , and . Now expand as well as in the form
where . Note that the terms in the expansions of and with are identical. Therefore, if we consider , these terms cancel. We can upper bound the difference by the Battacharyya constant of all those terms of the expansion of which correspond to , i.e.,
If we are interested in the Wasserstein distance instead, we can proceed in an almost identical fashion. The only difference is that in the last sequence of inequalities we use the convexity property (v) and the boundedness property (ii) of (the Wasserstein metric) Lemma 13.
Using the convexity property (v) of (the Wasserstein metric) Lemma 13 and canceling common terms, we get
The proof for the Battacharyya parameter proceeds in an identical fashion and uses the linearity of the Battacharyya parameter.
Let be a proper one-sided FP on , with any boundary condition. Let denote the Battacharyya parameter of the density of the -th section. Then for all ,
Since the Battacharyya parameter is multiplicative in and linear,
Further, recall from Lemma 5, property (iv), and the ensuing discussion, that , so that
Let , . Since , is convex. Let . Then by Jensen,
Consider the -regular ensemble with and let , where is the BP threshold the regular ensemble when transmitting over the BEC. Define .
For , has exactly three solutions, one of them being 0 and the other two denoted by and with . Further, for all and for all .
and ; for .
There exists a unique value so that , and there exists a unique value so that . Further, is decreasing in .
Let . The quantity is non-negative and depends only on the channel parameter and the degrees .
For , .
For , .
Let and denote the universal lower bounds, given in the previous part, on and , respectively. If we draw a line from with slope , then lies below this line for .
For we have
The function is the DE equation for the -regular ensemble when transmitting over the BEC. The two non-zero solutions, and represent the unstable and the stable FPs of DE . In the following, we will be using extremes of information combining techniques to relate the Battacharyya parameters via .
In Figure 6 we see that within a few sections the constellation changes from reliable sections (towards the boundary) to sections which all have more or less the same reliability. In other words, this transition happens quickly. This is made precise in the following lemma.
Let be the BP threshold for transmission over the BEC using the -regular (uncoupled) ensemble. For , let be the smaller of the two strictly positive roots of the equation , where . For , define .
Consider transmission over a BMS channel . Let be admissible in the sense of property (iv) of Definition 40. Let be a proper one-sided FP on with any boundary condition. Let denote the Battacharyya parameter of the density associated to the -th section and define .
Then, there exists a positive constant which depends on and , but not on or the channel , so that for any
Throughout the proof we set and we write for .
Note first that we have to prove the statement only for . This is true since we have defined to coincide with for and since further the function , which we use to bound the process, is strictly decreasing as a function of . Hence, in the sequel our language will reflect the fact that we have .
(i) The number of sections such that is at most . If then the number of sections in this part is 0. Hence wlog assume . Let be the smallest index so that . If then the claim is trivially fulfilled. Assume therefore that . From the monotonicity of and the fact that is increasing,
This is equivalent to . More generally, using the same line of reasoning, , as long as .
We summarize, the total distance we have to cover is and every sections we cover a distance of at least as long as we have not surpassed . Therefore, after sections we have either passed or we must be strictly closer to than . Hence, to cover the remaining distance we need at most extra sections. The total number of sections needed is therefore upper bounded by , which, in turn, is upper bounded by . The final claim follows by bounding with and by .
(ii) The number of sections such that is at most Let us define From Lemma 58, . Summing this inequality over all sections from to we get,
Writing in terms of the s and rearranging terms,
Without loss of generality we can assume that there exists a section so that (we know from point (i) that we must reach this point unless the constellation is too short, in which case the statement is trivially fulfilled). Consider sections , so that in addition . If no such exists then there are at most points in the interval , and the statement is correct a fortiori.
Our plan is to use (41) to lower bound . This means, we need a lower bound for . Since by assumption , it follows that , so that every contribution in the sum is positive (cf. Lemma 59 (i)). Further, by (the Spacing) Lemma 57, . Hence,
Let us explain how we obtain the last inequality. First we claim that there must exist a section with between and . Indeed, suppose on the contrary that this was not true. Let be the smallest section number such that . Clearly, such a exists. Indeed, since , it follows that . Since , we must have . This implies that . Using (the Spacing) Lemma 57 we conclude that . Hence . Using the universal lower bound on , we get , a contradiction to the hypothesis of the lemma. Finally, according to Lemma 59 part (iv), for , which implies the inequality. Combined with (41) this implies that
We summarize, the total distance we have to cover is and every steps we cover a distance of at least as long as we have not surpassed . Allowing for extra steps to cover the last part, bounding again by , bounding by and replacing and by their universal lower bounds, proves the claim. ∎
Appendix J Saturation – Theorem 47
Before we proceed to prove the Saturation theorem, we introduce a key technical element required in the proof, a family of spatial (approximate) FPs. This is the content of Definition 62 and Theorem 63. Then, Theorem 64 shows that the GEXIT integral of this family depends only on its end-points. Combined with the Negativity lemma 27 this imposes a strong constraint on the channel value of the spatial FPs, culminating in the proof of the Saturation theorem.
Let , , denote an increasing one-sided constellation on for the parameters . Let and let .
The family (of constellations) for the -ensemble, based on , is denoted by .
Each element is symmetric with respect to the spatial index and the components are indexed by . Hence it suffices to define the constellations in the range and then we set for . As usual, we set for . For and define
where for ,
Finally, . ∎
Notice that in the above definition when approaches , then .
In the definition above, we keep the channel constant across the sections and over . In other words, the channel remains constant for all the constellations in the family.
We denote the two partitions in the interpolation as phases, e.g., corresponds to phase I and corresponds to phase II.
The above interpolation might look complicated. But there is a straightforward interpretation. Think of one-sided constellations. We are interested in a constellation of size .
In phase I, the basic idea is to “move” the constellation to the right and at each point in time to “chop off” the overhanging parts both on the left and on the right. We do this until the left most section of is at position . If were a continuous function, i.e., suppose we had a continuum of sections, then this would be all we need to do. But is discrete, so in order to get a continuous interpolation we interpolate between two consecutive elements of . This mimics the “wave effect” we mentioned in the beginning.
In phase II, the residual constellation is uniformly brought down to in each section.
In the next lemma we show that if we have an interpolated family constructed via the above definition, then the resulting family is a family of approximate FPs.
Let , , denote an increasing one-sided constellation on with free or fixed boundary condition for the parameters and let . Assume that fulfills the following conditions, for some .
Constellation is close to “on the left”:
Also,
Constellation is approximate FP: For ,
Let denote the family as described in Definition 62. Then this family is an approximate FP family. More precisely, for and
and are ordered by degradation, increasing, and piece-wise linear,
for and for all and
for any and any
Discussion: For the boundary and in the middle the interpolation does not in general result in an approximate FP. Fortunately this does not cause problems. We will see in Theorem 64 that each section gives only a small contribution to the GEXIT integral. If we choose sufficiently large then we can safely ignore a fixed number of sections.
That and are ordered by degradation, increasing, and piece-wise linear follows by construction.
In the same way, that for and for all also follows by construction.
It remains to check that the family so defined constitutes an approximate FP family. Since the family, by definition, is symmetric around the section , we check only for the sections belonging in .
Phase I: Think of and as fixed, . Define and . Set . With these conventions, we want to bound
Using the convexity property (v) of (the Wasserstein metric) Lemma 13, it is sufficient to bound
separately. The two bounds are identical and their derivation is also essentially identical. Let us therefore concentrate on the first expression. Using first the triangle inequality and then the regularity properties (vi) and (vii) as well as the convexity property (v), we upper bound the first expression by
where to obtain the first inequality we use the approximate nature of and in the last step we have used property (ii) of Lemma 13.
Phase II: In this regime we interpolate the “tail” of the original constellation uniformly to . From the assumption of the lemma we have . Since is increasing we must have for . Lemma 13, property (iii), then implies that for all .
Again, think of and as fixed, . Set and . Then
where to obtain the penultimate inequality we use Lemma 33 to bound the distance of to , since is always an FP of DE) and the second expression is the distance of to , which is bounded using the previous arguments.
Next, we show that if we have an approximate family of FPs, then the area under the GEXIT integral associated to the family depends only on the “end points” of the interpolated family.
Let denote an approximate FP family for the ensemble. More precisely,
and are ordered by degradation, increasing, and piece-wise linearIn fact, we will apply this theorem to the family given in Definition 62. More generally, however, given a set of distinct ordered densities , we get a piece-wise linear family by linearly interpolating always between consecutive densities.,
for and for all ,
for ,
for , and
for all and
where is the GEXIT integral introduced in Definition 23. Let
Then is well defined and
Discussion: In words, the theorem says that for any family of spatial FPs which start and end at a constant (over all sections) FP, the GEXIT integral is given by the end-points and is close to the difference of the expression introduced in Lemma 26. In fact, from the Lemma 26 we see that, graphically, this is equal to the area under the BP GEXIT curve of the underlying ensemble between the two end-points.
Let us consider the circular ensemble which is associated to (see Definition 31). As defined in the statement of the lemma, for , the channel “seen” at position is . For the remaining sections we impose the “natural” condition . As a consequence, for these positions .
Since as well as are piece-wise linear, all GEXIT integrals are well defined (see the proof of Lemma 26). Consequently, is well-defined.
Instead of determining , directly, let us determine the equivalent quantity associated to the circular ensemble, i.e., we include the extra positions . Since for all “extra” positions the associated channel is constant, and so the additional integrals are zero, the numerical value of these two unnormalized GEXIT integrals is in fact identical.
We will now derive upper and lower bounds for the GEXIT integrals for the given approximate FP family. Recall: for we have a -approximate (in the Wasserstein metric) FP family. For all we know is that the channel is a monotone function of . Finally, for the channel is frozen to “perfect.”
Boundary: For the GEXIT integral is non-negative. Thus, in this regime, we get a lower bound by setting each GEXIT integral to 0 (cf. Lemma 16).
Interior: Consider the GEXIT integrals for .
Technique: Rather than evaluating these integrals directly we use the technique introduced in , i.e., we consider the computation tree of height rooted in node as shown in Figure 9 for the specific case .
More precisely, there are check nodes connected to this root variable node and further variable nodes connected to each such check node. So in total there are check nodes in this tree and variable nodes. We call the starting variable node, the root and all other variable nodes, leaves. By symmetry it suffices to consider one branch of this computation tree in detail. Let , , denote the position of a particular check node. We assume that the choice of is done uniformly over this interval. Let , , , denote the position of the -th variable node attached to this check node, and let the index of the root node be . For the leaf nodes we assume again a uniform choice of over the allowed interval. Note that, wlog, we have set the position for the root variable node. For each computation tree assign to its root node the channel , whereas each leaf variable node at position “sees” the channel . Note that for our model of the tree, the distribution (averaged over this choice) which flows into the root node is exactly , as required for the computation of .
Let us describe the basic trick which will help us to accomplish the computation. We will first determine the sum of all GEXIT integrals associated to such a tree. From this we will then subtract the GEXIT integrals associated to its leaf nodes. This will give us the GEXIT integral associated to the root node, which is what we are interested in.
More precisely, we use (G). The lhs of this equation gives us the contribution of the overall tree and the rhs contains the GEXIT integral of the root node plus the GEXIT integrals of the leaf nodes. For the current case, we stress that all the operations (integrals of derivatives and partial derivatives) in (G) are well-defined since the family we consider is piece-wise linear
Contributions from overall tree: Recall that for , and .
Consider first the case and . From Lemma 54 we know that the conditional entropy of the tree code is given by
Exactly the same argument tells us that the entropy of such a tree for is, up to a possible error of size , equal to . We conclude: the difference of the total entropy of such a tree is lower bounded by , call this .
Contributions from leaves: We need to find the contributions of GEXIT integrals associated to all the leaf nodes of each such tree rooted at a position . The exact such sum is difficult to determine. But we only need an upper bound to derive a lower bound on the overall GEXIT integral. Note that GEXIT integrals are non-negative. Hence, let us compute the sum of GEXIT integrals of leaf nodes of all computation trees, whether they are rooted in a position or not.
By symmetry, this contribution is easy to determine. More precisely, consider the following equivalent procedure. Pick a check node at position , . Every check node has connected variable nodes, where each variable node is picked with uniform probability and independently from the range and the choice of the variables is iid (note that the connections are taken on the circular ensemble).
Contributions from checks in the range : Check nodes in this range might see some frozen channels or channels which do not form approximate FPs. Hence we upper bound all GEXIT integrals associated to check nodes in this range by (cf. Lemma 16). The number of such integrals is .
Contributions from checks in the range : Check nodes in this range only see channels which are approximate FPs and none of the channels are frozen. There are such integrals. Let us determine the contribution for each such integral. Since we consider an average over all possible computation trees, the (average) density entering a check node is equal for all the leaf nodes (there are such densities). Let us call this density . If we focus on a check node at position , this density is equal to
To see the last inequality, using (ii), Lemma 21 we have
Accounting: Putting everything together, we have
Let us derive an upper bound in the same manner.
Boundary: For the GEXIT integrals are at most . This gives a contribution of . As usual, for the GEXIT integral is and does not contribute to the area.
Interior: Consider the GEXIT integrals for .
Technique: We use the same procedure as beforehand. But this time we need a lower bound of the GEXIT integrals of the leaf nodes.
Contributions from overall tree: As before, the overall contribution of each tree is equal to plus an error term of absolute value equal to .
Contributions from leaves: The idea is same as before and as before, we will consider the computation from the point of view of check nodes. As before, we split the contribution in two regimes, and .
Contributions from checks in the range : Check nodes in this range might see some frozen channels or channels which are not approximate FPs. Since we are looking for an upper bound, we set the contribution of such check nodes to be 0.
Contributions from checks in the range : As we discussed before, check nodes in this range only see channels which are approximate FPs and none of the channels are frozen. Further, all these GEXIT integrals corresponds to computation trees whose root is in the range . We can, therefore, subtract all their contributions, which are obtained by arguments similar to those used in the lower bound. There are such integrals and the contribution for each such integral is at least . Here, the last term takes into account the approximate FP nature of the channels and was defined in the arguments for obtaining the lower bound.
Proof of Theorem 47: Rather than deriving the bound for all values of the parameters, we are only interested in the behavior of this bound for values of tending to and values of and tending to . Hence, in the sequel, nothing is lost by assuming at several spots that is “sufficiently” small and and are “sufficiently” large (consequently is also sufficiently large). This will simplify our arguments significantly.
Let denote the proper one-sided FP on with forced boundary condition which fulfills the stated conditions for some and and . We prove the claim in several steps, where in each step we assert further properties that such a FP has to fulfill.
Constellation is almost flat and not too small “on the right”: Recall that by assumption so that for . Using the same reasoning as in the discussion at the end of Lemma 14, we can conclude that there exists an such that for all and . From part (i) of Lemma 14 we conclude that for all . Clearly, the right-hand side can be made arbitrarily small by picking sufficiently larger than .
Constellation can be made exactly flat and not too small “on the right”: Create from the increasing constellation on with free boundary condition in the following way,
The graphical interpretation is simple. We replace the “almost” flat part on the right plus the extra part on the right which might not be flat with an exactly flat part. To simplify our subsequent notation we set and from above arguments note that . Hence for all .
Constellation is approximate FP: Note that by going from to no component in is changed by more than a distance . Therefore, if we run DE on the modified components it is clear that in this range the output must still be close to the original output. More precisely, we have for every
where to get the penultimate inequality we first replace by , since is a true FP, and then to obtain the last inequality we apply Lemma 33. Since can be made arbitrarily small by choosing sufficiently large, this verifies the approximate FP nature for . Let us now focus on . Note that since , we can use the above argument in particular for . For this choice of all involved densities, , are equal to . Therefore, the previous argument shows that
But for all components of are equal to and so the approximate FP nature of is also verified for . Since , we conclude that is an approximate FP.
From FP to FP family: From the approximate FP on we create the approximate FP family on as described in Definition 62.
Computing GEXIT integral – Definition 23: Using the basic definition of the GEXIT functional in Definition 23 we conclude that the GEXIT integral associated to , is since the channel remains constant throughout the interpolation.
Computing GEXIT integral – Theorem 64: We now compute the GEXIT integral associated to by first applying Lemma 63 and then Theorem 64.
More precisely, from the previous arguments we satisfy all the hypotheses of Lemma 63. This allows us to conclude that the FP family constructed above is approximate FP (cf. (42)) if is chosen sufficiently large. Furthermore, since the starting ( for all sections ) and ending constellations ( for all sections ) are flat, we satisfy all the hypotheses of Theorem 64 from which we conclude that the GEXIT integral is upper bounded by .Note that .
Flat region has entropy not much smaller than : From part (viii) Lemma 59 we get
where in the last step we have used condition (vii) in Definition 40. We conclude that
We now proceed by contradiction. Let us assume that . As we just discussed,
In the last step we assumed without loss of generality that is chosen sufficiently small. The inequality then follows from the condition (v) in Definition 40. This, together with (44), guarantees that we satisfy the hypothesis of (the Negativity) Lemma 27. Hence we conclude that . From condition (vi) in Definition 40 . Hence for a sufficiently small and a sufficiently large , this leads to the conclusion that the GEXIT integral , a contradiction to the previous computation. As a consequence, we must have
To prove the intermediate claim we argue inductively that
Let us now bound the two terms containing Battacharyya parameters. Note that in each iteration the distance of the respective Battacharyya constants decreases by a factor of at least . Indeed, from Lemma 51,
Putting everything together we conclude that by choosing sufficiently large can be made as small as desired.
is close to : From Theorem 64 we have
From above arguments we have hence
Using the formula for given in Lemma 26 and properties (vii) and (ix) given in Lemma 13 we have
Recall that . Combining, we get
Then for any we have (cf. Lemma 18). Thus we conclude that for any . Denoting we have,
To obtain (a) we use , since and form a FP pair. This implies that The last inequality follows since condition (vii) in Definition 40 implies that . This implies
where the last equality follows since (cf. Lemma 29).
Appendix K Existence of FP – Theorem 48
Before proceeding to the main part of the proof, let us show that if we assume that there exists a proper FP on , with forced boundary condition on the right and on the left () and with Battacharyya parameter of the constellation (cf. Definition 37) equal to , then the desired properties (i) and (ii) mentioned in the statement of the theorem follow.
Constellation is close to “on the left”: Let be the largest integer so that for all , . We have a proper FP and (because is by assumption admissible in the sense of condition (iv) in Definition 40). Hence by applying (the Transition Length) Lemma 61 we conclude that the number of sections with Battacharyya parameter bounded between and is at most , where is the constant defined in Lemma 61. Since the Battacharyya parameter of the constellation is , we have
This implies that N_{1}\geq(N+1)\Big{(}\frac{1}{2}-\frac{wc(d_{l},d_{r})}{(N+1)\delta}\Big{)}. Using property (x) of (the Wasserstein metric) Lemma 13, we conclude that for all , .
Constellation is not too small “on the right”: Let be as defined previously. Again, since the Battacharyya parameter of the constellation is equal to we have
where on the rhs above we have replaced the sections with value greater than by the maximum value of .
This implies that . Thus if we define as the number of sections with Battacharyya parameter at least equal to , we must have
where we used to obtain the above expression.
It remains to show the existence of the proper FP itself, with Battacharyya parameter of the constellation equal to . We use the Schauder FP theorem in a strong form recently proved by Cauty : This theorem states that every continuous map from a convex compact subset of a topological vector space to itself has a FP.
Let (where denotes the norm). Note that is a real normed vector space and hence a topological vector space. Let denote the space of probability measures on ${\mathcal{P}}\subset{\mathcal{S}}{\mathcal{P}}{\mathcal{P}}{\mathcal{S}}{\mathcal{P}}{\mathcal{P}}{\mathcal{P}}$, see [104, Remark 6.19].
A Cartesian product of a family of topological vector spaces, when endowed with the product topology, is a topological vector space. Hence, , endowed with the product topology, is a topological vector space.
Discussion: As we discussed above, we think of the elements of as cumulative distribution functions. In particular, these are the cdfs in the so called domain. In the sequel, rather than only referring to cdfs it will often be more convenient to write down the distributions or distributions , directly.
is non-empty: Setting all elements of equal to gives an element in this space.
is convex: Let with -distributions given by and respectively. Let for some . Since is a linear operator, we see that
Also, using (2), we see that for all . Hence .
From Lemma in we know that each component of is a symmetric distribution. It therefore remains to shows that (i) , and (ii) for all . Both claims follow from the fact that we can encode the above properties in terms of continuous functions and that continuous functions preserve the properties under limits.
is compact: Note that is a closed subset of , which is compact since it is the product of compact spaces. Hence is compact as well.
Definition of map : In order to show (via Schauder’s FP theorem) that contains a FP of DE we need to exhibit a continuous map which maps into itself. Our first step is to define a map, call it , which “approximates” the DE equation and is well-suited for applying the FP theorem. The final step in our proof is then to show that the FP of the map is in fact a FP of DE itself.
The map is constructed as follows. For , let be the map,
where for , and where for . Define as
In words, if is “too large”, upgrade it by an appropriate channel . If, on the other hand, is “too small” then we take a convex combination with . In the preceding expressions, terms like denote component-wise products, i.e., the result is a vector of densities, where the -th component is the result of multiplying the -th component of with the scalar . Further, is a shorthand for .
It remains to specify the components of . Note that . Further, we require that its components are increasing and that they are all either or , except possibly one. I.e., has the form , where , and . This defines the vector uniquely. Pictorially we can think of this map in the following way. We start at component . We take an increasing convex combination with until the overall Battacharyya constant is equal to . If this is not sufficient, then we set and repeat this procedure with component , and so on. To apply Schauder’s theorem, we need to show that the map is well-defined and continuous.
Map is well defined: First consider the case . In this case . Since the Battacharyya parameter is a strictly increasing and continuous function of the channelThat the Battacharyya parameter is continuous follows since the channel family is smooth. Further, since the Battacharyya kernel is strictly concave and the channel family is ordered by degradation, the Battacharyya parameter is strictly increasing., there exists a unique such that . Note also that is monotone (spatially) since is monotonic (as a function of its arguments) and is monotone. Consequently, is monotone. Further, from the multiplicative property of the Battacharyya parameter at the variable node, we get that . It follows that in this case .
Consider next the case . If we choose then we get a Battacharyya parameter of . Further, the increase in the Battacharyya parameter is continuous. Hence there exists an so that the resulting constellation has Battacharyya constant equal to . Also, by construction the resulting constellation is monotone. This shows that also in this case . In both the cases above, the map maintains the symmetric nature of the -distributions.
We summarize, maps into itself. In the rest of the proof, we will use the notation to denote the Wasserstein distance between two constellations and .
Continuity of map : We will show that for every and for any , there exists a such, that if and , then . Note that if then
, ;
, ;
if and .We abuse notation slightly to denote the channel associated to by , respectively, rather than denoting them by the standard parameterization .
Assertion (i) is equivalent to Lemma (33) since if then a fortiori , . Assertion (ii) follows from assertion (i) by applying property (ix) of Lemma 13. To see assertion (iii) we write
The last inequality follows from assertion (ii) and Recall that the channel family is ordered by degradation. We can therefore apply property (ii) of Lemma 14 to prove our claim.
Choosing as a function of and using assertion (ii) above, we can therefore assume that either and or and . In the first case,
Let us now focus on the second case. Let denote the largest integer in such that is non-zero. Clearly if , then , else we set . Similarly, let be the corresponding index in . Let us denote and . Note that . Wlog we can assume that . With this we can upper bound by,
Above we have used that for we have In the case , the terms in the interval collapse to .
Let us first consider the case when . Note that . This implies that if we replace the Wasserstein distance by the Battacharyya parameter in (K) the expression evaluates to 0. Then writing the term as we get
where above we use .
We now continue with (K). We use , (x) of Lemma 13 and (K) to get the upper bound
Finally using assertions (i) and (ii) above we get that
Wlog we can assume . This implies . Hence from (ii) of Lemma 14 we can bound the second Wasserstein distance above by the difference of the Battacharyya parameters. Further,
The first Battacharyya difference on the rhs can be bounded by . For the second difference we use same arguments as (K) to obtain
Combining everything with the assertions (i) and (ii), in this case we get
Existence of FP of via Schauder: We can invoke Schauder’s FP theorem to conclude that has a FP in , call it .
Existence of FP of DE (): Let us show that, as a consequence, DE itself has a FP with the desired properties.
If , then with . Hence indeed, is a FP of DE.
Consider hence the case . We will show that it leads to a contradiction. Recall that in this case
and that for .
Given a density we say that it has a “BEC component” of if contains a delta at of “weight” (i.e., contains a mass of at ). In the sequel we will think of as the erasure probability of a binary erasure channel.
Let be the vector of BEC components corresponding to . Since we know that has some non-trivial components in , and by definition of the right boundary, for . We claim that for ,
Let us prove this claim immediately. Extract the BEC component from both the left-hand as well as the right-hand side of (51). This gives
where we wrote as a shorthand for and BEC() denotes weight at . To see the second step, i.e., to see that , let denote the density at the output of the check nodes when the input is . Let denote the (BEC) density at the output of the check nodes when the input is . Some thought shows that is also the BEC component of . In words, at check nodes the BEC component evolves according to density evolution – we get an erasure at the output of a check node if and only if at least one of the incoming messages is an erasure. At variable nodes we only get a bound. If all inputs to a variable node are erasures then the output is also an erasure, but this is only a sufficient condition. Thus (53) is proved. If , then and (52) is true. If , then , where the second step follows from (53).
Extend the constellation by sections on the right, with values equal to , and let denote this constellation. We claim that has at least
sections on the left with Battacharyya value between and where is the constant of Lemma 61 and only depends on the dd.
To prove this claim, we consider our original (before we extracted the BEC components) which was the FP obtained by Schauder’s theorem. We claim that has at least segments on the left with Battacharyya constant at most , where
Let us explain each of the terms on the right. There are segments to start with, which explains (a). At most sections on the right can have a Battacharyya value of or larger (since ). This accounts for the (b) term. Finally, all sections , with , must be sections where fulfills the actual FP equations, i.e., these cannot be sections where the map “pushes” the constellation up to . More precisely, we must have for . Indeed, from construction, starting from the rightmost section, each section is increased all the way up to before we move on to the next section on the left. Since the constellation has Battacharyya parameter equal to we conclude that for we must have , which is a true FP of DE for the channel . Therefore, for these section we can apply (the Transition Length) Lemma 61 and conclude that there are at most such section which have Battacharyya value between and . This is the term (c).
The claim now follows since the BEC component is upper bounded by the corresponding Battacharyya parameter, .
Now consider a further constellation on . We set for all . For we set to the FP of forward DE according to Lemma 22 in , where the length of the constellation is taken to be , , and . More precisely, Lemma 22 in says that if we run forward DE, with free boundary condition, when transmitting over the BEC with and coupled ensemble, then for large enough length, the one-sided FP of forward DE must be proper (non-trivial and increasing) and we can lower bound the Battacharyya parameter of the resulting FP. By our choice of this FP (on ) has Battacharyya parameter at least . Now since we have . This implies that . Thus Clearly, (component-wise).
We summarize, is a proper one-sided FP of DE for with fixed boundary condition and and . But we know from Theorem 47 that such a FP, , must have a channel value close to , the area threshold of -regular ensemble when transmitting over BEC. More precisely, applying Theorem 47 we conclude that the entropy of the channel of must be less than . Since For transmission over the BEC using a -regular ensemble, from Theorem 3.120 in we know that . Further the MAP threshold is upper bounded by the Shannon threshold, ., we conclude that by choosing small enough and large enough, can be made arbitrarily small and hence the channel of is strictly less than 1, leading to a contradiction since we started with . This contradiction tells us that we cannot have when we apply the Schauder theorem. Hence the FP must be a true FP of DE. ∎