Accelerated Gradient Descent via Long Steps

Benjamin Grimmer, Kevin Shu, Alex L. Wang

Introduction

When utilizing constant stepsizes, until recently, the best known guarantee was the textbook result that fixing $h_{i}=1$ ensures $f(x_{T})-f(x_{\star})\leq LD^{2}/2T$ . This was improved by the tight convergence theory of Teboulle and Vaisbourd , showing a rate of

when the stepsizes $h_{i}=1$ . Utilizing nonconstant stepsizes monotonically converging up to $2$ , they further showed a rate approaching $LD^{2}/8T$ . These coefficient improvements were first conjectured by .

By utilizing nonconstant periodically long stepsizes, Grimmer showed improved convergence rates are possible outside the classic range of stepsizes $(0,2)$ . We refer to steps with $h_{i}>2$ as long steps since they go beyond the classic regime $h_{i}\in(0,2)$ where descent on the objective value is guaranteed. Their strongest result, resulting from a computer-aided semidefinite programming proof technique, showed repeating a cycle of $127$ stepsizes $h_{0},\dots,h_{126}$ ranging from $1.4$ to $370.0$ gives a rate of

Note, bounding $\min_{i\leq T}f(x_{i})-f(x_{\star})$ (or a similar quantity) is natural for such long step methods as monotone decrease of the objective is no longer ensured. By considering longer and more complex patterns, increasing gains in the coefficient appear to follow. However, the reliance on numerically solving semidefinite programs with size depending on the pattern length limited this prior work’s ability to explore and prove continued improvements in convergence rates. Grimmer conjectured at least a $O(1/T\log(T))$ rate would follow if one could design and analyze (algebraically) cyclic patterns of generic length.

We show greater gains are possible. By using nonconstant, nonperiodic stepsizes $h_{i}$ , we prove

Proving this relies on semidefinite programming-based analysis techniques and considers the overall effect of many iterations at once (rather than the one-step inductions typical to most first-order method analysis).

In related work, Das Gupta et. al. produced numerically globally optimal stepsize selections via a branch-and-bound procedure for gradient descent with a fixed number of steps $T\in$ . By fitting to asymptotics of their numerical guarantees [2, Figure 2], they conjectured a $O(1/T^{1.178})$ rate may be possible and may be best possible. Our work leaves open the gap between our $O(1/T^{1.0564})$ rate and their conjecture, as well as the gap between their conjecture and the known lower bound for general gradient methods of $O(1/T^{2})$ .

Generally, studying accelerated convergence rates stemming from long steps can yield several advantages/insights beyond what classic momentum methods can provide. Understanding the acceleration stemming from long steps may yield insights into the fundamental mechanism enabling acceleration; we have shown that changing the update directions based on an auxiliary momentum sequence is not needed to beat $O(1/T)$ . Hence, an acceleration can be attained by a method storing only one vector in memory at each step rather than two. Further, using long steps may partially mitigate the effects of inexact or stochastic gradients, known to hamper momentum methods , as no momentum term exists to propagate past errors into future steps. Lastly, we note that continued work in this direction may yield theoretical support for such cyclic long stepsize patterns used in neural network training .

Stronger guarantees for gradient descent with variable stepsizes are known in specialized settings, like $\mu$ -strongly convex minimization. Classically, gradient descent with constant stepsizes $h_{i}=1/L$ produces an $\epsilon$ -minimizer in $O(\kappa\log(1/\epsilon))$ iterations, where $\kappa=L/\mu$ . Concurrent to this work, Altschuler and Parrilo recently showed an accelerated rate through the inclusion of long steps of $O(\kappa^{0.786434}\log(1/\epsilon))$ (extending their prior preliminary results in ). Our convergence theory, using a different pattern of long steps, also improves on the classic $O(\kappa\log(1/\epsilon))$ , although at a weaker rate. Our Theorem 3.2 ensures that under our long stepsize selection, gradient descent has a $O(\kappa^{0.94662}\log(1/\epsilon))$ rate. The silver ratio, $1+\sqrt{2}$ , occurs prominently in both our analysis and theirs, indicating potential deeper connections.

Altschuler and Parrilo ’s faster accelerated rate for smooth strongly convex problems can be extended to give a $O(1/T^{1.271553})$ guarantee for a modified gradient descent method for general smooth convex problems, the main focus of this work. Doing so requires running gradient descent on a modified objective function (whose choice depends on specifying a target accuracy and an initial distance to optimal). In contrast, our results show acceleration beyond $O(1/T)$ is possible for gradient descent via long steps alone, i.e., without needing this modification and additional problem knowledge.

In the further specialized case of minimizing strongly convex quadratics, the optimal stepsizes were given by , which attain the optimal $O(\kappa^{1/2}\log(1/\epsilon))$ rate. For nonconvex optimization, exact worst-case guarantees for gradient descent with short steps $h_{k}\in(0,1]$ were given by Abbaszadehpeivasti et al. . The potential use of longer steps (greater than $2/L$ ) in nonconvex settings is an interesting future direction but beyond the scope of this work.

In the remainder of this introduction, we define our stepsize selection which accelerates due to the inclusion of long steps. To prove our accelerated rates, Section 2 first reviews the semidefinite programming analysis technique of based on the performance estimation problem techniques of . Specifically, the proof of our accelerated convergence rates utilize the “straightforward” property of stepsize patterns. Section 3 proves the claimed convergence rate assuming that certain finite-length blocks within our nonperiodic stepsize pattern are straightforward. Section 4 shows that straightforwardness of a stepsize pattern can be certified by producing a feasible solution to an associated spectral set. Finally, we close the loop and show that appropriate blocks within our stepsize pattern are straightforward by constructing such certificates in Section 5. Appendices D and E verify the necessary conditions on our certificates. Several symbolically intense calculations or simplifications are deferred to the associated Mathematica notebook available at the Github repository https://github.com/ootks/GDLongSteps. This same Github repository also contains Julia code that computes our step size sequences and our associated certificates.

We define $\alpha_{i}$ and $\mu_{i}$ inductively. For $i\geq 0$ , define

Note that $q_{i}(1)=-(\beta_{i+1}-1)(\mu_{i}-1)<0$ , so that $q_{i}$ has a unique root larger than $1$ and $\alpha_{i}$ is well-defined.

The building block $\mathfrak{h}^{(k)}$ will be a pattern of length $t^{(k)}=2^{k+1}-1$ . Although this pattern has exponentially many stepsizes, it will contain only $2k$ distinct values. This $k$ th building block stepsize pattern takes the form

This construction was arrived at through substantial computer-search over patterns with the necessary properties (see our Theorem 4.1). Although the above pattern may seem somewhat cryptic, given the values in $\mathfrak{h}^{(k-1)}$ , to produce the next pattern $\mathfrak{h}^{(k)}$ of the form (1.5) just requires specifying three new numbers $\beta_{k-2}$ , $\alpha_{k-1}$ and $\mu_{k}$ . The values of the $\beta$ sequence follow a nice exponential pattern (1.3). Once this is set, the values for $\alpha_{k-1}$ and $\mu_{k}$ are then determined entirely by a system of two equations in two variables that simply imposes two necessary conditions for straightforwardness of $\mathfrak{h}^{(k)}$ (see Section 3.2).

Following this construction, the first four building block patterns are, for example, given by

The values of $\mathfrak{h}^{(4)}$ , $\mathfrak{h}^{(5)}$ , and $\mathfrak{h}^{(6)}$ are plotted in Figure 1, showcasing their symmetries and fractal nature. Below we provide bounds on how the quantities $\alpha_{k}$ , $\mu_{k}$ , and $H_{k}:=\sum_{i=0}^{t^{(k)-1}}\mathfrak{h}^{(k)}_{i}$ grow asymptotically (proof deferred in Appendix A.1).

1.2 Building the Proposed Nonconstant, Nonperiodic Stepsize Sequence

We construct our accelerated sequence of long stepsizes from rescaled versions of the stepsize building block patterns $\mathfrak{h}^{(k)}$ by some fixed scalar $\eta\in(0,1)$ . We first apply $(1-\eta)\mathfrak{h}^{(0)}$ a fixed number of times, then apply the pattern $(1-\eta)\mathfrak{h}^{(1)}$ a fixed number of times, and so on. Each rescaled pattern $(1-\eta)\mathfrak{h}^{(k)}$ will be applied

times where the associated parameter $\Delta^{(k)}$ is defined as in Lemma 3.1.

We do not believe the choice of $\Delta^{(k)}$ therein is as large as possible. Improvements on that parameter directly would improve our guarantees as fewer applications of each pattern would be needed. In fact, we propose a conjecture following our Lemma 3.1 on how $\Delta^{(k)}$ should scale with $k$ (see 3.1). This conjecture is supported numerically and a proof of it would directly lead to an $O(1/T^{1.119})$ convergence rate guarantee.

As a result, the proposed nonconstant, nonperiodic stepsize sequence is

We denote the first iteration where stepsizes are drawn from the pattern $\mathfrak{h}^{(k)}$ by

Note the value of $\Delta^{(k)}$ shrinks geometrically. As a result, the iteration counts where we switch to the next building block stepsize pattern $I_{k}$ grows geometrically. For example, setting $\eta=1/2$ , the proposed sequences of stepsizes would be of the form

Performance Estimation and Straightforwardness

Our proof machinery is built upon the performance estimation problem (PEP) ideas of . We first introduce this PEP line of work and associated semidefinite programs applied to our particular setting. Then, we introduce the improved semidefinite programming technique of , identifying a class of stepsize patterns, dubbed straightforward, for which the effects of long steps can be analyzed.

The performance estimation problem (PEP) results of establish that this problem can be relaxed (often tightly instead as a reformulation) to a finite-dimensional semidefinite minimization problem. Their PEP process of reformulations is carried out below, following the notation used in Grimmer , to introduce the needed notations here and for completeness.

Step 1: A QCQP reformulation. First, as proposed by Drori and Teboulle , one can discretize the infinite-dimensional problem defining $p_{L,D}(\delta)$ over all possible objective values $f_{k}$ and gradients $g_{k}$ at the points $x_{k}$ with $k\in I_{t}^{\star}:=\{\star,0,1,\dots t\}$ as done below. Using the interpolation theorem of Taylor et al. , this gives an exact reformulation rather than a relaxation, giving

where, without loss of generality, we have fixed $x_{\star}=0,f_{\star}=0,g_{\star}=0$ .

Step 2: An SDP relaxation. Second, one can relax the nonconvex problem (2.2) to the following SDP as done in . Define

with the following notation for selecting columns and elements of $H$ and $F$ :

This notation ensures $x_{i}=H\mathbf{x}_{i}$ , $g_{i}=H\mathbf{g}_{i}$ , and $f_{i}=F\mathbf{f}_{i}.$ Furthermore, for $i,j\in I_{t}^{\star}$ , define

Under an additional rank condition (that the problem dimension $n$ exceeds $t+2$ ), the QCQP (2.2) and SDP (2.3) are actually equivalent. However, this is not needed for our analysis, so we make no such assumption.

Step 3: The upper bounding dual SDP. Third, note the maximization SDP (2.3) is bounded above by its dual minimization SDP by weak duality, giving

Although it is not needed for our analysis, equality holds here as well (i.e., strong duality holds) due to [6, Theorem 6].

The dependence of this function on each parameter is made clear by considering $Z$ broken into the following blocks

This first entry only depends on $v$ and $v$ only occurs in this first entry. The remainder of the first column and row are linear functions of only $\lambda$ , denoted by $m(\lambda)$ . The remaining $(t+1)\times(t+1)$ block of $Z$ is linear in $\lambda$ and affine in $\mathfrak{h}$ , denoted by $M(\lambda,\mathfrak{h})$ .

2 Straightforward Stepsize Patterns

The performance estimation technique defined above does not provide a mechanism to give convergence rates for gradient descent as it only considers a fixed number of iterations $t$ . To enable these PEP semidefinite programs to provide convergence rate theorems, Grimmer proposed considering stepsize patterns where this worst-case function is bounded above by

Theorem 3.2 of showed that a stepsize pattern is straightforward with parameter $\Delta$ if the following spectral set is nonempty

Hence proving a pattern is straightforward amounts to identifying a feasible solution to a semidefinite program. Grimmer used this to automate the search for long straightforward patterns, generating their constant factor convergence rate improvements for periodic stepsize sequences.

This straightforward structure is critical to our proof development as well. We will show that any rescaling $\eta\in(0,1)$ of our building block patterns $(1-\eta)\mathfrak{h}^{(k)}$ is straightforward. However, in contrast to this prior work, our proof of this will be entirely analytic and hence apply for all $k$ . The move from computer-generated certificates to exact algebraic formulas was essential to move the resulting performance-bound gains from being constant factor improvements to our accelerated big-O convergence.

Accelerated Convergence Rate Analysis

Our convergence rate analysis relies on three lemmas, (i) showing each rescaled building block pattern is straightforward, (ii) guaranteeing progress is made after $R_{k}$ applications of each pattern, and (iii) bounding the total number of steps in each pattern. Our first lemma requires substantial and nontrivial constructions and verification to prove. This is deferred to Sections 4 and beyond.

Each (scaled) building block pattern $(1-\eta)\mathfrak{h}^{(k)}$ is straightforward. In particular, $\mathfrak{h}^{(0)}/2$ has parameter $\Delta=1/2$ and for $k>0$ , $\mathfrak{h}^{(k)}/2$ has parameter

Note these parameters $\Delta^{(k)}$ are chosen very conservatively. Any at most exponentially decaying lower bound suffices to give a rate strictly faster than $O(1/T)$ . The slack in our above bound can be seen by numerically computing the largest $\Delta$ such $\mathcal{S}_{\mathfrak{h}^{(k)}/2,\Delta}$ is nonempty (implying straightforwardness for $\mathfrak{h}^{(k)}/2$ ) for $k=1,\dots 5$ . The resulting numerical values are given below

The successive ratios in the numerical values of $\Delta^{(k)}$ are decreasing and seem to approach $(1+\sqrt{2})^{2}\approx 5.83$ . For example, $\Delta^{(4)}_{\mathtt{numerical}}/\Delta^{(5)}_{\mathtt{numerical}}\approx 5.93$ . This suggests the following conjecture:

There exists $c>0$ such that for all $k\geq 1$ , the building block $\mathfrak{h}^{(k)}/2$ is straightforward with $\Delta^{(k)}\geq c(1+\sqrt{2})^{-2k}$ .

As discussed previously, proving this conjecture would directly lead to an $O(1/T^{1.119})$ convergence rate guarantee. We also mention that if one could show the even stronger bound of $\Delta^{(k)}=\Omega((1+\sqrt{2})^{-k})$ , then our steplength schedule $h$ would have the nice property that each block $\mathfrak{h}^{(k)}$ is repeated only constant number of times and the resulting convergence rate guarantee, $O(1/T^{1.2715})$ , would match the rate achieved by Altschuler and Parrilo with their modified gradient descent algorithm.

Our second lemma analyzes the objective gap of gradient descent with the stepsize sequence (1.7) after completing $R_{k}$ applications of $(1-\eta)\mathfrak{h}^{(k)}$ . This lemma follows directly from Lemma 3.1. Recall $I_{k+1}$ denotes the iteration of gradient descent just after applications of $\mathfrak{h}^{(k)}$ has completed.

Each $k\geq 0$ has $f(x_{I_{k}})-f(x_{\star})\leq LD^{2}\Delta^{(k)}$ .

Note this trivially holds for $k=0$ since $I_{0}=0$ and $f(x_{0})-f(x_{\star})\leq\frac{1}{2}LD^{2}$ . We prove this inductively by showing that if $\delta_{I_{k}}\leq LD^{2}\Delta^{(k)}$ , then $\delta_{I_{k+1}}\leq LD^{2}\Delta^{(k+1)}$ . Suppose $\delta_{I_{k}}\leq LD^{2}\Delta^{(k)}$ . Then the straightforwardness of $(1-\eta)\mathfrak{h}^{(k)}$ ensures that the objective gap decreases with each application of the pattern $(1-\eta)\mathfrak{h}^{(k)}$ . Namely, for any $s\geq 0$ , we have

Solving this recurrence relation (of the standard form $\delta_{s+1}\leq\delta_{s}-C\delta_{s}^{2}$ ) ensures that for all $s\geq 0$

Hence iteration $I_{k+1}$ , after $s=R_{k}$ applications of $(1-\eta)\mathfrak{h}^{(k)}$ , has $\delta_{I_{k+1}}\leq LD^{2}\Delta^{(k+1)}$ . ∎

To convert this bound to a convergence rate guarantee, we need a bound on $I_{k}$ , given below.

Let $c=\frac{1}{768768\sqrt{2}}$ and $d=(1+\sqrt{2})^{4}$ . If $\eta=1/2$ , then $I_{0}=0$ and for $k>0$ ,

Trivially $I_{0}=0$ . Recall from Lemma 3.1 that the straightforwardness parameter is given by $\Delta^{(0)}=1/2$ and $\Delta^{(k)}=c/d^{k}$ if $k\geq 1$ . Then for $l\geq 0$ ,

From this, our accelerated convergence guarantee for gradient descent is immediate.

For any target accuracy $0<\epsilon<\frac{1}{2}LD^{2}$ and rescaling $\eta=1/2$ , gradient descent with stepsize sequence (1.7) has $f(x_{T})-f(x_{\star})\leq\epsilon$ by some iteration

As in Lemma 3.3, let $c=\frac{1}{768768\sqrt{2}}$ and $d=(1+\sqrt{2})^{4}$ for which $\Delta^{(k)}=c/d^{k}$ if $k\geq 1$ . Let $k(\epsilon)$ be the last pattern $k$ with $\Delta^{(k)}>\epsilon/LD^{2}$ . Since $\Delta^{(k)}$ monotonically decreases to zero, $k(\epsilon)$ is finite. Lemma 3.3 bounds the number of steps before building block pattern $k(\epsilon)$ is used by

A bound on the number of steps using pattern $\mathfrak{h}^{(k(\epsilon))}$ before an $\epsilon$ -minimizer follows from the convergence guarantee (3.1). Namely, the number of applications of the pattern $(1-\eta)\mathfrak{h}^{k(\epsilon)}$ needed is at most

Hence, if $k(\epsilon)>0$ and so $k(\epsilon)=\lfloor\log_{d}(cLD^{2}/\epsilon)\rfloor$ , an $\epsilon$ -minimizer is found by iteration

Otherwise if $k(\epsilon)=0$ , then an $\epsilon$ -minimizer is found within $LD^{2}/\epsilon$ iterations. Since $k(\epsilon)=0$ requires $\epsilon>LD^{2}c/d$ , one can verify $LD^{2}/\epsilon<10.3280\left(\frac{LD^{2}}{\epsilon}\right)^{0.94662}$ . ∎

This convergence rate improves if stronger lower bounds on $\Delta^{(k)}=c/d^{k}$ are provided. Convergence theory matching the conjectured optimal rate of would follow immediately if one could show a bound with $d=1+\sqrt{2}$ . This would give a rate of $O(1/\epsilon^{\log_{1+\sqrt{2}}(2)})$ and has the nice property that $R_{k}$ would become a constant. Such an idealized, potentially optimal, accelerated stepsize pattern would just apply each $\mathfrak{h}^{(k)}$ a constant number of times.

Lemma 3.1 further enables direct analysis of gradient descent for strongly convex minimization. Let $D_{k}=\sup\{\|x-x_{\star}\|\mid f(x)\leq f(x_{k})\}$ and $t^{(k)}=2^{k+1}-1$ . Note that $\mu$ -strong convexity of $f$ (defined as $f-\frac{\mu}{2}\|\cdot\|^{2}$ being convex) ensures $\frac{\mu}{2}D_{k}^{2}\leq\delta_{k}$ .In fact, this is the only property of strong convexity used in our analysis here. So our convergence guarantee presented in Theorem 3.2 holds more generally for any problem satisfying only a quadratic growth bound. Observe that if $\delta_{st^{(k)}}\leq LD_{k}^{2}\Delta^{(k)}$ , the objective gap contracts after applying the pattern $(1-\eta)\mathfrak{h}^{(k)}$ with

Conversely, if $\delta_{st^{(k)}}>LD_{k}^{2}\Delta^{(k)}$ , straightforwardness ensures that

Hence, every application of this pattern yields a contraction of at least

Given the condition number $\kappa=L/\mu$ and fixing $\eta=1/2$ , one can select the stepsize pattern giving the best contraction. Consider $k(\kappa)=\sup\{k\mid\Delta^{(k)}\geq\frac{1}{2\kappa}\}$ . Supposing $k(\kappa)>0$ , we have $\Delta^{(k(\kappa))}=c/d^{k(\kappa)}$ , giving bounds of $\frac{1}{2\kappa}\leq\Delta^{(k(\kappa))}\leq\frac{d}{2\kappa}$ . Noting $\sum_{i=0}^{t^{(k)}-1}\mathfrak{h}^{(k(\kappa))}_{i}\geq 2(1+\sqrt{2})^{k}$ , one has $\sum_{i=0}^{t^{(k)}}\mathfrak{h}^{(k)}_{i}\geq 2(2c\kappa/d)^{\log_{d}(1+\sqrt{2})}$ . Then the guarantee (3.2) gives a contraction factor after applying the pattern of stepsizes of $1-(2c\kappa/d)^{\log_{d}(1+\sqrt{2})}\kappa^{-1}.$ Since this contraction is attained after $t^{(k(\kappa))}=2^{k(\kappa)+1}-1\leq 2(2c\kappa/d)^{\log_{d}(2)}$ iterations, when amortized, the per-iteration contraction factor is

This gives the following convergence theorem. Note this accelerated rate is slower than that concurrently developed by . If one could improve the value of $d$ above to $1+\sqrt{2}$ , our rate would improve to match theirs.

2 Tight Bounds on Straightforward Patterns

Some insight into the structure of straightforward stepsize sequences follows from considering particular “bad” problem instances. Since straightforward patterns always yield a descent, showing a failure to achieve a descent on any instance suffices to prove a pattern is not straightforward. Three elementary bounds of this form are given below.

Consider the one-dimensional objective function $f(x)=\frac{1}{2}x^{2}$ with $x_{0}=1$ . Then each gradient descent step is $x_{k+1}=(1-h_{k})x_{k}$ . Hence $x_{t}=\prod_{i=0}^{t-1}(1-\mathfrak{h}_{i})$ . To achieve any descent by the end of this pattern, this must be between one and minus one. ∎

Consider the one-dimensional Huber objective function $f(x)=\frac{1}{2}x^{2}$ if $|x|\leq 1$ and $f(x)=|x|-\frac{1}{2}$ otherwise. Letting $x_{0}=-\sum_{j<i}\mathfrak{h}_{j}-1$ , one has $x_{i}=-1$ . Supposing $\mathfrak{h}_{i}\geq\sum_{j\neq i}\mathfrak{h}_{j}+2$ , one has $x_{i+1}\geq\sum_{j\neq i}\mathfrak{h}_{j}+1$ . Consequently, gradient descent failed to descend as $x_{t}\geq\sum_{j<i}\mathfrak{h}_{j}+1$ . ∎

Without loss of generality, $i=0$ and $i_{+}=1$ . Consider the one-dimensional objective function $\frac{1}{2}x^{2}+\frac{1}{2}$ for $x\leq 1$ and $x$ otherwise. Suppose $(1-\mathfrak{h}_{i})(1-\mathfrak{h}_{i_{+}})\geq\sum_{j\not\in\{i,i_{+}\}}\mathfrak{h}_{j}+1\geq 0$ . Letting $x_{0}=1$ , we then have $x_{1}=(1-\mathfrak{h}_{0})\leq 1$ and $x_{2}=(1-\mathfrak{h}_{1})(1-\mathfrak{h}_{0})$ . By our assumption, we then have $x_{2},\dots,x_{t}\geq 1$ . So no descent is achieved by applying the pattern and hence it is not straightforward. ∎

The first two of these bounds on how large straightforward patterns are actually tight in all of our proposed building block patterns $\mathfrak{h}^{(k)}$ . The selection of the middle stepsize $\mu_{k}$ is exactly the sum of all the other stepsizes plus two, matching Proposition 3.5. The selection of the one-quarter and three-quarters stepsize $\alpha_{k-2}$ is exactly the choice making the product in Proposition 3.4 equal one (Appendix A.2 verifies this).

Arguments based on these bounds can show that our first two building block patterns have the longest total length possible among all straightforward patterns. We believe this is true for all of our proposed patterns but only provide proof for the global maximality for $k=0,1$ .

For $t=1$ , no straightforward pattern has $\sum_{i=0}^{t-1}\mathfrak{h_{i}}\geq 2$ . (Hence the rescaled patterns $(1-\eta)\mathfrak{h}^{(0)}$ approach having maximum length.)

This is an immediate consequence of Proposition 3.5 with $i=0$ . ∎

For $t=3$ , no symmetric straightforward pattern has $\sum_{i=0}^{t-1}\mathfrak{h_{i}}\geq 8$ . (Hence the rescaled patterns $(1-\eta)\mathfrak{h}^{(1)}$ approach having maximum length.)

Maximizing $\sum_{i=0}^{2}h_{i}$ over the region constrained by the inequalities of Propositions 3.4, 3.5, and 3.6 ensures no pattern $h=(a,b,a)$ has $2a+b\geq 8$ (See Mathematica proof 3.1). ∎

A Spectral Certificate for Straightforwardness

All that remains to complete our analysis is the proof of Lemma 3.1. The remainder of this paper completes the substantial technical work needed to verify this. As an overview, the main result in this section (Theorem 4.1) gives a sufficient condition for straightforwardness that is more practical to verify than the nonemptiness of $\mathcal{S}_{\mathfrak{h},\Delta}$ . Then Section 5 constructs certificates $\lambda^{(k)}$ showing that the sufficient conditions of Theorem 4.1 are met for each $\mathfrak{h}^{(k)}$ . Finally, Sections D and E do the heavy algebraic work explicitly verifying these sufficient conditions are met.

This section’s main result, Theorem 4.1, considers patterns $\mathfrak{h}$ that may themselves not be straightforward, but for which $(1-\eta)\mathfrak{h}$ is straightforward for any $\eta\in(0,1)$ . Below, we show the set of straightforward stepsize patterns is star-convex with respect to the all zero stepsize pattern, justifying the search for such a rescaling theorem.

Suppose $\mathcal{S}_{\mathfrak{h},\Delta}$ is nonempty and $\Delta\geq 0$ . Then, $\mathcal{S}_{\theta\mathfrak{h},\omega\Delta}$ is nonempty for all $\theta\in(0,1]$ and $\omega\in$ .

It suffices to prove the proposition with $\omega=1$ as the constraints defining $\mathcal{S}_{\theta\mathfrak{h},\omega\Delta}$ only relax as $\omega$ decreases. The case $\theta=1$ follows by definition. Let $\theta\in(0,1)$ and let $(\lambda,\gamma)\in\mathcal{S}_{\mathfrak{h},\Delta}$ . We claim that $(\lambda,\theta\gamma)\in\mathcal{S}_{\theta\mathfrak{h},\Delta/\theta}$ . The first four constraints defining $\mathcal{S}_{\theta\mathfrak{h},\Delta/\theta}$ hold. We will need the following fact to verify the remaining constraints: for any fixed $\xi\geq 0$ , $\theta\mapsto M(\xi,\theta\mathfrak{h})$ is an affine function in $\theta$ with a PSD constant term, $M(\xi,0)$ . Then,

The other constraint holds similarly. We deduce that $(\lambda,\theta\gamma)\in\mathcal{S}_{\theta\mathfrak{h},\Delta/\theta}\subseteq\mathcal{S}_{\theta\mathfrak{h},\Delta}$ . ∎

If we additionally fix $\eta=1/2$ and assume that $t\geq 3$ , $H\geq 8$ , and $\mathfrak{L}\in(0,1]$ is a lower-bound on the second-smallest eigenvalue of $W_{2}(\lambda)$ , then we may take any $\Delta>0$ satisfying

We divide our proof into three parts. First in Section 4.1.1, we construct $\gamma$ from $\lambda$ satisfying the needed linear equality constraint. Then Section 4.1.2 shows a positive $\Delta>0$ exists with $(\lambda,(1-\eta)\gamma)\in\mathcal{S}_{(1-\eta)\mathfrak{h},\Delta}$ . Finally, Section 4.1.3 improves this, providing a quantitative lower bound on $\Delta$ .

To prove Theorem 4.1, we require some additional notation. Let

With this notation, we may decompose $M(\lambda,\mathfrak{h})=W_{1}(\lambda,\mathfrak{h})+W_{2}(\lambda)$ . Note that $W_{1}$ is bilinear in its arguments and $W_{2}$ is linear in $\lambda$ .

Below, we verify that the second constraint defining $\mathcal{S}_{\mathfrak{h},\Delta}$ , i.e., the linear constraint on $\gamma$ , is satisfied for our construction. The lemma below explains what the first two linear constraints in the definition of $\mathcal{S}_{\mathfrak{h},\Delta}$ require of $\lambda$ and $\gamma$ . Its proof is immediate from expanding the definition of $a_{i,j}$ .

The equation $\sum_{i,j=0}^{t}\lambda_{i,j}a_{i,j}=a_{\star,t}-a_{\star,0}$ holds if and only if

The sum of the zeroth row of $\lambda$ is one larger than the sum of the zeroth column of $\lambda$ .

Suppose the support of $\gamma$ is contained in $\left\{(\star,i):\,i\in[0,t]\right\}\cup\left\{(i,i+1):\,i\in[0,t-1]\right\}$ , then the equation $\sum_{i\neq j}\gamma_{i,j}a_{i,j}=2\sum_{i=0}^{t-1}\mathfrak{h}_{i}a_{\star,0}$ holds if and only if

Comparing Lemma 4.2 with our construction of $\gamma$ , we see that the first $t$ constraints in (4.1) are satisfied. To show that the last constraint is satisfied as well, it is enough to show that the sum of all left-hand side expressions in (4.1) is zero, i.e., $2H=\sum_{i=0}^{t}\gamma_{\star,i}=\sqrt{2H}\mathbf{1}^{\intercal}\phi$ . This is done in the following lemma.

It holds that $\mathbf{1}^{\intercal}\phi=\sqrt{2H}$ .

1.2 Existence of Associated Δ>0Δ0\Delta>0

The first three defining constraints of $\mathcal{S}_{(1-\eta)\mathfrak{h},\Delta}$ are satisfied regardless of $\Delta$ and $\eta$ . Similarly, $\lambda\geq 0$ does not depend on $\Delta$ or $\eta$ . Next, $\lambda+\Delta(1-\eta)\gamma$ is nonnegative for all $\Delta>0$ small enough by the assumption that $\lambda_{i,i+1}>0$ for all $i\in[0,t-1]$ and the observation that those are the only negative entries of $\gamma$ . It remains to check that the two PSD constraints defining $\mathcal{S}_{(1-\eta)\mathfrak{h},\Delta}$ hold for all $\Delta>0$ small enough.

$\mathfrak{A}$ is PSD by construction (in fact, rank-one). $\mathfrak{B}$ is PSD as $W_{2}$ maps nonnegative matrices to PSD matrices. We deduce that the first PSD constraint in the definition of $\mathcal{S}_{(1-\eta)\mathfrak{h},\Delta}$ holds regardless of $\Delta$ and $\eta$ .

We next evaluate $\mathbf{1}^{\intercal}W_{1}(\gamma,\mathfrak{h})\mathbf{1}$ and $\mathbf{1}^{\intercal}W_{2}(\gamma)\mathbf{1}$ . For the first expression,

For the second expression, we have $\mathbf{1}^{\intercal}W_{2}(\gamma)\mathbf{1}=\frac{1}{2}\sum_{i=0}^{t}\gamma_{\star,i}=H$ .

We deduce that $\mathbf{1}^{\intercal}Q_{3}\mathbf{1}=(1-\eta)H>0$ , or equivalently, the matrix $Q_{3}$ has a positive component in the kernel of $(1-\eta)Q_{1}+\eta Q_{2}$ . Thus, the Schur complement lemma shows the existence of a positive $\Delta$ satisfying the theorem statement.

1.3 A Quantitative Lower Bound on ΔΔ\Delta

For the second part of the theorem statement, we will assume $t\geq 3$ , $H\geq 8$ , $\eta=1/2$ . Additionally, we will assume that $\mathfrak{L}\in(0,1]$ lower bounds the second-smallest eigenvalue of $W_{2}(\lambda)$ .

We will now repeat portions of the proof of the first claim more quantitatively to derive explicit bounds on $\Delta$ . By the above arguments, it suffices to pick $\Delta$ so that $\lambda+\frac{\Delta}{2}\gamma\geq 0$ and $\frac{1}{2}Q_{1}+\frac{1}{2}Q_{2}+\Delta Q_{3}\succeq 0$ .

First, as the superdiagonal entries of $\gamma$ are defined as $\gamma_{i,i+1}=\sum_{j=0}^{i}\gamma_{\star,j}-2H=\sqrt{2H}\sum_{j=0}^{i}\phi_{i}-2H$ , we have that each of these entries is bounded in magnitude by $2H$ (see Lemma 4.3). In particular, the requirement that $\lambda+\frac{\Delta}{2}\gamma\geq 0$ is satisfied as long as

This is the first term in our bound on $\Delta$ .

We now turn to the constraint $\frac{1}{2}Q_{1}+\frac{1}{2}Q_{2}+\Delta Q_{3}\succeq 0$ .

where $\xi$ is the projection of $(\sqrt{H},-\phi/\sqrt{2})$ onto the orthogonal complement of $\tfrac{(t+1,-\mathbf{1}_{t+1})}{\sqrt{(t+1)^{2}+(t+1)}}$ . To see that this is possible, note that $\tfrac{(t+1,-\mathbf{1}_{t+1})}{\sqrt{(t+1)^{2}+(t+1)}},$ $\frac{\xi}{\|\xi\|_{2}}$ and $\tfrac{\mathbf{1}_{t+2}}{\sqrt{t+2}}$ are orthogonal and unit norm. Note that $(\sqrt{H},-\phi/\sqrt{2})$ is in the span of the first two basis vectors. Also note that the first and last vectors in this basis span the kernel of $Q_{2}$ .

where the last inequality follows from our assumption $t\geq 3$ .

Abusing notation, we will also write $Q_{i}$ to denote the matrix $Q_{i}$ written in this new basis. Then, $\frac{1}{2}Q_{1}+\frac{1}{2}Q_{2}$ can be bounded below by

We bound the minimum eigenvalue of the top-left two-by-two submatrix here as the determinant divided by the trace of the submatrix:

Plugging back into our lower bound on $\frac{1}{2}Q_{1}+\frac{1}{2}Q_{2}$ gives

We may also write $Q_{3}$ in this basis in block form by first letting $q_{3}$ be the orthogonal projection of $Q_{3}\tfrac{\mathbf{1}_{t+2}}{\sqrt{t+2}}$ onto the subspace orthogonal to $\mathbf{1}_{t+1}$ and writing

Note that $\mathbf{1}_{t+2}^{\intercal}Q_{3}\mathbf{1}_{t+2}=\frac{1}{2}H$ from our previous section.

We now apply the Schur complement lemma to the bottom right entry of this second matrix, which yields that this matrix is PSD if and only if

The second inequality follows because $q_{3}$ is the orthogonal projection of $Q_{3}\tfrac{\mathbf{1}_{t+2}}{\sqrt{t+2}}$ onto a subspace.

We now bound $\left\|Q_{3}\right\|_{\textup{op}}$ and $\left\|Q_{3}\mathbf{1}_{t+2}\right\|_{2}$ separately.

For this, we apply the triangle inequality to break the summation over all entries in $\gamma$ into summations over just the $\star$ th row and the first superdiagonal:

For the second term in the bound on $\left\|Q_{3}\right\|_{\textup{op}}$ , note that for all $i\in[0,t-1]$ , we have

is a tridiagonal matrix. Using the bounds $\left|\gamma_{i,i+1}\right|\leq 2H$ and $\mathfrak{h}_{i}\leq H$ , we have that all entries in this tridiagonal matrix have magnitude bounded by $2(2H)(1+H/2)\leq 3H^{2}$ . We may bound the operator norm of this matrix as the sum of the operator norms of each diagonal. Thus,

For the second term in our bound on $\left\|Q_{3}\mathbf{1}_{t+2}\right\|_{1}$ , a direct calculation shows

Thus, we may bound this quantity above by $2H^{2}$ .

This implies that our final expression is PSD as long as

Invoking our bound on $H$ , we have that $(\lambda,\gamma/2)\in\mathcal{S}_{\mathfrak{h}/2,\Delta}$ for any

Proof of Lemma 3.1 and Construction of Certificates

First for $k=0$ , we address the straightforwardness of $h^{(0)}/2=$ with parameter $\Delta^{(0)}=1/2$ individually. One can verify this by noting the below values of $(\lambda,\gamma)$ are a member of $\mathcal{S}_{,1/2}$ (see Mathematica proof 5.1):

For each $k\geq 1$ , in light of Theorem 4.1, Lemma 3.1 follows if we can demonstrate certificates $\lambda^{(k)}$ satisfying the sufficient conditions on straightforwardness therein for each $\mathfrak{h}^{(k)}$ . We do this in four parts. First, we provide a construction for our claimed certificates $\lambda^{(k)}$ , which satisfy the first three conditions in Theorem 4.1. Next, we will derive lower bounds on $\lambda_{i,i+1}$ and the second-smallest eigenvalue of $W_{2}(\lambda)$ :

Here, $w_{k}$ is a vector that will be constructed shortly. The proofs of these two facts are relatively tedious but ultimately straightforward. We defer the proofs of these two statements to Appendices D and E.

The proof of Lemma 3.1 will then be a direct application of Theorem 4.1 and the bounds stated in (5.1) and (5.2).

The remainder of this section provides our construction for the certificates $\lambda^{(k)}$ and verifies the lower bound on the second smallest eigenvalue of $W_{2}(\lambda)$ .

Although our construction of $\lambda^{(k)}$ only uses elementary arithmetic operations, it is still quite complicated. We first present as examples our certificates $\lambda^{(k)}$ for $k=1,2$ . Together with Theorem 4.1, this proves the straightforwardness of each $(1-\eta)\mathfrak{h}^{(k)}$ for $k=1,2$ .

2 Preliminary Definitions

We will begin with some auxilliary definitions that will be used in our definition of $\lambda$ . For $i\geq 0$ , we let $p(i)$ denote the number of one’s in the binary expansion of $i+1$ and we let $z(i)=\lfloor\log_{2}(i+1)\rfloor$ .

At times, it will be convenient to index entries of $\lambda^{(k)}$ backwards from the bottom-right instead of top-left. We define $\operatorname{rev}_{k}(t)\coloneqq 2^{k+1}-2-t$ . The value of $k$ will always be clear from context and we will simply write $\operatorname{rev}(t)$ . Lemma B.5 lists some useful relationships between $\nu(r+1)$ , $z(r)$ , $p(r)$ and $\nu(\operatorname{rev}(r+1))$ , $z(2^{k+2}-2-\operatorname{rev}(r))$ and $p(\operatorname{rev}(r))$ .

For $k\geq 1$ , recursively define $\sigma^{(k)}$ to be a vector of length $2^{k-1}-1$ as follows: $\sigma^{(1)}=\emptyset$ is the empty vector, and for $k\geq 2$ ,

For $k\geq 0$ , let $\rho_{k}$ be a vector of length $\lfloor 2^{k-1}\rfloor+1+2^{k}$ as follows: $\rho_{0}=$ and for $k\geq 1$ ,

For $k\geq 0$ , let $w_{k}$ be the vector of length $2^{k}$ as follows: $w_{0}=$ and for $k\geq 1$ ,

We now present our construction for $\lambda^{(k)}$ . Throughout this section, fix $k\geq 1$ . The matrix $\lambda^{(k)}$ is a $(2^{k+1}+1)\times(2^{k+1}+1)$ matrix that we construct below. We will index the rows and columns of $\lambda^{(k)}$ by $\{\star,0,1,\dots,2^{k+1}-1\}$ .

There are five cases for $\lambda^{(k)}_{i}$ depending on $i$ . See Figure 2 for a depiction of the different cases.

Case 1: $i+1<2^{k}$ and $i+1$ is not a power of 2.

Case 2: $i+1>2^{k}$ and $2^{k+1}-1-i$ is not a power of 2.

While we find this vector concatenation notation to be more compact and easier to read than specifying each entry of $\lambda$ separately, we will also give $\lambda$ entry by entry.

Case 1: $i+1<2^{k}$ and $i+1$ is not a power of 2.

Case 2: $i+1>2^{k}$ and $2^{k+1}-1-i$ is not a power of 2.

In the remainder of this section, we fix $k\geq 1$ and let $\lambda=\lambda^{(k)}$ . In this subsection only, we will let $\mathfrak{L}(\cdot)$ denote the second-smallest eigenvalue of its argument.

We recognize this as the Laplacian of the weighted graph on the vertices $[0,2^{k+1}-1]$ where the vertices $i$ and $j$ are connected with an edge with weight $\lambda_{i,j}+\lambda_{j,i}$ . We will lower-bound $\mathfrak{L}(W_{2}(\lambda))$ by identifying a simpler weighted graph that is dominated by our original graph and bounding its second-smallest eigenvalue instead.

The following lemma computes some lower bounds on the entries of $\lambda$ . This will allow us to identify our simpler graph. Its proof simply requires checking the relevant entries of $\lambda$ and is deferred to Section A.4.

We are now ready to prove a lower bound on $\mathfrak{L}(W_{2}(\lambda))$ .

Let $k\geq 1$ . Then, the second smallest eigenvalue of $W_{2}(\lambda)$ is at least $\frac{1}{286}(1+\sqrt{2})^{-k}$ .

Let $\mathcal{L}$ denote the Laplacian of the caterpillar graph. Lemma 5.1 implies that $\mathfrak{L}(W_{2}(\lambda))\geq\mathfrak{L}(\mathcal{L})$ .

Now, suppose for the sake of contradiction that $\mathfrak{L}(W_{2}(\lambda))\leq\frac{1}{286}(1+\sqrt{2})^{-k}$ . For simplicity, let $\Xi=\frac{1}{286}(1+\sqrt{2})^{-k}$ within this proof. We will deduce from this assumption that the “vertex-weighted” star graph that arises from contracting all path vertices in the caterpillar graph to a single vertex has small algebraic connectivity, from which we will derive a contradiction.

From this, we deduce that for each $i\in[1,k]$ , that $\frac{1}{\sqrt{2}}(x(2^{i}-1)-x(2^{i-1}-1))^{2}\leq\Xi$ . Chaining these inequalities, for any pair of path vertices, the difference in $x$ values is bounded above by $k2^{1/4}\Xi^{1/2}$ .

Next, let $\mathcal{L}_{\textup{star}}$ denote the weighted Laplacian on the leaf vertices and the vertex $o$ , that contains an edge of weight $\frac{1}{2\sqrt{2}}(1+\sqrt{2})^{-k}$ from $o$ to each leaf vertex. Given a leaf vertex $i$ , let $\textup{parent}(i)$ denote the path vertex that $i$ was attached to in the caterpillar graph. Then,

Here, on the second line, we have used the inequality $(a-b)^{2}\leq 2(a-c)^{2}+2(c-b)^{2}$ .

One may check (see Mathematica proof 5.2) that the expression in parentheses is $<1$ for all $k\geq 1$ .

On the other hand, (5.5) is the variational characterization of the second smallest eigenvalue of

The identity block in the bottom-right is $N\times N$ where $N\geq 2$ . Thus, the second smallest eigenvalue of ${\rm Diag}(\mu)^{-1/2}\mathcal{L}_{\textup{star}}{\rm Diag}(\mu)^{-1/2}$ is at least $\frac{1}{2\sqrt{2}}(1+\sqrt{2})^{-k}$ by Cauchy’s Interlacing Theorem, a contradiction.∎

Acknowledgements. Benjamin Grimmer’s work was supported in part by the Air Force Office of Scientific Research under award number FA9550-23-1-0531.

References

Appendix A Deferred Proofs and Calculations

First we verify $\alpha_{k}\leq\beta_{k+1}$ , then we prove our bounds relating $H_{k}$ and $\mu_{k}$ , and finally we show $\beta_{k}\leq\alpha_{k}$ . The defining equation of $\alpha_{k}$ is that $\alpha_{k}$ is the unique root larger than $1$ of $q_{k}$ . It is clear that $\beta_{k+1}\geq 1$ , thus to show $\alpha_{k}\leq\beta_{k+1}$ suffices to show that $q_{k}(\beta_{k+1})>0$ . We compute

To bound $H_{k}$ , we first claim that the sum of all $\beta_{i}$ ’s in $\mathfrak{h}^{(k)}$ is given by $\sqrt{2}((1+\sqrt{2})^{k}-1)-2k$ . This follows from noting each $\beta_{i}$ appears in $\mathfrak{h}^{(k)}$ a total of $2(2^{k-i-1}-1)$ times and so the total value of $\beta_{i}$ terms in $\mathfrak{h}^{(k)}$ is

Recall also that $\mu_{k}$ is the sum of all other entries in $\mathfrak{h}^{(k)}$ plus two. Thus,

To get an upper bound on this quantity, note that $\alpha_{i}\leq\beta_{i+1}=1+(1+\sqrt{2})^{i}$ . Thus,

To get a lower bound, observe that $\alpha_{i}\geq 1$ for all $i$ . Thus $H_{k}\geq 2\sqrt{2}((1+\sqrt{2})^{k}-1)$ . The first set of bounds for $\mu_{k}$ follows from the identity $\mu_{k}-1=H_{k}/2$ . The final claimed inequality follows directly as $\mu_{k}-\mu_{k-1}\geq\sqrt{2}(1+\sqrt{2})^{k}-2\sqrt{2}(1+\sqrt{2})^{k-1}=(2-\sqrt{2})(1+\sqrt{2})^{k-1}$ .

To conclude, we show $\beta_{k}\leq\alpha_{k}$ by showing $q_{k}(\beta_{k})\leq 0$ . We compute

where the inequality exactly follows from our lower bound on $\mu_{k}-1$ .

We previously claimed in Section 3.2 that $\prod_{i=0}^{t-1}(\mathfrak{h}^{(k)}_{i}-1)=1$ . We show this by induction below. The following lemma is useful in this calculation.

$\mu_{i+1}=\mu_{i}+2(\alpha_{i}+\beta_{i+1}-2)$ .

It is clear that $\mu_{i+1}=\mu_{i}+2\alpha_{i}+2\sum_{j=1}^{2^{i}-1}\pi^{(i)}_{j},$ which we have seen is equal to

As a base case, when $k=1$ , note that $\alpha_{0}$ is the positive root of the polynomial

so that $\alpha_{0}=\frac{3}{2}$ , and $h^{(1)}=(\frac{3}{2},5,\frac{3}{2})$ , which satisfies this equation. Now, assume that the equation holds for $\mathfrak{h}^{(k-1)}$ , i.e. $\prod_{i=0}^{2^{k-1}-1}(\mathfrak{h}^{(k-1)}_{i}-1)=1$ . Expand this expression to be

Computing the product of all of the $\beta_{i}-1$ , this simplifies to

Combining our previous expressions, we obtain by Lemma A.1 that

Now, we note that by the defining equation of $\alpha_{k-1}$ that

In this section, we fix a $k\geq 1$ and let $\lambda$ denote the construction $\lambda^{(k)}$ given in Section 5.3. Our goal is to bound $\min_{i\in[0,t-1]}\lambda_{i,i+1}$ below for use in proving Lemma 3.1 via Theorem 4.1.

We prove this by bounding $\lambda^{(k)}_{i,i+1}$ for any $i\in[0,2^{k+1}-2]$ separately across our five cases. We will also make use of the easy that $\mu_{z(i)}-1\geq(1+\sqrt{2})^{z(i)+1}$

Case 1: $i+1<2^{k}$ and $i+1$ is not a power of 2.

Case 4: $i=2^{k}-1$ . If $k=0$ , then this entry is $\beta_{1}/(\mu_{k}-1)$ . Otherwise, this entry is $\beta_{0}/(\mu_{k}-1)$ . As $\beta_{1}\geq\beta_{0}$ , we may bound the general case as

A.4 Bounding the Edge Weights in the Caterpillar graph

In Section 5.4, we required lower bounds on specific entries of $\lambda^{(k)}$ . These lower bounds were stated in Lemma 5.1 and are proved below.

Fix $k\geq 1$ and let $\lambda=\lambda^{(k)}$ .

Appendix B Useful Supporting Identities and Properties

Here are two recurrence relations for $\lambda^{(k)}$ that are useful in various calculations. They say that certain entries (or rows) of $\lambda^{(k)}$ are simply scalar multiples of other entries (or rows).

The claim holds if $i=j$ . In the remainder assume $i\neq j$ . Note that $\lambda^{(k)}_{i,j}\neq 0$ if and only if $-2^{\nu(i+1)-1}\leq j-i\leq 2^{\nu(i+1)}$ . As $(2^{z^{\prime}}+j)-(2^{z^{\prime}}+i)=j-i$ and $\nu(2^{z^{\prime}}+i+1)=\nu(i+1)$ we deduce that $\lambda^{(k)}_{i,j}\neq 0$ and and only if $\lambda^{(k)}_{2^{z^{\prime}}+i,2^{z^{\prime}}+j}\neq 0$ .

If $\lambda^{(k)}_{i,j}=0$ , then the claim holds. In the remainder, assume $\lambda^{(k)}_{i,j}\neq 0$ . In this case, our definitions imply that

and since $\nu(2^{z^{\prime}}+i+1)=\nu(i+1)$ , and the number of one’s in the binary expansion of $2^{z^{\prime}}+i$ is exactly one more than the number of one’s in the binary expansion of $i$ , we see that

Comparing the two expressions yields our result. ∎

In the first case, we have that both $\lambda^{(k)}_{\operatorname{rev}(r)}$ and $\lambda^{(k)}_{\operatorname{rev}(r^{\prime})}$ are defined according to Case 2 and that $\nu(\operatorname{rev}(r^{\prime})+1)=\nu(r^{\prime}+1)=\nu(r+1)=\nu(\operatorname{rev}(r)+1)$ . Thus, these two rows (after the natural re-indexing) are scalar multiples of $\rho_{\nu(r+1)}$ (and hence of each other). Using the identities Lemma B.5, we have

Comparing the two coefficients proves the claim.

Comparing the two coefficients proves the claim. ∎

B.2 Algebraic Properties of μ𝜇\mu

There are various algebraic properties of $\mu$ that we will use in this paper.

and taking the square root of both sides implies the last claim. ∎

Suppose $k\geq 1$ , then $2(\beta_{k}-1)+\sqrt{(\mu_{k-1}-1)(\mu_{k}-1)}=\mu_{k}-1$ , or equivalently, $2(1+\sqrt{2})^{k-1}+\sqrt{(\mu_{k-1}-1)(\mu_{k}-1)}=\mu_{k}-1$ .

By Lemma A.1, we have $\mu_{k-1}=\mu_{k}-2(\alpha_{k-1}+\beta_{k}-2)$ . Applying this identity and combining, we get that this is equivalent to

which is the defining equation for $\alpha_{k-1}$ . ∎

B.3 Properties of the revrev\operatorname{rev} operation

Suppose $0\leq r\leq 2^{k}-1$ . Then, $\operatorname{rev}(\operatorname{rev}(r))=r$ and

In particular, $p(\operatorname{rev}(r))+z(r)-k=z(r)-p(r)-\nu(r+1)+2$ .

We recall that $\operatorname{rev}(r)=2^{k+1}-2-r$ . For the first identity, note

Then, recall that the 2-adic valuation for the sum or difference of two numbers with different 2-adic valuations is the smaller of two. As $\nu(r+1)\leq k$ , we have that the above quantity is equal to $\nu(r+1)$ .

Finally, $p(\operatorname{rev}(r))$ is the number of ones in the binary expansion of $2^{k+1}-1-r$ . This is equivalent to $k+1$ minus the number of ones in the binary expansion of $r$ . Now, consider the binary expansion of $r+1$ . The smallest position for which the binary expansion of $r+1$ is equal to one, i.e., $\nu(r+1)$ , is the same as the smallest position for which the binary expansion of $r$ is equal to zero. The difference in the number of ones in their binary expansion is then $\nu(r+1)-1$ . We have deduced that $p(\operatorname{rev}(r))=k+1-p(r-1)=k+1-(p(r)+\nu(r+1)-1)=k-p(r)-\nu(r+1)+2$ . ∎

The support of $\lambda^{(k)}$ has a rich combinatorial structure, which we need to make use of extensively in our computations. We record some facts about this support and their proofs here. For now, let us fix $k$ , and let $\lambda$ refer to $\lambda^{(k)}$ .

From our definition of $\lambda$ , $\lambda_{i,j}\neq 0$ if and only if $i>j>i-2^{\nu(i+1)-1}$ or $i+2^{\nu(i+1)}>j>i$ .

It is useful to us to understand for a fixed $j$ , which are the $i$ where $\lambda_{i,j}\neq 0$ . For a given $j\leq 2^{k+1}-1$ , we let

Suppose that $j\in[1,2^{k}-1]$ has the binary expansion $j=\sum_{a=0}^{z}b_{a}2^{a}$ , where $b_{i}\in\{0,1\}$ and $b_{z}=1$ , then

We begin by showing that if $i=\sum_{a=r}^{z}b_{a}2^{a}-1$ for some $r\in[0,z]$ where $b_{r}=1$ , then $i\in S_{j}^{-}$ . Note that $\nu(i+1)=r$ . This implies that

Now, we show the reverse direction, i.e., if $i<j$ and $i+2^{\nu(i+1)}\geq j$ , then $i=\sum_{a=\nu(i+1)}^{z}b_{a}2^{a}-1$ . Note that $z(i+1)\leq z$ since $i<j$ . This implies that the binary expansion of $i+1$ can be expressed as

For the sake of contradiction, suppose that $b_{a}^{\prime}\neq b_{a}$ for some $a\geq\nu(i+1)$ . In this case, let $a^{*}$ be the largest $a$ so that $b_{a}^{\prime}\neq b_{a}$ . If $b_{a^{*}}^{\prime}=0$ while $b_{a^{*}}=1$ , then

If $b_{a^{*}}^{\prime}=1$ , while $b_{a^{*}}=0$ , then

In either case, we reach a contradiction. ∎

Suppose $2^{z}-1\leq j<2^{z+1}-1$ with $z<k$ . Then, $i\in S_{j}^{+}$ if and only if $i\in[j+1,2^{k+1}-2]$ and $i-j\leq 2^{\nu(i+1)-1}$ . In particular, if $i\in S_{j}^{+}$ , then $j<i\leq 2^{z+1}-1$ . If in addition $j=2^{z}-1$ , then $S_{j}^{+}$ is the singleton set $\{2^{z+1}-1\}$ .

This is clear from the support of $\lambda$ . ∎

Appendix D Proof of Theorem D.1

In this section, we will show that $\lambda^{(k)}$ satisfies the first main condition of Theorem 4.1.

The sum of the zeroth row of $\lambda^{(k)}$ is one larger than the sum of the zeroth column of $\lambda^{(k)}$ .

The sum of the $2^{k+1}-1$ row of $\lambda^{(k)}$ is one less than the sum of the $2^{k+1}-1$ column of $\lambda^{(k)}$ .

The equivalence of the two statements follows from Lemma 4.2. Thus, it suffices to prove the three statements in the second claim. We show the first item in Lemma D.16. We show the second item in Lemma D.17, Lemma D.18, and Lemma D.19. We show the third item in Lemma D.20. ∎

In the remainder of this section, we fix $k\geq 1$ and let $h=\mathfrak{h}^{(k)}$ and $\lambda=\lambda^{(k)}$ . Section D.1 computes the sums of each row of $\lambda$ . Section D.2 computes the sums of each column of $\lambda$ . Finally, Section D.3 proves lemmas claimed above. Various algebraic identities involving the entries of $h$ will be used in this section and proven in Appendix B.

Each row of $\lambda$ is composed of various components; we will enumerate their sums here.

For $k\geq 0$ , $\sum_{i=1}^{2^{k}-1}\pi^{(k)}_{i}=\beta_{k+1}-2$ .

First, note that $\pi^{(0)}=\emptyset$ so that $\sum_{i=1}^{0}\pi^{(0)}_{i}=0$ . We also have $\beta_{1}-2=0$ .

Note that the first $2^{k-1}-1$ entries of $\pi^{(k)}$ are identical to those of $\pi^{(k-1)}$ , and the same holds for the last $2^{k-1}-1$ entries. By induction, we may conclude that

See Mathematica proof D.1 for a proof of the second identity. ∎

For $k\geq 1$ , the sum of the entries in $\sigma^{(k)}$ is

We show this by induction: note that the sum of the entries in $\sigma^{(1)}$ is 0, and so is this expression.

For $k>1$ , the sum of the entries in $\sigma^{(k)}$ is

where $\Sigma_{k-1}$ is the sum of the entries in $\sigma^{(k-1)}$ .

By Lemma D.1, and the induction hypothesis, this is

Note that $(\beta_{k-1}-2)+\beta_{k}=(2+\sqrt{2})(1+\sqrt{2})^{k-2}$ , so that this becomes

For $k\geq 0$ , the sum of the entries in $\rho_{k}$ is

A simple calculation shows that this holds for $k=0$ (see Mathematica proof D.2). Now suppose $k\geq 1$ . Using Lemma D.1 and Lemma D.2, we see that the sum of the entries in $\rho_{k}$ is

This is equal to the claimed expression (see Mathematica proof D.3). ∎

For $k\geq 0$ , the sum of the entries of $w_{k}$ is $\sqrt{\mu_{k}-1}$ .

We proceed by induction: the sum of the entries of $w_{0}$ is $1=\sqrt{\mu_{0}-1}$ .

By expanding the definition and applying Lemma D.1 and the inductive hypothesis, the sum of the entries of $w_{k}$ is $\frac{\beta_{k}-2}{\sqrt{\mu_{k}-1}}+\frac{\beta_{k}}{\sqrt{\mu_{k}-1}}+\sqrt{\mu_{k-1}-1}$ . This is equivalent to the claimed expression by Lemma B.4. ∎

D.1.2 Computing Row Sums

We will give the sum of the entries in each row, dividing into the cases above.

Case 1: $i+1<2^{k}$ and $i+1$ is not a power of 2. The sum of the entries of $\lambda_{i}$ is

Case 2: $i+1>2^{k}$ and $2^{k+1}-1-i$ is not a power of 2. The sum of the entries of $\lambda_{i}$ is

Case 4: $i=2^{k}-1$ . The sum of the entries of $\lambda_{i}$ is

Cases 1 and 2 follow directly by definition and Lemma D.3. The expressions for Cases 3 and 4 follow by adding up the partial sums computed in the previous subsection. See Mathematica proof D.4 and Mathematica proof D.5.

For Case 5, we combine the expressions for the partial sums (see Mathematica proof D.6) to get the row sum in the form

Combining these expressions proves the claim (see Mathematica proof D.7). ∎

D.2 Column Sums

denote the indices above $j$ and indices below $j$ in the support of the $j$ th column. The following lemmas give computational descriptions of these sets that will be useful in computing the column sums. We will give their proofs in Appendix C

Suppose that $j\in[1,2^{k}-1]$ has the binary expansion $j=\sum_{a=0}^{z}b_{a}2^{a}$ , where $b_{i}\in\{0,1\}$ and $b_{z}=1$ , then

D.2.2 Computing Column Sums

Fix $1\leq j\leq 2^{k}-1$ . First suppose $j+1$ is not a power of $2$ and let $z$ so that $2^{z}-1<j<2^{z+1}-1$ . Let $p$ denote the number of ones in the binary expansion of $j+1$ . Then,

On the other hand, if $j=2^{z}-1$ for some $z=1,\dots,k$ , then

We begin with the case where $j+1$ is not a power of $2$ . Let $j+1=\sum_{a=0}^{z}b_{a}2^{a}$ be the binary expansion of $j+1$ . Since $2^{\nu(j+1)}$ is the largest power of 2 dividing $j+1$ , it follows that

Case (i): Let $i=2^{z}-1$ . As $j+1$ is not a power of $2$ , we have that $\nu(j+1)<z$ and $\nu(j-i)=\nu(j+1)$ . Thus, by definition of $\lambda$ , we have

where $p_{i}$ is the number of ones in the binary expansion of $i+1$ and we note that $\nu(j-i)=\nu((j+1)-(i+1))=\nu(j+1)$ . Now, note that if we sum over all $i$ of the form of case 2, there is exactly one such term for each possible value of $p_{i}$ from $2$ through $p-1$ . That is, if we add all such $\lambda_{i,j}$ , we obtain (see Mathematica proof D.8)

Once again, if we add up all such terms, we collect one for each possible value of $p_{i}$ from $p$ to $p+\nu(j+1)-1$ , yielding (see Mathematica proof D.9)

If $j+1$ is not a power of two, then adding the sums in the three cases yields (see Mathematica proof D.10)

Adding all of these terms yields $\frac{1}{2}(\mu_{z}-1)$ (see Mathematica proof D.11). ∎

Fix $j$ so that $2^{z}-1<j<2^{z+1}-1$ where $z<k$ . If there are $p$ one’s in the binary expansion of $j+1$ , then

We will show this by induction on the number of ones in the binary expansion of $j+1$ .

If there are exactly 2 one’s in the binary expansion of $j+1$ , then

as can be seen in Mathematica proof D.12.

Now, we assume $p>2$ . Let $j=2^{z}+j^{\prime}$ , where $j^{\prime}<2^{z}-1$ has $p-1$ one’s in its binary expansion. We have by Lemma D.9 that $S_{j}^{+}=\{i^{\prime}+2^{z}:i^{\prime}\in S_{j^{\prime}}^{+}\}\cup\{2^{z+1}-1\}$ . Once again we consider two cases: either $2^{z+1}-1\in\{i^{\prime}+2^{z}:i^{\prime}\in S_{j^{\prime}}^{+}\}$ , or it is not.

Assume that $2^{z+1}-1\in\{i^{\prime}+2^{z}:i^{\prime}\in S_{j^{\prime}}^{+}\}$ . This implies that $2^{z}-1\in S_{j^{\prime}}^{+}$ , or equivalently that $z(j^{\prime})=z-1$ . In this case,

Now, suppose $i^{\prime}\in S^{+}_{j^{\prime}}$ is not a power of 2. Since $z(i)=z(i^{\prime})+1$ , and $p(i)=p(i^{\prime})+1$ , Lemma B.1 implies that

The only element of $S^{+}_{j^{\prime}}$ and that is one less than a power of 2 is $i^{\prime}=2^{z}-1$ . If $2^{z}-1-2^{a}>j^{\prime}>2^{z}-1-2^{a+1}$ , then $2^{z+1}-1-2^{a}>j>2^{z+1}-1-2^{a}$ , and

Now assume that $2^{z+1}-1\not\in\{i^{\prime}+2^{z}:i^{\prime}\in S_{j^{\prime}}^{+}\}$ . This is equivalent to $2^{z^{\prime}+1}-1>j^{\prime}>2^{z^{\prime}}-1$ for some $z^{\prime}<z-1$ . Now, we have that

By Lemma B.1, we note that for all $i^{\prime}\in S_{j^{\prime}}^{+}$ other than $i^{\prime}=2^{z^{\prime}+1}-1$ ,

It remains to consider $\lambda_{2^{z}+2^{z^{\prime}+1}-1,2^{z}+j^{\prime}}$ and $\lambda_{2^{z+1}-1,2^{z}+j^{\prime}}$ .

Note that $2^{z+1}-2^{z-1}>j>2^{z+1}-2^{z}$ and that $j^{\prime}+1<2^{z^{\prime}-1}$ is not a power of 2, so

Also, if $2^{z^{\prime}+1}-1-2^{a}>j^{\prime}>2^{z^{\prime}+1}-1-2^{a+1}$ , then

Note that $\lambda_{2^{z}+2^{z^{\prime}}-1,2^{z}+j^{\prime}}+\lambda_{2^{z+1}-1,2^{z}+j^{\prime}}=(1+\sqrt{2})^{2(1+z^{\prime}-z)}\frac{\mu_{z+1}-1}{\mu_{z+1}-1}\lambda_{2^{z^{\prime}}-1,j^{\prime}}$ , so that

Inspecting the support of $\lambda^{(k)}$ , we have that $S_{j}^{+}=\varnothing$ , so we need only consider the sum of the entries corresponding to element of $S_{j}^{+}$ .

This formula also holds in the case where $l=k-1$ .

Summing up all entries in the column gives

Suppose $0\leq r<2^{k}-1$ and $r+1$ is not a power of $2$ . Then,

Let $\sum_{a=0}^{k}b_{a}2^{a}$ be the binary expansion of $\operatorname{rev}(r)$ . Equivalently if $\sum_{a=0}^{k}c_{a}2^{a}$ is the binary expansion of $r+1$ , then $b_{a}=(1-c_{a})$ for all $a\in[0,k]$ . Note that $c_{k}=0$ so that $b_{k}=1$ . The set

Suppose $\tau\in[0,k]$ and $b_{\tau}=1$ . Let $i=\sum_{a=\tau}^{k}b_{a}2^{a}-1$ . We have that $0\leq\nu(r+1)<z(r)<k$ . We enumerate the possible values of $\lambda^{(k)}_{i,\operatorname{rev}(r)}$ according to $\tau\in[z(r)+1,k]$ , $\tau\in[\nu(r+1)+1,z(r)-1]$ and $\tau\in[0,\nu(r+1)-1]$ .

Now, suppose $\tau\in[0,\nu(r+1)-1]$ , then $\sum_{a=0}^{\tau-1}b_{a}2^{a}+1=2^{\tau}$ . In this case $\lambda^{(k)}_{i}$ is defined according to Case 2, $p(i)=k+1-p(r)-\tau$ , $\nu(i+1)=\tau$ , and $z(2^{k+1}-2-i)=z(r)$ . Then,

Now suppose $\tau\in[\nu(r+1)+1,z(r)-1]$ satisfies $b_{\tau}=1$ . In this case, $\lambda^{(k)}_{i}$ is defined according to Case 2, $\nu(i+1)=\tau$ , and $z(2^{k+1}-2-i)=z(r)$ . Then,

When we sum up over entries of this form, the value of $p(i)$ is in bijection with $[k-z(r)+1,k-\nu(r+1)-p(r)+1]$ . Thus, the final part of this summation is

Finally, summing up all entries gives the desired claim (see Mathematica proof D.13). ∎

Let $0\leq r<2^{k}-1$ and suppose $r+1$ is not a power of $2$ . Then,

We will induct on $p(r)$ . As $r+1$ is not a power of two, we have $p(r)\geq 2$ .

First, suppose $p(r)=2$ . We compute directly that $S_{\operatorname{rev}(r)}^{+}=\left\{\operatorname{rev}(2^{z(r)}-1)\right\}$ . Then

One can verify that the second expression coincides with the first expression when $\nu(r+1)+1=z(r)$ (see Mathematica proof D.14). Thus,

We can also verify that this expression coincides with the claimed expression for $\sum_{i=S^{+}_{\operatorname{rev}(r)}}\lambda^{(k)}_{i,\operatorname{rev}(r)}$ (see Mathematica proof D.15).

Now, suppose $p(r)\geq 3$ and let $r^{\prime}=r-2^{z}$ . We have that $p(r^{\prime})=p(r)-1\geq 2$ so that $r^{\prime}+1$ is not a power of two. Let $z^{\prime}=z(r^{\prime})$ . We will now apply Lemma D.7. There are two cases: where $z^{\prime}=z-1$ and where $z^{\prime}<z-1$ .

In the first case, $z^{\prime}=z-1$ and Lemma D.7 states that $S^{+}_{\operatorname{rev}(r)}=\left\{i-2^{z}:\,i\in S^{+}_{\operatorname{rev}(r^{\prime})}\right\}$ so that

For all $i\in S^{+}_{\operatorname{rev}(r^{\prime})}$ , it must hold that $\operatorname{rev}(2^{z^{\prime}+1}-1)<\operatorname{rev}(r^{\prime})<i\leq\operatorname{rev}(2^{z^{\prime}}-1)$ . We may now apply Lemma B.2 to get

Now, consider the case $z^{\prime}<z-1$ . In this case, Lemma D.7 states that $S^{+}_{\operatorname{rev}(r)}=\left\{\operatorname{rev}(2^{z}-1)\right\}\cup\left\{i-2^{z}:\,i\in S^{+}_{\operatorname{rev}(r^{\prime})}\right\}$ . Again, for all $i\in S^{+}_{\operatorname{rev}(r^{\prime})}$ , it must hold that $\operatorname{rev}(2^{z^{\prime}+1}-1)<\operatorname{rev}(r^{\prime})<i\leq\operatorname{rev}(2^{z^{\prime}}-1)$ . We may now apply Lemma B.2 to get

We evaluate the two terms separately. First, note that $2^{z^{\prime}}<r^{\prime}+1<2^{z^{\prime}+1}\leq 2^{z-1}$ . Thus,

For the second term, we apply the inductive hypothesis and Lemma B.2 to get

One can check that the sum of these two expressions is given by the claimed expression (see Mathematica proof D.16). ∎

D.3 Comparisons of Row and Column Sums

The sum of the zeroth row of $\lambda$ is one larger than the sum of the zeroth column of $\lambda$ .

The only entry of $\lambda_{0}$ is 2. On the other hand, the only row which has an entry in column 0 is $\lambda_{1}$ , and $\lambda_{1,0}=1$ , which implies the lemma. ∎

For $i=1,\dots,2^{k}-2$ , the sum of the $i$ th row of $\lambda$ is equal to the sum of the $i$ th column of $\lambda$ .

We first show this assuming $i=2^{z}-1$ . It follows from our earlier work that the sum of the entries of the $i^{th}$ row is

On the other hand, the nonzero entries of the $i^{th}$ column are those indexed by $S_{i}^{-}$ and $2^{z+1}-1$ . The sum of these entries is

Thus, the row sum and the column sum are equal.

We next show this assuming $i+1$ is not a power of 2. If the number of one’s in the binary expansion of $i+1$ is $p>1$ , then the sum of the entries in the $i^{th}$ row of $\lambda$ is

On the other hand, we have seen that the sum of the entries in the $i^{th}$ column is

We verify that these two expressions are equivalent in Mathematica proof D.17. ∎

Let $k\geq 1$ and $i=2^{k}-1$ . The sum of the $i$ th row of $\lambda$ is equal to the sum of the $i$ th column of $\lambda$ .

By Lemmas D.5 and D.10, we have that the $(2^{k}-1)$ th row sum and column sum are both $\frac{\mu_{k}-1}{2}$ .∎

Suppose $2^{k}\leq j<2^{k+1}-1$ . Then, the sum of the entries in the $j$ th column of $\lambda^{(k)}$ is equal to the sum of the entries in the $j$ th row of $\lambda^{(k)}$ .

By Lemma D.5, this is also the associated row sum.

Now, suppose $j=\operatorname{rev}(r)$ where $0\leq r<2^{k}-1$ and $r+1$ is not a power of $2$ . Then, by Lemmas D.14 and D.15, we have that the $j$ th column sum is

On the other hand, noting that $p(\operatorname{rev}(r))=k-p(r)-\nu(r+1)+2$ , we have that the row sum is given by

These expressions are equal as is verified in Mathematica proof D.18. ∎

The sum of the $2^{k+1}-1$ row of $\lambda$ is one less than the sum of the $2^{k+1}-1$ column of $\lambda$ .

The final entry in the $(2^{k+1}-1)$ th column is where $i=2^{k}-1$ . Then $\lambda_{i,j}=\frac{1}{\sqrt{\mu_{k}-1}}$ and so the column sum is one. ∎

Appendix E Proof of Theorem E.1

In this section, we will show that $\lambda^{(k)}$ satisfies the the second main condition of Theorem 4.1.

where $\phi^{(k)}$ is defined in Equation 5.3.

For the remainder of the section, we fix $k\geq 1$ and let $h=\mathfrak{h}^{(k)}$ , $\lambda=\lambda^{(k)}$ , $M=M(\lambda,h)$ , and $\phi=\phi^{(k)}$ .

We perform the computation of $M$ entry by entry. The nature of the definition of $\lambda$ requires us to break this computation down into a number of distinct cases. We give a summary of the possible cases in Table 1.

We show that $M_{i,i}=\frac{1}{2}\phi_{i}^{2}$ for all $i\in[0,2^{k+1}-1]$ in Lemmas E.2, E.4, E.3 and E.5.

We then need to consider the off-diagonal entries of $M$ . Note that $M$ is symmetric, so we only need to consider $M_{i,j}$ where $i>j$ .

We will break the remaining entries of $M$ into cases, mirroring the cases in the definition of $\lambda$ . To reiterate, the cases we consider for an index $i$ are

$i<2^{k}-1$ and $i+1$ is not a power of 2.

$i>2^{k}-1$ and $2^{k+1}-i-1$ is not a power of 2.

We summarize the possible cases and where they are proved in Table 1.∎

We will make extensive use Theorem D.1 as well as the computed expressions for the partial row and column sums from Section D.1.1 and the computational descriptions of $S^{+}_{j}$ and $S^{-}_{j}$ in Section D.2.1.

In this subsection, we will consider the various diagonal entries of $M$ . We divide these into four cases: $M_{i,i}$ where $0\leq i<2^{k}-1$ , where $i=2^{k}-1$ , where $2^{k}-1<i<2^{k+1}-1$ , and where $i=2^{k+1}-1$ .

First, we present a lemma concerning the entries of $M$ .

We expand the entries of $M$ defined in Equation 2.4 and note that $\lambda$ is zero on its $\star$ th row and column:

If additionally, $0<i<2^{k+1}-1$ , then by Theorem D.1, we have that

If $i<2^{k}-1$ , then $M_{i,i}=\frac{1}{2}\phi_{i}^{2}=0$ .

When $i=0$ , we have that the $i$ th row sum is $2$ , the $i$ th column sum is $1$ , and $h_{i}=3/2$ . Thus,

Now, suppose $i>0$ and $2^{z}<i+1<2^{z+1}$ for some $z\in[1,k-1]$ . By Lemma D.5,

Subtracting these two expressions shows that $M_{i,i}=0$ (see Mathematica proof E.1).

On the other hand, if $i=2^{z}-1$ for some $z\in[1,k-1]$ , then by Lemma D.5

Substituting these expressions for $\mu_{z}$ and $\mu_{z+1}$ into our expression for $M_{i,i}$ shows that it is equal to zero (see Mathematica proof E.2).∎

Let $i=\operatorname{rev}(r)$ for some $r\in[0,2^{k}-1)$ . We divide into cases: either $r+1$ is a power of two or it is not.

If $r=2^{a}-1$ for some $a\in[0,k-1]$ , then by Lemma D.5,

We also have that $S^{+}_{i}=\varnothing$ , so that $M_{i,i}=\frac{\beta^{2}_{a+1}}{2(\mu_{a+1}-1)}$ . On the other hand, we also have that

Now suppose that $i=\operatorname{rev}(r)$ for some $0\leq r<2^{k}-1$ where $r+1$ not a power of 2. Then

By Lemma D.5, and the identity the identity $p(\operatorname{rev}(r))=k-p(r)-\nu(r+1)+2$ (see Lemma B.5), we have

Here, we use the fact that the $(2^{k+1}-1)$ th column sum is equal to one, the $(2^{k+1}-1)$ th row is zero, and that $\phi_{2^{k+1}-1}=1$ . ∎

E.2 Off-Diagonal Entries of M𝑀M

The following lemma gives a description for $M_{i,j}$ where $i\neq j$ and will be used repeatedly throughout this subsection.

We will use the definition of $A$ and $C$ to pull out the nonzero entries of each sum. The first sum becomes

We consider a useful lemma, which can be shown by just considering the support of $\lambda$ :

Fix $i<j<2^{k+1}-1$ . If $z(i)>z(j)+1$ then $M_{i,j}=0$ .

Also, if $z(i)=z(j)+1$ and $i+1$ is not a power of 2, then $M_{i,j}=0$ .

In all of the following calculations, we will consider the expression

We begin by taking care of the easier cases.

Suppose $0\leq r<2^{k}-1$ and $r+1$ is not a power of $2$ . Then,

The second identity holds as $\phi_{r}=0$ . We turn to the first identity. By Lemma E.6, we have

Note that if $r<2^{k-1}-1$ , then every term in this expression is zero. Thus we will assume $r>2^{k-1}-1$ . Now, we will write $2^{k}-(r+1)=\tau$ . Note that $1\leq\tau\leq 2^{k-1}-1$ .

Combining these identities gives $M_{r,2^{k}-1}=0$ . ∎

Suppose $0\leq r<2^{k}-1$ and $r+1$ is not a power of $2$ . Then,

Let $\tau$ so that $2^{\tau}<r+1<2^{\tau+1}$ . Note, we must have $\tau\in[0,k-1]$ . By Lemma E.6, we have

On the other hand, $\phi_{2^{k}-1}=\sqrt{\mu_{k}-1}$ and $\phi_{\operatorname{rev}(r)}=(w_{k})_{2^{k}-r-1}$ . ∎

The only nonzero entry in the first two summations is

Substituting both expressions back in gives

Here, on the last line we have used Lemma B.3. This completes the proof as $\phi_{\operatorname{rev}(2^{r}-1)}=\frac{\beta_{r+1}}{\sqrt{\mu_{r+1}-1}}$ and $\phi_{\operatorname{rev}(2^{\tau}-1)}=\frac{\beta_{\tau+1}}{\sqrt{\mu_{\tau+1}-1}}$ . ∎

The second equality follows from the fact that $\phi_{2^{k+1}-1}=1$ .

We turn to the first equality. Let $0\leq s\leq 2^{k+1}-2$ . First, note if $0\leq s\leq 2^{k}-2$ , then $\phi_{s}=0$ and $M_{s,2^{k+1}-1}=0$ by Lemma E.7. Thus, we may assume $2^{k}-1\leq s\leq 2^{k+1}-2$ in the remainder of this proof. We will break the rest of the proof into cases depending on how $\lambda_{s}$ is defined, i.e., Cases 2, 4, and 5.

Case 2: Suppose $s=\operatorname{rev}(r)$ for some $0\leq r<2^{k}-1$ for which $r+1$ is not a power of $2$ . By Lemma E.6,

By Lemma D.6, the nonzero summands in the sum correspond to indices $i=\operatorname{rev}(2^{\tau}-1)$ where $z(r)+1\leq\tau\leq k$ . We also have that $h_{s}=\beta_{\nu(r+1)}$ . Combining these identities gives

Note that $\phi_{\operatorname{rev}(r)}=\frac{\beta_{\nu(r+1)}}{\sqrt{\mu_{z(r)+1}-1}}$ .

Case 4: Suppose $s=2^{k}-1$ . By Lemma E.6,

On the other hand, $\phi_{2^{k}-1}=\sqrt{\mu_{k}-1}$ .

E.3 Off-Diagonal Entries in Case (1,1)11(1,1)

Our goal for this subsection will be to prove Lemma E.19, stating that the off-diagonal entries $M_{i,j}$ are zero for all $0\leq j<i<2^{k}-1$ , where neither $i+1$ nor $j+1$ are powers of two (equivalently, where $p(i),p(j)\geq 2$ ). We will prove Lemma E.19 inductively on the value of $\min(p(i),p(j))$ . We will make use of the following lemma

Fix $i>j$ such that $i+1$ and $j+1$ are both not powers of 2. If $z(i)=z(j)$ , so that $i=2^{z}+i^{\prime}$ and $j=2^{z}+j^{\prime}$ , and $z(j^{\prime})<z(i^{\prime})-1$ , then $M_{i,j}=0$ .

Now, note that because $i+1$ is not a power of 2,

Also note that because $j<2^{z}+2^{z-1}-1$ , and $j$ is not a power of 2,

The result follows by noting that $h_{i}=\beta_{\nu(i+1)}$ and $h_{j}=\beta_{\nu(j+1)}$ , so that

The following lemma will be the base case for the subsequent inductive proof and itself requires nontrivial calculations.

Suppose $0\leq j<i<2^{k}-1$ satisfy $\min\{p(i),p(j)\}=2$ . Then, $M_{i,j}=0$ .

If $z(i)\neq z(j)$ then the result follows from Lemma E.7. So, assume that $i=2^{z}+i^{\prime}$ and $j=2^{z}+j^{\prime}$ where $i^{\prime},j^{\prime}<2^{z}$ . Lemma E.17 shows that if $z(i^{\prime})>z(j^{\prime})+1$ , then $M_{i,j}=0$ . From now on, we will assume that $z(i^{\prime})\leq z(j^{\prime})+1$ .

We consider three cases, either $p(i)=p(j)=2$ ; $p(i)=2$ and $p(j)>2$ , and $p(i)>2$ and $p(j)=2$ .

In this case, $i=2^{z}+2^{r}-1$ and $j=2^{z}+2^{t}-1$ for some $t<r<z$ . By the assumption that $j<i$ and that $z(i^{\prime})>z(j^{\prime}+1)$ , we have that $t=r-1$ . Now, considering the support of $\lambda$ , we have that

Combining these values shows that $M_{i,j}=0$ (see Mathematica proof E.8).

Let $i=2^{z}+2^{r}-1$ , and let $j=2^{z}+j^{\prime}$ , where $2^{r-1}-1<j^{\prime}<2^{r}-1$ . We begin by noting

We break the remainder of the proof into two cases: where $j^{\prime}=2^{r}-2^{a}-1$ for some $a\geq 0$ or where $2^{r}-2^{a}-1<j^{\prime}<2^{r}-2^{a-1}-1$ for some $a\geq 1$ .

Combining these expressions show (see Mathematica proof E.9)

Now, we consider the other subcase, in which, for some $a\geq 1$ ,

so that this sum is given by (E.3). Next, we have

Combining these expressions shows (see Mathematica proof E.10)

Let $j=2^{z}+2^{r}-1$ . There are two subcases: where $z(i^{\prime})=r+1$ and that where $z(i^{\prime})=r$ .

If $z(i^{\prime})=r+1$ , then $i=2^{z}+2^{r+1}+i^{\prime\prime}$ , where $i^{\prime\prime}>0$ by the assumption that $p(i)>2$ . In this case,

Combining this expression with the identities

Now, consider the case in which $z(i^{\prime})=r$ . In this case, we have

It can be seen by Lemma D.6 that $S_{i}^{-}\cap\{0,\dots,j\}=\{j,2^{z}-1\}$ . That is,

One may check (see Mathematica proof E.11) that the two expressions coincide when $r=z-1$ . Thus, we may take the first expression in all cases.

Combining the previous expressions, we have that

Suppose $0\leq j<i<2^{k}-1$ , where neither $i+1$ nor $j+1$ is a power of $2$ . Then, $M_{i,j}=0$ .

By Lemma E.7, if $z(i)\neq z(j)$ , then $M_{i,j}=0$ . We may thus assume $z(i)=z(j)<k$ and let $z$ denote their common value.

We will show the result by induction on $\min\{p(i),p(j)\}$ . The base case, where $\min\{p(i),p(j)\}=2$ is shown in Lemma E.18. In the remainder, we assume that $p(i),p(j)\geq 3$ .

Now, set $i^{\prime}=i-2^{z}$ and $j^{\prime}=j-2^{z}$ . We see that $p(i^{\prime})=p(i)-1$ and $p(j^{\prime})=p(j)-1$ by considering the binary expansion of $i+1$ and $j+1$ . Thus, neither $i^{\prime}+1$ nor $j^{\prime}+1$ is a power of two.

We show the claim directly if $z(i^{\prime})\neq z(j^{\prime})$ . In this case, it holds that $z(j^{\prime})<z(i^{\prime})$ . In the sum

Here, the expression for $\lambda_{2^{z+1}-1,j}$ follows from the fact that $z(j^{\prime})<z$ .

Now, assume that $z(i^{\prime})=z(j^{\prime})$ and denote their common value by $z^{\prime}$ . By the inductive hypothesis, we may assume that

We now wish to compare $M_{i,j}$ to $M_{i^{\prime},j^{\prime}}$ . For this, we divide the summation defining $M_{i,j}$ into parts and then rearrange:

Note that $h_{2^{z}+i^{\prime}}=h_{i^{\prime}}$ , since $i^{\prime}+1$ is not a power of 2, and $\nu(2^{z}+i^{\prime}+1)=\nu(i^{\prime}+1)$ . Similarly, $h_{2^{z}+j^{\prime}}=h_{j^{\prime}}$ . We finally recall Lemma B.1, which shows that this expression is the same as

Here, there are two cases: either $z^{\prime}=z-1$ or it does not.

Here, we use the fact that $h_{i}=h_{i^{\prime}}=\beta_{\nu(i+1)}$ and similarly $h_{j}=h_{j^{\prime}}=\beta_{\nu(j+1)}$ . The first term in parentheses is zero by the definition of $\sigma$ .

The second term in parantheses is also zero upon plugging in the various values of $\lambda$ (see Mathematica proof E.13):

The second term is identically zero due to the identities (see Mathematica proof E.14)

It remains to show that the first term is also zero. Let $r=2^{z^{\prime}+1}-1-j^{\prime}$ . We have that $1\leq r<2^{z^{\prime}}$ and that $\tau=z(r-1)\in[0,z^{\prime}-1]$ . There are two final cases: where $r=2^{\tau}$ and where $2^{\tau}<r<2^{\tau+1}$ .

In the first case, we additionally have that $\nu(j+1)=\tau$ and

In both cases, plugging the relevant $\lambda$ values into the first term in parentheses shows that it is equal to zero. See Mathematica proof E.15 and Mathematica proof E.16.∎

E.4 Off-Diagonal Entries in Case (2,2)22(2,2)

Our goal for this subsection will be to prove Lemma E.8, stating that the off-diagonal entries of $M_{i,j}$ with $i,j<2^{k}-1$ are all 0. We start with a simplifying lemma:

Letting $h=\mathfrak{h}^{(k)}$ and $\lambda=\lambda^{(k)}$ , and fix $j<i\leq 2^{k}-1$ so that neither $i+1$ nor $j+1$ are powers of 2. If $z(i)\neq z(j)$ , then $M(h,\lambda)_{\operatorname{rev}(i),\operatorname{rev}(j)}=\frac{1}{2}\phi_{\operatorname{rev}(i)}\phi_{\operatorname{rev}(i)}$ .

Here, we use the fact that the series is telescoping to simplify the computation. ∎

The following lemma will be the base case for a subsequent inductive proof.

Here, in the second line, we use Lemma D.13 to simplify the first summation. The second summation uses Lemma B.5 to write $p(\operatorname{rev}(2^{z}+2^{t}-1))=k-t$ (see Mathematica proof E.17).

The only possible nonzero term in the second summation occurs in row $\operatorname{rev}(2^{z}-1)$ . Thus, the second summation is equal to

The only possible nonzero term in the second summation is

We deduce that regardless of whether $z=z^{\prime}+1$ , that

It remains to show that the two square-bracketed terms in (E.4) are zero.

This is identically zero as can be seen in Mathematica proof E.21.

This is identically zero as can be seen in Mathematica proof E.22.

Substituting this expression shows that the term in parentheses is zero (see Mathematica proof E.23).

By the inductive hypothesis, it holds that

We deal with the first square-bracketed term above. Applying Lemma B.2 and the identity we get from the inductive hypothesis, we may simplify this term to

There are two cases for the second term: either $z^{\prime}=z-1$ or $z^{\prime}<z-1$ .

Suppose $z^{\prime}=z-1$ . Then, the second term is

Now, suppose $z^{\prime}<z-1$ . Then, the second term is