Posted **2023-06-20**

It's been a while since I've updated the blog (likely due to the fact that I've been struggling to get it to work with Github pages...). Anyways, it'll, at some point, be migrated over, but for now this will have to do.

This post will focus on a particular, nearly silly, proof of a lower bound for the distance of an unbiased random walk, defined as

$X = \sum_{i=1}^n X_i,$where $X_i \sim \{\pm 1\}$, uniformly. The quantity we want to find a lower bound to is

$\mathbf{E}[|X|],$as $n$ is large. We know from a basic, if somewhat annoying, counting argument that

$\mathbf{E}[|X|] \sim \sqrt{\frac{2}{\pi}}\sqrt{n},$when $n \gg 1$. In general, we're interested in bounds of the form

$\mathbf{E}[|X|] \ge \Omega(\sqrt{n}).$Bounds like these are applicable in a number of important lower bounds for online convex optimization (see, *e.g.*, Hazan's lovely overview, section 3.2) though we won't be talking too much about the applications on this one.

Additionally, since $\mathbf{E}[X^2] = n$ (which follows by expanding and using the fact that $X_i$ are independent with mean zero) then

$\mathbf{E}[|X|] \le \sqrt{\mathbf{E}[X^2]} = \sqrt{n},$so we know that this bound is tight up to a constant. The first inequality here follows from an application of Jensen's inequality to the square root function (which is concave).

Mostly because I'm bad at counting and always end up with a hilarious number of errors. Plus, this proof is easily generalizable to a number of other similar results!

One simple method for lower-bounding the expectation of a variable like $|X|$ is to note that $|X|$ is nonnegative, so we have the following 'silly' bound

$\mathbf{E}[|X|] \ge \mathbf{E}[a\mathbf{1}_{|X| \ge a}] = a \mathbf{Pr}(|X| \ge a),$for any $a \ge 0$, where $\mathbf{1}_{|X| \ge a}$ is the indicator function for the event $|X| \ge a$, that is 1 if $|X| \ge a$ and zero otherwise. (The bound follows from the fact that $|X| \ge a \mathbf{1}_{|X|\ge a}$ pointwise.) Maximizing over $a$, assuming we have a somewhat tight lower bound over the probability that $|X| \ge a$, then this approach might give us a reasonable lower bound.

In a very general sense, we want to show that $|X|$ is 'anticoncentrated'; *i.e.*, it is reasonably 'spread out', which would indicate that its expectation cannot be too small, since it is nonnegative.

The first idea (or, at least, my first idea) would be to note that, since $\mathbf{E}[X^2]$ is on the order of $n$, then maybe we can use this fact to construct a bound for $\mathbf{E}[|X|]$ which 'should be' on the order of $\sqrt{n}$ assuming some niceness conditions, for example, that $|X| \le n$ is a bounded variable.

Unfortunately, just these two simple facts are not enough to prove the claim! We can construct a nonnegative random variable $Y\ge 0$ such that its second moment is $\mathbf{E}[Y^2] = n$, it is bounded by $Y \le n$, yet $\mathbf{E}[Y] = 1$. In other words, we wish to construct a variable that is very concentrated around $0$, with 'sharp' peaks at larger values.

Of course, the simplest example would be to take $Y = n$ with probability $1/n$ and $Y=0$ with probability $1-1/n$. Clearly, this variable is bounded, and has $n$ as its second moment. On the other hand,

$\mathbf{E}[Y] = (1/n)n + (1-1/n)0 = 1,$which means that the best bound we can hope for, using just these conditions (nonnegativity, boundedness, and second moment bound) on a variable, is a constant. (Indeed, applying a basic argument, we find that this is the smallest expectation possible.)

This suggests that we need a little more control over the tails of $|X|$, which gets us to...

Another easy quantity to compute in this case is $\mathbf{E}[X^4]$. (And, really, any even power of $X$ is easy. On the other hand, since $X$ has a distribution that is symmetric around 0, all odd moments are 0.) Splitting the sum out into each of the possible quartic terms, we find that any term containing an odd power of $X_i$ will be zero in expectation as the $X_i$ are independent. So, we find

$\mathbf{E}[X^4] = \sum_{i} \mathbf{E}[X_i^4] + \sum_{i\ne j} \mathbf{E}[X_i^2X_j^2] = n + n(n-1) = n^2.$This quantity will come in handy soon.

We can, on the other hand, split up the expectation of $X^2$ in a variety of ways. One is particularly handy to get a tail *lower bound* like the one we wanted in our proof idea (above):

The latter term can be upper bounded using Cauchy–Schwarz,^{[1]}

(Since $\mathbf{1}_{|X| \ge a}^2 = \mathbf{1}_{|X| \ge a}$.) And, since $\mathbf{E}[\mathbf{1}_{|X| \ge a}] = \mathbf{Pr}(|X| \ge a)$, we finally have:

$\mathbf{E}[X^2] \le a^2 + \sqrt{\mathbf{E}[X^4]}\sqrt{\mathbf{Pr}(|X| \ge a)}.$Rearranging gives us the desired lower bound,

$\mathbf{Pr}(|X| \ge a) \ge \frac{(\mathbf{E}[X^2] - a^2)^2}{\mathbf{E}[X^4]}.$(This is a Paley–Zygmund-style bound, except over $X^2$ rather than nonnegative $X$.)

Now, since we know that

$\mathbf{E}[|X|] \ge a \mathbf{Pr}(|X| \ge a),$then we have

$\mathbf{E}[|X|] \ge a \frac{(\mathbf{E}[X^2] - a^2)^2}{\mathbf{E}[X^4]}.$Parametrizing $a$ by $a = \alpha\sqrt{\mathbf{E}[X^2]}$ for some $0 \le \alpha \le 1$, we then have

$\mathbf{E}[|X|] \ge \alpha(1-\alpha^2)^2\frac{\mathbf{E}[X^2]^{3/2}}{\mathbf{E}[X^4]}.$The right-hand-side is maximized at $\alpha = 1/\sqrt{5}$, which gives the following lower bound

$\mathbf{E}[|X|] \ge \frac{16}{25\sqrt{5}}\frac{\mathbf{E}[X^2]^{3/2}}{\mathbf{E}[X^4]}.$And, finally, using the fact that $\mathbf{E}[X^2] = n$ and $\mathbf{E}[X^4] = n^2$, we get the final result:

$\mathbf{E}[|X|] \ge \frac{16}{25\sqrt{5}}\sqrt{n} \ge \Omega(\sqrt{n}),$as required, with no need for combinatorics! Of course the factor of $16/(25\sqrt{5}) \approx .29$ is rather weak compared to the factor of $\sqrt{2/\pi} \approx .80$, but this is ok for our purposes.

Of course, similar constructions also hold rather nicely for things like uniform $[-1, 1]$ variables, or Normally distributed, mean zero variables. Any variable for which the second and fourth moment can be easily computed allows us to compute a lower bound on this expectation. (Expectations of the absolute value of the sums of independently drawn versions of these variables could be similarly computed.) These have no obvious combinatorial analogue, so those techinques cannot be easily generalized, whereas this bound applies immediately.

[1] | Possibly the most elegant proof of Cauchy–Schwarz I know is based on minimizing a quadratic, and goes a little like this. Note that $\mathbf{E}[(X - tY)^2]\ge 0$ for any $t \in \mathbf{R}$. (That this expectation exists can be shown for any $t$ assuming both $X$ and $Y$ have finite second moment. If not, the inequality is also trivial.) Expanding gives $\mathbf{E}[X^2] - 2t\mathbf{E}[XY] + t^2\mathbf{E}[Y^2] \ge 0$. Minimizing the left hand side over $t$ then shows that $t^\star = \mathbf{E}[XY]/\mathbf{E}[Y^2]$, which gives
$\mathbf{E}[X^2] - \frac{\mathbf{E}[XY]^2}{\mathbf{E}[Y^2]} \ge 0.$
Multiplying both sides by $\mathbf{E}[Y^2]$ gives the final result. |

Built with Franklin.jl and Julia.