Posted 2023-06-20
It's been a while since I've updated the blog (likely due to the fact that I've been struggling to get it to work with Github pages...). Anyways, it'll, at some point, be migrated over, but for now this will have to do.
This post will focus on a particular, nearly silly, proof of a lower bound for the distance of an unbiased random walk, defined as
where , uniformly. The quantity we want to find a lower bound to is
as is large. We know from a basic, if somewhat annoying, counting argument that
when . In general, we're interested in bounds of the form
Bounds like these are applicable in a number of important lower bounds for online convex optimization (see, e.g., Hazan's lovely overview, section 3.2) though we won't be talking too much about the applications on this one.
Additionally, since (which follows by expanding and using the fact that are independent with mean zero) then
so we know that this bound is tight up to a constant. The first inequality here follows from an application of Jensen's inequality to the square root function (which is concave).
Mostly because I'm bad at counting and always end up with a hilarious number of errors. Plus, this proof is easily generalizable to a number of other similar results!
One simple method for lower-bounding the expectation of a variable like is to note that is nonnegative, so we have the following 'silly' bound
for any , where is the indicator function for the event , that is 1 if and zero otherwise. (The bound follows from the fact that pointwise.) Maximizing over , assuming we have a somewhat tight lower bound over the probability that , then this approach might give us a reasonable lower bound.
In a very general sense, we want to show that is 'anticoncentrated'; i.e., it is reasonably 'spread out', which would indicate that its expectation cannot be too small, since it is nonnegative.
The first idea (or, at least, my first idea) would be to note that, since is on the order of , then maybe we can use this fact to construct a bound for which 'should be' on the order of assuming some niceness conditions, for example, that is a bounded variable.
Unfortunately, just these two simple facts are not enough to prove the claim! We can construct a nonnegative random variable such that its second moment is , it is bounded by , yet . In other words, we wish to construct a variable that is very concentrated around , with 'sharp' peaks at larger values.
Of course, the simplest example would be to take with probability and with probability . Clearly, this variable is bounded, and has as its second moment. On the other hand,
which means that the best bound we can hope for, using just these conditions (nonnegativity, boundedness, and second moment bound) on a variable, is a constant. (Indeed, applying a basic argument, we find that this is the smallest expectation possible.)
This suggests that we need a little more control over the tails of , which gets us to...
Another easy quantity to compute in this case is . (And, really, any even power of is easy. On the other hand, since has a distribution that is symmetric around 0, all odd moments are 0.) Splitting the sum out into each of the possible quartic terms, we find that any term containing an odd power of will be zero in expectation as the are independent. So, we find
This quantity will come in handy soon.
We can, on the other hand, split up the expectation of in a variety of ways. One is particularly handy to get a tail lower bound like the one we wanted in our proof idea (above):
The latter term can be upper bounded using Cauchy–Schwarz,[1]
(Since .) And, since , we finally have:
Rearranging gives us the desired lower bound,
(This is a Paley–Zygmund-style bound, except over rather than nonnegative .)
Now, since we know that
then we have
Parametrizing by for some , we then have
The right-hand-side is maximized at , which gives the following lower bound
And, finally, using the fact that and , we get the final result:
as required, with no need for combinatorics! Of course the factor of is rather weak compared to the factor of , but this is ok for our purposes.
Of course, similar constructions also hold rather nicely for things like uniform variables, or Normally distributed, mean zero variables. Any variable for which the second and fourth moment can be easily computed allows us to compute a lower bound on this expectation. (Expectations of the absolute value of the sums of independently drawn versions of these variables could be similarly computed.) These have no obvious combinatorial analogue, so those techinques cannot be easily generalized, whereas this bound applies immediately.
[1] | Possibly the most elegant proof of Cauchy–Schwarz I know is based on minimizing a quadratic, and goes a little like this. Note that for any . (That this expectation exists can be shown for any assuming both and have finite second moment. If not, the inequality is also trivial.) Expanding gives . Minimizing the left hand side over then shows that , which gives
Multiplying both sides by gives the final result. |