Longest Path Search

Physics, optimization, and impossibility

Posted 2018-12-17

Note: this post is based on the results of this arXiv paper which I've been working on with Stephen Boyd and Jelena Vuckovic.

The main result of the above paper is kind of weird: essentially, it turns out that you can say what devices are physically impossible by phrasing certain problems as optimization problems and then using some basic tools of optimization to derive lower bounds.

To illustrate: imagine you want to generate an engine which is as efficient as possible, then we know the best you could possibly hope to do is given by the second law of thermodynamics. Now, what if (and bear with me here) we want something a little weirder? Say, what if we want a heat sink that has a particular dissipation pattern? Or what if you want a photonic crystal that traps light of a given wavelength in some region? Or a horn which has specific resonances?

We can write down the optimization problems corresponding to each of these circumstances: in general, these problems are very hard to solve in ways that aren't just "try all possible designs and pick the best one." (And there are a lot of possible designs.) On the other hand, by using some simple heuristics—gradient descent, for example—we appear to give much better devices than what almost anyone can do by hand. This approach, while it appears to work well in practice, brings up a few questions with no obvious answers.

  1. Maybe there is some design that is really complicated that these heuristics almost always miss, but that is much better than the current ones.

  2. It is possible that the objective we are requesting is physically impossible to achieve and we will never find a good design.

  3. Many heuristics depend heavily on the initial design we provide. Physical intuition sometimes appears to provide good initializations, but often the final design is unintuitive, so perhaps there are better approaches.

The paper provides (some) answers to these questions. In particular, it answers point (2) as its main goal, which gives a partial answer to (1) (namely that the heuristics we use appear to give designs that are often close to the best possible design, at least for the problems we tested), and an answer to (3), since the impossibility result suggests an initial design as a byproduct of computing the certificate of impossibility.

I'll explain the interesting parts of this paper in more detail below, since the paper (for the sake of brevity) simply references the reader to derivations of the results (and leaves some as exercises).

Lagrange duality

In optimization theory, there is a beautiful idea called Lagrange duality, which gives lower bounds to any optimization problem you can write down (at least theoretically speaking).

Let's say we have the following optimization problem,

minimizef(x)subject toh(x)0, \begin{array}{ll} \text{minimize} & f(x)\\ \text{subject to} & h(x) \le 0, \end{array}

(this encompasses essentially every optimization problem ever) with objective function f:RnRf: \mathbb{R}^n \to \mathbb{R} and constraint function h:RnRmh: \mathbb{R}^n \to \mathbb{R}^m, where the inequality is taken elementwise. Call the optimal value of the objective of the optimization problem pp^\star, which we will see again soon.

Continuing, we can then formulate the Lagrangian of the problem,

L(x,λ)=f(x)+λTh(x), \mathcal{L}(x, \lambda) = f(x) + \lambda^Th(x),

with λ0\lambda \ge 0. Finally, we formulate the dual function

g(λ)=infx(f(x)+λTh(x)). g(\lambda) = \inf_x \left(f(x) + \lambda^Th(x)\right).

Now, and here's the magic, this dual function g(λ)g(\lambda) at any λ0\lambda \ge 0 is always a lower bound for the optimal objective pp^\star. Why? Well,

g(λ)=infxL(x,λ), g(\lambda) = \inf_x \mathcal{L}(x, \lambda),

and, by definition of inf\inf,

infxL(x,λ)L(x,λ), \inf_x \mathcal{L}(x, \lambda) \le \mathcal{L}(x, \lambda),

for every xx. Now, every feasible point xfeasx^\mathrm{feas} of the optimization problem satisfies h(x)0h(x) \le 0 (this is the definition of 'feasible'), so, since λ0\lambda \ge 0,

L(xfeas,λ)=f(xfeas)+λTh(xfeas)0f(xfeas). \mathcal{L}(x^\mathrm{feas}, \lambda) = f(x^\mathrm{feas}) + \underbrace{\lambda^Th(x^\mathrm{feas})}_{\le 0} \le f(x^\mathrm{feas}).

In other words, for any feasible point, L(xfeas,λ)\mathcal{L}(x^\mathrm{feas}, \lambda) is always smaller than the objective value at that point. But, since g(λ)g(\lambda) is smaller than L(x,λ)\mathcal{L}(x, \lambda) for any xx, not just the feasible ones, we have

g(λ)f(xfeas), g(\lambda) \le f(x^\mathrm{feas}),

for any feasible point. This means that it is also at most as large as the optimal value (since every optimal point of this optimization problem is feasible). That is,

g(λ)p. g(\lambda) \le p^\star.

Therefore, for any λ0\lambda\ge 0, we know that g(λ)g(\lambda) is always a lower bound to the optimal objective value!

Of course, sometimes computing g(λ)g(\lambda) is at least as difficult as solving the original problem (due to the inf\inf we have in the definition of gg). It just so happens that many physical equations and objectives we care about are of a form elegant enough to give an explicit formula for gg, which is the main point of this paper.

Best lower bound

Of course, we often want the best (largest) lower bound, not just a lower bound (which can often be quite bad). In other words, we want to maximize our lower bound. We can phrase this as the new optimization problem,

maximizeg(λ)subject toλ0. \begin{array}{ll} \text{maximize} & g(\lambda)\\ \text{subject to} & \lambda \ge 0. \end{array}

What is interesting is that this optimization problem is always convex—e.g., it is almost always easy to compute the optimal value, if we can explicitly write down what gg is. (I won't prove this here, but the proof is very straightforward. Take a peek at section 5.1.2 in Boyd's Convex Optimization.)

Explicit dual function for gg

Problem formulation

Many of the problems we're interested in (including design in photonics via Maxwell's equations,[1] acoustics via Helmholtz's equations, quantum mechanics via Schrodinger's equation, and heat engineering via the heat equation) have physics equations of the form (once discretized)

(A+diag(θ))z=b, (A + \mathrm{diag}(\theta))z = b,

where θRn\theta \in \mathbb{R}^n are the design parameters (e.g. permittivity in the case of photonics, or speed of sound in the material in the case of acoustics) and zRnz \in \mathbb{R}^n is the field (e.g. the electric field in photonics, or the amplitude of the wave in acoustics). ARn×nA \in \mathbb{R}^{n\times n} is a matrix encoding the physics (the curl of the curl in Maxwell's equations, or a discretized Laplacian in Helmholtz's) and bRnb \in \mathbb{R}^n is an excitation of the field.

More specifically, take a peek at Helmholtz's equation:

2a(x)+(ω2c(x)2)a(x)=u(x), \nabla^2 a(x) + \left(\frac{\omega^2}{c(x)^2}\right)a(x) = u(x),

where c:R3R>0c: \mathbb{R}^3 \to \mathbb{R}_{> 0} is a function specifying the speed of sound at every point in the material, while a:R3Ra: \mathbb{R}^3 \to \mathbb{R} is a function specifying the amplitude at each point, u:R3Ru: \mathbb{R}^3 \to \mathbb{R} is a function specifying an excitation, and ωR0\omega \in \mathbb{R}_{\ge 0} is the frequency of the wave. We can make some simple correspondences:

(2A+(ω2c(x)2)diag(θ))a(x)z=u(x)b. \Bigg(\underbrace{\nabla^2}_{A} + \underbrace{\bigg(\frac{\omega^2}{c(x)^2}\bigg)}_{\mathrm{diag}(\theta)}\Bigg)\underbrace{a(x)}_{z} = \underbrace{u(x)}_{b}.

Now, we usually want the field (zz) to look similar to a desired field (which we will call z^\hat z), while satisfying the physics equation described above. We can phrase this in several ways, but a particularly natural one is by attempting to minimize the objective zz^22\left\|z - \hat z\right\|_2^2.

Finally, we are only able to choose materials within a specific range: that is, θminθθmax\theta^\mathrm{min} \le \theta \le \theta^\mathrm{max}.

Putting all of this together, we can write the optimization problem as

minimize12zz^22subject to(A+diag(θ))z=bθminθθmax. \begin{array}{ll} \text{minimize} & \frac12\left\|z - \hat z\right\|_2^2\\ \text{subject to} & (A + \mathrm{diag}(\theta))z = b\\ & \theta^\mathrm{min} \le \theta \le \theta^\mathrm{max}. \end{array}

which is exactly problem (1) in the paper, in the special case where W=IW = I, the identity matrix.

Deriving the dual

Here is essentially the only 'magic' part of the paper. First, we can write the Lagrangian of the problem as,

L(z,θ,ν)=12zz^22+νT((A+diag(θ))zb). \mathcal{L}(z, \theta, \nu) = \frac12\left\|z - \hat z\right\|_2^2 + \nu^T((A + \mathrm{diag}(\theta))z - b).

Now, there is something weird here: notice that I sneakily dropped the term containing the lower and upper limits for θ\theta—this idea is, in fact, what saves the entire approach. What we will first do is the usual thing: we'll minimize the Lagrangian over all possible zz, which we can easily do since the Lagrangian is a convex quadratic over zz. In particular, taking the gradient over zz and setting it to zero (which is necessary and sufficient by convexity and differentiability) gives us that the optimal zz is

z=z^(A+diag(θ))Tν, z = \hat z - (A + \mathrm{diag(\theta)})^T\nu,

which means that

infzL(z,θ,ν)=12z^(A+diag(θ))Tν22νTb+12z^22. \inf_z \mathcal{L}(z, \theta, \nu) = - \frac12\left\|\hat z - (A + \mathrm{diag(\theta)})^T\nu\right\|_2^2 - \nu^Tb + \frac12\|\hat z\|_2^2.

The next step is then finding the infimum of L\mathcal{L} over θ\theta. That is, finding

infθ(infzL(z,θ,ν)). \inf_\theta \left(\inf_z \mathcal{L}(z, \theta, \nu)\right).

Now, of course, minimization over all θ\theta is a lower bound (but not a very good one), since, unless ν=0\nu = 0, we can send the whole thing to negative infinity. (Why?)

What we can do instead is minimize over θ\theta, constrained to its feasible range, θminθθmax\theta^\mathrm{min} \le \theta \le \theta^\mathrm{max}. I'll leave it as an exercise for the reader as to why this is still a lower bound, but you should ponder this very carefully, because it is the main point of the paper. As an initial hint, take a second look at the proof above for why the Lagrangian is a lower bound in the first place. Of course, a second hint can be found in the paper which gives a somewhat-natural construction (sometimes called the "partial Lagrangian"), but I highly recommend sitting down with a bit of wine (or something stronger) and thinking about it![2]

If you've convinced yourself of this (or haven't yet, but want to continue), we now have the following minimization problem:

g(ν)=infθminθθmax(infzL(z,θ,ν))=infθminθθmax(12z^(A+diag(θ))Tν22νTb+12z^22) \begin{aligned} g(\nu) &= \inf_{\theta^\mathrm{min} \le \theta \le \theta^\mathrm{max}} \left(\inf_z \mathcal{L}(z, \theta, \nu)\right)\\ &= \inf_{\theta^\mathrm{min} \le \theta \le \theta^\mathrm{max}} \left(- \frac12\left\|\hat z - (A + \mathrm{diag(\theta)})^T\nu\right\|_2^2 - \nu^Tb + \frac12\|\hat z\|_2^2\right) \end{aligned}

The trick is to notice two things. One, that the objective is concave in θ\theta and, two, that the objective is separable over each component of θ\theta.

First off, let's say a function v:RRv: \mathbb{R} \to \mathbb{R} is concave over the interval [L,U][L, U], then it achieves its minimum value at the boundaries of the interval. Why? Well, the definition of concavity says, for every 0γ10 \le \gamma \le 1,

v(γL+(1γ)U)γv(L)+(1γ)v(U)min{v(L),v(U)}. v(\gamma L + (1- \gamma)U) \ge \gamma v(L) + (1-\gamma)v(U) \ge \min\{v(L), v(U)\}.

but any point in the interval [L,U][L, U] is a convex combination of LL or UU! So every point inside of the interval is at least as large as the smallest endpoint of the interval, which completes the proof.

This solves our problem: since the objective is separable, then we only need to consider each component of θ\theta, and, because it's concave, then we know that an optimal θi\theta_i is one of either θimin\theta^\mathrm{min}_i or θimax\theta^\mathrm{max}_i. Replacing the complicated inf\inf with this (much simpler) min\min gives the analytic solution for gg:

g(ν)=imin{12(z^iaiTν+θiminνi)2,12(z^iaiTν+θimaxνi)2}νTb+12z^22, \begin{aligned} g(\nu) = \sum_i \min\bigg\{-\frac12 (\hat z_i - a_i^T\nu + \theta_i^\mathrm{min} \nu_i)^2, &- \frac12 (\hat z_i - a_i^T\nu + \theta_i^\mathrm{max} \nu_i)^2\bigg\} \\&- \nu^Tb + \frac12\|\hat z\|_2^2, \end{aligned}

or, writing it in the same way as the paper, by pulling out the 1/2-1/2 (and using θmin=0\theta^\mathrm{min} = 0),

g(ν)=12imax{(z^iaiTν+νi)2,(z^iaiTν+θimaxνi)2}νTb+12z^22. g(\nu) = -\frac12 \sum_i \max\bigg\{ (\hat z_i - a_i^T\nu + \nu_i)^2, (\hat z_i - a_i^T\nu + \theta_i^\mathrm{max} \nu_i)^2\bigg\} - \nu^Tb + \frac12\|\hat z\|_2^2.

Now that we have an analytic form for gg (our set of lower bounds), we can maximize the function to get the best lower bound. As discussed before, this is a convex optimization problem which can be formulated by CVXPY and solved using one of the many available solvers for convex quadratically-constrained quadratic programs (QCQPs) or second-order conic programs (SOCPs).

Results

I'll give a quick summary of the results of the paper, but this is the section I would recommend checking out in the paper itself. (There are pretty pictures!)

For a relatively complex design, we found that a simple, commonly used heuristic finds a design with an objective value lying around 9% above the lower bound, and, therefore has objective value at most 9% above the best possible design. (In general, though, we suspect that the true optimum lies closer to the designs that the heuristics give than the lower bound we come up with.) In other words, it is physically impossible to more-than-marginally improve upon this design.

Additionally (I might discuss how this is done in a later post), we receive an initial design that is qualitatively quite similar to the final, locally-optimized design. See the image below.

<img src="/images/physics-impossibility-results/primal-dual-comparison.png" class="insert" style="width: 100%"> Comparison between the design suggested by the lower bound and the locally-optimized design.

I highly recommend checking out the pre-print that is up on arXiv for more info. Also, if you spot any mistakes (in either the post or the paper), please do @ me!

[1] This is... almost accurate, but not quite. It turns out a small modification to the problem is needed for Maxwell's equations in two and three dimensions. For specifics, see the appendix in the paper.
[2] Honestly, I only recommend reading this blog with wine (or whatever you have at hand). Not sure it's bearable, otherwise.

Built with Franklin.jl and Julia.