Category: optimization
This is just a basic note on Cetegen and Stuber's paper (apologies for the paywall) published a few days ago, Optimal Design of Controlled Environment Agricultural Systems Under Market Uncertainty. This post "simplifies" problem (4) from a bilevel optimization problem to a single (convex) optimization problem which can be readily solved. As a side note, I only skimmed the references provided, and some were too complicated for me to understand, especially at first glance.^{1} It is possible this technique is well known in the specific literature referenced, but hidden under mountains of notation and definitions. I also note that this is not the problem they (approximately) solve, but the method presented here might be useful in that case, too.
I've used this trick before in a few other papers, with the main example being a paper coauthored with Kunal Shah and Mac Schwager, found here, specifically in equation (8) and below, starting on page 10.
The basic (very general!) idea is to replace a "min-max" optimization problem with a "min" optimization problem. For example, say we are given the following optimization problem
with variables \(x \in \mathbb{R}^m\) and \(y \in \mathbb{R}^n\) and functions \(f : \mathbb{R}^m \times \mathbb{R}^n \to \mathbb{R}\), \(g:\mathbb{R}^n \to \mathbb{R}\), and \(h: \mathbb{R}^m \to \mathbb{R}\). Now, consider the usual Lagrangian of the "inner problem" (the one with the max over \(y\)), which we know is
If we define
then, for any \(\lambda \ge 0\) and any feasible \(y\) (i.e., \(y\) that satisfies \(g(y) \le 0\)), we have that
(The first inequality follows from the definition of \(\sup\) while the last inequality follows since \(\lambda \ge 0\) and \(g(y) \le 0\) which means that \(\lambda g(y) \le 0\).) So, for any \(x\), we know that, if you give me any \(\lambda \ge 0\), then \(\bar f(x, \lambda)\) is an "overestimator" of \(f(x, y)\) for any feasible \(y\).
But, since this is true for any \(\lambda \ge 0\) and \(x\), then certainly
for any \(x\).^{2}
If we use our new overestimator \(\bar f\) instead of \(f\), our new problem is now a simple optimization problem
that is not in min-max form and requires no other special techniques to solve. The optimal value of this problem need not be the same as that of the original, but is always guaranteed to be at least as large.
Of course, this can help, but it certainly doesn't solve our problem. We just need one more piece to the puzzle!
If you've studied a bit of convex analysis the punchline is this: the inequality we have for \(f\) and \(\bar f\) above holds exactly at equality when \(f\) is concave in \(y\) and \(g\) is convex in \(y\). More specifically
for any \(x\).^{3}
When this is true, the new problem has the same optimal value as the original and any solution \(x\) for the original is a solution to the new problem! (Why?)
I won't cover more since the specifics don't matter too much, but the general idea is simple enough, and we now have all the parts to convert the min-max problem (4) of Cetegen and Stuber to a simple convex optimization problem.
Things are mostly algebra from here on out, so apologies in advance, I guess. I will leave much of the "hard" work to the reader :)
The complete problem (4) is, as written in the paper, using (mostly) their notation:
with variable \(x \in \mathbb{R}^n\), where \(r\) is some vector of returns (but the specifics don't matter) and \(\mathcal{M}\) is:
In other words, \(\mathcal{M}\) is the set of positive semidefinite matrices (\(M \ge 0\)) whose entries lie between those of \(M^-\) and \(M^+\). I've also dropped some constant terms in the objective since those don't change the problem.
In this case, the "inner" optimization problem is the one in the objective, which is just
with variable \(M \in \mathbb{R}^{n\times n}\). We can easily write a (slightly not canonical) Lagrangian:
where \(\Lambda^+, \Lambda^- \in \mathbb{R}^{n\times n}_+\) are elementwise nonnegative. (The Lagrangian is non-canonical because I have not included the constraint \(M \ge 0\), which we will enforce below.) It is not hard to show that
As before, the inequality between matrices is with respect to the semidefinite cone.
Plugging this back into the original problem formulation, we now have a convex optimization problem:
The (extra!) variables \(\Lambda^+, \Lambda^- \in \mathbb{R}^{n\times n}\) are included along with the original variable \(x \in \mathbb{R}^n\), and the same problem data as before. This problem, by use of the Schur complement applied to the semidefinite inequality, is easily recast into standard SDP form and can be solved by most standard convex optimization problem solvers, such as SCS or Mosek.
Quick edit (3/18/22): Thanks to bodonoghue85 for finding a typo!
See, e.g. page 429 of this paper, where they seem to re-prove basic things like "the supremum of a bunch of convex functions is convex," and "partial minimization of a convex function is convex," but I'm honestly not 100% sure. ↩
This is called weak duality, c.f. sections 5.1.3 and 5.2.2 of Convex Optimization. ↩
This is often referred to as strong duality, c.f. section 5.3.2 of Convex Optimization. There are some additional conditions for equality to hold, but these are almost always met in practice. ↩