The S-procedure and small covering ellipsoids

Posted 2019-06-05

Note: This post was inspired by Kunal Shah’s question that came up at some point during one of our meetings: is there an efficient way of finding an ellipsoid which covers the intersections and unions of a bunch of other ellipsoids?

While this question has been explored somewhat extensively, the exposition is often more general than necessary and aimed at a relatively mathematical audience. Either way, if you’re interested, both papers are fairly well-written—I highly recommend at least a quick skim!

The S-procedure

The S-procedure is a well known lemma in control theory that seeks to answer the following question:

Let’s say we have a bunch of quadratic functions $f_{0}, f_{1}, f_{2}, \dots, f_{n} : 𝐑^{m} \to 𝐑$ . When is it true that

f_{i} (x) \leq 0 for i = 1, \dots, n ⟹ f_{0} (x) \leq 0,

for $x \in 𝐑^{m}$ ? (Recall that a quadratic is a function of the form $f (x) = x^{T} P x + 2 q^{T} x + r$ for symmetric $P \in 𝐑^{m \times m}$ , $q \in 𝐑^{m}$ , and $r \in 𝐑$ ).

There are many reasons to attempt to answer this (surprisingly useful) question. The original motivations were to show stability of systems, though the domain of applications is certainly larger. We can use this to show anything from impossibility results (for example, many of the results of this paper can be recast in terms of the S-procedure) to, well, in our case, the construction of a small covering ellipsoid from a bunch of other ellipsoids, which is itself useful for things like filtering (for localizing drones from noisy measurements for example) along with many other applications.

If you’re familiar with Lagrange duality, this is mostly an equivalent statement—except that this statement is in the special case of quadratics, where you can say a little more than with general functions.

The $n = 1$ case

We can fully and completely answer this question in the case that $n = 1$ : there exists a nonnegative number $τ \geq 0$ such that

f_{0} (x) \leq τ f_{1} (x)

for all $x$ if, and only if, $f_{1} (x) \leq 0 ⟹ f_{0} (x) \leq 0$ .

Why? Well, let say we have a $τ \geq 0$ that satisfies the above inequality. Then, if we have an $x$ such that $f_{1} (x) \leq 0$ , then

f_{0} (x) \leq τ f_{1} (x) \leq τ 0 = 0.

The converse is slightly trickier, so I will defer to B&V’s Convex Optimization which has a very readable presentation of the proof (see B.1 and B.2 in the appendix).

The general case

The general case is really only a slight change from the $n = 1$ case (except that the converse of the statement is not true). In particular, if there exist $λ \geq 0$ such that

f_{0} (x) \leq \sum_{i} λ_{i} f_{i} (x) for all x \in 𝐑^{m},

then, $f_{i} (x) \leq 0 for i = 1, \dots, n ⟹ f_{0} (x) \leq 0$ . Showing this is nearly the same as the $n = 1$ case,

f_{0} (x) \leq \sum_{i} λ_{i} f_{i} (x) \leq \sum_{i} λ_{i} 0 = 0.

So now we have a family of sufficient (but not necessary!) conditions for which we know when $f_{i} (x) \leq 0 for i = 1, \dots, n$ implies that $f_{0} (x) \leq 0$ .

Covering ellipsoids for unions

Definitions and connections

Ellipsoids are a particularly nice family to work with since, as you may have guessed, they are the sets defined by

ℰ = {x | f (x) \leq 0},

where $f : 𝐑^{m} \to 𝐑$ is a convex quadratic. This definition gives us a way of translating statements about sets (inclusion, etc) into statements about the functions which generate them. In particular, if we have two ellipsoids $ℰ, ℰ_{0} \subseteq 𝐑^{n}$ defined by the convex quadratics $f, f_{0}$ , then

ℰ \subseteq ℰ_{0} ⟺ (f (x) \leq 0 ⟹ f_{0} (x) \leq 0) .

But wait a minute, we know exactly when this happens! By the previous section, we found that

f (x) \leq 0 ⟹ f_{0} (x) \leq 0,

if and only if there is some $τ \geq 0$ with $f_{0} (x) \leq τ f (x)$ . Also note that if we have a union of a bunch of ellipsoids (say $ℰ_{1}, \dots, ℰ_{m}$ ) that we want to cover with an ellipsoid $ℰ_{0}$ , then this is the same as saying

ℰ_{i} \subseteq ℰ_{0}, for i = 1, \dots, m,

or, that each ellipsoid is covered by the big one, $ℰ_{0}$ .

Back to our goal

Ok, to reiterate, we are looking for a small ellipsoid $ℰ_{0} = {x | f_{0} (x) \leq 0}$ such that $ℰ_{0}$ contains all of the other ellipsoids $ℰ_{i} = {x | f_{i} (x) \leq 0}$ , where the $f_{i}$ and $f_{0}$ are convex quadratics. In other words, using the results of the previous subsection, we look for a quadratic $f_{0}$ such that

(f_{i} (x) \leq 0 ⟹ f_{0} (x) \leq 0) for each i

which we know happens only when there exists some $τ_{i} \geq 0$ with

f_{0} (x) \leq τ f_{i} (x) for each i and all x .

Now remains the final question: given two quadratics, $f_{i}$ and $f_{0}$ and some number $τ \geq 0$ , how can we check if $f_{0} (x) \leq τ f_{i} (x)$ for all $x$ ? I won’t prove this (though I have written a quick proof of this statement in my notes, found here), but, if we let $f_{i} (x) = x^{T} P_{i} x + 2 q_{i}^{T} x + r_{i}$ and $f_{0} (x) = x^{T} P^{'} x + 2 (q^{'})^{T} x + r^{'}$ then $f_{0} (x) \leq f_{i} (x)$ for all $x$ if, and only if,

[\begin{matrix} P^{'} & q \\ (q^{'})^{T} & r^{'} \end{matrix}] \leq τ [\begin{matrix} P_{i} & q_{i} \\ q_{i}^{T} & r_{i} \end{matrix}],

where we say two symmetric matrices $A, B$ satisfy $A \leq B$ whenever $x^{T} A x \leq x^{T} B x$ for all $x$ . A straightforward exercise it to verify that the set of matrices $A \geq 0$ is a convex cone (almost universally called the positive semidefinite or PSD cone).

This rewriting is extremely useful, since we’ve turned a problem over a potentially difficult-to-handle space (the space of quadratics greater than or equal to another) into a problem that is easy to handle (the PSD cone). The best news, though, is that we have efficient algorithms to solve optimization problems whose constraints are PSD constraints.¹

Corresponding optimization problem

Finally, after enough background, we can get to the final goal: writing an efficiently-solvable optimization problem to give us a small bounding ellipsoid.

There are several ways of defining “small,” in the case of ellipsoids, but one of the most common definitions is to pick the ellipsoid with the smallest volume. In the case that $ℰ_{0} = {x | x^{T} P^{'} x + 2 (q^{'})^{T} x + r \leq 0}$ , the volume of this ellipsoid is given by the determinant, $\det P^{'}$ , of the matrix $P^{'}$ . So, we can write—using the conditions given above—an optimization problem corresponding to finding the smallest (in volume) ellipsoid $ℰ_{0}$ which contains all ellipsoids, $ℰ_{i}$ as

\begin{matrix} \underset{P^{'}, q^{'}, r^{'}, τ}{minimize} & \det P^{'} \\ subject to & [\begin{matrix} P^{'} & q \\ (q^{'})^{T} & r^{'} \end{matrix}] \leq τ_{i} [\begin{matrix} P_{i} & q_{i} \\ q_{i}^{T} & r_{i} \end{matrix}], i = 1, \dots, n . \end{matrix}

The only problem here (which we can easily fix) is that the determinant is not a convex function. On the other hand, the log determinant is (for a proof, see the Convex book, section 3.1.5), so we can write,

\begin{matrix} \underset{P^{'}, q^{'}, r^{'}, τ}{minimize} & \log \det P^{'} \\ subject to & [\begin{matrix} P^{'} & q \\ (q^{'})^{T} & r^{'} \end{matrix}] \leq τ_{i} [\begin{matrix} P_{i} & q_{i} \\ q_{i}^{T} & r_{i} \end{matrix}], i = 1, \dots, n . \end{matrix}

This is equivalent to the original problem since $\log (y)$ is an increasing function of $y$ .

Of course, any convex function (such as, for example the trace) would do here as well.

Covering ellipsoids for intersections and unions

Ok, now we know how to solve the problem where we have a bunch of ellipsoids and we want to find an ellipsoid which covers all of them. How about the problem where we want to find an ellipsoid which also covers the sets ${N_{i}}$ for $i = 1, \dots, k$ , which are, themselves, intersections of ellipsoids?

In particular, if $N_{i}$ is defined as

N_{i} = ⋂_{j \in I_{i}} ℰ_{j},

for some index set $I_{i} \subseteq {1, \dots, n}$ and some set of ellipsoids ${ℰ_{j}}$ , each of which are defined as before ( $ℰ_{j} = {x | f_{j} (x) \leq 0}$ ), we can perform a similar trick to the one above!

More generally, if we have an ellipsoid $ℰ_{0} = {x | f_{0} (x) \leq 0}$ , then

N_{i} \subseteq ℰ_{0} ⟺ (f_{j} (x) \leq 0 for j \in I_{i} ⟹ f_{0} (x) \leq 0) .

(It’s a worthwhile exercise to think about why, but it follows the same idea as before.) In other words, $ℰ_{0}$ is a superset of $N_{i}$ only when a bunch of quadratic inequalities imply another. (Where have we seen this before…?)

In other words, we know that (by the first section), if there exist $λ \geq 0$ such that

f_{0} (x) \leq \sum_{j \in I_{i}} λ_{j} f_{j} (x),

then we immediately have that $N_{i} \subseteq ℰ_{0}$ . Since the converse is not true, we are sadly not guaranteed to actually find the smallest bounding ellipsoid $ℰ_{0}$ , but this is usually quite a good approximation (if it’s not exact).

Following exactly the same steps as in the previous section and using the same definitions, we now get a new program for minimizing the volume for the union and intersection of ellipsoids:

\begin{matrix} \underset{P^{'}, q^{'}, r^{'}, λ}{minimize} & \log \det P^{'} \\ subject to & [\begin{matrix} P^{'} & q \\ (q^{'})^{T} & r^{'} \end{matrix}] \leq \sum_{j \in I_{i}} λ_{i j} [\begin{matrix} P_{j} & q_{j} \\ q_{j}^{T} & r_{j} \end{matrix}], i = 1, \dots, k \\ λ_{i j} \geq 0, i = 1, \dots, k, j \in I_{i} . \end{matrix}

As before, this program does not guarantee actually finding the minimal volume ellipsoid, but it is likely to be quite close! (That is, if it’s not spot on, most of the time.)

For more information, see Boyd’s Linear Matrix Inequalities book. ↩