Saturday, March 05, 2011

Linear Programming for Maximum Independent Set

Maximum independent set, or "maximum stable" set is one of classical NP-complete problems described in Richard Karp's 1972 paper "Reducibility Among Combinatorial Problems". Other NP-complete problems often have a simple reduction to it, for instance, p.3 of Tony Jebara's "MAP Estimation, Message Passing, and Perfect Graphs" shows how MAP inference in an arbitrary MRF reduces to Maximum Weight Independent Set.

The problem asks to find the largest set of vertices in a graph, such that no two vertices are adjacent.



Finding largest independent set is equivalent to finding largest clique in graph's complement



A classical approach to solving this kind of problem is based on linear relaxation.

Take a variable for every vertex of the graph, let it be 1 if vertex is occupied and 0 if it is not occupied. We encode independence constraint by requiring that adjacent variables add up to 1. So we can consider the following graph and corresponding integer programming formulation:



$$\max x_1 + x_2 + x_3 + x_4 $$

such that
$$
\begin{array}{c}
x_1+x_2 \le 1 \\\\
x_2+x_3 \le 1 \\\\
x_3+x_4 \le 1 \\\\
x_4+x_1 \le 1 \\\\
x_i\ge 0 \ \ \forall i \\\\
x_i \in \text{Integers}\ \forall i
\end{array}
$$

Solving this integer linear integer program is equivalent to the original problem of maximum independent set, with 1 value indicating that node is in the set. To get a tractable LP programme we drop the last constraint.

If we solve LP without integer constraints and get integer valued result, the result is guaranteed to be correct. In this case we say that LP is tight, or LP is integral.

Curiously, solution of this kind of LP is guaranteed to take only values 0,1 and $\frac{1}{2}$. This is known as "half-integrality". Integer valued variables are guaranteed to be correct, and we need to use some other method to determine assignments to half-integer variables.

For instance below is a graph for which first three variables get 1/2 in the solution of the LP, while the rest of the vertices get integral values which are guaranteed to be correct



Here's what this LP approach gives us for a sampling of graphs



A natural question to ask is when this approach works. Its guaranteed to work for bipartite graphs, but it also seems to work for many non-bipartite graphs. None of the graphs above are bipartite.

Since independent set only allows one vertex per clique, we can improve our LP by saying that sum of variables over each clique has to below 1.

Consider the following graph that our original LP has a problem with:



Instead of having edge constraints, we formulate our LP with clique constraints as follows:

$$x_1+x_2+x_3\le 1$$
$$x_3+x_4+x_5\le 1$$

With these constraints, LP solver is able to find the integral solution



In this formulation, solution of LP is no longer half-integral. For the graph below we get a mixture of 1/3 and 2/3 in the solution.



Optimum of this LP is known as the "fractional chromatic number" and it is an upper bound both on the largest clique size and the largest independent set size. For the graph above, largest clique and independent set have sizes 9/3 and 12/3, whereas fractional chromatic number is 14/3.

This relaxation is tighter than previous, here are some graphs that previous LP relaxation had a problem with.



This LP is guaranteed to work for perfect graphs which is a larger class of graphs than bipartite graphs. Out of 853 connected graphs over 7 vertices, 44 are bipartite while 724 are perfect.

You can improve this LP further by adding more constraints. A form of LP constraints that generalizes previous two approaches is

$$\sum_{i\in C} x_i\le \alpha(C)$$

Where $C$ is some subgraph and $\alpha(C)$ is its independence number.

You extend previous relaxation by considering subgraphs other than cliques.
Lovasz describes the following constraint classes (p.13 of "Semidefinite programs and combinatorial optimization")

1. Odd hole constraints
2. Odd anti-hole constraints
3. Alpha-critical graph constraints.

Note that adding constraints for all maximal cliques can produce a LP with exponential number of constraints.

Notebook
Packages

8 comments:

Danny Tarlow said...

Cool post.

So in a practical setting, if we wanted to solve large maximum independent set problems with this approach, we could imagine starting with the basic set of edge constraints, running the LP, checking to see if we have an integral solution, and if not, intelligently adding new constraints.

Do you have any ideas about good ways to choose which additional constraints to add?

(As you may know, this is essentially the problem addressed in the following paper, though the paper is about general MAP inference rather than MIS:
D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, T. Jaakkola. Tightening LP Relaxations for MAP using Message Passing. Uncertainty in Artificial Intelligence (UAI) 24, July 2008.)

Roman V. Shapovalov said...

Look at, say, graph on (4, 5) position in the first big picture. It is not the only optimal LP solution, an off-the-shelf LP algorithm can assign 1/2 to the both nodes in the left column, and to the both nodes in the right column. So, should one use some technique that prefers integral solutions? Is it the common case when there are several optimums, and integral among them?

Yaroslav said...

Roman - Good question, I actually don't know why integer solution was chosen in that case. There are quite a few cases like that, with non-unique maximum, yet the solver choosing an integer solution, perhaps it's a side effect of the simplex method

Danny - that sounds like a promising approach -- here you could see which regions of the graph produce non-integer solutions, add constraints of the last kind for those vertex sets, and rerun the LP solver

Dmitry P. Vetrov said...

Concerning Roman's question I think it is possible to add random small epsilons (either positive or negative) to the right parts of constaints in LP relaxation. We will seem to loose the property of half-integrality but in such unlikely case when there are several solutions we will definitely converge to the integer one.

张潘 said...

I think the reason of 1/2 solution is that the variable takes value 1 in some of the maximum independent sets and 0 in others. In easy cases of Independent-Set, e.g. on Erdos-Renyi random graphs with average connectivity smaller than 2.71828..., leaf-removal algorithm gives an optimal solution, then I guess LP will give a tight solution. In hard cases, when leaf-removal core exists, no efficient algorithm exists to solve the problem, and LP gives fraction solution.

Yaroslav Bulatov said...

In my example with the bowtie graph, there's no maximum independent set that contains middle node, yet it gets value 1/2

Falk Hüffner said...

The possible sets of nonintegral values in optimal LP solutions are closed under intersection (Picard&Queyranne: On the integer-valued variables in the linear vertex packing problem, 1977, http://dx.doi.org/10.1007/BF01593772). Thus, there is a unique optimal LP solution with the minimum number of fractional values, and you can find it for example by trying to fix each variable to 1 and seeing if the objective changes.

On G(n, p)-graphs, for any p, with high probability all variables 1/2 is the only optimal LP solution (Pulleyblank: Minimum node covers and 2-bicritical graphs, 1979, http://dx.doi.org/10.1007/BF01588228).

kamban said...

Hi there, awesome site. I thought the topics you posted on were very interesting. I tried to add your RSS to my feed reader

and it a few. take a look at it, hopefully I can add you and follow.

Independent Villas