Suppose we want to count the number of independent sets in a graph below.

There are 9 independent sets.

Because the graphs are disjoint we could simplify the task by counting graphs in each connected component separately and multiplying the result

Variational approach is one way of extending this decomposition idea to connected graphs.

Consider the problem of counting independent sets in the following graph.

There are 7 independent sets. Let $x_i=1$ indicate that node $i$ is occupied, and $P(x_1,x_2,x_3,x_4)$ be a uniform distribution over independent sets, meaning it is either 1/7 or 0 depending whether $x_1,x_2,x_3,x_4$ forms an independent set

Entropy of uniform distribution over $n$ independent sets distribution is $H=-\log(1/n)$, hence $n=\exp H$, so to find the number of independent sets just find the entropy of distribution $P$ and exponentiate it.

We can represent P as follows

$$P(x_1,x_2,x_3,x_4)=\frac{P_a(x_1,x_2,x_3)P_b(x_1,x_3,x_4)}{P_c(x_1,x_3)}$$

Entropy of P similarly factorizes

$$H=H_a + H_b - H_c$$

Once you have $P_A$, $P_B$ and $P_C$ representing our local distributions of our factorization, we can forget the original graph and compute the number of independent sets from entropies of these factors

To find factorization of $P$ into $P_a,P_b$, and $P_c$ minimize the following objective

$$KL(\frac{P_a P_b}{P_c},P)$$

This is KL-divergence between our factorized representation and original representation. Message passing schemes like cluster belief propagation can be derived as one approach of solving this minimization problem. In this case, cluster belief propagation takes 2 iterations to find the minimum of this objective. Note that the particular form of distance is important. We could reverse the order and instead minimize $KL(P,P_a P_b/P_c)$ and while this makes sense from approximation stand-point, minimizing reversed form of KL-divergence is generally intractable.

This decomposition is sufficiently rich that we can model P exactly using proper choice of $P_a,P_b$ and $P_c$, so the minimum of our objective is 0, and our entropy decomposition is exact.

Now, the number of independent sets using decomposition above factorizes as

$$n=\exp H=\frac{\exp H_a \exp H_b}{\exp H_c} = \frac{\left(\frac{7}{2^{4/7}}\right)^2}{\frac{7}{2\ 2^{1/7}}} = \frac{4.75 \times 4.75}{3.17}=7$$

Our decomposition can be schematically represented as the following cluster graph

We have two regions and we can think of our decomposition as counting number of independent sets in each region, then dividing by number of independent sets in the vertex set shared by pair of connected regions. Note that region A gets "4.75 independent sets" so this is not as intuitive as decomposition into connected components

Here's another example of graph and its decomposition.

Using 123 to refer to $P(x_1,x_2,x_3)$ the formula representing this decomposition is as follows

$$\frac{124\times 236\times 2456 \times 4568 \times 478\times 489}{24\times 26 \times 456 \times 48 \times 68}$$

There are 63 independent sets in this graph, and using decomposition above the count decomposes as follows:

$$63=\frac{\left(\frac{21\ 3^{5/7}}{2\ 2^{19/21} 5^{25/63}}\right)^2 \left(\frac{3\ 21^{1/3}}{2^{16/21} 5^{5/63}}\right)^4}{\frac{21\ 3^{3/7}}{2\ 2^{1/7} 5^{50/63}} \left(\frac{3\ 21^{1/3}}{2\ 2^{3/7} 5^{5/63}}\right)^4}$$

Finding efficient decomposition that models distribution exactly requires that graph has small tree-width. This is not the case for many graphs, such as large square grids, but we can apply the same procedure to get cheap inexact decomposition.

Consider the following inexact decomposition of our 3x3 grid

Corresponding decomposition has 4 clusters and 4 separators, and the following factorization formula

$$\frac{1245\times 2356 \times 4578 \times 5689}{25\times 45 \times 56 \times 58}$$

We can use minimization objective as before to find the parameters of this factorization. Since original distribution can not be exactly represented in this factorization, result will be approximate, unlike previous two example.

Using message-passing to solve the variational problem takes about 20 iterations to converge, to an estimate of 46.97 independent sets

To try various decompositions yourself, unzip the archive and look at usage examples in indBethe3-test.nb

Notebooks

## 12 comments:

interesting post and nice examples!

Can you explain how you got $\exp H_a =\frac{7}{2^{4/7}}$ ?

To my understanding, the reason your first decomposition took only two iterations to reach the optimal solution, is because your cluster graph was actually a tree. Hence BP works like forward-backward and reaches the correct solution. Your other cluster graph is not a tree, and hence BP is just an approximation.

H_a is the entropy of the distribution over nodes 1,2,3. In other words it's the entropy with one node of the cycle marginalized out. Factor of 7 comes out because there are 7 outcomes before marginalization. Writing out the entropy explicitly you get

$$\frac{2}{7} \log \left(\frac{2}{7}\right)+\frac{1}{7} \log \left(\frac{1}{7}\right)+\frac{2}{7} \log \left(\frac{2}{7}\right)+\frac{1}{7} \log \left(\frac{1}{7}\right)+\frac{1}{7} \log \left(\frac{1}{7}\right)$$

To the other point -- yes -- for exact results, the cluster graph must be a junction tree. This also seems to also be a necessary condition for exactness if we allow more general belief propagation schemes -- http://stats.stackexchange.com/questions/4564/when-is-generalized-belief-propagation-exact

Thanks! I thought there was a trick in calculating this entropy without explicitly enumerating the outcomes.

Yaroslav,

From this example it seems that computing the entropy of a marginal is equally hard as computing the entropy of the joint distribution (or counting independent sets). If so, what's the point in such decomposition?

Roman -- to compute cluster entropy, you sum over all outcomes in a cluster, which scales exponentially with the size of the cluster. Brute force scales exponentially with size of the whole graph.

Yes, but for each cluster outcome one still needs to compute its probability w.r.t. the whole graph (either 1/7 or 2/7 in the example). Is there an efficient way to do this?

Indeed there is. You get marginals by solving KL-divergence optimization problem I described above. That objective can be evaluated in time that's only linear in size of the graph (exponential in size of cluster), and you can find local minimum efficiently. One approach gives Cluster BP, derivation on page 388-389 of Koller's book. When cluster graphs are trees like above, objective is convex and cluster BP update steps are identical to the junction tree algorithm, so you get exact result.

I see now. Thank you!

This is nice blog.

http://bwexports.in/products.php

Post a Comment