Tuesday, February 15, 2011

Cluster decomposition and variational counting

Suppose we want to count the number of independent sets in a graph below.



There are 9 independent sets.


Because the graphs are disjoint we could simplify the task by counting graphs in each connected component separately and multiplying the result



Variational approach is one way of extending this decomposition idea to connected graphs.

Consider the problem of counting independent sets in the following graph.


There are 7 independent sets. Let $x_i=1$ indicate that node $i$ is occupied, and $P(x_1,x_2,x_3,x_4)$ be a uniform distribution over independent sets, meaning it is either 1/7 or 0 depending whether $x_1,x_2,x_3,x_4$ forms an independent set

Entropy of uniform distribution over $n$ independent sets distribution is $H=-\log(1/n)$, hence $n=\exp H$, so to find the number of independent sets just find the entropy of distribution $P$ and exponentiate it.

We can represent P as follows

$$P(x_1,x_2,x_3,x_4)=\frac{P_a(x_1,x_2,x_3)P_b(x_1,x_3,x_4)}{P_c(x_1,x_3)}$$

Entropy of P similarly factorizes

$$H=H_a + H_b - H_c$$

Once you have $P_A$, $P_B$ and $P_C$ representing our local distributions of our factorization, we can forget the original graph and compute the number of independent sets from entropies of these factors

To find factorization of $P$ into $P_a,P_b$, and $P_c$ minimize the following objective

$$KL(\frac{P_a P_b}{P_c},P)$$

This is KL-divergence between our factorized representation and original representation. Message passing schemes like cluster belief propagation can be derived as one approach of solving this minimization problem. In this case, cluster belief propagation takes 2 iterations to find the minimum of this objective. Note that the particular form of distance is important. We could reverse the order and instead minimize $KL(P,P_a P_b/P_c)$ and while this makes sense from approximation stand-point, minimizing reversed form of KL-divergence is generally intractable.


This decomposition is sufficiently rich that we can model P exactly using proper choice of $P_a,P_b$ and $P_c$, so the minimum of our objective is 0, and our entropy decomposition is exact.

Now, the number of independent sets using decomposition above factorizes as
$$n=\exp H=\frac{\exp H_a \exp H_b}{\exp H_c} = \frac{\left(\frac{7}{2^{4/7}}\right)^2}{\frac{7}{2\ 2^{1/7}}} = \frac{4.75 \times 4.75}{3.17}=7$$

Our decomposition can be schematically represented as the following cluster graph



We have two regions and we can think of our decomposition as counting number of independent sets in each region, then dividing by number of independent sets in the vertex set shared by pair of connected regions. Note that region A gets "4.75 independent sets" so this is not as intuitive as decomposition into connected components


Here's another example of graph and its decomposition.





Using 123 to refer to $P(x_1,x_2,x_3)$ the formula representing this decomposition is as follows

$$\frac{124\times 236\times 2456 \times 4568 \times 478\times 489}{24\times 26 \times 456 \times 48 \times 68}$$

There are 63 independent sets in this graph, and using decomposition above the count decomposes as follows:

$$63=\frac{\left(\frac{21\ 3^{5/7}}{2\ 2^{19/21} 5^{25/63}}\right)^2 \left(\frac{3\ 21^{1/3}}{2^{16/21} 5^{5/63}}\right)^4}{\frac{21\ 3^{3/7}}{2\ 2^{1/7} 5^{50/63}} \left(\frac{3\ 21^{1/3}}{2\ 2^{3/7} 5^{5/63}}\right)^4}$$


Finding efficient decomposition that models distribution exactly requires that graph has small tree-width. This is not the case for many graphs, such as large square grids, but we can apply the same procedure to get cheap inexact decomposition.

Consider the following inexact decomposition of our 3x3 grid



Corresponding decomposition has 4 clusters and 4 separators, and the following factorization formula

$$\frac{1245\times 2356 \times 4578 \times 5689}{25\times 45 \times 56 \times 58}$$

We can use minimization objective as before to find the parameters of this factorization. Since original distribution can not be exactly represented in this factorization, result will be approximate, unlike previous two example.

Using message-passing to solve the variational problem takes about 20 iterations to converge, to an estimate of 46.97 independent sets




To try various decompositions yourself, unzip the archive and look at usage examples in indBethe3-test.nb

Notebooks

15 comments:

Brahms said...
This comment has been removed by the author.
Brahms said...

interesting post and nice examples!

Can you explain how you got $\exp H_a =\frac{7}{2^{4/7}}$ ?

To my understanding, the reason your first decomposition took only two iterations to reach the optimal solution, is because your cluster graph was actually a tree. Hence BP works like forward-backward and reaches the correct solution. Your other cluster graph is not a tree, and hence BP is just an approximation.

Yaroslav said...

H_a is the entropy of the distribution over nodes 1,2,3. In other words it's the entropy with one node of the cycle marginalized out. Factor of 7 comes out because there are 7 outcomes before marginalization. Writing out the entropy explicitly you get

$$\frac{2}{7} \log \left(\frac{2}{7}\right)+\frac{1}{7} \log \left(\frac{1}{7}\right)+\frac{2}{7} \log \left(\frac{2}{7}\right)+\frac{1}{7} \log \left(\frac{1}{7}\right)+\frac{1}{7} \log \left(\frac{1}{7}\right)$$

Yaroslav said...

To the other point -- yes -- for exact results, the cluster graph must be a junction tree. This also seems to also be a necessary condition for exactness if we allow more general belief propagation schemes -- http://stats.stackexchange.com/questions/4564/when-is-generalized-belief-propagation-exact

Brahms said...

Thanks! I thought there was a trick in calculating this entropy without explicitly enumerating the outcomes.

Roman V. Shapovalov said...

Yaroslav,
From this example it seems that computing the entropy of a marginal is equally hard as computing the entropy of the joint distribution (or counting independent sets). If so, what's the point in such decomposition?

Yaroslav said...

Roman -- to compute cluster entropy, you sum over all outcomes in a cluster, which scales exponentially with the size of the cluster. Brute force scales exponentially with size of the whole graph.

Roman V. Shapovalov said...

Yes, but for each cluster outcome one still needs to compute its probability w.r.t. the whole graph (either 1/7 or 2/7 in the example). Is there an efficient way to do this?

Yaroslav said...

Indeed there is. You get marginals by solving KL-divergence optimization problem I described above. That objective can be evaluated in time that's only linear in size of the graph (exponential in size of cluster), and you can find local minimum efficiently. One approach gives Cluster BP, derivation on page 388-389 of Koller's book. When cluster graphs are trees like above, objective is convex and cluster BP update steps are identical to the junction tree algorithm, so you get exact result.

Overrider said...
This comment has been removed by the author.
Roman V. Shapovalov said...

I see now. Thank you!

Reva Love said...

This is nice blog.

http://bwexports.in/products.php

OSr Group said...

super,so nice :))
Block Board Manufacturer in India

David Huer said...

Hello, I saw a sphere image in your blog and linked to it in my blog. Is this okay or would you like it removed?

Regards,

Dave
@huer_innov
davehuer.com
the posting is here: http://davehuer.com/blog/is-globular-thinking-an-encryption-cypher/

Anand Aggarwal said...

Great post , this artivle is uniques and given greatful information ,Thanks for share this post . We are given Surplace ensures an on time service post sale, with the interference of their possess technical or in association with manufacturers to resolve any troubles. Please know more click Here Biesse woodworking machine