Machine Learning, etc: December 2010

Tuesday, December 21, 2010

Visualizing 7 dimensional simplex

Suppose we'd like to visualize a set of joint probabilities realizable by distributions of 3 binary variables.

It is a 7 dimensional regular simplex, and we could draw a lower dimensional section of it. There's an infinite number of sections, but there aren't that many *interesting* ones.

As an example, suppose we want to find a good 2 dimensional section of a 3 dimensional regular simplex. It seems reasonable to restrict attention to sections that go through simplex center and two other points, each of which is a vertex or a center of some edge or face. There are just 2 such sections determined up to rotation/reflection

For 7 dimensional simplex, we could similarly take 3 dimensional sections determined by simplex center and 3 points, each a centroid of some set of vertices. This gives 49 distinct shapes, shown below.

Among those, there's just one section that intersects all 8 faces of the simplex and preserves the underlying symmetry

Also known as the regular octahedron.

Implementation details

Wednesday, December 15, 2010

Computationally nice structures

One approach to solving hard problems is to break them into computationally efficient parts.

For instance, Globerson/Jaakkola do approximate counting by decomposing graph into planar graphs. In each part, the problem reduces to perfect matchings which can be solved efficiently on planar graph. Bouchet/Jordan decompose counting bipartite perfect matchings into much easier problems that count matchings perfect on "one side" of the graph. Sontag/Jaakola solve subproblems on star-shaped subgraphs of original graph, the symmetry of star graph enabling a very fast solution to each subproblem.

I recently came across graph class database which gives information on thousands of graph classes. It shows dozens (hundreds?) distinct tractable classes for exact maximum independent set, and most of them seem to have unbounded tree width. I haven't heard of most of those classes and would be curious to know how many distinctly interesting algorithms they correspond to

Here are a few computationally nice structures I've seen employed in Machine Learning and Vision literature

Upper-bounded tree width

Everything is easy via tree decomposition

Lower-bounded girth

Almost everything is easy via loopy belief propagation

Planar

Easy minimization of submodular functions. Easy counting of certain structures by reduction to perfect matchings.

Complete

Easy counting of unweighted local structures

Perfect

Easy MAP inference with NAND potentials

Star

Very efficient updates for MAP inference

Claw-free

A generalization of line graphs. Easy maximum independent set, and counting by reduction to matchings

Any other useful classes I missed?
Mathematica notebook

Sunday, December 12, 2010

NIPS 2010 highlights

A few "connection to other fields" papers I found interesting

Variational Inference over Combinatorial Spaces. Alexandre Bouchard-Cote, Michael Jordan:
Extend variational formulation to upper bound the number of combinatorial structures with global constraints, like perfect matchings and TSP tours. The trick is to represent the space of structures as an intersection of spaces that are tractable to count. For perfect matchings, it is tractable to count edge sets that satisfy perfect matching constraint on one side, so you can represent the total space as intersection of perfect matchings that are "left-legal" and "right-legal". In experiments, the algorithm gave same accuracy for perfect matchings as existing PTAS based on Jerrum's technique, but 10 times faster.
Learning Efficient Markov Networks. Vibhav Gogate, William Webb, Pedro Domingos:
Learn high tree-width models and ensure efficient inference by restricting potential functions to a compact representation as a tree of AND/OR predicates. The problem of learning is reduced to symmetric submodular function minimization which can be done in $O(n^3)$ time with Queyranne's algorithm. For experiments, $O(n^3)$ was too slow, but they give a heuristic which worked to give state-of-the-art results in a practical dataset.
MAP estimation in Binary MRFs via Bipartite Multi-cuts. Sashank Jakkam Reddi, Sunita Sarawagi, Sundar Vishwanathan:
Reduces MAP inference in binary Markov Fields to multi-commodity flow problem
Worst-case bounds on the quality of max-product fixed-points. Meritxell Vinyals, Jesús Cerquides, Alessandro Farinelli, Juan Antonio Rodríguez-Aguilar:
Uses an inequality recently obtained in multi-agent literature to bound difference between true max-marginal and max-product max-marginal in terms of graph structure

Some interesting applied papers

Segmentation as maximum weight independent set. William Brendel, Sinisa Todorovic:
You want to segment your image into regions, but the initial candidate regions are overlapping fragments. Real segmentations don't have overlaps, turn this into MWIS problem to find good solution satisfying these mutual exclusivity constraints
Efficient Minimization of Decomposable Submodular Functions. Peter Stobbe, Andreas Krause:
Applies some recent techniques from non-smooth optimization to Lovasz extension of submodular function
Optimal Web-Scale Tiering as a Flow Problem. Gilbert Leung, Novi Quadrianto, Alexander Smola, Kostas Tsioutsiouliklis:
Linear integer programming to optimize cache arrangement for 84 million web-pages on Yahoo servers

At least four groups (Bach, Mosci, Jia, Micchelli) had papers on "structured sparsity". Traditional approach to sparse learning is to maximize the number of 0's in the parameter vector. The new idea is to incorporate knowledge on the pattern of 0 values.

Andy Mueller has some more highlights

Sunday, December 05, 2010

Mathematica blog

Most graphics and formulas that appear in this blog were created with help of Mathematica. I'll keep a separate blog with Mathematica tips.

Friday, December 03, 2010

Moving away from traditional peer-review

Common complaint about current publishing model is that sometimes good papers get rejected. A striking example is that David Lowe's SIFT algorithm was rejected multiple times from vision venues. The author then assumed that vision community is not interested, and applied for patent intended promote it just for industrial applications. As a result, what's arguably the most popular key-point detection algorithm is not free to use. This may also say something about limitation of our patent system.

You can see a few more examples in the inaugural issue of Mathematica Rejecta.

Yann LeCun proposes a solution to the problem.
His idea is like non-anonymous stack overflow, where you could search for papers that got several positive reviews, or ones that were positively reviewed by a specific reviewer.

I think the problem is not as serious now as it was before the Internet. Lobachevsky's work was rejected from peer-reviewed journals and might have faced obscurity if not for personal intervention of Gauss.

Today, if your result is obviously important, putting it up on your webpage may end up giving you as much publicity as a publication in a top journal, Ryan Williams latest CS-theoretical result is an example. This approach can also work to screen for errors, as we have seen in the case of Vinay Deolalikar P!=NP paper.

University of Minnesota officially looked into ways of assessing "new forms of scholarship in consideration of academic promotion. Andrew Gelman notices that his most influential papers were published in low ranking journals.

The bottom line is that world is shifting away from the traditional peer-reviewed publication model. Good results don't always need an official stamp of approval to be recognized, and perhaps in 30 years, mentioning the journals where you published at an academic interview will be as bad as coming to a software engineer interview with a heap of Certificates of Proficiency

This reminds me of a faculty candidate that came to interview to our Computer Science department once. He gave a good presentation but didn't get the job. I asked the person in charge of the hiring committee why, the answer was -- "he had too many publications, they couldn't all be good"