Saturday, April 30, 2011

Neural Networks making a come-back?

Five years ago I ran some queries on Google Scholar to see trends on the number of papers that mention particular phrase. The number of hits for each year was divided by the number of hits for "machine learning". Back then it looked like NN's started gaining in popularity with invention of back-propagation in 1980's, peaked in 1993 and went downhill from there.



Since then, there's been several major NN developments, involving deep learning and probabilistically founded versions so I decided to update the trend. I couldn't find a copy of scholar scraper script anymore, luckily Konstantin Tretjakov has maintained a working version and reran the query for me.



It looks like downward trend in 2000's was misleading because not all papers from that period have made it into index yet, and the actual recent trend is exponential growth!

One example of this "third wave" of Neural Network research is unsupervised feature learning. Here's what you get if you train a sparse auto-encoder on some natural scene images



What you get is pretty much a set of Gabor filters, but the cool thing is that you get them from your neural network rather than image processing expert

Friday, April 29, 2011

Another ML blog

I just noticed that Justin Domke has a blog --

He's one of the strongest researchers in the field of graphical models. I first came across his dissertation when looking for a way to improve loopy-Belief Propagation based training. His thesis gives one such idea -- instead of maximizing the fit of an intractable model, and using BP as intermediate step, maximize the fit of BP marginals directly. This makes sense since approximate (BP-based) marginals are what you ultimately use.

If you run BP for k steps, then likelihood of the BP-approximated model is tractable to minimize -- calculation of gradient is very similar to k steps of loopy BP. I derived formulas for gradient of "BP-approximated likelihood" in equations 17-21 in this note. It looks a bit complicated in my notation, but derivation is basically an application of chain rule to BP-marginal, which is a composition of k-steps of BP update formula.