Tuesday, May 12, 2015

ICLR 2015

Some ICLR posters that caught my eye:

[larger image]
Very simple to implement idea that gives impressive results. They force two groups of units to be uncorrelated by penalizing their cross covariance. When the first group is also forced to model classes, the second group automatically models the "style". The problem if separating out "style" has been studied for a while, see  Tenenbaum's "Separating style and content with bilinear models" and reverse citations, but this implementation is the simplest I've seen and could be implemented as a simple modification of batch-SGD.

 [larger image]

Similar to Yann LeCun's "Siamese Network" but potentially makes learning problem easier by looking at 2 pairs of examples at each step -- two examples of same class + two examples of different class. The loss models these pair distances -- first distance must be smaller.

 [larger image]

Apparently Adobe has collected photographs of text manually tagged with font identity, which authors promise to release at http://www.atlaswang.com/deepfont.html . Biased here because I've looked at fonts before (ie, see notMNIST dataset).

[larger image]

Similar to James Martens Optimizing Neural Networks with Kronecker-factored Approximate Curvature. Both papers make use of that fact that if you want to white gradient in some layer, it's equivalent (but cheaper) to whiten activations and backprop values instead. Martens inverts activation covariance matrix directly, Povey has an iterative scheme to approximate it with something easy to invert.

[larger image]

With Async SGD workers end up having stale values that deviate from what the rest of the workers look at. Here they derive some guidance on how far each worker is allowed to deviate.

 [larger image]

They get better results by doing SGD where examples are sampled using non-uniform strategy, such as sampling in proportion to gradient magnitude. This is empirical confirmation of results from Stochastic Gradient Methods workshop last year, with people showing optimal SGD sampling strategies related to "take high gradient magnitude steps first"

These are essentially separable ConvNets, where instead of learning regular filters and then trying to approximate them with low rank expansion, they learn rank-1 filters right away. Such decrease in parameters while keeping same performance and architecture is dramatic and surprising.

[larger image]

The most intriguing paper of the conference IMHO. It's an alternative to backprop for training networks. The basic idea is this -- your neural network is a function


In ideal world, you could learn your labels by inverting f, then your estimate is

That's too hard, so you approximate $f$ with a quadratic function with unit Hessian $g$ and invert that and let


Since original approximation was bad, new estimate will be off, so you repeat the step above. This gives you standard gradient descent.

An alternative approach is to try to learn $g^{-1}$ from data directly. In particular, notice that in a standard autoencoder trained to do $f(g(x))=x$, $g$ an be viewed as an inverse of $f$. So in this paper, they treat neural network as a composition of functions corresponding to layers and try to invert each one using approach above.

 [larger image]

Biased because I've played with Theano, but apparently someone got Theano running well enough to train ImageNet in similar speed to Caffe (30% slower on 1 GPU from some beer conversations)

If you have 10k classes, you'll need to take dot product of your last layer activations with 10k rows of the last parameter matrix. Instead, you could use WTA hashing to find a few rows likely to produce dot product with your activation vector, and only look at those.

Nvidia demoing their 4-GPU rig.

 [larger image]

Use some off-the-shelf software to decompose 4d convolutional tensor (width-height-input depth-output depth) into faster to compute Tensor product. However, if Flattened convolutional networks work for other domains, this seems like an overkill.

 [larger image]

They took network trained on ImageNet and sent regions of the image that activated by particular filters to labelers, to try to figure out what various neurons respond to. A surprisingly high number of neurons learned semantically meaningful concepts.

This was further reinforced by demo by Jason Yosinski showing visualization of various filters in ImageNet network. You could see emergence of "Face" and "Wrinkle" detectors, even though no such class is in ImageNet


Levko Velet said...

I read some of your posts, and now I think you're smart enough. Have ever heard about professionalessayuk.com? It's a popular thing to hire somebody for editing or helping with blog.

Anonymous said...

good for generating free xbox codes. xbox live gold codes Real and very authentic.

Gokul Ravi said...

nice blog
android training in bangalore
ios training in bangalore
machine learning online training

Gokul Ravi said...

useful blog
python interview questions
cognos interview questions
perl interview questions
vlsi interview questions
web api interview questions
msbi interview questions

Gokul Ravi said...

laravel interview questions
aem interview questions
salesforce interview questions
oops abab interview questions
itil interview questions
informatica interview questions
extjs interview questions

Gokul Ravi said...

sap bi interview questions
hive interview questions
seo interview questions
as400 interview questions
wordpress interview questions
accounting interview questions
basic accounting and financial interview questions

Akshita Jain said...

Nice Blog..!
Visit Best Machine Learning Classes in Jaipur

Coepd said...

We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.


Justin I. Kearns said...

Trascription writing services are very common in this time as there has so many option for this transcription services but there has very few who can do it in cheap rate. https://www.writemypaper.biz/how-we-work/ will give you some amazing idea and very effective for the papers writing.

click here said...

Machine is so difficult to learn. It has so many parts. But you serially maintain this. This is so easy to read and understand. So great writing here. keep writing like this. So good.

2 player games said...

I love your blog! I am hoping to see more articles from you in the foreseeable future. Really great!!!

Ava Lou said...

Sometimes you need to go beyond the regular expectation as things are like that. http://www.ppeessay.com/ppe-philosophy-politics-economics-essay-writers/ you will get our essay writers for economics.

Anonymous said...

Selenium is one of the most popular automated testing tool used to automate various types of applications. Selenium is a package of several testing tools designed in a way for to support and encourage automation testing of functional aspects of web-based applications and a wide range of browsers and platforms and for the same reason, it is referred to as a Suite.

Selenium Interview Questions and Answers
Javascript Interview Questions
Human Resource (HR) Interview Questions

Amar G said...

Selenium is one of the most popular automated testing tool used to automate various types of applications. Selenium is a package of several testing tools designed in a way for to support and encourage automation testing of functional aspects of web-based applications and a wide range of browsers and platforms and for the same reason, it is referred to as a Suite.

Selenium Interview Questions and Answers

manasa said...

Its really great site to visit, I really want to appropriate you for your great posting especially aboutBusiness Analyst Training

Rizwankhokhar said...

Most efficient bitcoin miner with low power comumption and highly productive btc generating device. You can will enjoy best rate of return on your investmnet. Order this amazing product now