Friday, December 16, 2005

Trends in Machine Learning according to Google Scholar

In a previous post I brought up the 1983 Machine Learning workshop which featured "33 papers", and it was the follow up to the 1980 Machine Learning workshop. By contrast, NIPS 2005 had 28 workshops and is just one of several international annual Machine Learning Conferences. You can see how the field grew by looking at the distribution of publication dates for articles containing phrase "machine learning" indexed by Google Scholar (normalized by total Scholar content for each year)
You can see there's a blip at 1983 when the workshop was held.

Yann LeCun quipped at NIPS closing banquet that people who joined the field in the last 5 years probably never heard of the word "Neural Network". Similar search (normalized by results for "machine learning") reveals a recent downward trend.

You can see a major upward trend starting around 1985 (that's when Yann LeCun and several others independently rediscovered backpropagation algorithm), peaking in 1992, and going downwards from then.

An even greater downward trend is seen when searching for "Expert System",

"Genetic algorithms" seem to have taken off in the 90's, and leveled off somewhat in recent years

On other hand, search for "support vector machine" shows no sign of slowing down

(1995 is when Vapnik and Cortez proposed the algorithm)

Also, "Naive Bayes" seems to be growing without bound

If I were to trust this, I would say that Naive Bayes research the hottest machine learning area right now

"HMM"'s seem to have been losing in share since 1981

(or perhaps people are becoming less likely to write things like "hmm, this result was unexpected"?)

What was the catastrophic even of 1981 that forced such a rapid extinction of HMM's (or hmm's) in scientific literature?

Finally a worrying trend is seen in the search for "artificial stupidity" divided by corresponding hits for "artificial intelligence". The 2000 through 2004 graph shows a definite updward direction.


Cosma said...

Do the results for HMMs include both "HMM" and "Hidden Markov Model"?

Yaroslav said...

No, just HMM. I didn't spend much time thinking about the queries, but here are the scripts I used to get the graphs (the two .py files)

Konstantin Tretjakov said...

You'll get slightly different results if you search for conditional frequencies, that is, for example, F("HMM", "machine learning")/F("machine learning").

In this case you'll get an increasing trend for HMM-s for example.

A slightly cleaned-up version of your script is available here:

SoloGen said...

Very interesting diagrams! (: I enjoyed them.

Yaroslav said...

Thanks! As you can see from Konstantin's post, you probably shouldn't read too much into them though. Except for the very first graph. It's clear machine learning is growing exponentially. But if machine learning algorithms can ever design machine learning algorithms, then the rate of growth will also be growing exponentially. Therefore the graph will hit a singularity

Anonymous said...

I don't understand how Google allowed you to do this. From their terms of use, they don't allow any automated queries which I find very dissapointing.

I have a "friend" who wrote a script to download every post from the Python newsgroup and after about 10 pages, it blocks you with a captcha.

Dave said...

Very entertaining. Especially the last graph.

pierre dangauthier said...

ohoho, Maybe should I stop my research about genetic algorithms for expert systems based HMMs ;-).... Nice job !