Monday, August 13, 2007

Analysis vs information theory

Here's a problem that could be solved using either analysis or information theory, which approach do you think is easier?

Suppose X_1,X_2,... is an infinite sequence of IID random variables. Let E be an event that's shift invariant, ie, if you take the set of all sequences of Xi's that comprise the event, and shift each sequence, you'll have the same set. For instance "Xi's form a periodic sequence" is a shift-invariant event. Show that P(E) is either 0 or 1.

Tuesday, August 07, 2007

proof techniques

Many people have seen a copy of this tongue-in-cheek email on proof techniques that's been circulating since the 80's. It' funny because it's true, and here is a couple examples from machine learning literature

  • Proof by reference to inaccessible literature, The author cites a simple corollary of a theorem to be found in a privately circulated memoir of the Slovenian Philological Society, 1883.

    Consider all the papers citing Vapnik's Theory of Pattern Recognition published by Nauka in 1974. That book is in Russian, and is quite hard to find. For instance it's not indexed in Summit, which includes US West Coast university libraries like Berkeley and University of Washington

  • Proof by forward reference: Reference is usually to a forthcoming paper of the author, which is often not as forthcoming as at first.

    I came across a couple of these, which I won't mention because it's a bit embarrassing to the authors

  • Proof by cumbersome notation: Best done with access to at least four alphabets and special symbols.

    Of course this is more a matter of habit, but I found this this paper hard to read because of the notation. It uses bold, italic, and caligraphic and fractur alphabets. And if 4 alphabets isn't enough, consult this page to see 9 different alphabets you can use

Wednesday, August 01, 2007

L1 regularization papers this year

Looking at the papers from this summer's machine learning conferences
(AAAI, UAI, IJCAI, ICML,COLT) it seems like there have been a lot of papers on L1 regularization this year. There are at least 3 papers on L1 regularization for structure learning by Koller, Wainwright, Murphy, several papers on minimizing l1 regularized log likelihood by Keerthi, Boyd, Gallen Andrew. A couple of groups are working on "Bayesian Logistic Regression" turns out to be "l1 regularized logistic regression" on closer look (surely Thomas Minka wouldn't approve of such term usage). This year's COLT has one.