- Jeff Bilmes' Graphical Models course , gives detailed introduction on Lauritzen's results, proof of Hammersley-Clifford results (look in "old scribes" section for notes)
- Kevin Murphy's Graphical Models course , good description of I-maps
- Sam Roweis' Graphical Models course , good introduction on exponential families
- Stephen Wainwright's Graphical Models class, exponential families, variational derivation of inference
- Michael Jordan's Graphical Models course, his book seems to be the most popular for this type of class
- Lauritzen's Graphical Models and Inference class , also his Statistical Inference class , useful facts on exponential families, information on log-linear models, decomposability
- Donna Precup's class , some stuff on structure learning
Friday, March 31, 2006
Graphical Models class notes
I needed to refresh some Graphical Models basics, and here are lecture notes I found useful
Wednesday, March 22, 2006
Challenge: Online Learning for Cell Phone Messaging
If you've used cell phones to send text messages you probably know about their auto-complete feature. For those that haven't -- each digit corresponds to 3 or 4 letters, you enter the digits consistent with your word, and it tries to guess which word you meant. For instance you enter "43556", and it will automatically guess "hello". But if you enter "785" to mean "SVM", it'll probably guess "run", and you'll have to back-up and re-enter the word. The challenge is for the phone to guess the right word.
Since new abbreviations spring up all the time, there's no dictionary that can cover all the words used in text-messages, so an ideal cell-phone will have an algorithm that can adapt to the user. The interesting question is how well can you do?
What makes this domain different from other sources of text is that it's a conversation, consisting of a series of short posts, containing colloquial grammar, bad spelling and abbreviations. I ran some simple algorithms on a similar dataset - 1.5M words of posts from a single person from an online chatroom.
The simplest algorithm is to return the most recent word encountered, consistent with given numbering. You can also return the most frequent word, or have a compound learner that returns one or the other. Cost of 1 is incurred every time an incorrect word is guessed. Here's what the curves look like
The "best rate" curve represents the cost incurred if we only make errors on words that haven't been seen before. After 1.5M words, this "algorithm" makes mistakes about 20% of the time relative to the simple learners, which means there's a lot of room for improvement. And since this "algorithm" only sees 2-3 candidates for each word entered, there's no need to worry about efficient inference.
So here's a challenge -- can you approach the "best rate" curve any closer without using any other sources of text? (like dictionaries) If so, I'd like to hear from you! Here's the simplest script to generate an evaluation like above.
(BTW, I have several GB's of chatlogs, and I'll make more data available if there's demand)
Since new abbreviations spring up all the time, there's no dictionary that can cover all the words used in text-messages, so an ideal cell-phone will have an algorithm that can adapt to the user. The interesting question is how well can you do?
What makes this domain different from other sources of text is that it's a conversation, consisting of a series of short posts, containing colloquial grammar, bad spelling and abbreviations. I ran some simple algorithms on a similar dataset - 1.5M words of posts from a single person from an online chatroom.
The simplest algorithm is to return the most recent word encountered, consistent with given numbering. You can also return the most frequent word, or have a compound learner that returns one or the other. Cost of 1 is incurred every time an incorrect word is guessed. Here's what the curves look like
The "best rate" curve represents the cost incurred if we only make errors on words that haven't been seen before. After 1.5M words, this "algorithm" makes mistakes about 20% of the time relative to the simple learners, which means there's a lot of room for improvement. And since this "algorithm" only sees 2-3 candidates for each word entered, there's no need to worry about efficient inference.
So here's a challenge -- can you approach the "best rate" curve any closer without using any other sources of text? (like dictionaries) If so, I'd like to hear from you! Here's the simplest script to generate an evaluation like above.
(BTW, I have several GB's of chatlogs, and I'll make more data available if there's demand)
Friday, March 03, 2006
Machine Learning videos
Learning from presentation or slides can be a much easier than from papers. I was reminded of that when I ran across Martin Wainwright's excellent class on Graphical Models. It has lecture notes *and* videos online.
Another large repository of machine learning related videos is the repository organized by the Pascal Network. You may run into a couple of bugs, but Sebastjan Mislej (sebastjan dot mislej dot ijs dot si) is very quick at fixing them when you email him.
Does anyone else have a favourite set of Machine Learning slides/notes/videos online?
Another large repository of machine learning related videos is the repository organized by the Pascal Network. You may run into a couple of bugs, but Sebastjan Mislej (sebastjan dot mislej dot ijs dot si) is very quick at fixing them when you email him.
Does anyone else have a favourite set of Machine Learning slides/notes/videos online?
Subscribe to:
Posts (Atom)