tag:blogger.com,1999:blog-105608002017-10-22T16:03:38.224-07:00Machine Learning, etcYaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.comBlogger119125tag:blogger.com,1999:blog-10560800.post-55539833252123680222017-10-22T16:03:00.001-07:002017-10-22T16:03:38.230-07:00 Optimizing deeper networks with KFAC in PyTorch.Medium post.
(I'm getting too much comment spam on Blogger, so I'll probably use medium/something else from now on, and just link here)Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com0tag:blogger.com,1999:blog-10560800.post-25100008147359129062016-05-18T10:23:00.000-07:002016-05-18T10:23:04.629-07:00Queues in TensorFlowI did an introduction to Queues talk at TensorFlow meetup in SF yesterday.
Here are the slides and the notebook: https://github.com/yaroslavvb/stuff/tree/master/queues_talkYaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com8tag:blogger.com,1999:blog-10560800.post-18152513168308325902015-05-12T16:36:00.001-07:002015-05-12T16:36:27.558-07:00ICLR 2015Some ICLR posters that caught my eye:
[larger image]
Very simple to implement idea that gives impressive results. They force two groups of units to be uncorrelated by penalizing their cross covariance. When the first group is also forced to model classes, the second group automatically models the "style". The problem if separating out "style" has been studied for a while, see Tenenbaum's "Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com2tag:blogger.com,1999:blog-10560800.post-70295089443580364532014-03-05T12:11:00.001-08:002014-03-05T14:02:01.571-08:00Stochastic Gradient Methods 2014Last week I attended Stochastic Gradient Methods workshop held at UCLA's IPAM . Surprisingly, there's still quite a bit of activity and unsolved questions around what is essentially, minimizing a quadratic function.
In 2009 Strohmer and Vershinin rediscovered an algorithm used for solving linear systems of equations from 1970 -- Kaczmarz method, and showed that this algorithm is a form of Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com7tag:blogger.com,1999:blog-10560800.post-25773623893030144122013-12-06T02:11:00.002-08:002014-03-02T08:54:05.791-08:00Deep Learning Internship at Google, Summer 2014We have a couple of internship openings for someone to train deep neural nets find extract interesting things in StreetView imagery. The ideal person would come and push the envelope of what's possible with large amount of training data (billions of labeled image examples for some tasks), and large amount of computation power data (essentially unlimited when you parallelize).
If you are Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com8tag:blogger.com,1999:blog-10560800.post-14307684263251988162012-10-30T14:35:00.002-07:002012-10-30T14:35:47.615-07:00Summer Intern openingWe are looking for a summer intern to apply Deep Learning techniques to the problem of reading text in the wild. More details hereYaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com5tag:blogger.com,1999:blog-10560800.post-38027088866933270862012-05-01T20:53:00.000-07:002012-05-01T20:54:07.027-07:00The Average FontI came across this post post where the author created a font by averaging together all fonts on his machine. I thought it would be cool to do the same for all fonts on the internet -- here's the average of about 375k distinct fonts
It's interesting that shapes are clearly seen even though fonts on the web are quite noisy, here's a random sample of things that make up the A above
Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com6tag:blogger.com,1999:blog-10560800.post-63041105929988165222011-11-21T22:23:00.001-08:002011-11-21T22:29:29.146-08:00Interesting papers coming up at NIPS'11There's a number of accepted papers whose camera-ready versions have been posted already. Here are the ones I found interesting. I'll give further update on these after the conference.
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, P. Krähenbühl, V. Koltun
Fast and Accurate k-means For Large Datasets, M. Shindler, A. Wong, A. Meyerson
Hashing Algorithms for Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com4tag:blogger.com,1999:blog-10560800.post-88703144894392474002011-11-13T22:39:00.001-08:002011-11-14T11:03:31.680-08:00Shapecatcher
Here's a cool tool I stumbled across reading John Cook's blog -- Shape Catcher looks up Unicode value from a drawing of a character.
Apparently it uses Shape Context features.
This motivated me to put together another dataset, unlike notMNIST this focuses on the tail end of Unicode, this is 370k bitmaps representing 29k Unicode values, grouped by Unicode
Unicode 370kYaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com2tag:blogger.com,1999:blog-10560800.post-16032480614038453972011-11-09T13:17:00.000-08:002011-11-13T17:55:06.918-08:00Google1000 dataset
This is a dataset of scans of 1000 public domain books that was released to the public at ICDAR 2007.
At the time there was no public serving infrastructure, so few people actually got the 120GB dataset.
It has since been hosted on Google Cloud Storage and made available for public download
http://commondatastorage.googleapis.com/books/icdar2007/README.txt
http://Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com0tag:blogger.com,1999:blog-10560800.post-78651896744266854652011-11-06T23:01:00.000-08:002011-11-06T23:03:05.519-08:00b-matching as improvement of kNNBelow is an illustration of b-matching from (Huang,Jebara AISTATS 2007) paper. You start with a weighted graph and the goal is to connect each v to k u's to minimize total edge cost. If v's represent labelled datapoints, u's unlabeled and weights correspond to distances, this works as a robust version of kNN classifier (k=2 in the picture) because it prevents any datapoint from exhibiting too Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com0tag:blogger.com,1999:blog-10560800.post-19136715096112685212011-10-25T21:07:00.000-07:002011-10-26T11:43:54.839-07:00Google Internship in Vision/MLMy group has intern openings for winter and summer. Winter may be too late (but if you really want winter, ping me and I'll find out feasibility). We use OCR for Google Books, frames from YouTube videos, spam images, unreadable PDFs encountered by the crawler, images from Google's StreetView cameras, Android and few other areas. Recognizing individual character candidates is a key step in OCR Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com4tag:blogger.com,1999:blog-10560800.post-59717131349975495052011-09-24T12:25:00.000-07:002011-09-24T19:03:39.826-07:00Don't test for exact equality of floating point numbers
A discussion came up on Guido von Rossum's Google Plus post. It comes down to the fact that 2.1 is not exactly represented as a floating point number. Internally it's 2.0999999999999996, and this causes unexpected behavior.
These kinds of issues often come up. The confusion is caused by treating floating point numbers as exact numbers, and expecting calculations with them to produce results Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com3tag:blogger.com,1999:blog-10560800.post-63806375229361894902011-09-08T22:45:00.000-07:002011-09-08T23:05:49.461-07:00notMNIST datasetI've taken some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts.
Here are some examples of letter "A"
Judging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case -- logistic regression on top of stacked auto-encoder with fine-tuning gets Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com62tag:blogger.com,1999:blog-10560800.post-62703744389117602632011-08-16T19:54:00.000-07:002011-08-17T00:24:58.342-07:00Making self-contained Unix programs with CDE
In the old days you could statically link your program and run it on another Unix station without worrying about dependencies. Unfortunately static linking no longer works, so you need to make sure that your target platform has the right libraries.
For instance, in order to get Matlab compiled code running on a server, you have to copy over libraries and set environment variables as specified Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com0tag:blogger.com,1999:blog-10560800.post-84381660305034175002011-07-13T01:52:00.000-07:002011-07-14T20:01:31.138-07:00Google+ ML peopleGoogle+ seems to have a fair number of Machine Learning people, I was able to track down 50 people I've met at conferences by starting at Andrew McCallum's circles. If you add me on Google Circles I'll assume you came from this blog and add you to my "Machine Learning" circleYaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com5tag:blogger.com,1999:blog-10560800.post-76339896482667446202011-06-25T22:33:00.001-07:002011-07-14T20:16:58.126-07:00Embracing non-determinismComputers are supposed to be deterministic. This is often the case for single processor machines. However, as you scale up, guaranteeing determinism becomes increasingly expensive.Even on single processor machines you are facing non-determinism on semi-regular basis. Here are some examples Bugs + poor OS memory control that allows programs to read uninitialized memory. A recent example for me wasYaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com2tag:blogger.com,1999:blog-10560800.post-49340831980913218832011-06-22T10:10:00.000-07:002011-06-22T11:04:20.411-07:00Machine Learning opportunities at GoogleGoogle is hiring and there are lots of opportunities to do Machine Learning-related work here. Kevin Murphy is applying Bayesian methods to video recommendation, Andrew Ng is working on a neural network that can run on millions of cores, and that's just the tip of the iceberg that I've discovered working here for last 3 months.There is machine learning work in both "researcher" and "engineer" Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com11tag:blogger.com,1999:blog-10560800.post-33906423479759773362011-04-30T13:47:00.000-07:002011-05-01T10:30:22.434-07:00Neural Networks making a come-back?Five years ago I ran some queries on Google Scholar to see trends on the number of papers that mention particular phrase. The number of hits for each year was divided by the number of hits for "machine learning". Back then it looked like NN's started gaining in popularity with invention of back-propagation in 1980's, peaked in 1993 and went downhill from there.Since then, there's been several Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com7tag:blogger.com,1999:blog-10560800.post-60601209416587544842011-04-29T22:28:00.001-07:002011-04-29T22:56:53.789-07:00Another ML blogI just noticed that Justin Domke has a blog -- He's one of the strongest researchers in the field of graphical models. I first came across his dissertation when looking for a way to improve loopy-Belief Propagation based training. His thesis gives one such idea -- instead of maximizing the fit of an intractable model, and using BP as intermediate step, maximize the fit of BP marginals directly. Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com1tag:blogger.com,1999:blog-10560800.post-10419513453998671602011-03-13T21:13:00.000-07:002011-03-13T22:00:04.334-07:00Going to GoogleI've accepted an offer from Google and will be joining their Tesseract team next week.I first got interested in OCR when I faced a project at my previous job involving OCR of outdoor scenes and found it to be a very complex task, yet highly rewarding because it's easy to make incremental progress and see your learners working.Current state-of-the-art OCR tools are not at human level of reading, Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com10tag:blogger.com,1999:blog-10560800.post-5746496796180683102011-03-05T03:52:00.000-08:002011-03-05T04:35:22.222-08:00Linear Programming for Maximum Independent SetMaximum independent set, or "maximum stable" set is one of classical NP-complete problems described in Richard Karp's 1972 paper "Reducibility Among Combinatorial Problems". Other NP-complete problems often have a simple reduction to it, for instance, p.3 of Tony Jebara's "MAP Estimation, Message Passing, and Perfect Graphs" shows how MAP inference in an arbitrary MRF reduces to Maximum Weight Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com14tag:blogger.com,1999:blog-10560800.post-14774976175351781742011-03-03T16:38:00.000-08:002011-03-13T22:13:33.862-07:00Perils of floating point arithmeticA recent discussion on stackoverflow brought up the issue of results of floating point arithmetic being non-reproducibleA reader asked what one could do to guarantee that result of floating point computation is always the same, and Daniel Lichtblau, a veteran developer at the kernel group of WRI replied that "it is impossible with current hardware and software"One problem is that IEEE 754 Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com0tag:blogger.com,1999:blog-10560800.post-44570509532100760462011-02-21T15:52:00.000-08:002011-02-24T17:54:44.770-08:00How to patent an algorithm in the USToday I got Google Alert today on the following pending patent -- Belief Propagation for Generalized Matching. I like to stay up on Belief Propagation literature, so I took a closer look. The PDF linked gives a fairly detailed explanation of belief propagation for solving matching problems, including pseudocode which is very detailed, looking like an excerpt of a C program. Appendix A seems to Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com11tag:blogger.com,1999:blog-10560800.post-24719230411237745512011-02-20T17:20:00.000-08:002011-02-21T01:05:43.716-08:00Generalized Distributive LawWith regular distributive law you can do things like$$\sum_{x_1,x_2,x_3} \exp(x_1 + x_2 + x_3)=\sum_{x_1} \exp x_1 \sum_{x_2} \exp x_2 \sum_{x_3} \exp x_3$$This breaks the original large sum into 3 small sums which can be computed independently.A more realistic scenario requires factorization into overlapping parts. For instance take the following$$\sum_{x1,x2,x3,x_4,x_5} \exp(x_1 x_2 + x_2 x_3 +Yaroslav Bulatovhttp://www.blogger.com/profile/06139256691290554110noreply@blogger.com0