Machine Learning, etc: Embracing non-determinism

Saturday, June 25, 2011

Embracing non-determinism

Computers are supposed to be deterministic. This is often the case for single processor machines. However, as you scale up, guaranteeing determinism becomes increasingly expensive.

Even on single processor machines you are facing non-determinism on semi-regular basis. Here are some examples

Bugs + poor OS memory control that allows programs to read uninitialized memory. A recent example for me was when I used FontForge to process a large set of files, and had it crash on a different set every time I ran it.
/dev/random gives "truly random" numbers, however, even if the program uses regular rand() seeded by time, it's essentially non-deterministic since so many factors affect the time of seed()
Floating point operations on modern architecture are non-deterministic because of run-time optimization -- depending on the pattern of memory allocation, processor will rearrange the order of your arithmetic operations, producing slightly different results on reruns
Soft-errors in circuits due to radiation, about one error per day, according to stats here
Some CPU units are bad and produce incorrect results. I don't know exact stats on this, but the guy who did the Petasort experiment told me that it was one of the issues they had to deal with.

.

Those modes of failure may be rare on one computer, but once you scale your computation up to many cores, it becomes a fact of life.

In a distributed application like sorting a petabyte of data, you should expect about 1 hardrive failure, 100 irrecoverable disk read errors, and a couple of undetected memory errors if you use error correcting RAM (more if you use regular RAM).

You can make your computation deterministic by adding checksums at each stage and recomputing the parts that fail. However, this is expensive, and becomes disporportionately more expensive as you scale up. When the Czajkowski et al sorted 100 trillion records deterministically, most of the time was spent on error recovery.

I've been running an aggregation that is modest in comparison, going over over about 100 TB of data and getting slightly different results each time. Theoretically I could add recovery stages that get rid of non-determinism, but at a cost that is not justified in my case.

Large scale analysis systems like Dremel are configured to produce approximate results by default, otherwise, the long tail of packet arrival times means that you'll spent most of the time waiting for a few "stragglers"

Given the increasing cost of determinism it makes sense to think of it in terms of cost-benefit analysis and trade off the benefit of reproducible results against the cost of 100% error recovery.

A related idea come up in Vikash Mansinghka presentation at NIPS 2010 workshop on monte carlo methods: Real world decision problems can be formulated as inference and solved with MCMC sampling, so it doesn't make sense to spend so much effort ensuring determinism only to destroy it with /dev/random -- you could achieve greater efficiency by using analogue circuits to start with. His company, Navia, is working in this direction.

There's some more discussion on determinism on Daniel Lemire's blog

28 comments:

Itman said...: I agree. It is not only hardware errors. It is also the order of execution. Perhaps, most non-determinism comes from multi-threaded applications.; 8:20 AM
Anonymous said...: Get access to all available offers from paypal for free and it all works. Redeem free paypal cash and get many more. Good and very informative. Love to share such blogs with all my friends same as the website which is menat to be shared free paypal money adder as it is a good public welfare work to do, this mentioned website has free paypal money and gift cards to offers. its real and working, Visit today.; 12:27 AM
Unknown said...: useful blog
python interview questions
cognos interview questions
perl interview questions
vlsi interview questions
web api interview questions
msbi interview questions; 11:32 PM
pythonV said...: amazing content really informational !!

check our site too :pytholabs; 10:32 AM
remo said...: THANKS FOR THE INFORMATION....
Digital Marketing Internship Program in BangaloreDigital Marketing Internship Program in Bangalore; 11:44 PM
Anonymous said...: Hello, o you know that the best way to boost your brain is
visiting or contacting us

https://weiiitrading.com/our-products/moonrock-carts/buy-empty-moonrock-clear-vape-cartridges-blue-carts-dr-zodiak-atomizers-with-flavor-box-packaging/

https://weiiitrading.com/our-products/heavy-hitters-carts/buy-wholesale-new-heavy-hitter-vape-cartridges-1-0ml-ceramic-coil-empty-tank-carts-510-thread-thick-oil-atomizer/

https://weiiitrading.com/our-products/juul-carts/buy-hot-empty-ceramic-pod-disassembled-cartridges-0-7ml-1-0ml-vape-pod-carts-for-vape-juul-vape-pen-start-kit-top-quality/

https://weiiitrading.com/our-products/mario-carts/buy-peaches-and-dream/

https://weiiitrading.com/our-products/heavy-hitters-carts/buy-bubba-kush-cartridge-2-2g/

https://weiiitrading.com/our-products/mario-carts/buy-thin-mint-cookies/

Pila Brass Knuckles Online for sale online,

where to buy Buy Pila Brass Knuckles Online

buy valley online cartridge

buy space candy online

buy cannabis syrup online

buy botox online

cannabis bread

uk chese

47 dank vapewhite fire og

buy moonrock

Email Us

Contact: +1 619-537-6734; 3:25 AM
https://www.ezeelinux.com/talk/member.php?action=profile&uid=1570 said...: really awesome post
thanks for sharing; 11:50 PM
playapkpro said...: In a distributed application like sorting a petabyte of data, you should expect about 1 hard drive failure, 100 irrecoverable disk read errors, and a couple of undetected memory errors if you use error-correcting RAM (more if you use regular RAM).
that type of error is really hit on chats apps or software on the device; 4:59 AM
Michael said...: In need of nursing assignment help uk turnout to Assignments Planet for all assignment services at a cheap price.; 3:19 AM
yo whatsapp apk download said...: yo whatsapp apk download thanks for sharing download more features whatsapp; 3:48 PM
Anonymous said...: Thanks for amazing information. Can you want to buy Thigh High Socks
online?; 9:59 AM
COCPUREAPK.COM said...: Thanks for the post.. Here are the some interesting posts regarding your niche. Check now- clash of lights. cf; 10:34 AM
Unknown said...: Thanks for the post.. Here are the some interesting posts regarding your niche check the new blanket hoodies.big hoodie blanket; 11:04 AM
Unknown said...: Great article!Check the new All by itself, blanket hoodies are fantastic outerwear. It is a spicy addition to your winter wardrobe. It makes you look cool, stylish, and cute, all simultaneously.wearable blanket hoodie; 11:17 AM
COCPUREAPK.COM said...: Are you looking for games darksoul of clash, having more features than a clash of clans, then you are at the perfect place we have a fantastic game with impressive features.; 8:51 AM
Anonymous said...: Amazing! This is very usefull for me. Also check hoodie blankets; 6:17 AM
Georgetown Fitness Centre said...: One of the choices in Georgetown, Texas for fitness centers is Gyms Georgetown Fitness Centre. For individuals living in the area who are looking for a place to work out, Gyms Georgetown TX offers a variety of amenities and services to help them reach their fitness goals. With state-of-the-art equipment, experienced trainers, and group fitness classes, Gyms Georgetown TX is a popular destination for those who want to stay active and healthy.

The gym's convenient location in Georgetown makes it easily accessible for residents and visitors alike, making it a great choice for anyone looking to improve their physical well-being.; 3:19 AM
엔에프엘뉴스 said...: I am very glad to be here.; 8:28 PM
토토사이트 said...: It is so amazing and interesting. I wanna tell you that it is very helpful for us.; 8:28 PM
폴 아렐라노다리엘 로슨 said...: Some truly superb blog posts on this website, thanks for contribution.; 1:31 AM
조이 샌포드삼손 루카스 said...: I must say it is very informative blog as well as interesting. Many thanks; 1:31 AM
탈리아 힉맨세스 브래디 said...: Such a clever blog work, Keep doing this kind of post, I support you; 1:31 AM
라이언 존스아멜리아 팜 said...: I’ve been following this web site. Thank you for providing a fine content!!!; 1:31 AM
바카라사이트 said...: the information is really very interesting and very easy to understand, I really like this!
소액결제 현금화
신용카드 현금화; 12:54 AM
파워볼사이트 said...: Simply want to say your article is as astonishing.
파워볼사이트
카지노사이트; 12:55 AM
Belinda said...: I was diagnosed with Parkinson’s disease four years ago. After traditional medications stopped working, I tried a herbal treatment from NaturePath Herbal Clinic Within months, my tremors eased, balance improved, and I regained my energy. It’s been life-changing I feel like myself again. If you or a loved one has Parkinson’s, I recommend checking out their natural approach at [www.naturepathherbalclinic.com]. info@naturepathherbalclinic.com; 2:44 AM
카지노사이트 카지노사이트 모음 said...: Cool stuff you have got and you keep update all of us.; 7:52 PM
Amit Kushwah said...: This is amazing! Thank you for sharing - your work is truly inspiring. Your talent shines through in everything you create! Physics can feel challenging with its formulas, diagrams, and problem-solving concepts—but creating a high-quality Physics assignment doesn’t have to be stressful. Physics assignment help can make the entire process easier by offering guidance, clarifying tough concepts, and supporting you with step-by-step solutions. With the right approach, proper planning, and effective support, you can complete your work faster and with greater confidence.; 4:41 AM