Thursday, May 18, 2006

Curse of Dimensionality and intuition

There are some counter-intuitive things happening as dimension increases.

For instance consider unit sphere (radius 1)

If you go from 2 dimensions to 3 dimensions, the volume increases (Pi to 4/3 pi). Similar increase happens when you go to 3,4,5 dimensions. But then the volume starts to decrease. Eventually it decreases all the way to 0.

Now consider a cube of width 1. As dimension increases, the volume stays the same. But the length of the main diagonal grows as sqrt(dimension). So the corners become situated further and further away from the center, and eventually almost all the mass is concentrated in the corners (meaning outside of the inscribed sphere)

Here's a plot of the fraction of the mass concentrated in the corners as a function of dimension

For a 7 dimensional cube, about 96% of the mass is concentrated in one of it's 128 "corners", so if we had 7 dimensional vision, we might see that cube like this

Finally consider what happens with a d-dimensional standard normal (unit covariance matrix, centered at 0). If we plot probability mass contained around a thin shell of radius r as a function r we get the following graphs (d=1,d=4,d=16)

The mean of this distribution is

which can be approximated as sqrt(d)

Furthermore you can derive that variance of this distribution converges to 1/2 (see notebook). Applying Chebyshev's inequality, we get that for any dimension, at least 0.75 of the mass is contained in the shell of thickness 1 centered at about sqrt(d). In numerical integration this seems to converge to a slightly higher number, for instance for 10^6 dimensions, it's about 0.84

Here's the Mathematica notebook with some derivations (web version)


damien francois said...

Very interesting read. May I recommend the nice overview of Mario Koppen as complementary reading. damien

Anonymous said...

This is the math post of the year!

Anonymous said...

Nice post.

I have some sort of knee-jerk reaction against comparing the volume of spheres in different dimensions. What does it mean to compare the volume of something in 3d with units [L^3] and 4d with units [L^4]? I wouldn't like to say that one is ‘bigger’ than the other. Of course one may compare dimensionless quantities, which I appreciate is what you are doing by comparing the ratio of the volume of the sphere and the enclosing cube. One could also compare to an inscribed cube and see how all of the mass of the sphere gets pushed out to its surface, just as the cube’s mass goes to the corners.

A pratical application that spends some time working out how spheres and cubes fit together in high dimensions is

PS Damien I liked your link. Thanks.

Yaroslav Bulatov said...

Thanks for the link Damien, thanks for the comment Daniel, Iain -- I agree, units are not really meaningful.

Consider a cube with side 1 meter long. As dimensions increase, cube volume stays constant, goes to infinity or to zero depending on whether you use meters, millimeters or kilometers to measure volume.

Relative volume of cube inscribed in a sphere is another interesting calculation -- since diagonal^2 has length dx^2, where x is the length of the side, inscribed cube's "volume" goes down as d^{-d/2} which is faster than the rate at which sphere's volume decreases (about (d/2)^(-d/2))

Anonymous said...

Really nice !
As regards the gaussian part of this post, some graphs are missing ("following graphs d=1,d=4,d=16") and that makes "The mean of this distribution is..." a little confusing because you actually refer to E[||X||] and not to E[X] (which is always 0). This is obvious, but, at first glance I was confused...
Nevertheless, are there consequences on technics using d-normal (like Parzen estimator) ?
And what about the mass repartition of other distributions, with lower/bigger tails ?

damien francois said...

Dear all, high-dimensional data analysis is the topic of my PhD thesis, which I am currently writing. I have gathered quite a few references about 'strange' properties of high dimensional spaces. If you are interested, just send me an email. damien

Anonymous said...

Nice introduction of simple math, high dimensional space seems to be challenging for we 3D human being. Anyway, really nice post and thanks.

bumbu pecel bali said...

this is good post...

please can you visit here..

tengs very much...

draj said...

Excellent machine learning blog,thanks for sharing...
Seo Internship in Bangalore
Smo Internship in Bangalore
Digital Marketing Internship Program in Bangalore

john said...

Great Article
IEEE final year projects on machine learning

JavaScript Training in Chennai

Final Year Project Centers in Chennai

JavaScript Training in Chennai

Mike Smith said...

Our Webroot Geek Squad Agents provide repair, installation and setup services on all kinds of tech at more than 1,100 Best Buy stores – including computer & tablet repair, setup and support, TV & home theater repair, car stereo & GPS installation, cell phone repair and home appliance repair. We fix most makes and models, no matter where you bought them and can show you how to get the most out of your technology. Want to fix any gadget issue? Our Trend Micro Geek Squad team is here to provide you with seamless services for your faulty devices. Geek Squad has set the benchmark when it comes to office or home appliances repairing. Our experts get acquainted with some exclusive technologies for hardware or electronic device issues. We can fix all the hindrances that can’t be solved by a typical customer support approach. The specialized touch of our technicians resolves the problem immediately.

DP said...

Pharm Envee is a global biotechnology and pharmaceutical marketing company that brings targeted products to a global audience. We specialize in utilizing innovative strategies to introduce groundbreaking biotechnology, pharmaceutical agents, and medical devices to market. Our team of marketing consultants and product specialists are dedicated to identifying and, if needed, creating opportunities for our clients’ products to launch and sustain in their targeted marketplaces.