Friday, November 02, 2007

Why software libraries aren't reused

Software Libraries and Their Reuse: Entropy, Kolmogorov Complexity, and Zipf’s Law


Gustavo Lacerda said...

I really like the sorts of ideas used in this paper, and I have a lot to digest here.

How can one measure the entropy of a domain? Do we need a dataset of programs from the same domain written by the same "perfect" programmer?

By "compressed", he means using libraries. Would it be useful to do some tighter compression using, e.g. gzip? That would be a better approximation to. It seems to me that some automatic rewriting for more reuse would be useful (compressing what the programmers failed to compress).

free associating:
* adaptive recompilation
* re: modeling of domains, can an IDE be trained to predict what library you're going to need next, conditional on what you've used so far?

OiOi said...

interesting stuff. It looks as though an individual program in a given domain is described by the compiled binary size.
For any given domain + problem type if the sizes are too dispersed the domain/subdomain has more entropy. Uniform distrbution increases the entropy and prevents the reuse.
great application of statistics.