The links of the week

A few articles which I have recently read:

  • The worst academic job . An article summarizing what’s wrong with academic career paths today in the US and in Europe.
  • Is Google making us stupid ?: . A bit late, but it is an interesting article on how Google, and more generally easy access to a vast knowledge base may influence how we think. I can’t help linking this to a recent work from James Evans on the effects of open access to science (discussed in The Economist here).

Is science obsolete ?

That’s basically what is argued in the Wired article“the end of theory” (found through the article “La fin de la thĂ©orie?”, on the excellent Econ/French blog econoclastes). The article itself is not that interesting: it tries to be provocative, but fails at giving good arguments for his case. The main argument is that thanks to enormous amount of data, number crunching, and computer farms as available for example at Google, it will be more effective to find patterns with just data, and without models. But the analysis is fundamentally flawed at several levels, both scientific and philosophical. It is true that some sciences, or more exactly some activities traditionally labeled as science are endangered by number crunching; but computers already made some repetitive activities obsolete, and it is hardly big news, except for the people concerned by the changes.

First, for the epistemological arguments against this thesis: the reasons why science is first about making theories, and then making experiments to confront the theory to reality is not just about practicality. There are some fundamental reasons why this is the case, as mentioned in the article by econoclaste (warning, in French): data gathering itself is subject to various biases, and theory can somewhat alleviate this bias. There is also an ambiguity on what Chris Anderson means by scientific models: he argues that google succeeded in getting reliable search results by avoiding using a model. But you could argue on the contrary that Google did better than everyone else because it had a better model for getting interesting pages related to keywords; indeed, the PageRank algorithm, which is the foundation of Google search engine, is a better algorithm than what other search engines used to do. And the PageRank algorithm is based upon some other works and theories, in particular citation analysis, which trace back to at least 1950. Another example given by Anderson is translation: he argues that translation with any language model can work better than with any linguistic knowledge. But arguing that translation can be done better without knowing the language is different than arguing it can be done without any model at all. For people who work on machine translation, it is actually quite well known that you don’t need to know a language to be successful in translating to it.

Reductionism and the curse of dimensionality

But more significantly in my opinion, the number crunching approach is fundamentally reductionist: it assumes you can explain the whole phenomenon from its smallest parts. A typical example of the failure of reductionism in science is fluid mechanics: you could explain the behavior of a fluid from the behavior from each particle in your fluid, but actually, you can’t, because once you have a reasonable number of particles, it becomes intractable to do it at the particle level. Tom Roud made a similar argument (in French) for the flaws in reductionist approach in biology.

There are some theoretical reasons why you will never manage to make a model of everything just from data. Most number crunching data methods are statistical in nature, and rely on estimating some probability distribution. From this point of view, Anderson’s argument can be understood as “with enough data, you can estimate any probability distribution”. But this is not true for several reasons; one is that complex problems often require computation in high-dimension spaces, and high-dimension spaces have some funky properties which are not intuitive and do not map well to our fundamentally three dimension world. One particularly significant property is the localization of volume in smooth solids. For example, in high dimension, most of the volume of a sphere is on a very thin shell, e.g. really near the surface. In dimension 1000, for a sphere of radius one, the volume contained in the sub-sphere of radius 0.5 is only 1/10^300 of the total volume. This means that if you could put uniformly all the atoms of the universe in the sphere, you would not even get one in the subsphere. In statistics, this phenomenon is known as the curse of dimensionality: the number of data necessary for estimation in high dimensions grows exponentially with the number of dimension.

Also, more data does not always mean you will get better information: a common quote in the data-mining community is “there is no better data than more data”, but this is a fallacy. You want data which brings more information, and in some cases, you can only easily get data which are not very informative. For example, when transcribing speech with computers machine translation (Automatic Speech Recognition, e.g. “speaking to your computer instead of typing”), you need to estimate the probability distribution of the words, you are interested in the probability of the words which do not appear often. After analyzing a few thousand examples, you will get a pretty good estimation of the “behavior” of common words like “the” and the likes, but maybe not for words like “hermeneutic”. And for practical applications, those rare words are the one which matter: if you miss “the” in a sentence, you can still understand it, but if you miss “hermeneutic”, this is much less likely.

Is number crunching new ?

So is this number crunching really the beginning of something new ? Actually, similar thesis have been argued before Anderson; the fields of data-mining and artificial intelligence (AI) have since their inception an history of making claims which never really materialize (AI, for example, has known several “AI winter”, for periods of low-funding, generally after periods of high-funding and high claims about what AI could do). Anyone familiar with the data-mining and artificial intelligence communities should be skeptical about big announcements like paradigm shift, or like here claiming to make science obsolete. I would not be surprised if AI/data-mining/associated fields are the ones which use the expression  “paradigm shift” the most often.

It baffles me that people still argue similar points with similar claims as 50 years ago.

Linked articles

For more on this, you can also see on Cosma’s blog. Also, in French:

Mathematical fundation for probability and statistics

I’ve been reading the new book from Chistopher Bishop, Pattern Recognition and Machine Learning, and once again, I realize that I really lack a strong mathematical fundation for statistics. Not that the book from Bishop requires it, it does not go really deep into the mathematical side, and anyone with an undergraduate level in calculus should be able to follow it quite easily. But I would really like to see proofs for some results (convergence of EM, why and in which conditions Variational Bayes approaches gives a good approximation, etc…). One of the point I keep being confused by is everything related to conditional expectation. I have a hard time to really ‘get it’ and use it comfortably.

What I am looking for is some books which are:

  1. at a graduate level
  2. mathematically sound (with proof of convergence, for example)
  3. Ideally, can be used a self-study (exercices + solutions).

Some of the references which look worth being looked at:

  1. A course in probability theory from Kai Lai Chung -> I just bought it. Seems concise
  2. The Elements of Statistical Learning by T. Hastie, R. Tibshirani and J. H. Friedman.
  3. Probability: A Graduate Course by Allan Gut. This one look good, with exercices (no solution, though). May seem like a detail, but the typography is really good: this is basic latex style, but that’s what I prefer by far for math-heavy text books.
  4. Fundation of probability by O. Kallenberg. I once borrowed it: it look like all the points I keep being lost at are treated, but the level is quite above mine for the moment.
  5. Mathematical Statistics by Jun Shao. This once has a companion book with solutions.

Will see how it goes with the first one, which I’ve just bought, and if it is enough to follow a Bayesian Choice, which I am still waiting for, and was recommended by A. Doucet as a good introduction to rigourous Bayesian statistics.