Saturday, October 15, 2016

The Problem With P-values: We Got Probability Wrong

Serious stuff.

I've mentioned the fact that if you can't can't replicate what you're doing, what you're doing isn't science. It might be metaphysics, it might be pseudoscience, it might be religion, it might be any number of things but it isn't science.

Reproducibility and falsifiability, along with predictive power are pretty much the definition of science.
(here's a quick explanation of the difference between reproducibility and replicability in science) 

What Mendeleev did in describing the properties of as-yet undiscovered elements was science. So-called post-normal science is not, it's policymaking gussied up with sciency sounding words.

See Feynman's 1974 Caltech commencement address. Cargo Cult Science for a really smart guy's take on the issue.*

From Aeon:

The problem with p-values
Academic psychology and medical testing are both dogged by unreliability. The reason is clear: we got probability wrong
The aim of science is to establish facts, as accurately as possible. It is therefore crucially important to determine whether an observed phenomenon is real, or whether it’s the result of pure chance. If you declare that you’ve discovered something when in fact it’s just random, that’s called a false discovery or a false positive. And false positives are alarmingly common in some areas of medical science.
In 2005, the epidemiologist John Ioannidis at Stanford caused a storm when he wrote the paper ‘Why Most Published Research Findings Are False’, focusing on results in certain areas of biomedicine. He’s been vindicated by subsequent investigations. For example, a recent article found that repeating 100 different results in experimental psychology confirmed the original conclusions in only 38 per cent of cases. It’s probably at least as bad for brain-imaging studies and cognitive neuroscience. How can this happen?

The problem of how to distinguish a genuine observation from random chance is a very old one. It’s been debated for centuries by philosophers and, more fruitfully, by statisticians. It turns on the distinction between induction and deduction. Science is an exercise in inductive reasoning: we are making observations and trying to infer general rules from them. Induction can never be certain. In contrast, deductive reasoning is easier: you deduce what you would expect to observe if some general rule were true and then compare it with what you actually see. The problem is that, for a scientist, deductive arguments don’t directly answer the question that you want to ask.

What matters to a scientific observer is how often you’ll be wrong if you claim that an effect is real, rather than being merely random. That’s a question of induction, so it’s hard. In the early 20th century, it became the custom to avoid induction, by changing the question into one that used only deductive reasoning. In the 1920s, the statistician Ronald Fisher did this by advocating tests of statistical significance. These are wholly deductive and so sidestep the philosophical problems of induction.

Tests of statistical significance proceed by calculating the probability of making our observations (or the more extreme ones) if there were no real effect. This isn’t an assertion that there is no real effect, but rather a calculation of what would be expected if there were no real effect. The postulate that there is no real effect is called the null hypothesis, and the probability is called the p-value. Clearly the smaller the p-value, the less plausible the null hypothesis, so the more likely it is that there is, in fact, a real effect. All you have to do is to decide how small the p-value must be before you declare that you’ve made a discovery. But that turns out to be very difficult.

The problem is that the p-value gives the right answer to the wrong question. What we really want to know is not the probability of the observations given a hypothesis about the existence of a real effect, but rather the probability that there is a real effect – that the hypothesis is true – given the observations. And that is a problem of induction.

Confusion between these two quite different probabilities lies at the heart of why p-values are so often misinterpreted. It’s called the error of the transposed conditional. Even quite respectable sources will tell you that the p-value is the probability that your observations occurred by chance. And that is plain wrong....MORE
 *Feynman's speech to the Caltech brainiacs begins:
"During the Middle Ages there were all kinds of crazy ideas, such as that a piece of rhinoceros horn would increase potency. Then a method was discovered for separating the ideas--which was to try one to see if it worked, and if it didn't work, to eliminate it. This method became organized, of course, into science. And it developed very well, so that we are now in the scientific age. It is such a scientific age, in fact that we have difficulty in understanding how witch doctors could ever have existed, when nothing that they proposed ever really worked--or very little of it did.

But even today I meet lots of people who sooner or later get me into a conversation about UFOS, or astrology, or some form of mysticism, expanded consciousness, new types of awareness, ESP, and so forth. And I've concluded that it's not a scientific world.

Most people believe so many wonderful things that I decided to investigate why they did. And what has been referred to as my curiosity for investigation has landed me in a difficulty where I found so much junk that I'm overwhelmed. First I started out by investigating various ideas of mysticism, and mystic experiences. I went into isolation tanks and got many hours of hallucinations, so I know something about that. Then I went to Esalen, which is a hotbed of this kind of thought (it's a wonderful place; you should go visit there). Then I became overwhelmed. I didn't realize how much there was.

At Esalen there are some large baths fed by hot springs situated on a ledge about thirty feet above the ocean. One of my most pleasurable experiences has been to sit in one of those baths and watch the waves crashing onto the rocky shore below, to gaze into the clear blue sky above, and to study a beautiful nude as she quietly appears and settles into the bath with me.

One time I sat down in a bath where there was a beautiful girl sitting with a guy who didn't seem to know her. Right away I began thinking, "Gee! How am I gonna get started talking to this beautiful nude babe?"

I'm trying to figure out what to say, when the guy says to her, I'm, uh, studying massage. Could I practice on you?" 
"Sure," she says. They get out of the bath and she lies down on a massage table nearby. I think to myself, "What a nifty line! I can never think of anything like that!"

He starts to rub her big toe. "I think I feel it, "he says. "I feel a kind of dent--is that the pituitary?" I blurt out, "You're a helluva long way from the pituitary, man!"

They looked at me, horrified--I had blown my cover--and said, "It's reflexology!" I quickly closed my eyes and appeared to be meditating...."