Guest post by Pete Attkins
Commentator “jm” asked a profound and pertinent question: “What DOES it take for people to get real in this world, practice some common sense, and pay attention to what’s going on with themselves?” This question was asked in the context of asserting that personal experience always trumps the results of large-scale scientific experiments; and asserting that alt-med experts are better able to provide individulized healthcare than 21st Century orthodox medicine.
What does common sense and paying attention lead us to conclude about the following? We test a six-sided die for bias by rolling it 100 times. The number 1 occurs only once and the number 6 occurs many times, never on its own, but in several groups of consecutive sixes.
I think it is reasonable to say that common sense would, and should, lead everyone to conclude that the die is biased and not fit for its purpose as a source of random numbers.
In other words, we have a gut feeling that the die is untrustworthy. Gut instincts and common sense are geared towards maximizing our chances of survival in our complex and unpredictable world — these are innate and learnt behaviours that have enabled humans to survive despite the harshness of our ever changing habitat.
Only very recently in the long history of our species have we developed specialized tools that enable us to better understand our harsh and complex world: science and critical thinking. These tools are difficult to master because they still haven’t been incorporated into our primary and secondary formal education systems.
The vast majority of people do not have these skills therefore, when a scientific finding flies in the face of our gut instincts and/or common sense, it creates an overwhelming desire to reject the finding and classify the scientist(s) as being irrational and lacking basic common sense. It does not create an intense desire to accept the finding then painstakingly learn all of the science that went into producing the finding.
With that in mind, let’s rethink our common sense conclusion that the six-sided die is biased and untrustworthy. What we really mean is that the results have given all of us good reason to be highly suspicious of this die. We aren’t 100% certain that this die is biased, but our gut feeling and common sense are more than adequate to form a reasonable mistrust of it and to avoid using it for anything important to us. Reasons to keep this die rather than discard it might be to provide a source of mild entertainment or to use its bias for the purposes of cheating.
Some readers might be surprised to discover at this point that the results I presented from this apparently heavily-biased die are not only perfectly valid results obtained from a truly random unbiased die, they are to be fully expected. Even if the die had produced 100 sixes in that test, it would not confirm that the die is biased in any way whatsoever. Rolling a truly unbiased die once will produce one of six possible outcomes. Rolling the same die 100 times will produce one unique sequence out of the 6^100 (6.5 x 10^77) possible sequences: all of which are equally valid!
Gut feeling plus common sense rightfully informs us that the probability of a random die producing one hundred consecutive sixes is so incredibly remote that nobody will ever see it occur in reality. This conclusion is also mathematically sound: if there were 6.5 x 10^77 people on Earth, each performing the same test on truly random dice, there is no guarantee that anyone would observe a sequence of one hundred consecutive sixes.
When we observe a sequence such as 2 5 1 4 6 3 1 4 3 6 5 2… common sense informs us that the die is very likely random. If we calculate the arithmetic mean to be very close to 3.5 then common sense will lead us to conclude that the die is both random and unbiased enough to use it as a reliable source of random numbers.
Unfortunately, this is a perfect example of our gut feelings and common sense failing us abysmally. They totally failed to warn us that the 2 5 1 4 6 3 1 4 3 6 5 2… sequence we observed had exactly the same (im)probability of occurring as a sequence of one hundred 6s or any other sequence that one can think of that doesn’t look random to a human observer.
The 100-roll die test is nowhere near powerful enough to properly test a six-sided die, but this test is more than adequately powered to reveal some of our cognitive biases and some of the deficits in our personal mastery of science and critical thinking.
To properly test the die we need to provide solid evidence that it is both truly random and that its measured bias tends towards zero as the number of rolls tends towards infinity. We could use the services of one testing lab to conduct billions of test rolls, but this would not exclude errors caused by such things as miscalibrated equipment and experimenter bias. It is better to subdivide the testing across multiple labs then carefully analyse and appropriately aggregate the results: this dramatically reduces errors caused by equipment and humans.
In medicine, this testing process is performed via systematic reviews of multiple, independent, double-blind, placebo-controlled trials — every trial that is insufficiently powered to add meaningfully to the result is rightfully excluded from the aggregation.
Alt-med relies on a diametrically opposed testing process. It performs a plethora of only underpowered tests; presents those that just happen to show a positive result (just as a random die could’ve produced); and sweeps under the carpet the overwhelming number of tests that produced a negative result. It publishes only the ‘successes’, not its failures. By sweeping its failures under the carpet it feels justified in making the very bold claim: Our plethora of collected evidence shows clearly that it mostly ‘works’ and, when it doesn’t, it causes no harm.
One of the most acidic tests for a hypothesis and its supporting data (which is a mandatory test in a few branches of critical engineering) is to substitute the collected data for random data that has been carefully crafted to emulate the probability mass functions of the collected datasets. This test has to be run multiple times for reasons that I’ve attempted to explain in my random die example. If the proposer of the hypothesis is unable to explain the multiple failures resulting from this acid test then it is highly likely that the proposer either does not fully understand their hypothesis or that their hypothesis is indistinguishable from the null hypothesis.