In my last post, we discussed the “A+B versus B” trial design as a tool to produce false positive results. This method is currently very popular in alternative medicine, yet it is by no means the only approach that can mislead us. Today, let’s look at other popular options with a view of protecting us against trialists who naively or willfully might fool us.
The crucial flaw of the “A+B versus B” design is that it fails to account for non-specific effects. If the patients in the experimental group experience better outcomes than the control group, this difference could well be due to effects that are unrelated to the experimental treatment. There are, of course, several further ways to ignore non-specific effects in clinical research. The simplest option is to include no control group at all. Homeopaths, for instance, are very proud of studies which show that ~70% of their patients experience benefit after taking their remedies. This type of result tends to impress journalists, politicians and other people who fail to realise that such a result might be due to a host of factors, e.g. the placebo-effect, the natural history of the disease, regression towards the mean or treatments which patients self-administered while taking the homeopathic remedies. It is therefore misleading to make causal inferences from such data.
Another easy method to generate false positive results is to omit blinding. The purpose of blinding the patient, the therapist and the evaluator of the outcomes in clinical trials is to make sure that expectation is not the cause of or contributor to the outcome. They say that expectation can move mountains; this might be an exaggeration, but it can certainly influence the result of a clinical trial. Patients who hope for a cure regularly do get better even if the therapy they receive is useless, and therapists as well as evaluators of the outcomes tend to view the results through rose-tinted spectacles, if they have preconceived ideas about the experimental treatment. Similarly, the parents of a child or the owners of an animal can transfer their expectations, and this is one of several reasons why it is incorrect to claim that children and animals are immune to placebo-effects.
Failure to randomise is another source of bias which can make an ineffective therapy look like an effective one when tested in a clinical trial. If we allow patients or trialists to select or choose which patients receive the experimental and which get the control-treatment, it is likely that the two groups differ in a number of variables. Some of these variables might, in turn, impact on the outcome. If, for instance, doctors allocate their patients to the experimental and control groups, they might select those who will respond to the former and those who don’t to the latter. This may not happen with malicious intent but through intuition or instinct: responsible health care professionals want those patients who, in their experience, have the best chances to benefit from a given treatment to receive that treatment. Only randomisation can, when done properly, make sure we are comparing comparable groups of patients, and non-randomisation is likely to produce misleading findings.
While these options for producing false positives are all too obvious, the next possibility is slightly more intriguing. It refers to studies which do not test whether an experimental treatment is superior to another one (often called superiority trials), but to investigations attempting to assess whether it is equivalent to a therapy that is generally accepted to be effective. The idea is that, if both treatments produce the same or similarly positive results, both must be effective. For instance, such a study might compare the effects of acupuncture to a common pain-killer. Such trials are aptly called non-superiority or equivalence trials, and they offer a wide range of possibilities for misleading us. If, for example, such a trial has not enough patients, it might show no difference where, in fact, there is one. Let’s consider a deliberately silly example: someone comes up with the idea to compare antibiotics to acupuncture as treatments of bacterial pneumonia in elderly patients. The researchers recruit 10 patients for each group, and the results reveal that, in one group, 2 patients died, while, in the other, the number was 3. The statistical tests show that the difference of just one patient is not statistically significant, and the authors therefore conclude that acupuncture is just as good for bacterial infections as antibiotics.
Even trickier is the option to under-dose the treatment given to the control group in an equivalence trial. In our hypothetical example, the investigators might subsequently recruit hundreds of patients in an attempt to overcome the criticism of their first study; they then decide to administer a sub-therapeutic dose of the antibiotic in the control group. The results would then apparently confirm the researchers’ initial finding, namely that acupuncture is as good as the antibiotic for pneumonia. Acupuncturists might then claim that their treatment has been proven in a very large randomised clinical trial to be effective for treating this condition, and people who do not happen to know the correct dose of the antibiotic could easily be fooled into believing them.
Obviously, the results would be more impressive, if the control group in an equivalence trial received a therapy which is not just ineffective but actually harmful. In such a scenario, the most useless or even slightly detrimental treatment would appear to be effective simply because it is equivalent to or less harmful than the comparator.
A variation of this theme is the plethora of controlled clinical trials which compare one unproven therapy to another unproven treatment. Perdicatbly, the results indicate that there is no difference in the clinical outcome experienced by the patients in the two groups. Enthusiastic researchers then tend to conclude that this proves both treatments to be equally effective.
Another option for creating misleadingly positive findings is to cherry-pick the results. Most trails have many outcome measures; for instance, a study of acupuncture for pain-control might quantify pain in half a dozen different ways, it might also measure the length of the treatment until pain has subsided, the amount of medication the patients took in addition to receiving acupuncture, the days off work because of pain, the partner’s impression of the patient’s health status, the quality of life of the patient, the frequency of sleep being disrupted by pain etc. If the researchers then evaluate all the results, they are likely to find that one or two of them have changed in the direction they wanted. This can well be a chance finding: with the typical statistical tests, one in 20 outcome measures would produce a significant result purely by chance. In order to mislead us, the researchers only need to “forget” about all the negative results and focus their publication on the ones which by chance have come out as they had hoped.
One fail-proof method for misleading the public is to draw conclusions which are not supported by the data. Imagine you have generated squarely negative data with a trial of homeopathy. As an enthusiast of homeopathy, you are far from happy with your own findings; in addition you might have a sponsor who puts pressure on you. What can you do? The solution is simple: you only need to highlight at least one positive message in the published article. In the case of homeopathy, you could, for instance, make a major issue about the fact that the treatment was remarkably safe and cheap: not a single patient died, most were very pleased with the treatment which was not even very expensive.
And finally, there is always the possibility of overt cheating. Researchers are only human and are thus not immune to temptation. They may have conflicts of interest or may know that positive results are much easier to publish than negative ones. Certainly they want to publish their work – “publish or perish”! So, faced with disappointing results of a study, they might decide to prettify them or even invent new ones which are more pleasing to them, their peers, or their sponsors.
Am I claiming that this sort of thing only happens in alternative medicine? No! Obviously, the way to minimise the risk of such misconduct is to train researchers properly and make sure they are able to think critically. Am I suggesting that investigators of alternative medicine are often not well-trained and almost always uncritical? Yes.
How do you fancy playing a little game? Close your eyes, relax, take a minute or two and imagine the newspaper headlines which new medical discoveries might make within the next 100 years or so. I know, this is a slightly silly and far from serious game but, I promise, it’s quite good fun.
Personally, I see the following headlines emerging in front of my eyes:
VACCINATION AGAINST AIDS READY FOR ROUTINE USE
IDENTIFICATION OF THE CAUSE OF DEMENTIA LEADS TO FIRST EFFECTIVE CURE
GENE-THERAPY BEGINS TO SAVE LIVES IN EVERY DAY PRACTICE
CANCER, A NON-FATAL DISEASE
HEALTHY AGEING BECOMES REALITY
Yes, I know this is nothing but naïve conjecture mixed with wishful thinking, and there is hardly anything truly surprising in my list.
But, hold on, is it not remarkable that I visualise considerable advances in conventional healthcare but no similarly spectacular headlines relating to alternative medicine? After all, alternative medicine is my area of expertise. Why do I not see the following announcements?
YET ANOTHER HOMEOPATH WINS THE NOBEL PRIZE
CHIROPRACTIC SUBLUXATION CONFIRMED AS THE SOLE CAUSE OF MANY DISEASES
CHRONICALLY ILL PATIENTS CAN RELY ON BACH FLOWER REMEDIES
CHINESE HERBS CURE PROSTATE CANCER
ACUPUNCTURE MAKES PAIN-KILLERS OBSOLETE
ROYAL DETOX-TINCTURE PROLONGS LIFE
CRANIOSACRAL THERAPY PROVEN EFFECTIVE FOR CEREBRAL PALSY
IRIDOLOGY, A VALID DIAGNOSTIC TEST
How can I be so confident that such headlines about alternative medicine will not, one day, become reality?
Simple: because I only need to study the past and realise which breakthroughs have occurred within the previous 100 years. Mainstream scientists and doctors have discovers insulin-therapy that turned diabetes from a death sentence into a chronic disease, they have developed antibiotics which saved millions of lives, they have manufactured vaccinations for deadly infections, they have invented diagnostic techniques that made early treatment of many life-threatening conditions possible etc, etc, etc.
None of the many landmarks in the history of medicine has ever been in the realm of alternative medicine.
What about herbal medicine? Some might ask. Aspirin, vincristine, taxol and other drugs originated from the plant kingdom, and I am sure there will be similar such success-stories in the future.
But were these truly developments driven by traditional herbalists? No! They were discoveries entirely based on systematic research and rigorous science.
Progress in healthcare will not come from clinging to a dogma, nor from adhering to yesterday’s implausibilites, nor from claiming that clinical experience is more important than scientific research.
I am not saying, of course, that all of alternative medicine is useless. I am saying, however, that it is time to get realistic about what alternative treatments can do and what it cannot achieve. They will not save many lives, for instance; an alternative cure for anything is a contradiction in terms. The strength of some alternative therapies lies in palliative and supportive care and not in changing the natural history of diseases.
Yet proponents of alternative medicine tend to ignore this all too obvious fact and go way beyond the line that divides responsible from irresponsible behaviour. The result is a plethora of bogus claims – and this is clearly not right. It raises false hopes which, in a nutshell, are always unethical and often cruel.
Science has seen its steady stream of scandals which are much more than just regrettable, as they undermine much of what science stands for. In medicine, fraud and other forms of misconduct of scientists can even endanger the health of patients.
On this background, it would be handy to have a simple measure which would give us some indication about the trustworthiness of scientists, particularly clinical scientists. Might I be as bold as to propose such a method, the TRUSTWORTHINESS INDEX (TI)?
A large part of clinical science is about testing the efficacy of treatments, and it is the scientist who does this type of research who I want to focus on. It goes without saying that, occasionally, such tests will have to generate negative results such as “the experimental treatment was not effective” [actually “negative” is not the right term, as it is clearly positive to know that a given therapy does not work]. If this never happens with the research of a given individual, we could be dealing with false positive results. In such a case, our alarm bells should start ringing, and we might begin to ask ourselves, how trustworthy is this person?
Yet, in real life, the alarm bells rarely do ring. This absence of suspicion might be due to the fact that, at one point in time, one single person tends to see only one particular paper of the individual in question – and one result tells him next to nothing about the question whether this scientist produces more than his fair share of positive findings.
What is needed is a measure that captures the totality of a researcher’s out-put. Such parameters already exist; think of the accumulated ”Impact Factor” or the ”H-Index”, for instance. But, at best, these citation metrics provide information about the frequency or impact of this person’s published papers and totally ignore his trustworthiness. To get a handle on this particular aspect of a scientist’s work, we might have to consider not the impact but the direction of his published conclusions.
If we calculated the percentage of a researcher’s papers arriving at positive conclusions and divided this by the percentage of his papers drawing negative conclusions, we might have a useful measure. A realistic example might be the case of a clinical researcher who has published a total of 100 original articles. If 50% had positive and 50% negative conclusions about the efficacy of the therapy tested, his TI would be 1.
Depending on what area of clinical medicine this person is working in, 1 might be a figure that is just about acceptable in terms of the trustworthiness of the author. If the TI goes beyond 1, we might get concerned; if it reaches 4 or more, we should get worried.
An example would be a researcher who has published 100 papers of which 80 are positive and 20 arrive at negative conclusions. His TI would consequently amount to 4. Most of us equipped with a healthy scepticism would consider this figure highly suspect.
Of course, this is all a bit simplistic, and, like all other citation metrics, my TI provides us not with any level of proof; it merely is a vague indicator that something might be amiss. And, as stressed already, the cut-off point for any scientist’s TI very much depends on the area of clinical research we are dealing with. The lower the plausibility and the higher the uncertainty associated with the efficacy of the experimental treatments, the lower the point where the TI might suggest something to be fishy.
A good example of an area plagued with implausibility and uncertainty is, of course, alternative medicine. Here one would not expect a high percentage of rigorous tests to come out positive, and a TI of 0.5 might perhaps already be on the limit.
So how does the TI perform when we apply it to my colleagues, the full-time researchers in alternative medicine? I have not actually calculated the exact figures, but as an educated guess, I estimate that it would be very hard, even impossible, to find many with a TI under 4.
But surely this cannot be true! It would be way above the acceptable level which we just estimated to be around 0.5. This must mean that my [admittedly slightly tongue in cheek] idea of calculating the TI was daft. The concept of my TI clearly does not work.
The alternative explanation for the high TIs in alternative medicine might be that most full-time researchers in this field are not trustworthy. But this hypothesis must be rejected off hand – or mustn’t it?