Musculoskeletal and rheumatic conditions, often just called “arthritis” by lay people, bring more patients to alternative practitioners than any other type of disease. It is therefore particularly important to know whether alternative medicines (AMs) demonstrably generate more good than harm for such patients. Most alternative practitioners, of course, firmly believe in what they are doing. But what does the reliable evidence show?
To find out, ‘Arthritis Research UK’ has sponsored a massive project lasting several years to review the literature and critically evaluate the trial data. They convened a panel of experts (I was one of them) to evaluate all the clinical trials that are available in 4 specific clinical areas. The results for those forms of AM that are to be taken by mouth or applied topically have been published some time ago, now the report, especially written for lay people, on those treatments that are practitioner-based has been published. It covers the following 25 modalities:
Chiropractic (spinal manipulation)
Kinesiology (applied kinesiology)
Magnet therapy (static magnets)
Osteopathy (spinal manipulation)
Qigong (internal qigong)
Our findings are somewhat disappointing: only very few treatments were shown to be effective.
In the case of rheumatoid arthritis, 24 trials were included with a total of 1,500 patients. The totality of this evidence failed to provide convincing evidence that any form of AM is effective for this particular condition.
For osteoarthritis, 53 trials with a total of ~6,000 patients were available. They showed reasonably sound evidence only for two treatments: Tai chi and acupuncture.
Fifty trials were included with a total of ~3,000 patients suffering from fibromyalgia. The results provided weak evidence for Tai chi and relaxation-therapies, as well as more conclusive evidence for acupuncture and massage therapy.
Low back pain had attracted more research than any of the other diseases: 75 trials with ~11,600 patients. The evidence for Alexander Technique, osteopathy and relaxation therapies was promising by not ultimately convincing, and reasonably good evidence in support of yoga and acupuncture was also found.
The majority of the experts felt that the therapies in question did not frequently cause harm, but there were two important exceptions: osteopathy and chiropractic. For both, the report noted the existence of frequent yet mild, as well as serious but rare adverse effects.
As virtually all osteopaths and chiropractors earn their living by treating patients with musculoskeletal problems, the report comes as an embarrassment for these two professions. In particular, our conclusions about chiropractic were quite clear:
There are serious doubts as to whether chiropractic works for the conditions considered here: the trial evidence suggests that it’s not effective in the treatment of fibromyalgia and there’s only little evidence that it’s effective in osteoarthritis or chronic low back pain. There’s currently no evidence for rheumatoid arthritis.
Our point that chiropractic is not demonstrably effective for chronic back pain deserves some further comment, I think. It seems to be in contradiction to the guideline by NICE, as chiropractors will surely be quick to point out. How can this be?
One explanation is that, since the NICE-guidelines were drawn up, new evidence has emerged which was not positive. The recent Cochrane review, for instance, concludes that spinal manipulation “is no more effective for acute low-back pain than inert interventions, sham SMT or as adjunct therapy”
Another explanation could be that the experts on the panel writing the NICE-guideline were less than impartial towards chiropractic and thus arrived at false-positive or over-optimistic conclusions.
Chiropractors might say that my presence on the ‘Arthritis Research’-panel suggests that we were biased against chiropractic. If anything, the opposite is true: firstly, I am not even aware of having a bias against chiropractic, and no chiropractor has ever demonstrated otherwise; all I ever aim at( in my scientific publications) is to produce fair, unbiased but critical assessments of the existing evidence. Secondly, I was only one of a total of 9 panel members. As the following list shows, the panel included three experts in AM, and most sceptics would probably categorise two of them (Lewith and MacPherson) as being clearly pro-AM:
Professor Michael Doherty – professor of rheumatology, University of Nottingham
Professor Edzard Ernst – emeritus professor of complementary medicine, Peninsula Medical School
Margaret Fisken – patient representative, Aberdeenshire
Dr Gareth Jones (project lead) – senior lecturer in epidemiology, University of Aberdeen
Professor George Lewith – professor of health research, University of Southampton
Dr Hugh MacPherson – senior research fellow in health sciences, University of York
Professor Gary Macfarlane (chair of committee) – professor of epidemiology, University of Aberdeen
Professor Julius Sim – professor of health care research, Keele University
Jane Tadman – representative from Arthritis Research UK, Chesterfield
What can we conclude from all that? I think it is safe to say that the evidence for practitioner-based AMs as a treatment of the 4 named conditions is disappointing. In particular, chiropractic is not a demonstrably effective therapy for any of them. This, of course begs the question, for what condition is chiropractic proven to work! I am not aware of any, are you?
The question whether spinal manipulation is an effective treatment for infant colic has attracted much attention in recent years. The main reason for this is, of course, that a few years ago Simon Singh had disclosed in a comment that the British Chiropractic Association (BCA) was promoting chiropractic treatment for this and several other childhood condition on their website. Simon famously wrote “they (the BCA) happily promote bogus treatments” and was subsequently sued for libel by the BCA. Eventually, the BCA lost the libel action as well as lots of money, and the entire chiropractic profession ended up with enough egg on their faces to cook omelets for all their patients.
At the time, the BCA had taken advice from several medical and legal experts; one of their medical advisers, I was told, was Prof George Lewith. Intriguingly, he and several others have just published a Cochrane review of manipulative therapies for infant colic. Here are the unabbreviated conclusions from their article:
“The studies included in this meta-analysis were generally small and methodologically prone to bias, which makes it impossible to arrive at a definitive conclusion about the effectiveness of manipulative therapies for infantile colic. The majority of the included trials appeared to indicate that the parents of infants receiving manipulative therapies reported fewer hours crying per day than parents whose infants did not, based on contemporaneous crying diaries, and this difference was statistically significant. The trials also indicate that a greater proportion of those parents reported improvements that were clinically significant. However, most studies had a high risk of performance bias due to the fact that the assessors (parents) were not blind to who had received the intervention. When combining only those trials with a low risk of such performance bias, the results did not reach statistical significance. Further research is required where those assessing the treatment outcomes do not know whether or not the infant has received a manipulative therapy. There are inadequate data to reach any definitive conclusions about the safety of these interventions”
Cochrane reviews also carry a “plain language” summary which might be easier to understand for lay people. And here are the conclusions from this section of the review:
The studies involved too few participants and were of insufficient quality to draw confident conclusions about the usefulness and safety of manipulative therapies. Although five of the six trials suggested crying is reduced by treatment with manipulative therapies, there was no evidence of manipulative therapies improving infant colic when we only included studies where the parents did not know if their child had received the treatment or not. No adverse effects were found, but they were only evaluated in one of the six studies.
If we read it carefully, this article seems to confirm that there is no reliable evidence to suggest that manipulative therapies are effective for infant colic. In the analyses, the positive effect disappears, if the parents are properly blinded; thus it is due to expectation or placebo. The studies that seem to show a positive effect are false positive, and spinal manipulation is, in fact, not effective.
The analyses disclose another intriguing aspect: most trials failed to mention adverse effects. This confirms the findings of our own investigation and amounts to a remarkable breach of publication ethics (nobody seems to be astonished by this fact; is it normal that chiropractic researchers ignore generally accepted rules of ethics?). It also reflects badly on the ability of the investigators of the primary studies to be objective. They seem to aim at demonstrating only the positive effects of their intervention; science is, however, not about confirming the researchers’ prejudices, it is about testing hypotheses.
The most remarkable thing about the new Cochrane review is, I think, the in-congruence of the actual results and the authors’ conclusion. To a critical observer, the former are clearly negative but the latter sound almost positive. I think this begs the question about the possibility of reviewer bias.
We have recently discussed on this blog whether reviews by one single author are necessarily biased. The new Cochrane review has 6 authors, and it seems to me that its conclusions are considerably more biased than my single-author review of chiropractic spinal manipulation for infant colic; in 2009, I concluded simply that “the claim [of effectiveness] is not based on convincing data from rigorous clinical trials”.
Which of the two conclusions describe the facts more helpfully and more accurately?
I think, I rest my case.
In my last post, we discussed the “A+B versus B” trial design as a tool to produce false positive results. This method is currently very popular in alternative medicine, yet it is by no means the only approach that can mislead us. Today, let’s look at other popular options with a view of protecting us against trialists who naively or willfully might fool us.
The crucial flaw of the “A+B versus B” design is that it fails to account for non-specific effects. If the patients in the experimental group experience better outcomes than the control group, this difference could well be due to effects that are unrelated to the experimental treatment. There are, of course, several further ways to ignore non-specific effects in clinical research. The simplest option is to include no control group at all. Homeopaths, for instance, are very proud of studies which show that ~70% of their patients experience benefit after taking their remedies. This type of result tends to impress journalists, politicians and other people who fail to realise that such a result might be due to a host of factors, e.g. the placebo-effect, the natural history of the disease, regression towards the mean or treatments which patients self-administered while taking the homeopathic remedies. It is therefore misleading to make causal inferences from such data.
Another easy method to generate false positive results is to omit blinding. The purpose of blinding the patient, the therapist and the evaluator of the outcomes in clinical trials is to make sure that expectation is not the cause of or contributor to the outcome. They say that expectation can move mountains; this might be an exaggeration, but it can certainly influence the result of a clinical trial. Patients who hope for a cure regularly do get better even if the therapy they receive is useless, and therapists as well as evaluators of the outcomes tend to view the results through rose-tinted spectacles, if they have preconceived ideas about the experimental treatment. Similarly, the parents of a child or the owners of an animal can transfer their expectations, and this is one of several reasons why it is incorrect to claim that children and animals are immune to placebo-effects.
Failure to randomise is another source of bias which can make an ineffective therapy look like an effective one when tested in a clinical trial. If we allow patients or trialists to select or choose which patients receive the experimental and which get the control-treatment, it is likely that the two groups differ in a number of variables. Some of these variables might, in turn, impact on the outcome. If, for instance, doctors allocate their patients to the experimental and control groups, they might select those who will respond to the former and those who don’t to the latter. This may not happen with malicious intent but through intuition or instinct: responsible health care professionals want those patients who, in their experience, have the best chances to benefit from a given treatment to receive that treatment. Only randomisation can, when done properly, make sure we are comparing comparable groups of patients, and non-randomisation is likely to produce misleading findings.
While these options for producing false positives are all too obvious, the next possibility is slightly more intriguing. It refers to studies which do not test whether an experimental treatment is superior to another one (often called superiority trials), but to investigations attempting to assess whether it is equivalent to a therapy that is generally accepted to be effective. The idea is that, if both treatments produce the same or similarly positive results, both must be effective. For instance, such a study might compare the effects of acupuncture to a common pain-killer. Such trials are aptly called non-superiority or equivalence trials, and they offer a wide range of possibilities for misleading us. If, for example, such a trial has not enough patients, it might show no difference where, in fact, there is one. Let’s consider a deliberately silly example: someone comes up with the idea to compare antibiotics to acupuncture as treatments of bacterial pneumonia in elderly patients. The researchers recruit 10 patients for each group, and the results reveal that, in one group, 2 patients died, while, in the other, the number was 3. The statistical tests show that the difference of just one patient is not statistically significant, and the authors therefore conclude that acupuncture is just as good for bacterial infections as antibiotics.
Even trickier is the option to under-dose the treatment given to the control group in an equivalence trial. In our hypothetical example, the investigators might subsequently recruit hundreds of patients in an attempt to overcome the criticism of their first study; they then decide to administer a sub-therapeutic dose of the antibiotic in the control group. The results would then apparently confirm the researchers’ initial finding, namely that acupuncture is as good as the antibiotic for pneumonia. Acupuncturists might then claim that their treatment has been proven in a very large randomised clinical trial to be effective for treating this condition, and people who do not happen to know the correct dose of the antibiotic could easily be fooled into believing them.
Obviously, the results would be more impressive, if the control group in an equivalence trial received a therapy which is not just ineffective but actually harmful. In such a scenario, the most useless or even slightly detrimental treatment would appear to be effective simply because it is equivalent to or less harmful than the comparator.
A variation of this theme is the plethora of controlled clinical trials which compare one unproven therapy to another unproven treatment. Perdicatbly, the results indicate that there is no difference in the clinical outcome experienced by the patients in the two groups. Enthusiastic researchers then tend to conclude that this proves both treatments to be equally effective.
Another option for creating misleadingly positive findings is to cherry-pick the results. Most trails have many outcome measures; for instance, a study of acupuncture for pain-control might quantify pain in half a dozen different ways, it might also measure the length of the treatment until pain has subsided, the amount of medication the patients took in addition to receiving acupuncture, the days off work because of pain, the partner’s impression of the patient’s health status, the quality of life of the patient, the frequency of sleep being disrupted by pain etc. If the researchers then evaluate all the results, they are likely to find that one or two of them have changed in the direction they wanted. This can well be a chance finding: with the typical statistical tests, one in 20 outcome measures would produce a significant result purely by chance. In order to mislead us, the researchers only need to “forget” about all the negative results and focus their publication on the ones which by chance have come out as they had hoped.
One fail-proof method for misleading the public is to draw conclusions which are not supported by the data. Imagine you have generated squarely negative data with a trial of homeopathy. As an enthusiast of homeopathy, you are far from happy with your own findings; in addition you might have a sponsor who puts pressure on you. What can you do? The solution is simple: you only need to highlight at least one positive message in the published article. In the case of homeopathy, you could, for instance, make a major issue about the fact that the treatment was remarkably safe and cheap: not a single patient died, most were very pleased with the treatment which was not even very expensive.
And finally, there is always the possibility of overt cheating. Researchers are only human and are thus not immune to temptation. They may have conflicts of interest or may know that positive results are much easier to publish than negative ones. Certainly they want to publish their work – “publish or perish”! So, faced with disappointing results of a study, they might decide to prettify them or even invent new ones which are more pleasing to them, their peers, or their sponsors.
Am I claiming that this sort of thing only happens in alternative medicine? No! Obviously, the way to minimise the risk of such misconduct is to train researchers properly and make sure they are able to think critically. Am I suggesting that investigators of alternative medicine are often not well-trained and almost always uncritical? Yes.
What is and what isn’t evidence, and why is the distinction important?
In the area of alternative medicine, we tend to engage in sheer endless discussions around the subject of evidence; the relatively few comments on this new blog already confirm this impression. Many practitioners claim that their very own clinical experience is at least as important and generalizable as scientific evidence. It is therefore relevant to analyse in a little more detail some of the issues related to evidence as they apply to the efficacy of alternative therapies.
To prevent the debate from instantly deteriorating into a dispute about the value of this or that specific treatment, I will abstain from mentioning any alternative therapy by name and urge all commentators to do the same. The discussion on this post should not be about the value of homeopathy or any other alternative treatment; it is about more fundamental issues which, in my view, often get confused in the usually heated arguments for or against a specific alternative treatment.
My aim here is to outline the issues more fully than would be possible in the comments section of this blog. Readers and commentators can subsequently be referred to this post whenever appropriate. My hope is that, in this way, we might avoid repeating the same arguments ad nauseam.
Clinical experience is notoriously unreliable
Clinicians often feel quite strongly that their daily experience holds important information about the efficacy of their interventions. In this assumption, alternative practitioners are usually entirely united with healthcare professionals working in conventional medicine.
When their patients get better, they assume this to be the result of their treatment, especially if the experience is repeated over and over again. As an ex-clinician, I do sympathise with this notion which might even prevent practitioners from losing faith in their own work. But is the assumption really correct?
The short answer is NO. Two events [the treatment and the improvement] that follow each other in time are not necessarily causally related; we all know that, of course. So, we ought to consider alternative explanations for a patient’s improvement after therapy.
Even the most superficial scan of the possibilities discloses several options: the natural history of the condition, regression towards the mean, the placebo-effect, concomitant treatments, social desirability to name but a few. These and other phenomena can contribute to or determine the clinical outcome such that inefficacious treatments appear to be efficacious.
What follows is simple, undeniable and plausible for scientists, yet intensely counter-intuitive for clinicians: the prescribed treatment is only one of many influences on the clinical outcome. Thus even the most impressive clinical experience of the perceived efficacy of a treatment can be totally misleading. In fact, experience might just reflect the fact that we repeat the same mistake over and over again. Put differently, the plural of anecdote is anecdotes, not evidence!
Clinicians tend to get quite miffed when anyone tries to explain to them how multifactorial the situation really is and how little their much-treasured experience tells us about therapeutic efficacy. Here are seven of the counter-arguments I hear most frequently:
1) The improvement was so direct and prompt that it was obviously caused by my treatment [this notion is not very convincing; placebo-effects can be just as prompt and direct].
2) I have seen it so many times that it cannot be a coincidence [some clinicians are very caring, charismatic, and empathetic; they will thus regularly generate powerful placebo-responses, even when using placebos].
3) A study with several thousand patients shows that 75% of them improved with my treatment [such response rates are not uncommon, even for ineffective treatments, if patient-expectation was high].
4) Surely chronic conditions don’t suddenly get better; my treatment therefore cannot be a placebo [this is incorrect, eventually many chronic conditions improve, if only temporarily].
5) I had a patient with a serious condition, e.g. cancer, who received my treatment and was cured [if one investigates such cases, one often finds that the patient also took a conventional treatment; or, in rare instances, even cancer-patients show spontaneous remissions].
6) I have tried the treatment myself and had a positive outcome [clinicians are not immune to the multifactorial nature of the perceived clinical response].
7) Even children and animals respond very well to my treatment, surely they are not prone to placebo-effects [animals can be conditioned to respond; and then there is, of course, the natural history of the disease].
Is all this to say that clinical experience is useless? Clearly not! I am merely pointing out that, when it comes to therapeutic efficacy, clinical experience is no replacement for evidence. It is invaluable for a lot of other things, but it can at best provide a hint and never a proof of efficacy.
What then is reliable evidence?
As the clinical outcomes after treatments always have many determinants, we need a different approach for verifying therapeutic efficacy. Essentially, we need to know what would have happened, if our patients had not received the treatment in question.
The multifactorial nature of any clinical response requires controlling for all the factors that might determine the outcome other than the treatment per se. Ideally, we would need to create a situation or an experiment where two groups of patients are exposed to the full range of factors, and the only difference is that one group does receive the treatment, while the other one does not. And this is precisely the model of a controlled clinical trial.
Such studies are designed to minimise all possible sources of bias and confounding. By definition, they have a control group which means that we can, at the end of the treatment period, compare the effects of the treatment in question with those of another intervention, a placebo or no treatment at all.
Many different variations of the controlled trial exist so that the exact design can be adapted to the requirements of the particular treatment and the specific research question at hand. The over-riding principle is, however, always the same: we want to make sure that we can reliably determine whether or not the treatment was the cause of the clinical outcome.
Causality is the key in all of this; and here lies the crucial difference between clinical experience and scientific evidence. What clinician witness in their routine practice can have a myriad of causes; what scientists observe in a well-designed efficacy trial is, in all likelihood, caused by the treatment. The latter is evidence, while the former is not.
Don’t get me wrong; clinical trials are not perfect. They can have many flaws and have rightly been criticised for a myriad of inherent limitations. But it is important to realise that, despite all their short-commings, they are far superior than any other method for determining the efficacy of medical interventions.
There are lots of reasons why a trial can generate an incorrect, i.e. a false positive or a false negative result. We therefore should avoid relying on the findings of a single study. Independent replications are usually required before we can be reasonably sure.
Unfortunately, the findings of these replications do not always confirm the results of the previous study. Whenever we are faced with conflicting results, it is tempting to cherry-pick those studies which seem to confirm our prior belief – tempting but very wrong. In order to arrive at the most reliable conclusion about the efficacy of any treatment, we need to consider the totality of the reliable evidence. This goal is best achieved by conducting a systematic review.
In a systematic review, we assess the quality and quantity of the available evidence, try to synthesise the findings and arrive at an overall verdict about the efficacy of the treatment in question. Technically speaking, this process minimises selection and random biases. Systematic reviews and meta-analyses [these are systematic reviews that pool the data of individual studies] therefore constitute, according to a consensus of most experts, the best available evidence for or against the efficacy of any treatment.
Why is evidence important?
In a way, this question has already been answered: only with reliable evidence can we tell with any degree of certainty that it was the treatment per se – and not any of the other factors mentioned above – that caused the clinical outcome we observe in routine practice. Only if we have such evidence can we be sure about cause and effect. And only then can we make sure that patients receive the best possible treatments currently available.
There are, of course, those who say that causality does not matter all that much. What is important, they claim, is to help the patient, and if it was a placebo-effect that did the trick, who cares? However, I know of many reasons why this attitude is deeply misguided. To mention just one: we probably all might agree that the placebo-effect can benefit many patients, yet it would be a fallacy to assume that we need a placebo treatment to generate a placebo-response.
If a clinician administers an efficacious therapy [one that generates benefit beyond placebo] with compassion, time, empathy and understanding, she will generate a placebo-response PLUS a response to the therapy administered. In this case, the patient benefits twice. It follows that, merely administering a placebo is less than optimal; in fact it usually means cheating the patient of the effect of an efficacious therapy.
The frequently voiced counter-argument is that there are many patients who are ill without an exact diagnosis and who therefore cannot receive a specific treatment. This may be true, but even those patients’ symptoms can usually be alleviated with efficacious symptomatic therapy, and I fail to see how the administration of an ineffective treatment might be preferable to using an effective symptomatic therapy.
We all agree that helping the patient is the most important task of a clinician. This task is best achieved by maximising the non-specific effects [e.g. placebo], while also making sure that the patient benefits from the specific effects of what medicine has to offer. If that is our goal in clinical practice, we need reliable evidence and experience. Therefore one cannot be a substitute for the other, and scientific evidence is an essential precondition for good medicine.