MD, PhD, MAE, FMedSci, FRSB, FRCP, FRCPEd.

bias

During the last decade, Professor Claudia Witt and co-workers from the Charite in Berlin have published more studies of homeopathy than any other research group. Much of their conclusions are over-optimistic and worringly uncritical, in my view. Their latest article is on homeopathy as a treatment of eczema. As it happens, I have recently published a systematic review of this subject; it concluded that “the evidence from controlled clinical trials… fails to show that homeopathy is an efficacious treatment for eczema“. The question therefore arises whether the latest publication of the Berlin team changes my conclusion in any way.

Their new article describes a prospective multi-centre study which included 135 children with mild to moderate atopic eczema. The parents of the kids enrolled in this trial were able to choose either homeopathic or conventional doctors for their children who treated them as they saw fit. The article gives only scant details about the actual treatments administered. The main outcome of the study was a validated symptom score at 36 months. Further endpoints included quality of life, conventional medicine consumption, safety and disease related costs at six, 12 and 36 months.

The results showed no significant differences between the groups at 36 months. However, the children treated conventionally seemed to improve quicker than those in the homeopathy group. The total costs were about twice higher in the homoeopathic compared to the conventional group. The authors conclude as follows: “Taking patient preferences into account, while being unable to rule out residual confounding, in this long-term observational study, the effects of homoeopathic treatment were not superior to conventional treatment for children with mild to moderate atopic eczema, but involved higher costs“.

At least one previous report of this study has been available for some time and had thus been included in my systematic review. It is therefore unlikely that this new analysis might change my conclusion, particularly as the trial by Witt et al has many flaws. Here are just some of the most obvious ones:

Patients were selected according to parents’ preferences.

This means expectations could have played an important role.

It also means that the groups were not comparable in various, potentially important prognostic variables.

Even though much of the article reads as though the homeopaths exclusively employed homeopathic remedies, the truth is that both groups received similar amounts of conventional care and treatments. In other words, the study followed a ‘A+B versus B’ design (here is the sentence that best gives the game away “At 36 months the frequency of daily basic skin care was… comparable in both groups, as was the number of different medications (including corticosteroids and antihistamines)…”). I have previously stated that this type of study-design can never produce a negative result because A+B is always more than B.

Yet, at first glance, this new study seems to prove my thesis wrong: even though the parents chose their preferred options, and even though all patients were treated conventionally, the addition of homeopathy to conventional care failed to produce a better clinical outcome. On the contrary, the homeopathically treated kids had to wait longer for their symptoms to ease. The only significant difference was that the addition of homeopathy to conventional eczema treatments was much more expensive than conventional therapy alone (this finding is less than remarkable: even the most useless additional intervention costs money).

So, is my theory about ‘A+B versusB’ study-designs wrong? I don’t think so. If B equals zero, one would expect exactly the finding Witt et al produced:  A+0=A. In turn, this is not a compliment for the homeopaths of this study: they seem to have been incapable of even generating a placebo-response. And this might indicate that homeopathy was not even usefull as a means to generate a placebo-response. Whatever interpretation one adopts, this study tells us very little of value (as children often grow out of eczema, we cannot even be sure whether the results are not simply a reflection of the natural history of the disease); in my view, it merely demonstrates that weak study designs can only create weak findings which, in this particular case, are next to useless.

The study was sponsored by the Robert Bosch Stiftung, an organisation which claims to be dedicated to excellence in research and which has, in the past, spent millions on researching homeopathy. It seems doubtful that trials of this caliber can live up to any claim of excellence. In any case, the new analysis is certainly no reason to change the conclusion of my systematic review.

To their credit, Witt et al are well aware of the many weaknesses of their study. Perhaps in an attempt to make them appear less glaring, they stress that “the aim of this study was to reflect the real world situation“.Usually I do not accept the argument that pragmatic trials cannot be rigorous – but I think Witt et al do have a point here: the real word tells us that homeopathic remedies are pure placebos!

As I am drafting this post, I am in a plane flying back from Finland. The in-flight meal reminded me of the fact that no food is so delicious that it cannot be spoilt by the addition of too many capers. In turn, this made me think about the paper I happened to be reading at the time, and I arrived at the following theory: no trial design is so rigorous that it cannot to be turned into something utterly nonsensical by the addition of a few amateur researchers.

The paper I was reading when this idea occurred to me was a randomised, triple-blind, placebo-controlled cross-over trial of homeopathy. Sounds rigorous and top quality? Yes, but wait!

Essentially, the authors recruited 86 volunteers who all claimed to be suffering from “mental fatigue” and treated them with Kali-Phos 6X or placebo for one week (X-potencies signify dilution steps of 1: 10, and 6X therefore means that the salt had been diluted 1: 1000000 ). Subsequently, the volunteers were crossed-over to receive the other treatment for one week.

The results failed to show that the homeopathic medication had any effect (not even homeopaths can be surprised about this!). The authors concluded that Kali-Phos was not effective but cautioned that, because of the possibility of a type-2-error, they might have missed an effect which, in truth, does exist.

In my view, this article provides an almost classic example of how time, money and other resources can be wasted in a pretence of conducting reasonable research. As we all know, clinical trials usually are for testing hypotheses. But what is the hypothesis tested here?

According to the authors, the aim was to “assess the effectiveness of Kali-Phos 6X for attention problems associated with mental fatigue”. In other words, their hyposesis was that this remedy is effective for treating the symptom of mental fatigue. This notion, I would claim, is not a scientific hypothesis, it is a foolish conjecture!

Arguably any hypothesis about the effectiveness of a highly diluted homeopathic remedy is mere wishful thinking. But, if there were at least some promissing data, some might conclude that a trial was justified. By way of justification for the RCT in question, the authors inform us that one previous trial had suggested an effect; however, this study did not employ just Kali-Phos but a combined homeopathic preparation which contained Kalium-Phos as one of several components. Thus the authors’ “hypothesis” does not even amount to a hunch, not even to a slight incling! To me, it is less than a shot in the dark fired by blind optimists – nobody should be surprised that the bullet failed to hit anything.

It could even be that the investigators themselves dimly realised that something is amiss with the basis of their study; this might be the reason why they called it an “exploratory trial”. But an exploratory study is one whithout a hypothesis, and the trial in question does have a hyposis of sorts – only that it is rubbish. And what exactly did the authos meant to explore anyway?

That self-reported mental fatigue in healthy volunteers is a condition that can be mediatised such that it merits treatment?

That the test they used for quantifying its severity is adequate?

That a homeopathic remedy with virtually no active ingredient generates outcomes which are different from placebo?

That Hahnemann’s teaching of homeopathy was nonsense and can thus be discarded (he would have sharply condemned the approach of treating all volunteers with the same remedy, as it contradicts many of his concepts)?

That funding bodies can be fooled to pay for even the most ridiculous trial?

That ethics-committees might pass applications which are pure nonsense and which are thus unethical?

A scientific hypothesis should be more than a vague hunch; at its simplest, it aims to explain an observation or phenomenon, and it ought to have certain features which many alt med researchers seem to have never heard of. If they test nonsense, the result can only be nonsense.

The issue of conducting research that does not make much sense is far from trivial, particularly as so much (I would say most) of alt med research is of such or even worst calibre (if you do not believe me, please go on Medline and see for yourself how many of the recent articles in the category “complementary alternative medicine” truly contribute to knowledge worth knowing). It would be easy therefore to cite more hypothesis-free trials of homeopathy.

One recent example from Germany will have to suffice: in this trial, the only justification for conducting a full-blown RCT was that the manufacturer of the remedy allegedly knew of a few unpublished case-reports which suggested the treatment to work – and, of course, the results of the RCT eventually showed that it didn’t. Anyone with a background in science might have predicied that outcome – which is why such trials are so deplorably wastefull.

Research-funds are increasingly scarce, and they must not be spent on nonsensical projects! The money and time should be invested more fruitfully elsewhere. Participants of clinical trials give their cooperation willingly; but if they learn that their efforts have been wasted unnecessarily, they might think twice next time they are asked. Thus nonsensical research may have knock-on effects with far-reaching consequences.

Being a researcher is at least as serious a profession as most other occupations; perhaps we should stop allowing total amateurs wasting money while playing at being professioal. If someone driving a car does something seriously wrong, we take away his licence; why is there not a similar mechanism for inadequate researchers, funders, ethics-committees which prevents them doing further damage?

At the very minimum, we should critically evaluate the hypothesis that the applicants for research-funds propose to test. Had someone done this properly in relatiom to the two above-named studies, we would have saved about £150,000 per trial (my estimate). But as it stands, the authors will probably claim that they have produced fascinating findings which urgently need further investigation – and we (normally you and I) will have to spend three times the above-named amount (again, my estimate) to finance a “definitive” trial. Nonsense, I am afraid, tends to beget more nonsense.

 

In my last post, we discussed the “A+B versus B” trial design as a tool to produce false positive results. This method is currently very popular in alternative medicine, yet it is by no means the only approach that can mislead us. Today, let’s look at other popular options with a view of protecting us against trialists who naively or willfully might fool us.

The crucial flaw of the “A+B versus B” design is that it fails to account for non-specific effects. If the patients in the experimental group experience better outcomes than the control group, this difference could well be due to effects that are unrelated to the experimental treatment. There are, of course, several further ways to ignore non-specific effects in clinical research. The simplest option is to include no control group at all. Homeopaths, for instance, are very proud of studies which show that ~70% of their patients experience benefit after taking their remedies. This type of result tends to impress journalists, politicians and other people who fail to realise that such a result might be due to a host of factors, e.g. the placebo-effect, the natural history of the disease, regression towards the mean or treatments which patients self-administered while taking the homeopathic remedies. It is therefore misleading to make causal inferences from such data.

Another easy method to generate false positive results is to omit blinding. The purpose of blinding the patient, the therapist and the evaluator of the outcomes in clinical trials is to make sure that expectation is not the cause of or contributor to the outcome. They say that expectation can move mountains; this might be an exaggeration, but it can certainly influence the result of a clinical trial. Patients who hope for a cure regularly do get better even if the therapy they receive is useless, and therapists as well as evaluators of the outcomes tend to view the results through rose-tinted spectacles, if they have preconceived ideas about the experimental treatment. Similarly, the parents of a child or the owners of an animal can transfer their expectations, and this is one of several reasons why it is incorrect to claim that children and animals are immune to placebo-effects.

Failure to randomise is another source of bias which can make an ineffective therapy look like an effective one when tested in a clinical trial. If we allow patients or trialists to select or choose which patients receive the experimental and which get the control-treatment, it is likely that the two groups differ in a number of variables. Some of these variables might, in turn, impact on the outcome. If, for instance, doctors allocate their patients to the experimental and control groups, they might select those who will respond to the former and those who don’t to the latter. This may not happen with malicious intent but through intuition or instinct: responsible health care professionals want those patients who, in their experience, have the best chances to benefit from a given treatment to receive that treatment. Only randomisation can, when done properly, make sure we are comparing comparable groups of patients, and non-randomisation is likely to produce misleading findings.

While these options for producing false positives are all too obvious, the next possibility is slightly more intriguing. It refers to studies which do not test whether an experimental treatment is superior to another one (often called superiority trials), but to investigations attempting to assess whether it is equivalent to a therapy that is generally accepted to be effective. The idea is that, if both treatments produce the same or similarly positive results, both must be effective. For instance, such a study might compare the effects of acupuncture to a common pain-killer. Such trials are aptly called non-superiority or equivalence trials, and they offer a wide range of possibilities for misleading us. If, for example, such a trial has not enough patients, it might show no difference where, in fact, there is one. Let’s consider a deliberately silly example: someone comes up with the idea to compare antibiotics to acupuncture as treatments of bacterial pneumonia in elderly patients. The researchers recruit 10 patients for each group, and the results reveal that, in one group, 2 patients died, while, in the other, the number was 3. The statistical tests show that the difference of just one patient is not statistically significant, and the authors therefore conclude that acupuncture is just as good for bacterial infections as antibiotics.

Even trickier is the option to under-dose the treatment given to the control group in an equivalence trial. In our hypothetical example, the investigators might subsequently recruit hundreds of patients in an attempt to overcome the criticism of their first study; they then decide to administer a sub-therapeutic dose of the antibiotic in the control group. The results would then apparently confirm the researchers’ initial finding, namely that acupuncture is as good as the antibiotic for pneumonia. Acupuncturists might then claim that their treatment has been proven in a very large randomised clinical trial to be effective for treating this condition, and people who do not happen to know the correct dose of the antibiotic could easily be fooled into believing them.

Obviously, the results would be more impressive, if the control group in an equivalence trial received a therapy which is not just ineffective but actually harmful. In such a scenario, the most useless or even slightly detrimental treatment would appear to be effective simply because it is equivalent to or less harmful than the comparator.

A variation of this theme is the plethora of controlled clinical trials which compare one unproven therapy to another unproven treatment. Perdicatbly, the results indicate that there is no difference in the clinical outcome experienced by the patients in the two groups. Enthusiastic researchers then tend to conclude that this proves both treatments to be equally effective.

Another option for creating misleadingly positive findings is to cherry-pick the results. Most trails have many outcome measures; for instance, a study of acupuncture for pain-control might quantify pain in half a dozen different ways, it might also measure the length of the treatment until pain has subsided, the amount of medication the patients took in addition to receiving acupuncture, the days off work because of pain, the partner’s impression of the patient’s health status, the quality of life of the patient, the frequency of sleep being disrupted by pain etc. If the researchers then evaluate all the results, they are likely to find that one or two of them have changed in the direction they wanted. This can well be a chance finding: with the typical statistical tests, one in 20 outcome measures would produce a significant result purely by chance. In order to mislead us, the researchers only need to “forget” about all the negative results and focus their publication on the ones which by chance have come out as they had hoped.

One fail-proof method for misleading the public is to draw conclusions which are not supported by the data. Imagine you have generated squarely negative data with a trial of homeopathy. As an enthusiast of homeopathy, you are far from happy with your own findings; in addition you might have a sponsor who puts pressure on you. What can you do? The solution is simple: you only need to highlight at least one positive message in the published article. In the case of homeopathy, you could, for instance, make a major issue about the fact that the treatment was remarkably safe and cheap: not a single patient died, most were very pleased with the treatment which was not even very expensive.

And finally, there is always the possibility of overt cheating. Researchers are only human and are thus not immune to temptation. They may have conflicts of interest or may know that positive results are much easier to publish than negative ones. Certainly they want to publish their work – “publish or perish”! So, faced with disappointing results of a study, they might decide to prettify them or even invent new ones which are more pleasing to them, their peers, or their sponsors.

Am I claiming that this sort of thing only happens in alternative medicine? No! Obviously, the way to minimise the risk of such misconduct is to train researchers properly and make sure they are able to think critically. Am I suggesting that investigators of alternative medicine are often not well-trained and almost always uncritical? Yes.

Since it was first published, the “Swiss government report” on homeopathy has been celebrated as the most convincing proof so far that homeopathy works. On the back of this news, all sorts of strange stories have emerged. Their aim seems to be that consumers become convinced that homeopathy is based on compelling evidence.

Readers of this blog might therefore benefit from a brief and critical evaluation of this “evidence” in support of homeopathy. Recently, not one, two, three but four independent critiques of this document have become available.

Collectively, these articles [only one of which is mine] suggest that the “Swiss report” is hardly worth the paper it was written on; one of the critiques published in the Swiss Medical Weekly even stated that it amounted to “research misconduct”! Compared to such outspoken language, my own paper concluded much more conservatively: “this report [is] methodologically flawed, inaccurate and biased”.

So what is wrong with it? Why is this document not an accurate summary of the existing evidence? I said this would be a brief post, so I  will only mention some of the most striking flaws.

The report is not, as often claimed, a product by the Swiss government; in fact, it was produced by 13 authors who have no connection to any government and who are known proponents of homeopathy. For some unimaginable reason, they decided to invent their very own criteria for what constitutes evidence. For instance, they included case-reports and case-series, re-defined what is meant by effectiveness, were highly selective in choosing the articles they happened to like [presumably because of the direction of the result] while omitting lots of data that did not seem to confirm their prior belief, and assessed only a very narrow range of indications.

The report quotes several of my own reviews of homeopathy but, intriguingly, it omitted others for no conceivable reason. I was baffled to realise that the authors reported my conclusions differently from the original published text in my articles. If this had occurred once or twice, it might have been a forgivable error – but this happened in 10 of 22 instances.

Negative conclusions in my original reviews were thus repeatedly turned into positive verdicts, and evidence against homeopathy suddenly appeared to support it. This is, of course, a serious problem: if someone is too busy to look up my original articles, she is very unlikely to notice this extraordinary attempt to cheat.

To me, this approach seems similar to that of an accountant who produces a balance sheet where debts appear as credits. It is a simple yet dishonest way to generate a positive result where there is none!

The final straw for me came when I realised that the authors of this dubious report had declared that they were free of conflicts of interest. This notion is demonstrably wrong; several of them earn their living through homeopathy!

Knowing all this, sceptics might take any future praise of this “Swiss government report” with more than just a pinch of salt. Once we are aware of the full, embarrassing details, it is not difficult to understand how the final verdict turned out to be in favour of homeopathy: if we convert much of the negative data on any subject into positive evidence, any rubbish will come out smelling of roses – even homeopathy.

 

Is acupuncture an effective treatment for pain? This is a question which has attracted decades of debate and controversy. Proponents usually argue that it is supported by good clinical evidence, millennia of tradition and a sound understanding of the mechanisms involved. Sceptics, however, tend to be unimpressed and point out that the clinical evidence of proponents often is cherry-picked, that a long history of usage is fairly meaningless, and that the alleged mechanisms are tentative at best.

This discrepancy of opinions is confusing, particularly for lay people who might be tempted to try acupuncture. But it might vanish in the light of a new, comprehensive and unique evaluation of the clinical evidence.

An international team of acupuncture trialists published a meta-analysed of individual patient data to determine the analgesic effect of acupuncture compared to sham or non-acupuncture control for the following 4 chronic pain conditions: back and neck pain, osteoarthritis, headache, and shoulder pain. Data from 29 RCTs, with an impressive total of 17 922 patients, were included.

The results of this new evaluation suggest that acupuncture is superior to both sham and no-acupuncture controls for each of these conditions. Patients receiving acupuncture had less pain, with scores that were 0.23 (95% CI, 0.13-0.33), 0.16 (95% CI, 0.07-0.25), and 0.15 (95% CI, 0.07-0.24) SDs lower than those of sham controls for back and neck pain, osteoarthritis, and chronic headache, respectively; the effect sizes in comparison to no-acupuncture controls were 0.55 (95% CI, 0.51-0.58), 0.57 (95% CI, 0.50-0.64), and 0.42 (95% CI, 0.37-0.46) SDs.

Based on these findings, the authors reached the conclusion that “acupuncture is effective for the treatment of chronic pain and is therefore a reasonable referral option. Significant differences between true and sham acupuncture indicate that acupuncture is more than a placebo. However, these differences are relatively modest, suggesting that factors in addition to the specific effects of needling are important contributors to the therapeutic effects of acupuncture”.

Only hours after its publication, this new meta-analysis was celebrated by believers in acupuncture as the strongest evidence yet on the topic currently available. Much of the lay press followed in the same, disappointingly uncritical vein.The authors of the meta-analysis, most of whom are known enthusiasts of acupuncture, seem entirely sure that they have provided the most compelling proof to date for the effectiveness of acupuncture. But are they correct or are they perhaps the victims of their own devotion to this therapy?

Perhaps, a more sceptical view would be helpful – after all, even the enthusiastic authors of this article admit that, when compared to sham, the effect size of real acupuncture is too small to be clinically relevant. Therefore one might argue that this meta-analysis confirms what critics have suggested all along: acupuncture is not a useful treatment for clinical routine.

Unsurprisingly, the authors of the meta-analysis do their very best to play down this aspect. They reason that, for clinical routine, the comparison between acupuncture and non-acupuncture controls is more relevant than the one between acupuncture and sham. But this comparison, of course, includes placebo- and other non-specific effects masquerading as effects of acupuncture – and with this little trick ( which, by the way is very popular in alternative medicine), we can, of course, show that even sugar pills are effective.

I do not doubt that context effects are important in patient care; yet I do doubt that we need a placebo treatment for generating such benefit in our patients. If we administer treatments which are effective beyond placebo with kindness, time, compassion and empathy, our patients will benefit from both specific and non-specific effects. In other words, purely generating non-specific effects with acupuncture is far from optimal and certainly not in the interest of our patients. In my view, it cannot be regarded as not good medicine, and the authors’ conclusion referring to a “reasonable referral option” is more than a little surprising in my view.

Acupuncture-fans might argue that, at the very minimum, the new meta-analysis does demonstrate acupuncture to be statistically significantly better than a placebo. Yet I am not convinced that this notion holds water: the small residual effect-size in the comparison of acupuncture with sham might not be the result of a specific effect of acupuncture; it could be (and most likely is) due to residual bias in the analysed studies.

The meta-analysis is strongly driven by the large German trials which, for good reasons, were heavily and frequently criticised when first published. One of the most important potential drawbacks was that many participating patients were almost certainly de-blinded through the significant media coverage of the study while it was being conducted. Moreover, in none of these trials was the therapist blinded (the often-voiced notion that therapist-blinding is impossible is demonstrably false). Thus it is likely that patient-unblinding and the absence of therapist-blinding importantly influenced the clinical outcome of these trials thus generating false positive findings. As the German studies constitute by far the largest volume of patients in the meta-analysis, any of their flaws would strongly impact on the overall result of the meta-analysis.

So, has this new meta-analysis finally solved the decades-old question about the effectiveness of acupuncture? It might not have solved it, but we have certainly moved closer to a solution, particularly if we employ our faculties of critical thinking. In my view, this meta-analysis is the most compelling evidence yet to demonstrate the ineffectiveness of acupuncture for chronic pain.

Subscribe via email

Enter your email address to receive notifications of new blog posts by email.

Recent Comments

Note that comments can be edited for up to five minutes after they are first submitted but you must tick the box: “Save my name, email, and website in this browser for the next time I comment.”

The most recent comments from all posts can be seen here.

Archives
Categories