MD, PhD, MAE, FMedSci, FRSB, FRCP, FRCPEd.

false positive

1 5 6 7 8 9 11

The question whether infant colic can be effectively treated with manipulative therapies might seem rather trivial – after all, this is a benign condition which the infant quickly grows out of. However, the issue becomes a little more tricky, if we consider that it was one of the 6 paediatric illnesses which were at the centre of the famous libel case of the BCA against my friend and co-author Simon Singh. At the time, Simon had claimed that there was ‘not a jot of evidence’ for claiming that chiropractic was an effective treatment of infant colic, and my systematic review of the evidence strongly supported his statement. The BCA eventually lost their libel case and with it the reputation of chiropractic. Now a new article on this intriguing topic has become available; do we have to reverse our judgements?

The aim of this new systematic review was to evaluate the efficacy or effectiveness of manipulative therapies for infantile colic. Six RCTs of chiropractic, osteopathy or cranial osteopathy alone or in conjunction with other interventions were included with a total of 325 infants. Of the 6 included studies, 5 were “suggestive of a beneficial effect” and one found no evidence of benefit. Combining all the RCTs suggested that manipulative therapies had a significant effect. The average crying time was reduced by an average of 72 minutes per day. This effect was sustained for studies with a low risk of selection bias and attrition bias. When analysing only those studies with a low risk of performance bias (i.e. parental blinding) the improvement in daily crying hours was no longer statistically significant.

The quality of the studies was variable. There was a generally low risk of selection bias but a high risk of performance bias. Only one of the studies recorded adverse events and none were encountered.

From these data, the authors drew the following conclusion: Parents of infants receiving manipulative therapies reported fewer hours crying per day than parents whose infants did not and this difference was statistically significant. Most studies had a high risk of performance bias due to the fact that the assessors (parents) were not blind to who had received the intervention. When combining only those trials with a low risk of such performance bias the results did not reach statistical significance.

Does that mean that chiropractic does work for infant colic? No, it does not!

The first thing to point out is that the new systematic review included not just RCTs of chiropractic but also osteopathy and cranio-sacral therapy.

The second important issue is that the effects disappear, once performance bias is being accounted for which clearly shows that the result is false positive.

The third relevant fact is that the majority of the RCTs were of poor quality. The methodologically best studies were negative.

And the fourth thing to note is that only one study mentioned adverse effects, which means that the other 5 trials were in breach of one of rather elementary research ethics.

What makes all of this even more fascinating is the fact that the senior author of the new publication, George Lewith, is the very expert who advised the BCA in their libel case against Simon Singh. He seems so fond of his work that he even decided to re-publish it using even more misleading language than before. It is, of course, far from me to suggest that his review was an attempt to white-wash the issue of chiropractic ‘bogus’ claims. However, based on the available evidence, I would have formulated conclusions which are more than just a little different from his; something like this perhaps:

The current best evidence suggests that the small effects that emerge when we pool the data from mostly unreliable studies are due to bias and therefore not real. This systematic review therefore fails to show that manipulative therapies are effective. It furthermore points to a serious breach of research ethics by the majority of researchers in this field.

Imagine an area of therapeutics where 100% of all findings of hypothesis-testing research is positive, i.e. comes to the conclusion that the treatment in question is effective. Theoretically, this could mean that the therapy is a miracle cure which is useful for every single condition in every single setting. But sadly, there are no miracle cures. Therefore something must be badly and worryingly amiss with the research in an area that generates 100% positive results.

Acupuncture is such an area; we and others have shown that Chinese trials of acupuncture hardly ever produce a negative finding. In other words, one does not need to read the paper, one already knows that it is positive – even more extreme: one does not need to conduct the study, one already knows the result before the research has started. But you might not believe my research nor that of others. We might be chauvinist bastards who want to discredit Chinese science. In this case, you might perhaps believe Chinese researchers.

In this systematic review, all randomized controlled trials (RCTs) of acupuncture published in Chinese journals were identified by a team of Chinese scientists. A total of 840 RCTs were found, including 727 RCTs comparing acupuncture with conventional treatment, 51 RCTs with no treatment controls, and 62 RCTs with sham-acupuncture controls. Among theses 840 RCTs, 838 studies (99.8%) reported positive results from primary outcomes and two trials (0.2%) reported negative results. The percentages of RCTs concealment of the information on withdraws or sample size calculations were 43.7%, 5.9%, 4.9%, 9.9%, and 1.7% respectively.

The authors concluded that publication bias might be major issue in RCTs on acupuncture published in Chinese journals reported, which is related to high risk of bias. We suggest that all trials should be prospectively registered in international trial registry in future.

I applaud the authors’ courageous efforts to conduct this analysis, but I do not agree with their conclusion. The question why all Chinese acupuncture trials are positive has puzzled me since many years, and I have quizzed numerous Chinese colleagues why this might be so. The answer I received was uniformly that it would be very offensive for Chinese researchers to conceive a study that does not confirm the views held by their peers. In other words, acupuncture research in China is conducted to confirm the prior assumption that this treatment is effective. It seems obvious that this is an abuse of science which must cause confusion.

Whatever the reasons for the phenomenon, and we can only speculate about them, the fact has been independently confirmed several times and is now quite undeniable: acupuncture trials from China – and these constitute the majority of the evidence-base in this area – cannot be trusted. The only way to adequately deal with this problem that I can think of is to discard them outright.

Today, there are several dozens of journals publishing articles on alternative medicine. ‘The Journal of Alternative and Complementary Medicine’ is one of the best known, and it has one of the highest impact factors of them all. The current issue holds a few ‘gems’ which might be worthy of a comment or two. Here I have selected three articles reporting clinical studies, and I reproduce their abstracts (almost) in full (in italics) and add my comments (for clarity in bold). All the articles are available electronically, and I have provided the links for those who want to investigate beyond the abstracts.

STUDY No 1

The first ‘pilot study‘ was aimed to demonstrate the potential of auricular acupuncture (AAT) for insomnia in maintenance haemodialysis (MHD) patients and to prepare for a future randomized controlled trial.

Eligible patients were enrolled into this descriptive pilot study and received AAT designed to manage insomnia for 4 weeks. Questionnaires that used the Pittsburgh sleep quality index (PSQI) were completed at baseline, after a 4-week intervention, and 1 month after completion of treatment. Sleep quality and other clinical characteristics, including sleeping pills taken, were statistically compared between different time points.

A total of 22 patients were selected as eligible participants and completed the treatment and questionnaires. The mean global PSQI score was significantly decreased after AAT intervention (p<0.05). Participants reported improved sleep quality (p<0.01), shorter sleep latency (p<0.05), less sleep disturbance (p<0.01), and less daytime dysfunction (p=0.01). They also exhibited less dependency on sleep medications, indicated by the reduction in weekly estazolam consumption from 6.98±4.44 pills to 4.23±2.66 pills (p<0.01). However, these improvements were not preserved 1 month after treatment.

Conclusions: In this single-center pilot study, complementary AAT for MHD patients with severe insomnia was feasible and well tolerated and showed encouraging results for sleep quality.

My comments:

In alternative medicine research, it has become far too common (almost generally accepted) to call a flimsy trial a ‘pilot study’. The authors give their game away by stating that, by conducting this trial, they want to ‘demonstrate the potential of AAT’. This is not a legitimate aim of research; science is for TESTING hypotheses, not for PROVING them!

The results of this trial show that patients experienced improvements after receiving AAT which, however, did not last. As there was no placebo control group, the most likely explanation for these outcomes would be that AAT generated a short-lasting placebo effect.

A sample size of 22 is, of course, far to small to allow any conclusions about the safety of the intervention. Despite these obvious facts, the authors seem convinced that AAT is both safe and effective.

STUDY No 2

The aim of the second study was to compare the therapeutic effect of Yamamoto new scalp acupuncture (YNSA), a recently developed microcupuncture system, with traditional acupuncture (TCA) for the prophylaxis and treatment of migraine headache.

In a randomized clinical trial, 80 patients with migraine headache were assigned to receive YNSA or TCA. A pain visual analogue scale (VAS) and migraine therapy assessment questionnaire (MTAQ) were completed before treatment, after 6 and 18 sections of treatment, and 1 month after completion of therapy.

All the recruited patients completed the study. Baseline characteristics were similar between the two groups. Frequency and severity of migraine attacks, nausea, the need for rescue treatment, and work absence rate decreased similarly in both groups. Recovery from headache and ability to continue daily activities 2 hours after medical treatment showed similar improvement in both groups (p>0.05).

Conclusions: Classic acupuncture and YNSA are similarly effective in the prophylaxis and treatment of migraine headache and may be considered as alternatives to pharmacotherapy.

My comments:

This is what is technically called an ‘equivalence trial’, i.e. a study that compares an experimental treatment (YNSA) to one that is (assumed to be) effective. To demonstrate equivalence, such trials need to have large sample sizes, and this study is woefully underpowered. As it stands, the results show nothing meaningful at all; if anything, they suggest that both interventions were similarly useless.

STUDY No 3

The third study determined whether injection with hypertonic dextrose and morrhuate sodium (prolotherapy) using a pragmatic, clinically determined injection schedule for knee osteoarthritis (KOA) results in improved knee pain, function, and stiffness compared to baseline status.

The participants were 38 adults who had at least 3 months of symptomatic KOA and who were in the control groups of a prior prolotherapy randomized controlled trial (RCT) (Prior-Control), were ineligible for the RCT (Prior-Ineligible), or were eligible but declined the RCT (Prior-Declined).

The injection sessions at occurred at 1, 5, and 9 weeks with as-needed treatment at weeks 13 and 17. Extra-articular injections of 15% dextrose and 5% morrhuate sodium were done at peri-articular tendon and ligament insertions. A single intra-articular injection of 6 mL 25% dextrose was performed through an inferomedial approach.

The primary outcome measure was the validated Western Ontario McMaster University Osteoarthritis Index (WOMAC). The secondary outcome measure was the Knee Pain Scale and postprocedure opioid medication use and participant satisfaction.

The Prior-Declined group reported the most severe baseline WOMAC score (p=0.02). Compared to baseline status, participants in the Prior-Control group reported a score change of 12.4±3.5 points (19.5%, p=0.002). Prior-Decline and Prior-Ineligible groups improved by 19.4±7.0 (42.9%, p=0.05) and 17.8±3.9 (28.4%, p=0.008) points, respectively; 55.6% of Prior-Control, 75% of Prior-Decline, and 50% of Prior-Ineligible participants reported score improvement in excess of the 12-point minimal clinical important difference on the WOMAC measure. Postprocedure opioid medication resulted in rapid diminution of prolotherapy injection pain. Satisfaction was high and there were no adverse events.

Conclusions: Prolotherapy using dextrose and morrhuate sodium injections for participants with mild-to-severe KOA resulted in safe, significant, sustained improvement of WOMAC-based knee pain, function, and stiffness scores compared to baseline status.

My Comments:

This study had nothing that one might call a proper control group: all the three groups mentioned were treated with the experimental treatment. No attempt was made to control for even the most obvious biases: the observed effects could have been due to placebo or any other non-specific effects. The authors conclusions imply a causal relationship between the treatment and the outcome which is wrong. The notion that the experimental treatment is ‘safe’ is based on just 38 patients and therefore not reasonable.

IMPLICATION

All of this might seem rather trivial, and my comments could be viewed as a deliberate and vicious attempt to discredit one of the most respected journals of alternative medicine. Yet, considering that articles of this nature are more the rule than the exception in alternative medicine, I do think that this flagrant lack of scientific rigour is a relevant issue and has important implications.

As long as research in this area continues to be deeply flawed, as long as reviewers turn a blind eye to (or are not smart enough to detect) even the most obvious mistakes, as long as journal editors accept any rubbish in order to fill their pages, there is a great danger that we are being continuously being mislead about the supposed therapeutic value of alternative therapies.

Many who read this blog will, of course, have the capacity to think critically and might therefore not fall into the trap of accepting the conclusions of fatally flawed research. But many other people, including politicians, journalists and consumers, might not have the necessary appraisal skills and will thus not be able to tell that such studies can serve only one purpose: to popularise bogus treatments and thereby render health care less effective and more dangerous. Enthusiasts of alternative medicine are usually fully convinced that such studies amount to evidence and ram this pseudo-information down the throat of health care decision makers – the effects of such lobbying on public health can be disastrous.

And there is another downside to the publication of such dismal drivel: assuming (as I do) that not all of alternative medicine is completely useless, such embarrassingly poor research will inevitably have detrimental effects on the discipline of alternative medicine. After being exposed to a seemingly endless stream of pseudo-research, critics will eventually give up taking any of it seriously and might claim that none of it is worth the bother. In other words, those who conduct, accept or publish such nonsensical papers are not only endangering medical progress in general, they are also harming the very cause they try so desperately hard to advance.

I have often asked myself whether it is right/necessary to scientifically test things which are entirely implausible. Should we, for instance test the effectiveness of treatments which have a very low prior probability of generating a positive effect such as paranormal healing, homeopathy or Bach flower remedies? If you believe in the principles of evidence-based medicine you might focus on the clinical evidence and see biological plausibility as secondary. If you are a basic scientist, you are likely to do the reverse.

A recent article addressed this issue. The author points out that evaluating the absurd is absurd. Specifically, he noted that the empirical evaluation of a therapy would normally assume a plausible rationale regarding the mechanism of action. However, examination of the historical background and underlying principles for reflexology, iridology, acupuncture, auricular acupuncture, and some herbal medicines, reveals a rationale founded on the principle of analogical correspondences, which is a common basis for magical thinking and pseudoscientific beliefs such as astrology and chiromancy. Where this is the case, it is suggested that subjecting these therapies to empirical evaluation may be tantamount to evaluating the absurd.

This makes a lot of sense – but is it really entirely true? Are there no legitimate reasons at all for testing alternative treatments that lack biological plausibility? Ten or twenty years ago, I would have disagreed with the notion that plausibility is an essential prerequisite for scientific testing; today, I have changed my mind a little, but not as much as to agree completely with the assumption. In other words, I still see more than one good reason why evaluating the absurd might be reasonable or even advisable.

  1. Using plausibility as the only arbiter of scientific ‘evaluability’, assumes that we understand everything about plausibility there is to know. Yet it might just be possible that we mis-categorise something as implausible simply because we are not yet fully aware of all the facts.
  2. Declaring something as plausible and another thing as implausible are not hard and fast verdicts but judgements which, at least to some degree, are subjective. Sceptics find the axioms of homeopathy utterly implausible, for instance – but ask a homeopath, and you will hear all sorts of explanations which, at least to them, sound plausible.
  3. If an implausible alternative treatment is in wide-spread use, we arguably have a responsibility to test it scientifically in order to demonstrate the truth about it (to those proponents of that therapy who are willing to accept that rigorous science can find the truth). If we fail to do this, it will be the enthusiasts of that therapy who conduct less than rigorous science and produce false positive results. In turn, this will give the impression that the treatment is effective and mislead consumers, politicians, journalists etc. Seen from this perspective, it might even be unethical to not do the science.

So, I am in two minds about this (which might be a reflection of the fact that, during different periods of my life, I have been a clinician, a basic scientist and a clinical researcher). I realise that plausibility and prior probability are important – much more so than I appreciated years ago. But I think they should not be the only criteria. The clinical evidence should not be pushed aside completely.

I’d be interested to learn your views on this tricky issue.

The mechanisms thorough which spinal manipulative therapy (SMT) exerts its alleged clinical effects are not well established. A new study investigated the effects of subject expectation on clinical outcomes.

Sixty healthy subjects underwent quantitative sensory testing to their legs and low backs. They were randomly assigned to receive a positive, negative, or neutral expectation instructional set regarding the effects of a spe cific SMT technique on pain perception. Following the instructional set, all subjects received SMT and underwent repeat sensory tests.

No inter-group differences in pain response were present in the lower extremity following SMT. However, a main effect for hypoalgesia was present. A significant interaction was present between change in pain perception and group assignment in the low back with participants receiving a negative expectation instructional set demonstrating significant hyperalgesia.

The authors concluded that this study provides preliminary evidence for the influence of a non- specific effect (expectation) on the hypoalgesia associated with a single session of SMT in normal subjects. We replicated our previous findings of hypoalgesia in the lower extremity associated with SMT to the low back. Additionally, the resultant hypoalgesia in the lower extremity was independent of an expectation instructional set directed at the low back. Conversely, participants receiving a negative expectation instructional set demonstrated hyperalgesia in the low back following SMT which was not observed in those receiving a positive or neutral instructional set.

More than 10 years ago, we addressed a similar issue by conducting a systematic review of all sham-controlled trials of SMT. Specifically, we wanted to summarize the evidence from sham-controlled clinical trials of SMT. Eight studies fulfilled our inclusion/exclusion criteria. Three trials (two on back pain and one on enuresis) were judged to be burdened with serious methodological flaws. The results of the three most rigorous studies (two on asthma and one on primary dysmenorrhea) did not suggest that SMT leads to therapeutic responses which differ from an inactive sham-treatment. We concluded that sham-controlled trials of SMT are sparse but feasible. The most rigorous of these studies suggest that SMT is not associated with clinically relevant specific therapeutic effects.

Taken together, these two articles provide intriguing evidence to suggest that SMT is little more than a theatrical placebo. Given the facts that SMT is neither cheap nor devoid of risks, the onus is now on those who promote SMT, e.g. chiropractors, osteopaths and physiotherapists, to show that this is not true.

Guest Post by Jan Willem Nienhuys

The so-called Swiss government report of 2011 on homeopathy was actually an expanded translation of a 2006 book, which in itself was an expanded version of a document submitted to a Swiss committee (PEK) in charge of evaluation of alternative medicine. It has been severely criticised. A summary of criticisms with links can be found on the RationalWiki item to which we may add the Zeno’s Blog. I present here the results of my scrutiny of chapter 10 (1), although I base my report on the original German edition.

This chapter by itself shows a familiar result: the better the investigation, the less evidence in favor of homeopathy it shows. It shows also how homeopaths systematically distort unfavorable results by mispresenting them. Chapter 10 deals with clinical investigations of homeopathy. The authors restrict their attention to an odd assortment of diseases such as acute rhinitis, allergic rhinitis, allergic asthma, sinusitis, adenoid vegetations, pharyngitis, tonsillitis, influenza-like infection and otitis media, together denoted as ‘upper respiratory tract infections/allergic reactions’ or URTI/A for short.

The number of papers reviewed is very small. The authors looked at much more than randomized clinical trials. Apparently their search did not extend further than 2003, but then they might have found over 150 papers, of which about one third double blind randomized trials that compared how well highly diluted homeopathy and placebo cured one of the indicated diseases. They managed to miss 25 papers mentioned in earlier meta-analyses and about four papers that are summarized in Pubmed.

Among the papers they missed is an extremely strong support for the claim ‘homeopathy works for URTI/A’. For example Riverón-Garrote et al. (2) did a placebo controlled double blind randomized clinical trial of homeopathy (apparently individualised) for asthma. Of about 33 verum patients 32 improved, whereas of about 30 placebo patients only 4 improved. The so-called p-value for such a result is less than 10–11. One wonders why this result wasn’t published in Science or Nature, but only in an obscure Spanish language homeopathic journal. Maybe the paper was excluded because it didn’t state that it was about allergic asthma, but note that in about three quarters of all asthma some kind of allergy is implicated.

Of course this pales in comparison to the paper by Friese and Zabalotnyi (3). Again a double blind randomised clinical trial with 72 sinusitis sufferers for both verum and placebo. But here 71 out of 72 verum patients were free of complaints after three weeks, or at least improved, whereas this was the case for only 8 of the placebo patients. Fisher’s Exact Test gives p = 2.47 times 10-29 (one tailed). A remarkable result, because it is well known that over 80% of sinusitis cases cures spontaneously within two weeks. Maybe placebos are dangerous in the hands of homeopaths. Again one wonders why Friese and Zabalotnyi didn’t share the Nobel prize in, say, 2008, and why it is necessary at all to meticulously analyse papers in which homeopathy shows a marginal advantage.

Instead, Maxion-Bergemann et al. include in their survey a paper by Bahemann (4). We quote the summary of the paper from the internet: ‘In homeopathic practice, Kalium bromatum is known as a remedy in the case of paranoid delusions, e. g. if someone suffers from the delusion of being the object of divine revenge, of being damned, or of being pursued. It is also a very important remedy in the case of nocturnal fears in children as well as in the case of convulsions, when they are hereditary, when they occur in childbed, or during teething. The following case demonstrates the successful treatment of a severe mononucleosis after studying the Materia medica.’ Mononucleosis isn’t even mentioned in the list given that specifies URTI/A. Maybe it was included because one of the symptoms of mononucleosis is a sore throat. Apparently the mononucleosis patient was given Kalium Bromatum (Maxion-Bergemann et al. state that it is Kalium Chromatum 200C, presumably Chromatum and Bromatum don’t differ too much to bother) because of something remarkable the patient said during the anamnesis. The reason for giving Kalium bromatum 200C in cases of paranoia might be that an overdose of bromide can induce psychoses. The homeopathic Materia Medica contains quite a few ‘symptoms’ from accidental poisonings reported in old medical literature; potassium bromide was liberally used in the nineteenth century for the calming of seizure and nervous disorders, according to Wikipedia.

More impressive in the list of 13 RCTs of Maxion-Bergemann are two of the largest ‘homeopathic’ trials known, namely of the remedy Oscillococcinum. These trials cannot be taken seriously. The first one, by Ferley et al. (5), has one glaring fault. They started with 478 ‘influenza’-patients (237 verum), tried to make 149 family physicians note down when the patients recovered, and then elected to restrict their attention to the 63 patients (39 verum) that recovered within 48 hours and therefore probably didn’t have flu at all. Coincidentally this was the only possibility out of 14 that gave a ‘significant’ result: correctly computed, p is just below 0.05. (Ferley et al. based their computation on 462 patients with 228 verum and applied a chi-squared test without continuity correction). It is hardly credible that they set this 48-hour criterion in advance, because even if the remedy worked, the risk of having too few subjects to get a significant result would have been considerable. But if one picks out one result among many possibilities, one should correct for multiple outcome. So the Ferley et al. investigation is at most an exploratory result in need of independent confirmation.

This ‘confirmation’ was undertaken soon afterwards, namely in the beginning of 1991, but the results were only published in 1998 and cannot be found on Pubmed (6). In this paper the definitions are somewhat different, but Papp et al. report that of 334 patients (167 verum) a total of 57 (32 verum) were cured in 48 hours. Now 25 versus 32 is not remarkable at all. One doesn’t need any elaborate computation for this. Calculation gives p=0.4. So one might think that the Ferley hypothesis was soundly refuted. But Papp et al. used something they call ‘the Krauth test’, probably some kind of automated post hoc fishing trip to select the best criteria to distinguish the placebo and verum groups. They claim that this ‘test’ gives p=0.0028. They specifically refer to ‘the null hypothesis (the number of patients free of symptoms after 48 hours is equal in both treatment groups)’, so their computation is wrong. The most remarkable thing about Papp et al. is that nobody seems to have to have noticed the large discrepancy between what the numbers say and the claim of the paper.

Another paper with ‘positive’ results is the 1994 study of Reilly et al. (7), number 28 in Maxion-Bergemann et al. The group of Reilly investigated allergic diseases treated by what they called homeopathy. The typical Reilly experiment consists of administering a highly diluted causative agent such as pollen or house dust mite or cat hairs or bird feathers to persons suffering from pollen allergy (seasonal rhinitis) or allergic asthma. However for true homeopathy one uses a substance that has been the subject of a so-called proving, and the remedy is chosen of the totality of all patient ‘symptoms’ – including things like sleeping position and fear of thunderstorms – sufficiently matches the symptoms of the proving. Let me call Reilly’s method ultra-isopathy. Reilly was already discussing this study on a symposium in 1990, but that paper is not clear. It is about 28 asthma patients, and only 24 were analysed. This small number in itself is already reason enough not to consider it. The main analysis was by comparing a subjective measure of wellbeing, the Visual Analog Scale (VAS). Here we find a significant difference (p=0.003) in favor of ultra-isopathy. However, in the small print we see that change in the very important FEV1-value (Forced Expiratory Volume in 1 second) was non-significant (p=0.08) but this refers only to the 18 patients that took such a test before and after the experiment.

Reilly attracted more attention with his first experiment in this vein (8). He started out with 79 patients in both the verum and the placebo group. The treatment was ultradiluted grass pollen for hay fever. The analysis was only about 56 verum and 52 placebo (in a diagram 53 placebo are shown). Such a large dropout (32%) is not good. On basis of the VAS-scores Reilly found p=0.02. VAS is only an ordinal scale and it is not at all clear that one person’s 60 mm means the same as another person’s 60 mm, and also not that two patients with respectively 40 mm and 80 mm together can be considered as equivalent to two other patients with 60 mm each. If we distinguish only better / equal / worse, then the numbers for the verum group were 34 / 9 / 13 and for the placebo group 27 / 5 / 21. One can analyse this in various ways: as a 3 by 2 contingency table (p=0.15), or as a 2 by 2 table, namely by joining the middle group either to the right (p=0.10) or to the left (p=0.34). In this manner the difference is less impressive.

Maxion-Bergemann et al. collected 29 articles. I take the liberty of removing from these everything that is not a double blind RCT that compares how well highly diluted homeopathy and placebo cures an URTI/A disease. We also remove all research with 50 or less patients. The more or less openly fraudulent or at least grossly mistaken Oscillococcinum trials I also leave out. In order of appearance we have then Wiesenauer 1985 (9) [8] Reilly 1986 (8) [6] Wiesenauer 1989 (10) [10] De Lange-de Klerk 1994 (11) [1] Aabel 2000 (12) [4] Jacobs 2001 (13) [22] Friese 2001 (14) [24] Lewith 2002 (15) [25] White 2003 (16) [29] The square brackets refer to the numbering in Maxion-Bergemann et al. A short review of these nine articles follows.

Wiesenauer 1985: one standard remedy for hayfever. Randomised 213 patients, analysed only 164. “no statistical significance was achieved” says the abstract on Pubmed. Reilly 1986: this we have discussed already. Ultra-isopathy for hayfever. Randomised 158 patients, analysed 108. Statistically significant, but barely so. Wiesenauer 1989: four groups, each with their own standard remedy or placebo for sinusitis, 152 patients. “There was no remarkable difference in the therapeutic success among the investigated homeopathic drug combinations nor between the active drugs and placebo”, according to the abstract in Pubmed De Lange-de Klerk 1994: this research was reported more extensively in the lead author’s dissertation (17). Individualised homeopathy for recurrent URTI in children. 175 children were randomised and 170 analysed after following them for a year. 128 different remedies/potencies were prescribed and all together 1042 different prescriptions were handed out. The result was a non-significant difference between homeopathy and placebo. One striking aspect of this investigation is that only after all computations were done, it was revealed which of the two groups was the placebo group and which the verum group. So the author or her thesis advisors deliberately made it impossible to fall for the temptation to start a fishing expedition in the data after the code was completely broken. See also Pubmed. Aabel 2000: ultra-isopathy for birch pollen allergy. Strictly speaking this investigation shouldn’t be in this short list because it was partly prophylactic. From Pubmed: “Surprisingly, the verum treated patients fared worse than the placebo group”. No measure of statistical significance is mentioned. Remarkably this article is preceded by a similar article (18) that Maxion-Bergemann et al. apparently weren’t able to locate. Jacobs 2001: 75 children with otitis media were treated with individualised homeopathy or placebo. Pubmed: “differences were not statistically significant”. It seems that Jacobs has indulged in a fishing trip because she mentions a “significant decrease in symptoms at 24 and 64 h after treatment in favor of homeopathy”. But that is wrong. Significance only can have a meaning if it refers to a single outcome that was planned before any patients were seen. Just picking out two results out of many and stating they are ‘significant’ betrays a fundamental ignorance of research methodology. Friese 2001: this article is also published elsewhere (19), at least the numbers are exactly the same according to Pubmed. 97 children randomized for either individual homeopathic treatment or placebo treatment of adenoid vegetations, 82 analysed. Apparently these 82 comprised 41 placebo and 41 verum, and of these 12 and 9 respectively required an operation in the end. This allegedly corresponds to p=0.64, “These results show no statistical significance.” Incidentally, this is the same Friese as reference 3. Lewith 2002: again ultra-isopathy, now for asthma, 242 patients randomised, 202 completed all clinical assessments. The full article can be accessed via Pubmed and elsewhere. The main conclusion is “Homoeopathic immunotherapy is not effective in the treatment of patients with asthma.” The authors notice that the averages in both groups behave somewhat erratic, and they have no explanation for this. White 2003: individualised homeopathy compared to placebo for 96 children with asthma, who are followed for 12 months. The conclusion is that there is no evidence that this kind of homeopathy is better than placebo. In other words, out of nine investigations only one (Reilly 1986) obtains a barely significant result.

But the interpretation of Maxion-Bergemann et al. is totally different: “If only the placebo-controlled, randomized trials with the highest EBM evidence are considered, 12 of 16 trials show a positive result for the homeopathically treated group (significantly positive 8/16 and trend 4/16).” Even in the more restricted subset of nine discussed above they are overly optimistic. They mark Wiesenauer (1985), De Lange-de Klerk (1994), Jacobs (2001) as showing a ‘trend for homeopathy’ and Lewith (2002) is even marked ‘significant’. The meticulous and high quality research of De Lange (1993, 1994) is judged ‘trend for homeopathy’.

In case of De Lange it seems clear where this judgement comes from. De Lange had several outcomes (number of sick periods, total duration of sick periods, sum of all dayscores etc., and all these showed roughly the same small non-significant difference in favor of homeopathy. This is not really strange, because these outcomes all measure about the same phenomenon. It is not remarkable that there is a small difference between the averages of the two groups that can only be noticed if the children are followed for a full year. There is not even the beginning of a reason that this has anything to do with the treatment. For example the homeopathy group had ‘significantly’ less pets at home. This might serve as an explanation why they as a group were slightly less sick. One might also speculate that this was retroactively caused by the homeopathic treatment. This is not really more improbable than highly diluted stuff (more than 95% D6 and higher) having an effect.

By convention ‘statistically significant’ is the lower limit where weak conclusions such as ‘worth investigating further’ can be justified, and we repeat: only if it refers to a single outcome measure or endpoint chosen before any data collection has started. De Lange chose recurrent URTI because homeopathy was reputed to be most effective for this type of complaints, especially after investigations such as those of Reilly (1986). If following 170 children for a full year cannot show a clear advantage, then that is simply a negative result. In the case of Lewith the ‘significant for homeopathy’ is probably based on partial results such as that in week 3 ‘homeopathy’ fared better in the asthma VAS. One can just as well point to week 16 where the FEV1 of the placebo group seems much better than in the homeopathy group.

Maxion-Bergemann et al. seem to have been singularly inept in collecting papers on homeopathic trials, and for no apparent reason they decided to look also at a large number of case reports and investigations without control group or blinding, even after investigators as early as 1991 have remarked that henceforth only well designed large double blind RCTs were worth considering. If we restrict our attention to the properly blinded controlled investigations, we see the same thing as in other meta-analyses of homeopathy: there is lots of rubbish in favor of homeopathy, but the good trials say plainly and clearly: homeopathy is ineffective, precisely what can be predicted from the fact that there is nothing in it.

Homeopaths nowadays have a lot to say about RCTs and how they prove homeopathy. RCTs are subtle and complicated scientific tools. It is somewhat strange to see how homeopaths resolutely ignore two centuries of basic science but then argue their cause on the basis of complicated statistics.

Homeopathy is an assortment of wildly different practices and theories. We have seen ultra-isopathy, individualised homeopathy and the practice of giving one standardised remedy for one diagnosis without asking too many personal details from the patient. These standard remedies are often branded mixtures of highly diluted ‘classical’ homeopathy, quite contrary to the opinions of homeopathy’s inventor Hahnemann. There are many more variants of homeopathy and the homeopaths themselves cannot agree which are the correct ones.

Moreover, if a treatment or trial doesn’t work out, then a number of additional hypotheses about homeopathy can be invoked, which is what Maxion-Bergemann et al. do. Homeopathic remedies supposedly are counteracted by lots of regular medications and even by strong tasting or smelling food, such as coffee, parsley, garlic and peppermint. Hahnemann even disapproved of reading in bed and long afternoon naps and prolonged suckling of infants (Organon, section 260). Poor performance of homeopathy can be blamed on something called ‘initial aggravation’ or else on lack of experience of the poorly performing homeopath.

But that these factors are relevant at all is unknown, just like there is no proof at all for the similia principle, nor for the hundred thousands or even millions of ‘symptoms’ associated with highly diluted materials in the homeopathic Materia Medica. If homeopaths really want scientists to share homeopathic beliefs, they should not think up lame excuses for ‘failed’ tests, but for starters they might try to present proofs for all or at least some of their ‘symptoms’. They don’t try very hard and in so far it has been tried, it also has failed (20).

I would like to thank Willem Betz for helpful remarks.

I am a retired mathematician with no other interest than a desire to promote science.

References

1. Stefanie Maxion-Bergemann, Gudrun Bornhöft, Denise Bloch, Christina Vogt-Frank, Marco Righetti, André Thurneysen. (2011) Clinical Studies on the Effectiveness of Homeopathy for URTI/A (Upper Respiratory Tract Infections and Allergic Reactions) in: Homeopathy in Healthcare – Effectiveness, Appropriateness, Safety, Costs. G. Bornhöft and P.F. Mattheiesen (eds.), Berlin etc., Springer 2011, p. 18-157.

2. Riverón-Garrote, M., Fernandez-Argüelles, R.; Morán-Rodríquez, F.; Campistrou-Labaut, J.L. (1998) Ensayo clínico controlado aleatorízado del tratamiento homeopático del asma bronquial, Boletín Mexicano de Homepatía 1998; 31(2):54-61.

3. Friese, K.-H., Zabalotnyi, D.I. (2007) Homöopathie bei akuter Rhinosinusitis, Eine doppelblinde, placebokontrollierte Studie belegt die Wirksamkeit und Verträglichkeit eines homöopathischen Kombinationsarzneimittels, HNO 55(4):271-277.

4. Bahemann A. (2002) Kalium bromatum bei infektiöser Mononukleose. Zeitschrift für Klassische Homöopathie 46:232–233.

5. Ferley J.P., Zmirou D., D’Adhemar D., Balducci F. (1989). A controlled evaluation of a homoeopathic preparation in the treatment of influenza like syndromes. British Journal of Clinical Pharmacology 27:329-335.

6. Papp R., Schuback G., Beck E., Burkard G., Bengel J., Lehrl S., Belon P. (1998). Oscillococcinum in patients with influenza-like syndromes: a placebo-controlled double-blind evaluation. British Homeopathic Journal 87:69-76.

7. Reilly, D.T., Taylor, M.A., Beattie, N.G.M., Campbell, J.H., McSharry C., Aitchison T.C., Carter R., Stevenson R. (1994) Is evidence for homoeopathy reproducible?, Lancet 1994 344:1601-1606.

8. Reilly, D.T., Taylor, M.A., McSharry, C., Aitchison, T. (1986) Is Homoeopathy a Placebo Response?, Controlled Trial of Homoeopathic Potency – With Pollen in Hayfever as Model, Lancet II.2:881-886.

9. Wiesenauer, M., Gaus, W. (1985) Double-blind Trial Comparing the Effectiveness of Galphimia Potentisation D6 (Homoeopathic Preparation), Galphimia Dilution 10-6 and Placebo on Pollinosis, Arzneimittelforschung 35(11):1745-1747.

10. Wiesenauer M, Gaus W, Bohnacker U, Häussler S (1989) Wirksamkeitsprüfung von homöopathischen Kombinationspräparaten bei Sinusitis: Ergebnisse einer randomisierten Doppelblindstudie unter Praxisbedingungen. Arzneimittelforschung 39:620-625.

11. de Lange-de Klerk E.S.M., Blommers J., Kuik D.J., Bezemer P.D., Feenstra L. (1994). Effects of homoeopathic medicines on daily burden of symptoms in children with recurrent upper respiratory tract infections. BMJ 309:1329-1332.

12. Aabel, S. (2000) No beneficial effect of isopathic prophylactic treatment for birch pollen allergy during a low-pollen season, A double-blind, placebo-controlled clinical trial of homeopathic Betula 30c. British Homeopathic Journal 89(4):169-173.

13. Jacobs, J., Springer, D.A., Crothers, D. (2001) Homeopathic treatment of acute otitis media in children, A preliminary randomized placebo-controlled trial. The Pediatric Infectious Disease Journal 20(2):177-183.

14. Friese K.H., Feuchter U., Lüdtke R., Moeller H. (2001) Results of a randomised prospective double-blind trial on the homeopathic treatment of adenoid vegetations. European Journal of General Practice 7:48-54.

15. Lewith, G.T., Watkins, A.D.; Hyland, M.E.; Shaw, S.; Broomfield, J.A.; Dolan, G.; Holgate, S.T. (2002) Use of ultramolecular potencies of allergen to treat asthmatic people allergic to house dust mite: double blind randomised controlled clinical trial, BMJ 324:520-523.

16. White, A., Slade, P.; Hunt, C.; Hart, A.; Ernst, E. (2003) Individualised homeopathy as an adjunct in the treatment of childhood asthma, A randomised placebo controlled trial. Thorax 58(4):317-321

17. Lange-de Klerk, E.S.M. de, Effects of homoeopathic medicines on children with recurrent upper respiratory tract infections. Vrije Universiteit Amsterdam, 1993 (Dissertation).

18. Aabel, S., Laerum, E.; Dölvik, S.; Djupesland, P. (2000) Is homeopathic ‘immunotherapy’ effective?, A double-blind, placebo-controlled trial with the isopathic remedy Betula 30c for patients with birch pollen allergy. British Homeopathic Journal 89(4):161-168.

19. Friese K.-H., Feuchter U., Möller H. (1997). Die homöopathische Behandling von adenoiden Vegetationen. HNO; 45:618–624.

20. Brien S., Lewith G., Bryant, T. (2003) Ultramolecular homeopathy has no observable clinical effects. A randomized, double-blind, placebo-controlled proving trial of Belladonna 30C.

Mohamed Khalifa is a therapist who works in Austria and has been practicing manual therapy for more than 30 years. His treatment, the so-called “Khalifa therapy”, is based on rhythmically applying manual pressure on parts of the body. Khalifa claims to be able to speed the self-healing processes of the human body. He has treated many top-athletes from all over the world; however, his method has never been investigated in detail within interdisciplinary scientific studies.

Now the first RCT of Khalifa therapy has become available.

Rupture of the anterior cruciate ligament (ACL) is an injury which usually needs to be treated surgically. It does not heal spontaneously, although some claim this commonly accepted knowledge to be not true. This randomized, controlled, observer-blinded, multicentre study was performed to test the effectiveness of Khalifa therapy for ACL. Thirty patients with complete ACL rupture, magnetic resonance imaging (MRI) verified, were included. Study examinations (e.g., international knee documentation committee (IKDC) score) were performed at inclusion (t 0). Patients were randomized to receive either standardised physiotherapy (ST) or additionally 1 hour of Khalifa therapy at the first session (STK). Twenty-four hours later, study examinations were performed again (t 1). Three months later control MRI and follow-up examinations were performed (t 2).

Initial status was comparable between both groups. There was a highly significant difference of mean IKDC score results at t 1 and t 2. After 3 months, 47% of the STK patients, but no ST patient, demonstrated an end-to-end homogeneous ACL in MRI. Clinical and physical examinations were significantly different in t 1 and t 2. ACL healing can be improved with manual therapy. Physical activity could be performed without pain and nearly normal range of motion after one treatment of specific pressure.

The authors of this study concluded that spontaneous healing of ACL rupture is possible within 3 months after lesion, enhanced by Khalifa therapy. The effect sizes of 1.6 and 2.0 standard deviations after treatment and after 3 months are considerable and prompt further work. Further progress in understanding the underlying mechanisms including placebo will be possible when more experience with the manual pressure therapy has been gathered by other therapists.

The authors of this RCT state that according to common knowledge, it (ACL) does not heal spontaneously. Other authors disagree with this notion:

Observations on 14 patients with ACL, for instance, indicated an acutely injured ACL may eventually spontaneously heal without using an extension brace, allowing return to athletic activity. Another study suggested that an acutely injured ACL has healing capability. It also suggests that conservative management of the acute ACL injury can yield satisfactory results in a group of individuals who have low athletic demands and continuous ACL on MRI, provided the patients are willing to accept the slight risk of late ACL reconstruction and meniscal injury.

So yes, the authors of the new RCT are correct in stating: spontaneous healing of ACL rupture is possible within 3 months … but the healing might indeed be SPONTANEOUS, i.e. unrelated to the Khalifa therapy. Before we can accept that Khalifa therapy is anything but a theatrical placebo, this RCT needs independent replication. Generally speaking, it seems a bad idea to make exaggerated claims on the basis of one single trial, particularly for treatments that are as implausible as this one.

It is almost 10 years ago that Prof Kathy Sykes’ BBC series entitled ALTERNATIVE MEDICINE was aired. I had been hired by the BBC as their advisor for the programme and had tried my best to iron out the many mistakes that were about to be broadcast. But the scope for corrections turned out to be narrow and, at one stage, the errors seemed too serious and too far beyond repair to continue with my task. I had thus offered my resignation from this post. Fortunately this move led to some of my concerns being addressed after all, and they convinced me to remain in post.

The first part of the series was on acupuncture, and Kathy presented the opening scene of a young women undergoing open heart surgery with the aid of acupuncture. All the BBC had ever shown me and asked me to advise on was the text – I had never seen the images. Kathy’s text included the statement that the patient was having the surgery “with only needles to control the pain.”  I had not objected to this statement in the firm belief that the images of the film would back up this extraordinary claim. As it turned out, it did not; the patient clearly had all sorts of other treatments given through intra-venous lines and, in the film, these were openly in the view of Kathy Sykes.

This overt contradiction annoyed not just me but several other people as well. One of them was Simon Singh who filed an official complaint against the BBC for misleading the public, and eventually won his case.

The notion that acupuncture can serve as an alternative to anaesthesia or other surgical conditions crops up with amazing regularity. It is important not least because is often used as a promotional tool with the implication that, IF ACUPUNCTURE CAN ACHIVE SUCH DRAMATIC EFFECTS, IT MUST BE AN INCREDIBLY USEFUL TREATMENT! It is therefore relevant to ask what the scientific evidence tells us about this issue.

This was the question we wanted to address in a recent publication. Specifically, our aim was to summarise recent systematic reviews of acupuncture for surgical conditions.

Thirteen electronic databases were searched for relevant reviews published since 2000. Data were extracted by two independent reviewers according to predefined criteria. Twelve systematic reviews met our inclusion criteria. They related to the prevention or treatment of post-operative nausea and vomiting as well as to surgical or post-operative pain. The reviews drew conclusions which were far from uniform; specifically for surgical pain the evidence was not convincing. We concluded that “the evidence is insufficient to suggest that acupuncture is an effective intervention in surgical settings.”

So, Kathy Sykes’ comment was misguided in more than just one way: firstly, the scene she described in the film did not support what she was saying; secondly, the scientific evidence fails to support the notion that acupuncture can be used as an alternative to analgesia during surgery.

This story has several positive outcomes all the same. After seeing the BBC programme, Simon Singh contacted me to learn my views on the matter. This prompted me to support his complaint against the BBC and helped him to win this case. Furthermore, it led to a co-operation and friendship which produced our book TRICK OR TREATMENT.

The news that the use of Traditional Chinese Medicine (TCM) positively affects cancer survival might come as a surprise to many readers of this blog; but this is exactly what recent research has suggested. As it was published in one of the leading cancer journals, we should be able to trust the findings – or shouldn’t we?

The authors of this new study used the Taiwan National Health Insurance Research Database to conduct a retrospective population-based cohort study of patients with advanced breast cancer between 2001 and 2010. The patients were separated into TCM users and non-users, and the association between the use of TCM and patient survival was determined.

A total of 729 patients with advanced breast cancer receiving taxanes were included. Their mean age was 52.0 years; 115 patients were TCM users (15.8%) and 614 patients were TCM non-users. The mean follow-up was 2.8 years, with 277 deaths reported to occur during the 10-year period. Multivariate analysis demonstrated that, compared with non-users, the use of TCM was associated with a significantly decreased risk of all-cause mortality (adjusted hazards ratio [HR], 0.55 [95% confidence interval, 0.33-0.90] for TCM use of 30-180 days; adjusted HR, 0.46 [95% confidence interval, 0.27-0.78] for TCM use of > 180 days). Among the frequently used TCMs, those found to be most effective (lowest HRs) in reducing mortality were Bai Hua She She Cao, Ban Zhi Lian, and Huang Qi.

The authors of this paper are initially quite cautious and use adequate terminology when they write that TCM-use was associated with increased survival. But then they seem to get carried away by their enthusiasm and even name the TCM drugs which they thought were most effective in prolonging cancer survival. It is obvious that such causal extrapolations are well out of line with the evidence they produced (oh, how I wished that journal editors would finally wake up to such misleading language!) .

Of course, it is possible that some TCM drugs are effective cancer cures – but the data presented here certainly do NOT demonstrate anything like such an effect. And before such a far-reaching claim is being made, much more and much better research would be necessary.

The thing is, there are many alternative and plausible explanations for the observed phenomenon. For instance, it is conceivable that users and non-users of TCM in this study differed in many ways other than their medication, e.g. severity of cancer, adherence to conventional therapies, life-style, etc. And even if the researchers have used clever statistical methods to control for some of these variables, residual confounding can never be ruled out in such case-control studies.

Correlation is not causation, they say. Neglect of this elementary axiom makes for very poor science – in fact, it produces dangerous pseudoscience which could, like in the present case, lead a cancer patient straight up the garden path towards a premature death.

Do you think that chiropractic is effective for asthma? I don’t – in fact, I know it isn’t because, in 2009, I have published a systematic review of the available RCTs which showed quite clearly that the best evidence suggested chiropractic was ineffective for that condition.

But this is clearly not true, might some enthusiasts reply. What is more, they can even refer to a 2010 systematic review which indicates that chiropractic is effective; its conclusions speak a very clear language: …the eight retrieved studies indicated that chiropractic care showed improvements in subjective measures and, to a lesser degree objective measures… How on earth can this be?

I would not be surprised, if chiropractors claimed the discrepancy is due to the fact that Prof Ernst is biased. Others might point out that the more recent review includes more studies and thus ought to be more reliable. The newer review does, in fact, have about twice the number of studies than mine.

How come? Were plenty of new RCTs published during the 12 months that lay between the two publications? The answer is NO. But why then the discrepant conclusions?

The answer is much less puzzling than you might think. The ‘alchemists of alternative medicine’ regularly succeed in smuggling non-evidence into such reviews in order to beautify the overall picture and confirm their wishful thinking. The case of chiropractic for asthma does by no means stand alone, but it is a classic example of how we are being misled by charlatans.

Anyone who reads the full text of the two reviews mentioned above will find that they do, in fact, include exactly the same amount of RCTs. The reason why they arrive at different conclusions is simple: the enthusiasts’ review added NON-EVIDENCE to the existing RCTs. To be precise, the authors included one case series, one case study, one survey, two randomized controlled trials (RCTs), one randomized patient and observer blinded cross-over trial, one single blind cross study design, and one self-reported impairment questionnaire.

Now, there is nothing wrong with case reports, case series, or surveys – except THEY TELL US NOTHING ABOUT EFFECTIVENESS. I would bet my last shirt that the authors know all of that; yet they make fairly firm and positive conclusions about effectiveness. As the RCT-results collectively happen to be negative, they even pretend that case reports etc. outweigh the findings of RCTs.

And why do they do that? Because they are interested in the truth, or because they don’t mind using alchemy in order to mislead us? Your guess is as good as mine.

1 5 6 7 8 9 11
Subscribe via email

Enter your email address to receive notifications of new blog posts by email.

Recent Comments

Note that comments can be edited for up to five minutes after they are first submitted but you must tick the box: “Save my name, email, and website in this browser for the next time I comment.”

The most recent comments from all posts can be seen here.

Archives
Categories