MD, PhD, FMedSci, FRSB, FRCP, FRCPEd

methodology

One phenomenon that can be noted more frequently than any other in alternative medicine research is that studies arrive at wrong or misleading conclusions. This is more than a little disappointing, not least because it is the conclusion of a trial that is often picked up by health writers and others who in turn mislead the public. On this blog, we must have seen hundreds of examples of this irritating phenomenon. Here is yet another one. This study, a randomized, parallel, open-label exploratory trial, evaluated and compared the effects of systemic manual acupuncture, periauricular electroacupuncture and distal electroacupuncture for treating patients with tinnitus. It included patients who suffered from idiopathic tinnitus for more than two weeks were recruited. They were divided into three groups:

  1. systemic manual acupuncture group (MA),
  2. periauricular electroacupuncture group (PE),
  3. distal electroacupuncture group (DE).

Nine acupoints (TE 17, TE21, SI19, GB2, GB8, ST36, ST37, TE3 and TE9), two periauricular acupoints (TE17 and TE21), and four distal acupoints (TE3, TE9, ST36, and ST37) were selected. The treatment sessions were performed twice weekly for a total of 8 sessions over 4 weeks. Outcome measures were the tinnitus handicap inventory (THI) score and the loud and uncomfortable visual analogue scales (VAS). Demographic and clinical characteristics of all participants were compared between the groups upon admission using one-way analysis of variance (ANOVA). One-way ANOVA was used to evaluate the THI, VAS loud, and VAS uncomfortable scores. The least significant difference test was used as a post-hoc test. In total, 39 subjects were eligible for analysis. No differences in THI and VAS loudness scores were observed between groups. The VAS uncomfortable scores decreased significantly in MA and DE compared with those in PE. Within the group, all three treatments showed some effect on THI, VAS loudness scores and VAS uncomfortable scores after treatment except DE in THI. The authors concluded that there was no statistically significant difference between systemic manual acupuncture, periauricular electroacupuncture and distal electroacupuncture in tinnitus. However, all three treatments had some effect on tinnitus within the group before and after treatment. Systemic manual acupuncture and distal electroacupuncture have some effect on VAS. Neither of the three treatments tested in this study have been previously proven to work. Therefore, it is quite simply nonsensical to compare them. Comparative studies are indicated only with therapies that have a solid evidence-base. They are called ‘superiority trials’ and require a different statistical approach as well as much larger sample sizes. In other words, this study was an unethical waste of resources from the outset. With this in mind, there is only one conclusion that fits the data: there was no statistically significant difference between the three types of acupuncture. The data are therefore in keeping with the notion that all three are placebos. Alternatively one might conclude more clearly for those who are otherwise resistant to learning a lesson: POORLY DESIGNED CLINICAL TRIALS ARE UNETHICAL AND NEVER LEND THEMSELVES TO MEANINGFUL CONCLUSIONS.

Acupuncture is often recommended as a treatment for shoulder pain, but its effectiveness is far from proven. A new study has just been published; but does it change this uncertainty?

A total of 227 patients with subacromial pain syndrome were recruited to this RCT. The patients were allocated to three groups who received either A) group exercise, B) group exercise plus acupuncture or C) group exercise plus electro-acupuncture. The primary outcome measure was the Oxford Shoulder Score. Follow-up was post treatment, and at 6 and 12 months. Data were analysed on intention-to-treat principles with imputation of missing values.

Treatment groups were similar at baseline. All treatment groups demonstrated improvements over time. Between-group estimates were, however, small and non-significant.

The authors concluded that neither acupuncture nor electro-acupuncture were found to be more beneficial than exercise alone in the treatment of subacromial pain syndrome. 

Well, that was to be expected!… I hear the rationalists amongst us exclaim.

Actually, I am not so sure.

One could easily have expected that the acupuncture groups (B and C) show a significant advantage over group A.

Why?

Because acupuncture is a ‘theatrical placebo’, a ritual that impresses patients and thus impacts on results, particularly on subjective outcomes like pain. If the results had shown a benefit for acupuncture + exercise (groups B and C) versus exercise alone (group A), what would we have made of it? Acupuncture fans would surely have claimed that it is evidence confirming acupuncture’s effectiveness. Sceptics, on the other hand, would have rightly insisted that it demonstrates nothing of the sort – it merely confirms that placebo effects can affect clinical outcomes such as pain.

As it turned out, however, this trial results happened to indicate that these placebo-effects can be so small that they fail to reach the level of statistical significance.

I think there is one noteworthy message here: RCTs with such a design (no adequate control for placebo effects) can easily generate false-positive results (in this case, this did not happen, but it was nevertheless a possible outcome). Such studies are popular but utterly useless: they don’t advance our knowledge one single iota. If that is so, we should not waste our resources on them because, in the final analysis, this is not ethical. In other words, we must stop funding research that has little or no chance of advancing our knowledge.

This new RCT was embargoed until today; so, I had to wait until I was able to publish my comments. Here are the essentials of the study:

The Swedish investigators compared the effect of two types of acupuncture versus no acupuncture in infants with colic in public child health centres (CHCs). The study was designed as a multicentre, randomised controlled, single-blind, three-armed trial (ACU-COL) comparing two styles of acupuncture with no acupuncture, as an adjunct to standard care. Among 426 infants whose parents sought help for colic and registered their child’s fussing/crying in a diary, 157 fulfilled the criteria for colic and 147 started the intervention.

Parallel to usual care, study participants visited the study CHC twice a week for 2 weeks. Thus, all infants received usual care plus 4 extra visits to a CHC, during which parents met a nurse for 20–30 min and were able to discuss their infant’s symptoms. Together these were considered to represent gold standard care. The nurse listened, and gave evidence-based advice and calming reassurance. Breastfeeding mothers were encouraged to continue breastfeeding. At each visit, the study nurse carried the infant to a separate treatment room where they were left alone with the acupuncturist for 5 min.

The acupuncturist treated the baby according to group allocation and recorded the treatment procedures and any adverse events. Disposable stainless steel 0.20×13 mm Vinco needles (Helio, Jiangsu Province, China) were used. Infants allocated to group A received standardised MA at LI4. One needle was inserted to a depth of approximately 3 mm unilaterally for 2–5 s and then withdrawn without stimulation. Infants allocated to group B received semi-standardised individualised acupuncture, mimicking clinical TCM practice. Following a manual, the acupuncturists were able to choose one point, or any combination of Sifeng, LI4 and ST36, depending on the infant’s symptoms, as reported in the diary. A maximum of five insertions were allowed per treatment. Needling at Sifeng consisted of 4 insertions, each to a depth of approximately 1 mm for 1 s. At LI4 and ST36, needles were inserted to a depth of approximately 3 mm, uni- or bilaterally. Needles could be retained for 30 seconds. De qi was not sought, therefore stimulation was similarly minimal in groups A and B. Infants in group C spent 5 min alone with the acupuncturist without receiving acupuncture.

The effect of the two types of acupuncture was similar and both were superior to gold standard care alone. Relative to baseline, there was a greater relative reduction in time spent crying and colicky crying by the second intervention week (p=0.050) and follow-up period (p=0.031), respectively, in infants receiving either type of acupuncture. More infants receiving acupuncture cried <3 hours/day, and thereby no longer fulfilled criteria for colic, in the first (p=0.040) and second (p=0.006) intervention weeks. No serious adverse events were reported.

The authors concluded that acupuncture appears to reduce crying in infants with colic safely.

Notice that the investigators are cautious and state in the abstract that “acupuncture appears to reduce crying…” Their conclusions from the actual article are, however, quite different; here they state the following:

Among those initially experiencing excessive infant crying, the majority of parents reported normal values once the infant’s crying had been evaluated in a diary and a diet free of cow’s milk had been introduced. Therefore, objective measurement of crying and exclusion of cow’s milk protein are recommended as first steps, to avoid unnecessary treatment. For those infants that continue to cry >3 hours/day, acupuncture may be an effective treatment option. The two styles of MA tested in ACU-COL had similar effects; both reduced crying in infants with colic and had no serious side effects. However, there is a need for further research to find the optimal needling locations, stimulation and treatment intervals.

Such phraseology is much more assertive and seems to assume acupuncture caused specific therapeutic effects. Yet, I think, this assumption is not warranted.

In fact, I believe, the study shows almost the opposite of what the authors conclude. Both minimal and TCM acupuncture seemed to reduce the symptoms of colic compared to no acupuncture at all. I think, this confirms previous research showing that acupuncture is a ‘theatrical placebo’. The study was designed without an adequate placebo group. It would have been easy to use some form of sham acupuncture in the control group. Why did the authors not do that? Heaven knows, but one might speculate that they were aiming for a positive result – and what better way to ensure it than with a ‘no treatment’ control group?

There are, of course, numerous other flaws. For instance, Prof David Colquhoun FRS, Professor of Pharmacology at University College London, criticised the study because of its lousy statistics:

START OF QUOTE

“It is truly astonishing that, in the 21st century, the BMJ still publishes a journal devoted to a form of pre-scientific medicine which after more than 3000 trials has still not been able to produce convincing evidence of efficacy1. Like most forms of alternative medicine, acupuncture has been advocated for a vast range of problems, and there is little evidence that it works for any of them. Colic has not been prominent in these claims. What parent would think that sticking needles into their baby would stop it crying? The idea sounds bizarre. It is. This paper certainly doesn’t show that it works.

“The statistical analysis in the paper is incompetent. This should have been detected by the referees, but wasn’t.  For a start, the opening statement, ‘A two-sided P value ≤0.05 was considered statistically significant’ is simply unacceptable in the light of all recent work about reproducibility.  Still worse, Table 1 uses the description ‘statistical tendency towards significance (p=0.051–0.1)’.

“Worst of all, Table 1 reports 24 different P values, of which three are (just) below 0.05. Yet no correction has been used for multiple comparisons. This is very bad practice. It’s highly unlikely that, if the proper correction had been done, any of the results would have given a type 1 error rate below 5%.

“Even were it not for this, most of the ‘significant’ P values are marginal (only slightly less than 0.05).  It is now well known that the type 1 error rate gives an optimistic view. What matters is the false positive rate – the chance that a ‘significant’ result is a false positive.  A p-value close to 0.05 implies that there is at least a 30% chance that they are false positives.  If one thought, a priori, that the chance of colic being cured by sticking needles into a baby was less than 50%, the false positive rate could easily be greater than 80%2.  It is now recognised that this misinterpretation of p-values is a major contributor to the crisis of reproducibility.

“Other problems concern the power calculation.  A priori calculations of power are well-known to be overoptimistic, because small trials usually overestimate the effect size.  In this case the initial estimated sample size was not attained, and a rather mysterious recalculation of power was used.

“Another small problem: the discussion points out that ‘the majority of infants in this cohort did not have colic’.

“The nature of the control group is not very clear. An appropriate control might have been to cuddle the baby – this was used in a study in which another implausible treatment, chiropractic, was shown not to work.  This appears not to have been done.

“Lastly, p-values are reported in the text without mention of effect sizes. This is contrary to all statistical advice.

“In conclusion, the design of the trial is reasonable (apart from the control group) but the statistical analysis is appalling.  It’s very likely that there aren’t any real effects of acupuncture at all. This paper serves more to muddy the waters than to add useful information. It’s a model for the sort of mistakes that have led to the crisis in reproducibility.  The BMJ should not be publishing this sort of stuff, and the referees seem to have no understanding of statistics.”

END OF QUOTE

Despite these rather obvious – some would say fatal – flaws, the editor of ACUPUNCTURE IN MEDICINE (AIM) thought this trial to be so impressively rigorous that he issued a press-release about it. This, I think, is particularly telling, perhaps even humorous: it shows what kind of a journal AIM is, and also provides an insight into the state of acupuncture research in general.

The long and short of it is that conclusions about specific therapeutic effects of acupuncture are not permissible. We know that colicky babies respond even to minimal attention, and this trial confirms that even a little additional TLC in the form of acupuncture will generate an effect. The observed outcome is most likely unrelated to acupuncture.

If you want to scientifically investigate this question, it might be a good idea NOT to start with the following sentence: “Auricular acupuncture (AA) is effective in the treatment of preoperative anxiety”. Yet, this is exactly what the authors did in their recent publication.

The aim of this new study was to investigate whether AA can reduce exam anxiety as compared to placebo and no intervention. Forty-four medical students were randomized to receive AA, placebo, or no intervention in a crossover manner. Subsequently they completed three comparable oral anatomy exams with an interval of one month between the exams/interventions.

A licensed acupuncturist with more than five years of experience with this technique applied AA at the acupuncture points MA-IC1 (Lung), MA-TF1 (ear Shenmen), MA-SC (Kidney), MA-AT1 (Subcortex) and MA-TG (Adrenal gland) bilaterally. Indwelling fixed ‘New Pyonex’ needles embedded in a skin-coloured adhesive tape were used for AA. The participants were instructed by the acupuncturist to stimulate the auricular needles for 3–5 minutes, if they felt anxious. For the placebo procedure, ‘New Pyonex’ placebo needles were attached to five sites on the helix of the auricle bilaterally. ‘New Pyonex’ placebo needles have the same appearance as AA needles but consist of self-adhesive tape only. In order to avoid potential physiologic effects of acupressure, the participants were not instructed to stimulate the attached ‘New Pyonex’ placebo needles. AA and placebo needles were left in situ until the next day and were removed out of sight of the participants after the exam by the investigator, who was not involved in acupuncture procedure

Levels of anxiety were measured using a visual analogue scale before and after each intervention as well as before each exam. Additional measures included the State-Trait-Anxiety Inventory, duration of sleep at night, blood pressure, heart rate and the extent of participant blinding.

All included participants finished the study. Anxiety levels were reduced after AA and placebo intervention compared to baseline and the no intervention condition (p < 0.003). Moreover, AA was also better at reducing anxiety than placebo in the evening before the exam (p = 0.018). Participants were able to distinguish between AA and placebo intervention.

The authors concluded that both auricular acupuncture and placebo procedure were shown to be effective in reducing levels of exam anxiety in medical students. The superiority of verum AA over placebo AA and no intervention is considered to be due to stimulation of cranial nerves, but may have been increased in effect by insufficient participant blinding.

Here are just three of the major concerns I have about this study:

  • The trial design seems odd: a crossover study can only work well, if there is a stable baseline. This may not be the case with three consecutive exams; the anxiety experienced by students is bound to get less as time goes by. I think anyone who has passed a series of exams will confirm that there is a large degree of habituation.
  • It seems inadequate to employ just one acupuncturist; it means that the trial might end up testing not acupuncture per se but the skills of the therapist.
  • The placebo used for this study cannot possibly have fooled anyone into believing that it was real AA; volunteers were not even instructed to ‘stimulate’ the placebo devices. The difference to the ‘real thing’ must have been very clear to all involved. This means that the control for placebo-effects was woefully incomplete. In turn, this means that the observed outcomes are most likely due to residual bias.

In view of these concerns, allow me to re-phrase the authors’ conclusions:

THE RESULTS OF THIS POORLY-DESIGNED STUDY ARE DIFFICULT TO INTERPRET. MOST LIKELY THEY SHOW THAT ACUPUNCTURE IS NOT EFFECTIVE BUT MERELY WORKS THROUGH A PLACEBO-RESPONSE.

This meta-analysis was performed “to ascertain the effectiveness of oral aloe vera consumption on the reduction of fasting blood glucose (FBG) and hemoglobin A1c (HbA1c).”

PubMed, CINAHL, Natural Medicines Comprehensive Database, and Natural Standard databases were searched. The searches were limited to clinical trials or observational studies conducted in humans and published in English. Studies of aloe vera’s effect on FBG, HbA1c, homeostasis model assessment-estimated insulin resistance (HOMA-IR), fasting serum insulin, fructosamine, and oral glucose tolerance test (OGTT) in prediabetic and diabetic populations were examined.

Nine studies were included in the FBG parameter (n = 283); 5 of these studies included HbA1c data (n = 89). Aloe vera decreased FBG by 46.6 mg/dL (p < 0.0001) and HbA1c by 1.05% (p = 0.004). Significant reductions of both endpoints were maintained in all subgroup analyses. Additionally, the data suggested that patients with an FBG ≥200 mg/dL may see a greater benefit. A mean FBG reduction of 109.9 mg/dL was observed in this population (p ≤ 0.0001). There was evidence of publication bias with FBG but not with HbA1c.

The authors concluded that the results of this meta-analysis support the use of oral aloe vera for significantly reducing both FBG (46.6 mg/dL) and HbA1c (1.05%) in prediabetic and diabetic patients. However, given the current overall quality and relative scarcity of data, further clinical studies that are more robust and better controlled are warranted to confirm and further explore these findings.

Oh no, the results do not support the use of aloe vera at all!!

Why?

Because this ‘meta-analysis’ is of unacceptably poor quality. Here are just some of the flaws that render it totally useless, particularly for issuing advice such as above:

  • The authors included uncontrolled observational studies which make no attempt to control for non-specific effects.
  • In several studies, the use of concomitant anti-diabetic medications was allowed; therefore it is not possible to establish cause and effect by aloe vera.
  • The search strategy was woefully inadequate; for instance non-English publications were not considered.
  • There was no assessment of the scientific rigor of the included studies; this totally invalidates the reliably of the conclusions.
  • The included studies used preparations of widely different aloe vera preparations, and there is no way of knowing the does of the active ingredients.

Diabetes is a serious condition that affects millions worldwide. If some of these patients are sufficiently gullible to follow the conclusions of this paper, they might be dead within a matter of days. This makes this article one of the most dangerous papers that I have seen in the ‘peer-reviewed’ literature of alternative medicine.

Who publishes such utter and irresponsible rubbish?

You may well ask.

The journal has been discussed on this blog  before for the junk that regularly appears in its pages, and so has its editor in chief. The authors (and the reviewers) are not known to me, but one thing is for sure: they don’t know the first thing about conducting a decent systematic review/meta-analysis.

Acupuncture for hot flushes?

What next?

I know, to rational thinkers this sounds bizarre – but, actually, there are quite a few studies on the subject. Enough evidence for me to have published not one but four different systematic reviews on the subject.

The first (2009) concluded that “the evidence is not convincing to suggest acupuncture is an effective treatment of hot flash in patients with breast cancer. Further research is required to investigate whether there are specific effects of acupuncture for treating hot flash in patients with breast cancer.”

The second (also 2009) concluded that “sham-controlled RCTs fail to show specific effects of acupuncture for control of menopausal hot flushes. More rigorous research seems warranted.”

The third (again 2009) concluded that “the evidence is not convincing to suggest acupuncture is an effective treatment for hot flush in patients with prostate cancer. Further research is required to investigate whether acupuncture has hot-flush-specific effects.”

The fourth (2013), a Cochrane review, “found insufficient evidence to determine whether acupuncture is effective for controlling menopausal vasomotor symptoms. When we compared acupuncture with sham acupuncture, there was no evidence of a significant difference in their effect on menopausal vasomotor symptoms. When we compared acupuncture with no treatment there appeared to be a benefit from acupuncture, but acupuncture appeared to be less effective than HT. These findings should be treated with great caution as the evidence was low or very low quality and the studies comparing acupuncture versus no treatment or HT were not controlled with sham acupuncture or placebo HT. Data on adverse effects were lacking.”

And now, there is a new systematic review; its aim was to evaluate the effectiveness of acupuncture for treatment of hot flash in women with breast cancer. The searches identified 12 relevant articles for inclusion. The meta-analysis without any subgroup or moderator failed to show favorable effects of acupuncture on reducing the frequency of hot flashes after intervention (n = 680, SMD = − 0.478, 95 % CI −0.397 to 0.241, P = 0.632) but exhibited marked heterogeneity of the results (Q value = 83.200, P = 0.000, I^2 = 83.17, τ^2 = 0.310). The authors concluded that “the meta-analysis used had contradictory results and yielded no convincing evidence to suggest that acupuncture was an effective treatment of hot flash in patients with breast cancer. Multi-central studies including large sample size are required to investigate the efficiency of acupuncture for treating hot flash in patients with breast cancer.”

What follows from all this?

  • The collective evidence does NOT seem to suggest that acupuncture is a promising treatment for hot flushes of any aetiology.
  • The new paper is unimpressive, in my view. I don’t see the necessity for it, particularly as it fails to include a formal assessment of the methodological quality of the primary studies (contrary to what the authors state in the abstract) and because it merely includes articles published in English (with a therapy like acupuncture, such a strategy seems ridiculous, in my view).
  • I predict that future studies will suggest an effect – as long as they are designed such that they are open to bias.
  • Rigorous trials are likely to show an effect beyond placebo.
  • My own reviews typically state that MORE RESEARCH IS NEEDED. I regret such statements and would today no longer issue them.

The aim of a new meta-analysis was to estimate the clinical effectiveness and safety of acupuncture for amnestic mild cognitive impairment (AMCI), the transitional stage between the normal memory loss of aging and dementia. Randomised controlled trials (RCTs) of acupuncture versus medical treatment for AMCI were identified using six electronic databases.

Five RCTs involving a total of 568 subjects were included. The methodological quality of the RCTs was generally poor. Participants receiving acupuncture had better outcomes than those receiving nimodipine with greater clinical efficacy rates (odds ratio (OR) 1.78, 95% CI 1.19 to 2.65; p<0.01), mini-mental state examination (MMSE) scores (mean difference (MD) 0.99, 95% CI 0.71 to 1.28; p<0.01), and picture recognition score (MD 2.12, 95% CI 1.48 to 2.75; p<0.01). Acupuncture used in conjunction with nimodipine significantly improved MMSE scores (MD 1.09, 95% CI 0.29 to 1.89; p<0.01) compared to nimodipine alone. Three trials reported adverse events.

The authors concluded that acupuncture appears effective for AMCI when used as an alternative or adjunctive treatment; however, caution must be exercised given the low methodological quality of included trials. Further, more rigorously designed studies are needed.

Meta-analyses like this one are, in my view, perfect examples for the ‘rubbish in, rubbish out’ principle of systematic reviews. This may seem like an unfair statement, so let me justify it by explaining the shortfalls of this specific paper.

The authors try to tell us that their aim was “to estimate the clinical effectiveness and safety of acupuncture…” While it might be possible to estimate the effectiveness of a therapy by pooling the data of a few RCTs, it is never possible to estimate its safety on such a basis. To conduct an assessment of therapeutic safety, one would need sample sizes that go two or three dimensions beyond those of RCTs. Thus safety assessments are best done by evaluating the evidence from all the available evidence, including case-reports, epidemiological investigations and observational studies.

The authors tell us that “two studies did not report whether any adverse events or side effects had occurred in the experimental or control groups.” This is a common and serious flaw of many acupuncture trials, and another important reason why RCTs cannot be used for evaluating the risks of acupuncture. Too many such studies simply don’t mention adverse effects at all. If they are then submitted to systematic reviews, they must generate a false positive picture about the safety of acupuncture. The absence of adverse effects reporting is a serious breach of research ethics. In the realm of acupuncture, it is so common, that many reviewers do not even bother to discuss this violation of medical ethics as a major issue.

The authors conclude that acupuncture is more effective than nimodipine. This sounds impressive – unless you happen to know that nimodipine is not supported by good evidence either. A Cochrane review provided no convincing evidence that nimodipine is a useful treatment for the symptoms of dementia, either unclassified or according to the major subtypes – Alzheimer’s disease, vascular, or mixed Alzheimer’s and vascular dementia.

The authors also conclude that acupuncture used in conjunction with nimodipine is better than nimodipine alone. This too might sound impressive – unless you realise that all the RCTs in question failed to control for the effects of placebo and the added attention given to the patients. This means that the findings reported here are consistent with acupuncture itself being totally devoid of therapeutic effects.

The authors are quite open about the paucity of RCTs and their mostly dismal methodological quality. Yet they arrive at fairly definitive conclusions regarding the therapeutic value of acupuncture. This is, in my view, a serious mistake: on the basis of a few poorly designed and poorly reported RCTs, one should never arrive at even tentatively positive conclusion. Any decent journal would not have published such misleading phraseology, and it is noteworthy that the paper in question appeared in a journal that has a long history of being hopelessly biased in favour of acupuncture.

Any of the above-mentioned flaws could already be fatal, but I have kept the most serious one for last. All the 5 RCTs that were included in the analyses were conducted in China by Chinese researchers and published in Chinese journals. It has been shown repeatedly that such studies hardly ever report anything other than positive results; no matter what conditions is being investigated, acupuncture turns out to be effective in the hands of Chinese trialists. This means that the result of such a study is clear even before the first patient has been recruited. Little wonder then that virtually all reviews of such trials – and there are dozens of then – arrive at conclusions similar to those formulated in the paper before us.

As I already said: rubbish in, rubbish out!

This post is dedicated to Mel Koppelman.

Those who followed the recent discussions about acupuncture on this blog will probably know her; she is an acupuncturist who (thinks she) knows a lot about research because she has several higher qualifications (but was unable to show us any research published by herself). Mel seems very quick in lecturing others about research methodology. Yesterday, she posted this comment in relation to my previous post on a study of aromatherapy and reflexology:

Professor Ernst, This post affirms yet again a rather poor understanding of clinical trial methodology. A pragmatic trial such as this one with a wait-list control makes no attempt to look for specific effects. You say “it is quite simply wrong to assume that this outcome is specifically related to the two treatments.” Where have specific effects been tested or assumed in this study? Your statement in no way, shape or form negates the author’s conclusions that “aromatherapy massage and reflexology are simple and effective non-pharmacologic nursing interventions.” Effectiveness is not a measure of specific effects.

I am most grateful for this comment because it highlights an issue that I had wanted to address for some time: The meanings of the two terms ‘efficacy and effectiveness’ and their differences as seen by scientists and by alternative practitioners/researchers.

Let’s start with the definitions.

I often use the excellent book of Alan Earl-Slater entitled THE HANDBOOK OF CLINICAL TRIALS AND OTHER RESEARCH. In it, EFFICACY is defined as ‘the degree to which an intervention does what it is intended to do under ideal conditions. EFFECTIVENESS is the degree to which a treatment works under real life conditions. An EFFECTIVENESS TRIAL is a trial that ‘is said to approximate reality (i. e. clinical practice). It is sometimes called a pragmatic trial’. An EFFICACY TRIAL ‘is a clinical trial that is said to take place under ideal conditions.’

In other words, an efficacy trial investigates the question, ‘can the therapy work?’, and an effectiveness trial asks, ‘does this therapy work?’ In both cases, the question relate to the therapy per se and not to the plethora of phenomena which are not directly related to it. It seems logical that, where possible, the first question would need to be addressed before the second – it does make little sense to test for effectiveness, if efficacy has not been ascertained, and effectiveness without efficacy does not seem to be possible.

In my 2007 book entitled UNDERSTANDING RESEARCH IN COMPLEMENTARY AND ALTERNATIVE MEDICINE (written especially for alternative therapists like Mel), I adopted these definitions and added: “It is conceivable that a given therapy works only under optimal conditions but not in everyday practice. For instance, in clinical practice patients may not comply with a therapy because it causes adverse effects.” I should have added perhaps that adverse effects are by no means the only reason for non-compliance, and that non-compliance is not the only reason why an efficacious treatment might not be effective.

Most scientists would agree with the above definitions. In fact, I am not aware of a debate about them in scientific circles. But they are not something alternative practitioners tend to like. Why? Because, using these strict definitions, many alternative therapies are neither of proven efficacy nor effectiveness.

What can be done about this unfortunate situation?

Simple! Let’s re-formulate the definitions of efficacy and effectiveness!

Efficacy, according to some alternative medicine proponents, refers to the therapeutic effects of the therapy per se, in other words, its specific effects. (That coincides almost with the scientific definition of this term – except, of course, it fails to tell us anything about the boundary conditions [optimal or real-life conditions].)

Effectiveness, according to the advocates of alternative therapies, refers to its specific effects plus its non-specific effects. Some researchers have even introduced the term ‘real-life effectiveness’ for this.

This is why, the authors of the study discussed in my previous post, could conclude that “aromatherapy massage and reflexology are simple… effective… interventions… to help manage pain and fatigue in patients with rheumatoid arthritis.” Based on their data, neither aromatherapy nor reflexology has been shown to be effective. They might appear to be effective because patients expected to get better, or patients in the no-treatment control group felt worse for not getting the extra care. Based on studies of this nature, giving patients £10 or a box of chocolate might also turn out to be “simple… effective… interventions… to help manage pain and fatigue in patients with rheumatoid arthritis.” Based on these definitions of efficacy and effectiveness, there are hardly any limits to bogus claims for any old quackery.

Such obfuscation suits proponents of alternative therapies fine because, using such definitions, virtually every treatment anyone might ever think of can be shown to be effective! Wishful thinking, it seems, can fulfil almost any dream, it can even turn the truth upside down.

Or can anyone name an alternative treatment that cannot even generate a placebo response when administered with empathy, sympathy and care? Compared to doing nothing, virtually every ineffective therapy might generate outcomes that make the treatment look effective. Even the anticipation of an effect alone might do the trick. How often have you had a tooth-ache, went to the dentist, and discovered sitting in the waiting room that the pain had gone? Does that mean that sitting in a waiting room is an effective treatment for dental pain?

In fact, some enthusiasts of alternative medicine could soon begin to argue that, with their new definition of ‘effectiveness’, we no longer need controlled clinical trials at all, if we want to demonstrate how effective alternative therapies truly are. We can just do observational studies without a control group, note that lots of patients get better, and ‘Bob is your uncle’!!! This is much faster, saves money, time and effort, and has the undeniable advantage of never generating a negative result.

To most outsiders, all this might seem a bit like splitting hair. However, I fear that it is far from that. In fact, it turns out to be a fairly fundamental issue in almost any discussion about the value or otherwise of alternative medicine. And, I think, it is also a matter of principle that reaches far beyond alternative medicine: if we allow various interest groups, lobbyists, sects, cults etc. to use their own definitions of fundamentally important terms, any dialogue, understanding or progress becomes almost impossible.

While over on my post about the new NICE GUIDELINES on acupuncture for back pain, the acupuncturists’ assassination attempts of my character, competence, integrity and personality are in full swing, I have decided to employ my time more fruitfully and briefly comment on a new piece of acupuncture research.

This new Italian study was to determine the effectiveness of acupuncture for the management of hot flashes in women with breast cancer.

A total of 190 women with breast cancer were randomly assigned to two groups. Random assignment was performed with stratification for hormonal therapy; the allocation ratio was 1:1. Both groups received a booklet with information about climacteric syndrome and its management to be followed for at least 12 weeks. In addition, the acupuncture group received 10 traditional acupuncture treatment sessions involving needling of predefined acupoints.

The primary outcome was hot flash score at the end of treatment (week 12), calculated as the frequency multiplied by the average severity of hot flashes. The secondary outcomes were climacteric symptoms and quality of life, measured by the Greene Climacteric and Menopause Quality of Life scales. Health outcomes were measured for up to 6 months after treatment. Expectation and satisfaction of treatment effect and safety were also evaluated. We used intention-to-treat analyses.

Of the participants, 105 were randomly assigned to enhanced self-care and 85 to acupuncture plus enhanced self-care. Acupuncture plus enhanced self-care was associated with a significantly lower hot flash score than enhanced self-care at the end of treatment (P < .001) and at 3- and 6-month post-treatment follow-up visits (P = .0028 and .001, respectively). Acupuncture was also associated with fewer climacteric symptoms and higher quality of life in the vasomotor, physical, and psychosocial dimensions (P < .05).

The authors concluded that acupuncture in association with enhanced self-care is an effective integrative intervention for managing hot flashes and improving quality of life in women with breast cancer.

This hardly needs a comment, as I have been going on about this study design many times before: the ‘A+B versus B’ design can only produce positive findings. Any such study concluding that ‘acupuncture (or whatever other intervention) is effective’ can therefore not be a legitimate test of a hypothesis and ought to be categorised as pseudo-science. Sadly, this problem seems more the rule than the exception in the realm of acupuncture research. That’s a pity really… because, if there is potential in acupuncture at all, this sort of thing can only distract from it.

I think the JOURNAL OF CLINICAL ONCOLOGY, its editors and reviewers, should be ashamed of having published such misleading rubbish.

Reiki is one of the most popular types of ‘energy healing’. Reiki healers believe to be able to channel ‘healing energy’ into patients’ body thus enabling them to get healthy. If Reiki were not such a popular treatment, one could brush such claims aside and think “let the lunatic fringe believe what they want”. But as Reiki so effectively undermines consumers’ sense of reality and rationality, I feel I should continue informing the public about this subject – despite the fact that I have already reported about it several times before, for instance here, here, here, here, here and here.

A new RCT, published in a respected journal looks interesting enough for a further blog-post on the subject. The main aim of the study was to investigate the effectiveness of two psychotherapeutic approaches, cognitive behavioural therapy (CBT) and a complementary medicine method Reiki, in reducing depression scores in adolescents. The researchers from Canada, Malaysia and Australia recruited 188 adolescent depressed adolescents. They were randomly assigned to CBT, Reiki or wait-list. Depression scores were assessed before and after 12 weeks of treatments/wait list. CBT showed a significantly greater decrease in Child Depression Inventory (CDI) scores across treatment than both Reiki (p<.001) and the wait-list control (p<.001). Reiki also showed greater decreases in CDI scores across treatment relative to the wait-list control condition (p=.031).  Male participants showed a smaller treatment effects for Reiki than did female participants. The authors concluded that both CBT and Reiki were effective in reducing the symptoms of depression over the treatment period, with effect for CBT greater than Reiki.

I find it most disappointing that these days even respected journals publish such RCTs without the necessary critical input. This study may appear to be rigorous but, in fact, it is hardly worth the paper it was printed on.

The results show that Reiki produced worse results than CBT. That I can well believe!

However, the findings also suggest that Reiki was nevertheless “effective in reducing the symptoms of depression”, as the authors put it in their conclusions. This statement is misleading!

It is based on the comparison of Reiki with doing nothing. As Reiki involves lots of attention, it can be assumed to generate a sizable placebo effect. As a proportion of the patients in the wait list group are probably disappointed for not getting such attention, they can be assumed to experience the adverse effects of their disappointment. The two phenomena combined can easily explain the result without any “effectiveness” of Reiki per se.

If such considerations are not fully discussed and made amply clear even in the conclusions of the abstract, it seems reasonable to accuse the journal of being less than responsible and the authors of being outright misleading.

As with so many papers in this area, one has to ask: WHERE DOES SLOPPY RESEARCH END AND WHERE DOES SCIENTIFIC MISCONDUCT BEGIN?

Gravityscan Badge

Recent Comments

Note that comments can be edited for up to five minutes after they are first submitted.


Click here for a comprehensive list of recent comments.

Categories