Highly diluted homeopathic remedies are pure placebos! This is what the best evidence clearly shows. Ergo they cannot be shown in a rigorous study to have effects that differ from placebo.  But now there is a study that seems to contradict this widely accepted conclusion.

Can someone please help me to understand what is going on?

In this double-blind, placebo-controlled RCT, 60 patients suffering from insomnia were treated either individualised homeopathy (IH) or placebo for 3 months. Patient-administered sleep diary and Insomnia Severity Index (ISI) were used the primary and secondary outcomes respectively, measured at baseline, and after 3 months.

Five patients dropped out (verum:2,control:3).Intention to treat sample (n=60) was analysed. Trial arms were comparable at baseline. In the verum group, except sleep diary item 3 (P= 0.371), rest of the outcomes improved significantly (all P < 0.01). In the control group, there were significant improvements in diary item 6 and ISI score (P < 0.01) and just significant improvement in item 5 (P= 0.018). Group differences were significant for items 4, 5 and 6(P < 0.01) and just significant (P= 0.014) for ISI score with moderate to large effect sizes; but non-significant (P > 0.01) for rest of the outcomes.

The authors concluded that in this double-blind, randomized, prospective, placebo-controlled, two parallel arms clinical trial conducted on 60 patients suffering from insomnia, there was statistically significant difference measured in sleep efficiency, total sleep time, time in bed, and ISI score in favour of homeopathy over placebo with moderate to large effect sizes. Group differences were non-significant for rest of the outcomes(i.e. latency to fall asleep, minutes awake in middle of night and minutes awake too early). Individualized homeopathy seemed to produce significantly better effect than placebo. Independent replications and adequately powered trials with enhanced methodological rigor are warranted.

I have studied this article in some detail; its methodology is nicely and fully described in the original paper. To my amazement, I cannot find a flaw that is worth mentioning. Sure, the sample was small, the treatment time short, the outcome measure subjective, the paper comes from a dubious journal, the authors have a clear conflict of interest, even though they deny it – but none of these limitations has the potential to conclusively explain the positive result.

In view of what I stated above and considering what the clinical evidence so far tells us, this is most puzzling.

A 2010 systematic review authored by proponents of homeopathy  included 4 RCTs comparing homeopathic medicines to placebo. All involved small patient numbers and were of low methodological quality. None demonstrated a statistically significant difference in outcomes between groups.

My own 2011 not Medline-listed review (Focus on Alternative and Complementary Therapies Volume 16(3) September 2011 195–199) included several additional studies. Here is its abstract:

The aim of this review was the critical evaluation of evidence for the effectiveness of homeopathy for insomnia and sleep-related disorders. A search of MEDLINE, AMED, CINAHL, EMBASE and Cochrane Central Register was conducted to find RCTs using any form of homeopathy for the treatment of insomnia or sleep-related disorders. Data were extracted according to pre-defined criteria; risk of bias was assessed using Cochrane criteria. Six randomised, placebo-controlled trials met the inclusion criteria. Two studies used individualised homeopathy, and four used standardised homeopathic treatment. All studies had significant flaws; small sample size was the most prevalent limitation. The results of one study suggested that homeopathic remedies were superior to placebo; however, five trials found no significant differences between homeopathy and placebo for any of the main outcomes. Evidence from RCTs does not show homeopathy to be an effective treatment for insomnia and sleep-related disorders.

It follows that the new trial contradicts previously published evidence. In addition, it clearly lacks plausibility, as the remedies used were highly diluted and therefore should be pure placebos. So, what could be the explanation of the new, positive result?

As far as I can see, there are the following possibilities:

  • fraud,
  • coincidence,
  • some undetected/undisclosed bias,
  • homeopathy works after all.

I would be most grateful, if someone could help solving this puzzle for me (if needed, I can send you the full text of the new article for assessment).

99 Responses to A new study of homeopathy suggests that highly diluted remedies are better than placebos (and I cannot fault it)

  • In my amateur view, and I am happy to be corrected, this is easily explained purely by the sample size.

    With a ‘normal’ bell curve, the results from a sample of 55 would be expected to be repeatable, with 95% confidence, at +/- 27%. If you apply this range to probable outcomes it is very difficult to say that the results are statistically significant.

    I have worked in statistics in marketing all my (long) life, where sample sizes are actually significant and results are cross referenced to predictions. This is one field where statistical theory in modelling predictability has been thoroughly tested. I understand why medical statistics are sometimes bound by small samples inherently, however there is no need to use such minuscule samples size for such a widespread condition, except to hope for a randomly ‘positive’ result, which would imply bias.

    I can only hope this is scientifically published, and repeated in order for the statistics to be… statistical.

    • thanks
      I am, however, not sure that this is entirely correct.
      yes, a too small sample size renders any result less reliable, but it would more likely work the other way: it should hide a group difference where one exists, rather than producing one where none exists.

      • My understanding is not that it would hide or show a group difference, but that the probability of the observed results being repeatable is reduced.

        The smaller the sample, the more it is just an exercise in maths, and the less likely any real-world predictions can be made of it – just ask the Polling companies that, even when unbiased, using a sample of 1,000 (+/-6.3%), are still wildly inaccurate.

        • thanks; I am most thankful for being corrected. the reason I did assume what I assumed was this: In statistical hypothesis testing, a type II error, often caused by too small sample sizes, is the failure to reject a false null hypothesis. More simply stated, a type II error is to falsely infer the absence of something that is present.

      • I am not a statistician but from my readings, yes YOU ARE MISTAKEN.

        Francis (above) is a much better authority but if you have a small n-size and a noisy measurement we can assume there is a lot randomness in the results. It can go either way.

        From your description the study looks methodologically good but it would need a much larger sample size plus replications to give us more confidence in the results.

        As Cohen (1990) says “Less is more except of course for sample size”.

      • I’m afraid I haven’t read the full paper, only the abstract, as I don’t have free access to the journal and I don’t really want to pay for it. The abstract doesn’t really tell us anything much about how the study was conducted, what statistical analysis was performed, what comparisons were actually made, so we aren’t really in any position to judge whether the authors’ conclusions were justified.

        With such a small sample size, differences may well have been due to chance, particularly as so many different measures were used (a total of seven – the six diary items plus the ISI), giving multiple opportunities to get a statistically significant result.

        Both groups showed statistically significant improvement over time in most of the measures used. This is to be expected, of course, since most things improve with time, and in any case the insomniac subjects would have sought medical help at a time when their problem was worse than usual. Also we don’t know when the study was performed – in Southern India the tropical climate means that it is easier to sleep at some times of year than others.

        It seems that at least some of the people commenting have seen more of the paper than just the abstract, as it seems that the verum group was atypical on some measures at baseline compared to the end of the trial and also when compared to the placebo group. If that is the case then the differences could all be explained by regression to the mean.

        With regard to the effects of sample size, smaller samples would certainly be more liable to show an apparent difference where none exists (you are more likely to throw all sixes with four dice than with eight). On the other hand, this should be reflected in wider confidence intervals.

        The abstract tells us that for some of the diary items (time to fall asleep, minutes awake in the middle of the night, minutes awake too early) there was no significant difference between the groups and for others (total hours in bed, total hours asleep and sleep efficiency) there was. It is interesting that the first three, showing no difference, seem to be continuous variables (i.e. number of minutes), whereas the last three, showing a difference between the groups, seem to be categorical variables (six hours, seven hours, eight hours; effeciency category A, category B… or whatever). OK, I am inferring this from a not very detailed abstract, but if this is how they were recorded, then the statistical analysis needs to be different between items 1,2,3 and items 4,5,6. We have no information at all concerning how they actually were analysed, or indeed whether the P-values quoted relate to appropriate statistical tests.

        I would certainly concur with the conclusion at the end of the abstract:
        “Rigorous trials and independent replications are warranted”

  • Homeopathic medicines are not diluted as stated here and elsewhere, they are potentised in a unique process not employed in any other form of manufacture as far as I am aware. That is why they are not inert.

    • true, this is why some have referred to them as magic shaken water [the notion that potentisation does anything is a myth]

    • If you could detail this potentising process, Nick. And also explain how it works and what might happen if the potentiation were carried out incorrectly, and what would constitute incorrect potentisation. We need to be clear on this.


    • @ Nick Biggins

      I responded recently before to your (unevidenced) assertion that potentizing is something special. We know about succussion perfectly well. You apparently could not see anything hilarious in the videos linked to, or comprehend why the process is self-evidently ridiculous, or you’d not be back now with the same message.

      Let me spell it out for you. Since Hahnemann himself couldn’t figure out how many shakes and at what force were required for optimal succussion, here we are, >150 years later, with the professional manufacturers of homeopathic medicines all clearly differing in the methods they use to put in the magic. And here we are, >150 years later, without a shred of evidence that succussed fluids differ in any detectable way from unpotentized fluids.

      You said “they are potentised in a unique process not employed in any other form of manufacture as far as I am aware”. Well, this 3 minute video is not about homeopathy, but it demonstrates a ‘potentizer’ that “takes the bad stuff” out of foods. So homeopathy is not unique in its insanity. Indeed, this potentizer has taken a lot more ingenuity in its set-up than mere shaking/banging about of fluids in bottles.

      • “And here we are, >150 years later, without a shred of evidence that succussed fluids differ in any detectable way from unpotentized fluids”

        This is blatantly false in regard to research, please update your software:

        • perhaps you did not read this article to the end: “… although many studies do suggest it, the residual presence of the initial ingredient in the ultramolecular dilutions remains to be proved, as the question of contamination cannot be ignored”

          • Perhaps it is extremely disingenuous to conflate the standard disclaimer “remain to be proven etc” with absence of facts.

            Facts such as, there is in vitro evidence of the effect of homeopathic dilutions, and Darwin himself had to accept that something he thought was quackery, wasn’t.
            “The reader will best realize this degree of dilution by remembering that 5,000 ounces would more than fill a thirty-one gallon cask [barrel]; and that to this large body of water one grain of the salt was added; only half a drachm, or thirty minims, of the solution being poured over a leaf. Yet this amount sufficed to cause the inflection of almost every tentacle, and often the blade of the leaf. … My results were for a long time incredible, even to myself, and I anxiously sought for every source of error. … The observations were repeated during several years. Two of my sons, who were as incredulous as myself, compared several lots of leaves simultaneously immersed in the weaker solutions and in water, and declared that there could be no doubt about the difference in their appearance. … In fact every time that we perceive an odor, we have evidence that infinitely smaller particles act on our nerves”

            Now, as there are in vitro effetcs, there must be some kind of agent left in the dilutions, and the silicate nanostructures are a step into the comprehension of what’s happening with homeopathy

          • @Victor Nickel

            Please provide an example of convincing evidence to support your contention that “there is in vitro evidence of the effect of homeopathic dilutions”. Everything I’ve seen so far is experimentally incompetent or done at concentrations of substances much higher than the 30C dilutions so widely sold by homeopathic sources.

            Your Darwin example falls into the latter category: half a drachm (886 mg) in 5000 oz (142 L) is ~6 mg/L (6 µg/mL), around homeopathic 2–3C (assuming 1C = 10 g/L or 10 mg/mL). That’s not evidence for homeopathic magic!

            Remember, Hahnemann reckoned potentization by succussion had no limits and that 200C was even more potent than 30C. Please provide your evidence for effects of homeopathic dilutions in vitro.

          • @Frank Odds

            Indeed, and Darwin established that there was an orthodox dose-response relationship, with the effects becoming weaker and more difficult to observe as the concentration decreased, and a limit beyond which no effect was observed (at a level corresponding to something like 7X, apparently).

        • @Victor Nickel

          Sorry, but this review by Demangeat is mere nanobabble, uncritically accepting the results of dubious experiments* done with insoluble metals. As Edzard has already noted, the authors state (forced by a competent peer reviewer?) “the residual presence of the initial ingredient [i.e. the stuff that’s supposed to be the medicine] in the ultramolecular dilutions remains to be proved”.

          This doesn’t stop the author from barreling on “But if this was definitely established, the notion of “Memory of water” would definitively lapse. Homeopathy would then be reduced to microdose pharmacology, exhibiting hormetic-type responses which justifies the simile therapeutic principle [129], and would be included in the present panoply of nanomedicine.”

          But hormesis is by no means a universal phenomenon, it’s seen at easily measurable concentrations of substances that do exhibit the phenomenon and it’s gone long before the high dilutions that represent the highest potency in homeopathy. The author seems to think that microdoses and nanomedicine are the same thing, thus falling into the trap of wanting to sound ‘sciency’ while not possessing enough scientific acumen to understand that a 30C ‘medicine’ is well below any ‘nano-‘ level: it’s even far lower than the level for which there’s any SI prefix (‘yocto-‘ — 10^–24).

          *For example, the paper by Chikramane et al., Langmuir (2012) 13; 15864-75, about which Dana Ullman so often salivates on this blog, involves successive dilution and succussion of colloidal gold particles. They left each dilution to stand for 1 hour [!] then sampled the top layer and middle layer for gold particles. That they found gold particles in the top layers is about as remarkable as the discovery that the Pope holds Roman Catholic beliefs. Even worse, they let each dilution stand for 45 minutes then took off a sample from the top layer for the next dilution step: surely that’s not how homeopathic dilutions are normally done?. Demangeat simply accepts this risible experimentation at face value.

    • That is what himeopathists say. There is zero credible evidence that dilution and twerking has any objective effect though.

      The most likely explanation in this study is the obvious: fraud.

  • The paper states:

    General management: All the participants were encouraged to develop good sleep hygiene and habits such as not using bed for anything except sleep, maintaining regular sleep timings, avoiding behaviors such as napping after 3:00 pm, caffeine after lunchtime etc. which may interfere with sleep physiology.

    Yet I can’t see consideration of how this advice influenced participants. I think it would have needed a baseline measure of whether this advice affected those in the verum and placebo groups. Without knowing that, surely we can’t assume it was the homeopathy rather than the sleep advice? It could be the two groups were not similar at baseline after all.

    • they did mention baseline measurements, I think. and any advice given to all patients should affect both groups in the same way.

      • Possibly, but if they did not measure the effects of the advice, then we don’t know whether some were more amenable to the advice that others. Does the paper say whether the baseline measurements were taken before or after the sleep advice was gives? It could be the study was actually measuring the effects of two separate treatments.

  • Until its replicated along with more objective outcomes (fMRI) I would consider it a statistical anomaly.

  • Well, we all know that about 5% of all studies are simply false positive. Maybe that’s one of them?

  • The trial registration is here. The brief description states:

    Insomnia is the most common sleep-related complaint with a prevalence of 6-18% in the general population. In South India, 18.6% of respondents reported insomnia. Chronic insomnia, if untreated, can have social, economic and occupational impacts on the individual. Insomnia is associated with impaired day-time functioning, reduced Quality of Life, increased risk of morbidity and substantial societal cost. There are multiple placebo controlled trials with results supporting the efficacy of homoeopathic medication in insomnia; still, one systematic review recommended that future trials of homeopathy and insomnia be conducted using adequate and rigorous study designs. In this trial, the investigators intend to assess the efficacy of individualized homoeopathic treatment for insomnia on 60 patients in a double blind, randomized, parallel arm, placebo controlled design in the outpatients of National Institute of Homoeopathy, Salt Lake, Kolkata 700106, West Bengal, India. The patients will be prescribed either individualized homoeopathic medicines or identical placebo, and will be followed up for 3 months. Data will be gathered at baseline and after 3 months using sleep diary and insomnia severity index questionnaire. Randomization will be pharmacy-controlled. Code will be broken at the end of the trial after the database is frozen. The Intention to treat (ITT) population will be subjected to statistical analysis.

    • “The patients will be prescribed either individualized homoeopathic medicines or identical placebo,”

      An amusingly frank admission in there; identical indeed.

      • yes, I noticed that too – they mean ‘identically looking’, of course, but ended up amusing us a little – so their study was good for something!

  • Let us put the findings in some perspective focussing on the results claimed significant.

    First is item 4 „Hours spent in Bed“. IH-group started with 6.6 hours at baseline and improved by 0.4 hours to 7.0 hours in bed. This gain for sure is significantly bigger than in the control-group, but in the end is still worse than with the control (baseline 7.6, end 7.4 hours). If long hours in bed are considered positive, then the gain after three months in the active group did not even reach total average. Not much of an achievement I would say.

    Item 5 “Total sleep time” increased with the control group from baseline 2.5 hours to 3.4 hours, which is not much better than the 3.3 hours achieved under control after starting with 3.1. Again, the gain is bigger under treatment but in the end the patients are roughly at the same status, difference amounts to 6 minutes in sleeping time. (And we did not even discuss how exact the assessment of the sleeping hours could have been, apparently the patients writing their estimate to their diary in the morning).

    Item 6 “Sleep efficiency”: I do not know what exactly is behind this item. I would assume some rating of the patient’s subjective satisfaction with his sleep. If so, then this might be connected with the improvement of sleeping time. Patients with an increase of nearly one hour in sleep may well tend to be more satisfied and feel more relieved than people with a gain of a mere 12 minutes.

    My conclusion: The success of the homeopathic remedies might well be negligible. All that is gained is what the treatment group was under par at baseline – and not even reaching this in every item.

    And this may have come about with the “sleeping hygiene” advice.

    Only this and nothing more.

    • if it were the advice, should this not have benefitted both groups? in this case, there would be no inter-group differences.

      • But there were from the beginning. In Item 4 and Item 5 there were differences in baseline data. 6.6 vs. 7.6 hours and 2.5 vs. 3.1 hours respectively. These differences might not have been significant to start with. As would be 7.0 vs. 7.4 and 3.4 vs. 33 hours respectively in the end. But the trick is to rate the differences in gain. Then a few minutes gain in relative magnitude – and eventually become significant.

        The advice may or may not have been of some impact. And if it had, then I would doubt if it would be the same in both groups down to the minute. Remember: The treatment group was (a little) more off the mark than the controllgroup.

      • My reading of Norbert’s extract (which also sounds like a pseudo medicine 🙂 ) is that one group started at 3.1hrs total sleep, and the other at 2.5hrs total sleep.

        If this is the case, the advice surely would have benefitted both groups as you say, and they achieved equivalent total sleep time as a result.

        The fact that one increased more is only due to the imbalance at the start.

        I apologise, unreservedly, for not reading the repot yet due to time pressure.
        Maybe I need some sleep…

    • I think you are right. If for “Group differences were significant for items 4, 5 and 6(P < 0.01)" they actually compared the sizes of the within group changes (substantial) rather than comparing the actual outcomes the a) they have compared the wrong thing to be meaningful or clinically significant and b) they have probably done a lot more statistical tests than they corrected for by using p < 0.01 to control for multiple comparisons. It looks like theu did this, because the between group actual outcome differences for 4 & 5 quoted by Dr Aust are less significant than the baseline differences.
      "Sleep efficiency" is defined as hours asleep divided by hours in bed, so it is a direct correlate of the increase in item 5.
      I think time in bed is a mpre reliable measure than actual sleep time, which is why its iincrease is positive despite more time lying awake in bed being an apparent negative. In fact both groups report less time lying in bed awake.
      I’d be fascinated to read the full study, although I don’t remember enough statistics to analyze it properly.

  • Even if we can’t find an error, we KNOW that there is one. When in 2011 in the OPERA experiment scientists found neutrinos propagating faster than light, the scientific community was not very much enthusiastic (the media was). Scientists KNEW that there were some errors either in the experiments or in the interpretation. They were not even eager to know what would be the error. There could be MANY errors in an experiment with probablilities much-much higher that that of the faster than light propagation (and in our case of the homeopathic explanation).

    A year later it was reported that a loose cable caused the anomalous results, but this had not caused big news. It was obvious that there is an error. See more here:

    The proof of homeopathy will not come from sporadic clinical trials. It should come from basic science experinments in a was that different kind of experiments all show to a common direction. In that way even a very small effect could be proven. In homeopathy no such convergent small proves can be found. In fact the sporadic positive results diverge. The do not point to a certain direction. This is exactly the pattern we expect in case of noise. This is pure anomaly hunting.

    This is why we say that it is not worth to push these kind of trials. If one is very interested in proving homeopathy, he should put together existing evidence and show that the anomalous results there all show into a common direction AND that it relates to the rules of homeopathy. Noone could do that so far and this experiment is not adding to this work either.

  • It is always fun to watch the SPIN that goes on here, though 3 cheer for Edzard in reminding and in re-reminding people here that this was a randomized double-blind and placebo controlled trial.

    It is also fun and funny to notice how some of you prefer to use the “implausibility” ruse to dismiss good scientific evidence. It is NO LONGER accurate to say that homeopathy is “implausible.” The ONLY people who can say that are people who are ill-informed about recent evidence about how homeopathic medicines may work OR people who are simply being mistruthful. That gambit is done now. People who use the “implausibility” of homeopathy are simply waving a red flag to say that they are either ignorant or lying (which is it?).

    And for the record, there have been numerous trials on homeopathy and insomnia, including some with objective measures…and the 2 below are just 2 of several:

    A double-blind randomized and placebo controlled trial was conducted with 30 patients with primary insomnia, in accordance with DSM-IV TR (2000) criterion 307.42 Primary Insomnia (Naudé, Marcelline, Couchman, et al., 2010). The measurement tools used were a Sleep Diary (SD) and the Sleep Impairment Index (SII).
    After an initial consultation, 2 follow-up consultations at 2-week intervals took place. Homeopathic medication was prescribed at the first and second consultations. The SII was completed at each consultation and participants were instructed at the first consultation to start the SD.
    Sleep Diary data revealed that verum treatment resulted in a significant increase in duration of sleep throughout the study, compared to the placebo treatment which resulted in no significant increase in duration of sleep. A significant improvement in SII summary scores and number of improved individual questions was found in the verum group, responses to all 11 questions having improved significantly upon completion of the study. An initial improvement occurred in the placebo group, but was not sustained. Comparison of results between the groups revealed a statistically significant difference.
    The researchers concluded that the homeopathic simillimum treatment of primary insomnia was effective when compared to placebo. Homeopathy is a viable treatment modality for this condition and further research is justified.

    A single-blind and double-blind study was conducted for a month with 54 young adults of both sexes (ages 18-31) with above-average scores on standardized personality scales for either cynical hostility or anxiety sensitivity (but not both) and a history of coffee-induced insomnia (Bell, Howerter, Jackson, 2010). At-home polysomnographic recordings were obtained on successive pairs of nights once per week for a total of eight recordings (nights 1, 2, 8, 9, 15, 16, 22, 23). All subjects received placebo pellets on night #8 (single-blind) and verum pellets on night #22 (double-blind) in 30C doses of one of two homeopathic remedies, Nux Vomica or Coffea Cruda. Subjects completed daily morning sleep diaries and weekly Pittsburgh sleep quality index scales, as well as profile of mood states scales at bedtime on polysomnography nights.
    The study found that those patients who received either of the homeopathic medicines had significantly increased PSG total sleep time and NREM, as well as awakenings and stage changes. Changes in actigraphic and self-rated scale effects were not significant.

    Bell IR, Howerter A, Jackson N, Aickin M, Baldwin CM, Bootzin RR. Effects of homeopathic medicines on polysomnographic sleep of young adults with histories of coffee-related insomnia. Sleep Med. 12(2011):505-511.

    Naudé DF, Marcelline I, Couchman S, and Maharaj A. Chronic primary insomnia: Efficacy of homeopathic simillimum. Homeopathy. Volume 99, Issue 1, January 2010, 63-68.

    • “And for the record, there have been numerous trials on homeopathy and insomnia, including some with objective measures…”
      you should make a habit of reading before writing; I cite 2 systematic reviews of RCTs in the post.

    • Dana

      Watching you do this is like watching a one-legged man trying to participate in an arse-kicking contest.

      So. All this supposedly overwhelming evidence from eight years ago concerning how great homeopathy is in treating sleep disorders. Seemingly now backed up. Any explanations as to why it hasn’t taken over as the prime treatment for this common problem?

      And how about the lack of internal validity in the second trial? Oh.. because homeopathy needs to be individualised apart from when it doesn’t? Again?

      And that’s before we start on the risible numbers in these exercises in noise-torturing,

    • Dana, would you care to respond to my assertion that a sample size that small, also present in your cited tests, is too small to make any statistically valid conclusions?

      I would also be fascinated to read your view of my assertion that the differences between the 2 samples at baseline account for all the results, as opposed to treatment?

      Lies, Damn Lies…

      • As we all know, it is much more difficult to achieve statistical significance in small trials except when the differences between the treatment and the control group are TRULY significant.

        The fact that there were small differences between the base statistics between the treatment and control group shows that the researchers were NOT fudging their data…and the bottomline is that these differences were NOT significant.

        You folks try to damn homeopathic research if the study is well-done or not…and you then forget that the instruction provided does NOT pose ANY significance to the trial because this instruction was provided to BOTH groups…and then you use the “implausibility” gambit. And then, there’s always the ad homs, which you have a black belt…and if not an ad hom, you feign outrage.

        Your tricks are transparent…and fun to watch.

        • I agree that the baseline differences imply the data was not fudged,.

          To say that the differences were not significant is simply wrong, one total sleep time was 24% higher than the other.

          And to ignore this difference when interpreting the results pure snake oil marketing.

          • I cannot imagine that researchers who fudge their data are not clever enough to make sure they appear real.
            the trial was not designed to test for significance of baseline difference; this means that tests of statistical significance are not legit here. or am I mistaken [again]?

          • You are correct Edzard, again.

            The increase in sleep time is indeed significant, however the resulting total sleep time difference is not significant.

            It would appear very selective use of statistics has been applied here, something statisticians are unfortunately prone to when biased.

            My summary is that the results show no significant difference between the groups when accounting for baseline differences is included, and when sample size is considered the non-result is also meaningless in the real world.

            Lies, Damn Lies…

          • I wholeheartedly agree with Francis.
            The bottom line also appears to be that after treatment the two group’s differences were less than before treatment. That means the study allows the null hypothesis.

        • I would downright contradict your statement, that to achieve significance is more difficult in small sample sizes than in big ones. See here:

          • That is a fascinating and very helpful article Norbert, thank you, as someone that has to occasionally use reason to against the ‘loony zealots’.

          • Thanks for the link to this study! The present topic above falls neatly into these reported results. The law of small numbers and publication bias is all that is necessary to consider here. It would have been nice to see a power analysis done to confirm what effect size the study was capable of reliably detecting.

    • “Randomized double-blind and placebo controlled trial” you say. Well if you know what you are talking about, you would have discovered that the so called ‘Gold Standard’ for clinical trials is not always what it seems. Dig deep and you often uncover conflicts of interest, poorly conducted/skewed methodology, or a biased ‘agenda’ .. do I need to go on?

  • I have a few possibilities.

    Multiple endpoints. They have a lot of endpoints. That doesn’t invalidate the results, but I find myself wondering if they had even more endpoints they haven’t shared.

    There was a study in 1994 of homeopathy for diarrhea. IIRC, the treatment arm got counseling/individualized remedies but the placebo group did not. But that would be a gross error and you said there weren’t any of those.

    Blinding slip. It’s not hard to do. The person who packages the remedies might have been around doing other work for example.

    • I think their choice of p< 0.01 is a response to multiple endpoints, so if the 7 endpoints are all the endpoints they actually looked at, that would be reasonable, but it is possible they did different statistical analyses, giving more comparisons – e.g. actual value of result 4 in Treatment vs Placebo groups as well as change in value of result 4 in Treatment vs Placebo groups. In this case the first is appears not significant while the second is torturing the data until it confesses.

  • Oh no! many wise opinions regarding the effectiveness – or otherwise – of homeopathy. Let’s get real here …. all subjects participating in a clinical trial are consciously aware they are participating in a trial (stating the obvious) and will NOT then respond consciously or sub-consciously as they might in their normal, everyday lives. It’s called UHB (Unpredictable Human Behaviour). We know that placebo sugar pills prescribed to a selected group of patients will produce a statistically significant result – this is not opinion, this is recorded fact. In any event there are two specific influences going on in the quoted trial and other similar trials: 1) the homeopathic ‘formulation’ itself – i.e. how is it prepared, and 2) the influence or ‘energy’ being created by the practitioner/deliverer that is possibly skewing the outcome in some way. As we know, homeopathy is often used with positive outcomes on pet animals – so no placebo effect should be possible – yet similar results have been recorded. This forum contains many opinions, however it is ONLY the real, measurable benefit experienced by the subject themselves that matters at the end of the day … regardless of the treatment. Hope this view doesn’t offend any of you academics!

    • wrong on so many levels!

      • Edzard, you are a very lazy responder to comments you disagree with. You said “wrong on so many levels” which is a silly soundbite and an example of your throw-away irrational expressions that reflect someone with a limited intellect. “Wrong on so many levels” adds nothing useful to the debate – are you really that dumb?

        • If it helps, I believe Edzard provided several of the levels you ask for in his response to my request for evidence from you, which you have not even replied to.

          I will restate my question to you more precisely, ‘Helen’, please can you provide peer reviewed evidence that supports your assertion about homeopathy having positive outcomes for animals, I found none, but plenty to the contrary.

          • Francis I am pleased to try and help you with your troubled mind. Firstly, I assume you have at your fingertips every peer reviewed published trial, observational study and medical paper in every language since ancient times – has anyone counted them? There are many millions I suspect. Just because you have not found any ‘evidence’ does not mean it doesn’t exist – come on this is basic stuff! And don’t forget, one person’s evidence is often another person’s bogus assertion… agreed? In any event, evidence is not always PROOF of something is it? I don’t need to prove anything to anyone here. I am a ‘skilled detective’ amongst other talents and some of the papers I have uncovered are far too explosive to disclose on this very juvenile blog (nothing personal).

          • Sadly not ‘Helen’, although your estimate of many millions of scientific papers rather overstates the importance of this subject.

            It is a shame you are unable to share a single one of them, even more so that you feel we are too delicate to review your explosive findings.

            As a ‘skilled detective’, who’s namesake has previously supported Multi-Level Marketing (a.k.a. the illegal practice of pyramid selling), may I suggest your colours have been shown, and we shall agree to disagree.

            Or should I just say “Yes, your Royal Highness”.

            Good luck in your world, I will continue to seek out the facts.

          • ‘Helen’ said:

            some of the papers I have uncovered are far too explosive to disclose on this very juvenile blog

            The Nobel Committees can be contacted at:

            The Nobel Foundation
            P.O. Box 5232, SE-102 45 Stockholm, Sweden
            Street address: Sturegatan 14, Stockholm
            Tel. +46 (0)8 663 09 20
            Fax +46 (0)8 660 38 47

        • “‘energy’ being created by the practitioner/deliverer that is possibly skewing the outcome”
          homeopathy is often used with positive outcomes on pet animals
          no placebo effect should be possible
          it is ONLY the real, measurable benefit experienced by the subject themselves that matters at the end of the day
          Hope this view doesn’t offend any of you academics!

    • While not an academic I recognise a duck when I see one Helen. Your assertion that ‘As we know, homeopathy is often used with positive outcomes on pet animals’ troubles me, as I for one do not know that, and my search for peer reviewed evidence has surprisingly found nothing!

      • Francis, your closed mind attitude is very strange. If you are not an academic as you say, then what exactly are you? Very sorry I have caused you to be “troubled”! Will you be able to sleep tonight I wonder? AS you may or may not know, Peer Reviewed published trials are regularly challenged and found to be flawed when subjected to truly impartial and objective scrutiny – don’t you know that? Do you automatically accept that all Peer Reviewed published clinical trials and studies are absolute evidence without question? Move on everyone, nothing to see here.

        • I believe the process you are referring to as regular challenges are peer reviews are they not.

          Like most of the commenters on this blog I seek to know the truth, not to dogmatically assert it. Possibly this is a tactic you should try. I note that you still did not answer my request, should I assume there is no such evidence supporting your claims?

      • Francis, you said: “who’s namesake has previously supported Multi-Level Marketing (a.k.a. the illegal practice of pyramid selling), may I suggest your colours have been shown, and we shall agree to disagree”. … what are you saying? My namesake? You are scraping the barrel with that remark. If you found another Helen Murray and assumed she was me, then you are rather dumb I would say. As it happens I don’t “support MLM” and have no strong views on it either way – although you are deluded in your assertion that MLM is illegal because pyramid selling no longer exists and has been illegal now for many years didn’t you know? You talk in riddles and most of it is irrelevant to the main debate. If anyone wishes to study the ‘evidence’ I have uncovered that I choose not to publish on this blog, then you can request an invitation to attend one of our clinics. We will show you all the published data we have, but only after you have completed a sanity test.

    • Hi Helen

      You didn’t respond the first time so I’ll ask again: would you like to say whether or not you have previously commented here using a different name?



  • Hi Lenny, you guys must be really bored – wanting to know if I have used a different name? There are bigger issues you may want to focus on I would have thought? Do you attend the annual Rupert Bear reunion bash by any chance?

    • I think this is a YES

    • Helen

      You have used two different names with the same email address. Can I suggest you acquaint yourself with the rules of this blog, which state:

      Please use the same name you’ve used before when commenting — it doesn’t have to be your real name, but it helps others follow the discussions.

      I hope you do not have a problem with this simple level of transparency.

      • I wouldn’t get too worried. Sandra’s brand of cloth-headed pseudointellectual foolishness combined with high-handed pearl-clutching is so distinctive she can call herself whatever she wants on each post she makes and we’d still recognise her.

        • Lenny, you are making no sense. “we’d still recognise her” is a meaningless and very juvenile comment. Who do you think I really am? Just say what you mean by that comment. You seem to have far too much free time for pursuing this dead-end issue. Good luck with your other endeavours whatever they might be.

        • Really? No ‘skilled detective’ with ‘explosive research’, how dull.

          • Hello Francis, I’m sure you are aware that only ‘little people’ with a low level of intellect need to have the last word – it gives them some kind of warped satisfaction. You must be a very sad person. I expect a reply as you have this desperate need to continue a meaningless conversation as others are doing here. You know nothing about my real identity and I am now bored with you so I hope you enjoy talking to yourself, I have moved on.

      • Dear Admin, you seem to insist on pursuing this very minor problem that you have but I don’t. Did you know that two or more people can use the same email address? It isn’t illegal. Hope that helps.

        • ‘Helen’

          No one has alleged any illegality whatsoever so I’m not sure why you chose to bring that up. Transparency, openness and integrity may seem like ‘minor problems’ to you, but I have pointed you to the rules of this website.

      • word******[email protected] by any chance?

  • Edzard has very kindly emailed me a copy of the full paper.

    First of all, can I say that I don’t find all the name-calling in the most recent posts either very interesting or enlightening, and it makes it harder to find posts that are actually saying something.

    Secondly, there are some very good points that have been made by Norbert Aust and Christine Rose, both of whom I suspect have better training in statistics than I have. There are also some valid points made by Francis, though there are some differences between clinical trials and marketing in terms of what statistical tests tell us.

    Gabor Hrasko finds parallels with an experiment a few years ago which appeared to show neutrinos travelling faster than light before a more mundane explanation was found. Personally I don’t find this very helpful, since assuming that we know the result of the trial without looking at the data is taking a similar position to the homeopathy camp (though I agree that extraordinary claims require extraordinary evidence).

    Helen Murray does not appear to have understood what this discussion is about, and Dana Ullman is spouting his or her usual nonsense.

    Coming back to the trial, I think, really, that it all boils down to sample size. There is a brief discussion of it in section 2.7 where the authors begin by estimating the minimal required size of the trial. They are off to a bad start, as they don’t seem to know what size of effect they are looking for (normally you might have some idea from previous trials or from clinical experience what sort of treatment effect is likely to be found or actually useful in practice). As a sort of cop-out they have used a method (Cohen?) which supplies numbers for “small”, “medium” and “large” effects which can then be put into the calculation of sample size. A small effect here is something that would not be apparent simply to the naked eye (such as trying to work out whether there is a difference in height between two similar groups). A large effect would be reasonably obvious (continuing the height example, maybe comparing Norwegians and Italians), whereas a medium effect is somewhere in between. Another way of looking at it is that a small effect would probably not be clinically useful whereas a medium effect might be.

    Having done their sample size calculation, they say that the minimum sample size required to have an 80% chance of detecting a “medium” effect at a significance level of 90% is 128. This is clearly a lot more than the 60 which they used, so this can really only be regarded as a pilot study to investigate the feasibility of a larger trial (they conclude that it would be feasible, which I am sure is true, and they explain that they were limited as they had to complete the study within a year, probably because that was the period allocated for the thesis, though they use some nonsense about ethical considerations to justify not continuing for longer).

    In order to do this sort of calculation, you need to estimate the mean and standard deviation that you expect to find in your data. However, I suspect that they actually did the calculation afterwards using the actual data they had collected, which isn’t really valid. Indeed, they state at the end of the power discussion that: “post hoc power analysis revealed a power compromise up to 60.4%.” I have been unable to find out whether “power compromise” is a recognised statistical term, and if so, what it means. I am more confident in saying that any post-hoc power analysis adds nothing to the data, and its only usefulness is in designing future trials.

    So… We know that their trial was underpowered to find any effect of clinically useful magnitude, and similarly it may well find effects that aren’t really there (type I errors).

    Actually there is a clue from the baseline data, where we find that there are quite large differences between the groups in some of the measures used (hours in bed 6.6 vs 7.6, hours asleep 2.5 vs. 3.1) suggesting that there is already an important random element here.

    So – the improvements within each group were probably due to regression to the mean (though there may well have been an additional placebo effect). Any other changes seen were explicable by chance.

    Actually, there is more that we don’t know about the data, such as whether or not it followed a normal distribution – this is assumed by the statistical tests that they used, and invalidated if there were outliers.

    So my conclusion is an underpowered trial and a flawed statistical analysis.

    • Thank you Dr Julian.

      I have found (most of) this discussion really interesting. It is specifically very helpful for my learning to find that better and more formally trained statisticians agree with my rather shallower understanding, albeit so much more eloquently.

      All those years in junk mail were not wasted after all.

      I will continue to follow Edzard’s blog with interest, although I will try to only comment if I have something of value to offer.

      • Thank-you, but I don’t know that I am a better statistician. I was always good at maths, and I had to take a statistics course as part of my training to be an oncologist (along with radiobiology and radiation physics, as well as the more obvious subjects such as tumour pathology). I have found it useful when reading scientific papers in medical journals, particularly since the authors often don’t have a terribly good grasp of statistics themselves. Statistics, of course, is all about probability and the behaviour of numbers, both areas where most people’s intuition is very misleading.

        Funnily enough, the two subjects I studied at school which I have found most useful are Latin (only up to O-level, unfortunately) and physics (A-level).

  • “I’m sure you are aware that only ‘little people’ with a low level of intellect need to have the last word – it gives them some kind of warped satisfaction.”. Imagine that, it would be similar to someone on Twitter blocking someone that disagrees with them, but then stalking their timeline to criticise them without a chance or recourse.

  • I think the outcome measure is not only subjective, but not even reliable in any way. Researchers from Freiburg University, Germany, stated this:

    “The majority of patients suffering from severe insomnia sleep around 80 percent of their normal quota in the sleep laboratory,” said Dr. Bernd Feige, research group leader at the Department of Psychiatry and Psychotherapy at the Freiburg University Medical Centre. Scientists have been searching for the reason for this discrepancy between subjective perception and objectively measurable sleep duration for about 20 years.”

    For their study, the Freiburg researchers invited 27 volunteers with severe sleep disorders and 27 healthy sleepers to the sleep laboratory. During the first two nights, the volunteers got used to the environment. During the two subsequent nights, the researchers woke the test persons up with a signal tone from the REM phase.

    As soon as they woke up, the study participants pressed a button and a study assistant interviewed them in the darkened room. The first question was, “Did you just sleep or were you awake?

    The astonishing result: “Although all test persons were woken from dream sleep, every sixth test person with sleep problems was sure to have been awake,” says Dr. Feige. Healthy volunteers almost never thought they were awake.” – What might this tell us about the reliability of the outcomes delivered by the patients themselves?

    It should also be borne in mind that insomnia is classified in many organic and non-organic causal relationships. ICD-10 alone provides a number of classification options.

    The Freiburg researchers also point out that patients whose sleep problems are caused by dysfunctional habits, incorrect information and poor sleep hygiene often only need one consultation session. Well…

    It may be that the solution to the puzzle lies quite simply on this factual level.

    • Often in hospital patients will complain of having hardly slept all night when the nurses have seen them snoring away for most of the time. I know this is anecdote but it does concur with the findings of the Freiburg study that you mention.

  • I would suggest that the prior probability (not easily quantifiable) also be taken into account (not easy, too) when interpreting the results of any “single” trial alone. We may not always have the privilege to such knowledge, but with homeopathy, after some hundreds of years, we do and it’s close to zero (I’m open to debate about that), if I want to be rather optimistic, that is. P < 0.01 isn't quite good for homeopathy, I would really start bothering below 0.001, although I have personally settled to zero prior probability, so until a homeopathy trial with a negative p comes up, homeopathy trials are an exercise in futility…

    Of course I know it’s not wise to discard results that have been gathered with trouble and effort, but because statistics never give you zero (unless one is bold enough to set the prior probability to absolute zero), the tiny chance of making a Type II error (and oftentimes misunderstanding of how statistics works and how it should be interpreted) has kept homeopathy around within the scientific realm for more than two centuries now.

  • “In this double-blind, placebo-controlled RCT””, I don’t trust a word of it. I see nine dubious Indian purported academics behind one anglican name, and my spidey-senses are tingling.

    Homeopathy is a *fraud*, and much of Indian medicine is quackery.

  • I just read about this study in the German “Natur med Depesche” and independently from your article looked at the paper itself some days ago between consultation hours.

    I am wondering whether a primary endpoint can be so ill-defined as it was here. In conventional medicine, you have eg. “progression free survival”, “major adverse cardiac event, such as…”, but a primary endpoint to my knowledge is never “quality of sleep, duration of sleep, duration of wake phases” together and so on. I would have advised investigators to focus on one measurement of outcome as the primary endpoint (such as difference in ISI-score which is hardly significant at the suggested level of significance) and use others as secondary endpoints. I don’t even know if I would see the primary endpoint “sleep diary” as fulfilled as only half of the sub-endpoints seemed to be significant. Shouldn’t more than half sub-endpoints yield significance?

    When talking about significance: calculation of sample size was conducted at an alpha of 0.05. The study didn’t give a *** about this calculation regarding sample size as mentioned in other comments. And additionally, level of significance was set at 0.01 despite calculations at 0.05. Would this be reasonable?
    Also, I am not sure, whether p-values of 0.014 and 0.018 would yield “just significant results” if you put the cut-off at 0.01.

    Maybe this is useful input? If not, I still thank you for your time!

    • thanks – most useful!

    • Henrik,

      I didn’t read that part of the paper very thoroughly, but you are quite right. Apart from not being particularly useful clinically, this sort of subgroup analysis is completely meaningless statistically as if you analyse enough factors some of them are bound to be significant regardless of any real effect.

      Richard Peto explained this very well in an interview with Jim Al-Khalili on the BBS’s “Life Scientific” radio programme (which is available to download as a podcast). He had just submitted a paper to The Lancet reporting the outcome of a study examining the effects on mortality of giving aspirin as secondary prevention following acute cardiac events (he found quite a big improvement in survival in the aspirin group). The Editor of The Lancet said that he would only publish the paper if Peto and his fellow authors included a subgroup analysis (essentially they wanted to know whether there were any people in whom aspirin might make a greater or lesser difference than the population as a whole). This might seem a reasonable question, but Peto knew that any such analysis would find spurious correlations that were nevertheless statistically significant. The Editor insisted, so he resubmitted the paper, breaking down the outcomes by astrological star sign. He found that patients born under Pisces and Gemini (or whatever it was) were twice as likely to benefit from aspirin, but for those born under Aries and Libra it made no difference.

      Lancet: “What is this nonsense?”
      Peto: “You wanted a subgroup analysis”
      Lancet: “But we all know that these can’t be real effects.”
      Peto: “Exactly. But that is what you get with subgroup analysis. It would have been equally meaningless if I had broken it down by age, sex or pre-existing conditions”

      The editor took the point and accepted the paper in its original form. Peto was happy, because he knew that clinicians might base their practice on any subgroup results and that people would die as a result. A lot of people, since acute coronary events are very common.

      This anecdote, and the homeopathy paper that we are discussing, illustrate how widespread the problem is of clincal studies which do not involve a statistician at the design phase, or are conducted and written up by clinicians who have not taken (or not understood) a course in medical statistics. Unfortunately very often neither have the doctors reading about them.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe via email

Enter your email address to receive notifications of new blog posts by email.

Recent Comments

Note that comments can be edited for up to five minutes after they are first submitted but you must tick the box: “Save my name, email, and website in this browser for the next time I comment.”

The most recent comments from all posts can be seen here.