Guest post by Norbert Aust
Edzard invited me to review a recent paper on homeopathic treatment of women with pre-menstrual syndrome which he recently discussed on this blog (PMS) . This is what I found:
With this study by Yakir et al. all requirements for low risk of bias are obviously met (blinding, randomisation, allocation concealment etc.) which would make it a high quality study. However, I would like to raise three concerns with increasing severity in that order:
- Publication history seems odd with a paper of today to report a trial from twenty years ago
- It is unclear what the reported data imply for the condition of the women, therefore, it is hardly clear what the changes before to after really imply.
- A deeper analysis of the data points in the direction of small study bias.
(1) Trial and publication history
- A first test of homeopathy for PMS was performed in the years 1992 to 1994 by the same team of researchers: 19 women finished the test in an outpatient clinic in Jerusalem/Israel before the trial was aborted for lack of funding.
- A second trial was performed between 1996 and 1999 where 96 women completed the test.
- The first trial of 19 women was published as pilot study in 2001 , well after data collection of the main study was completed.
- In 2001 or 2002 the main trial was reported at a conference “Future Directions and Current Issues of research in Homeopathy”, which took place in Freiburg/Germany in 2002. The report was included in the proceedings book .
- In the years 2015 to 2017 a Dutch team updated the original manuscript.
- Now, a full 20 years after data collection was completed and 18 years after data evaluation was done, the current paper was published in a peer reviewed journal.
This seems odd. The homeopathic community is desperately looking for high quality trials with positive outcomes and here is one that is left ignored for a very long time. And why the authors elected to publish the pilot study in 2001 when the main study was at least very near to completion is unclear. The authors claim that the study was part of the PhD thesis of the lead author in 2002, but do not explain why this paper was not then published in a peer reviewed journal.
Honestly, I do not know what to make of it.
(2) Implication of data
The main outcome measure was assessed by the modified Moos Menstrual Distress Questionnaire, a 37 item questionnaire which the women completed daily for five months, two before treatment for baseline data, three for follow up after treatment. Each item had five options rated 0 to 4. Data from the last 12 days before menses were used for evaluation. However, the authors are not very clear about their procedure:
“Premenstrual symptom (PM) scores were defined as the total scores of 37 pre-specified symptoms in the MDQ during the last 12 premenstrual days ... The mean scores of all women for the various MDQ parameters were calculated before and after treatment, separately in both intervention groups.”
So the authors averaged 24 and 36 questionnaires respectively with a total rating between 0 and 4 X 37 = 148 points. But the scores as reported range between 0 and 2.00. So what are they? And if we do not know what they represent, what does the difference before to after treatment indicate?
Besides the implausible magnitude of the scores, there is a question what they would indicate. Apparently the scores combine two dimensions, namely intensity and duration of the syndrome. The data included 12 days for each cycle. It is hard to believe that any woman would suffer that long from her PMS condition. So any score might indicate an individual duration either long with low intensity or the opposite, a short period of extreme intensity. The score does not offer any clue as to what happened – and a reduction might not indicate an improvement at all: if a long period of low intensity before treatment is modified to a very short period of intense symptoms afterwards the woman might not really consider it an improvement, even if the score may be smaller.
The authors could have used other measures that would be more illuminating – maximum rating per cycle or number of days with a score exceeding a certain threshold – then their data would be much more illuminating on what really happened.
By the way: there is no hint about the timely scatter of the score of the individual women, which would allow us to judge the developments. And the sample size calculation seems to be performed post hoc: It was not included in the conference paper  and yields exactly the number of patients included in the study.
(3) Data evaluation
Let us assume for now that the scores transport some real meaning and a reduced score is connected to an improved condition of the patient. Even then there are questions.
The authors report that the mean of this ominous PMS score was reduced for the verum group from 0.443 to 0.287, a reduction by 0.156 score points, whereas the placebo group saw an smaller improvement from 0.426 at baseline to 0.340, a reduction by 0.086 points only. This implies a solid improvement by 35 % with verum, while the control group achieved an improvement of 20 % only. The authors include an image with all the mean PMS-Scores for all the women arranged for verum and controls (Fig. 3 in ). The black squares represent the baseline data for each woman, the light squares give the score after treatment (Fig 3 from ).
Even at first glance, the data do not look that different. Even with placebo, some women show marked improvement, though more frequent with verum. However, more women under verum encounter aggravation than under placebo, but a few women of the control group had this more intensely. About the same number of women in both groups reached low final scores.
To put this into a more solid state I digitized the above image and did a short analysis of the PMS scores. Here is what I found:
|mean **||0 .43||0.34||0.09|
* Please note: The data given for the difference are quartiles and medians of the individual changes that occurred, not the changes of the quartiles and medians.
** The data for the mean values are given to check for accuracy which is good.
This table shows, that the inter-quartile portion of the women in both groups faired about the same with a slight advantage for placebo below median and for verum above. The difference in the mean values is due to the outliers, namely by a surplus 5 women in verum group who encountered a marked improvement under verum and a surplus of three women who suffered a marked deterioration under placebo.
Mere chance or not? Small study bias, where the impact of outliers dominate the outcome? Hard to say. This result differs greatly from the pilot study, where most of the women under verum achieved marked improvements and no deterioration occurred, while only minor improvement and some aggravations were recorded.
The quality of the study looks fine, but the result is not as solid as it seems. The positive outcome is affected by the development of about 10 % of the combined study population only. This might indicate some kind of small study bias and would require replication by a rigorous trial with an increased number of participants – preferably independent from the original team.
 Yakir M, Klein-Laansma CT, Kreitler S et al.: A Placebo-Controlled Double-Blind Randomized Trial with individualizes Homeopathic Treatment using a Symptom Cluster Approach in Women with Premenstrual Syndrome. Homeopathy, doi 10.1055/s-0039-1691834 https://www.ncbi.nlm.nih.gov/pubmed/31434111
 Yakir M, Kreitler S, Brzezinski A et al.: Effects of homeopathic treatment in women with premenstrual syndrome: a pilot study, British Homeopathic Journal (2001); 90: 148-153 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1014.5054&rep=rep1&type=pdf
 Yakir M, Kreitler S, Brzezinski A, Vithoulkas G, Bentwich Z: ‘Successful treatment of premenstrual syndrome by classical homeopathy’ in: Walach H.: Future directions and current issues of research in homeopathy’ (Conference proceedings), Freiburg/Germany 2002, S. 134-143