MD, PhD, FMedSci, FSB, FRCP, FRCPEd

scientific misconduct

This post has an odd title and addresses an odd subject. I am sure some people reading it will ask themselves “has he finally gone potty; is he a bit xenophobic, chauvinistic, or what?” I can assure you none of the above is the case.

Since many years, I have been asked to peer-review Chinese systematic reviews and meta-analyses of TCM-trials submitted to various journals and to the Cochrane Collaboration for publication, and I estimate that around 300 such articles are available today. Initially, I thought they were a valuable contribution to our knowledge, particularly for the many of us who cannot read Chinese languages. I hoped they might provide reliable information about this huge and potentially important section of the TCM-evidence. After doing this type of work for some time, I became more and more frustrated; now I have decided not to accept this task any longer – not because it is too much trouble, but because I have come to the conclusion that these articles are far less helpful than I had once assumed; in fact, I now fear that they are counter-productive.

In order to better understand what I mean, it might be best to use an example; this recent systematic review seems as good for that purpose as any.

Its Chinese authors “hypothesized that the eligible trials would provide evidence of the effect of Chinese herbs on bone mineral density (BMD) and the therapeutic benefits of Chinese medicine treatment in patients with bone loss. Randomized controlled trials (RCTs) were thus retrieved for a systematic review from Medline and 8 Chinese databases. The authors identified 12 RCTs involving a total of 1816 patients. The studies compared Chinese herbs with placebo or standard anti-osteoporotic therapy. The pooled data from these RCTs showed that the change of BMD in the spine was more pronounced with Chinese herbs compared to the effects noted with placebo. Also, in the femoral neck, Chinese herbs generated significantly higher increments of BMD compared to placebo. Compared to conventional anti-osteoporotic drugs, Chinese herbs generated greater BMD changes.

In their abstract, the part on the paper that most readers access, the authors reached the following conclusions: “Our results demonstrated that Chinese herb significantly increased lumbar spine BMD as compared to the placebo or other standard anti-osteoporotic drugs.” In the article itself, we find this more detailed conclusion: “We conclude that Chinese herbs substantially increased BMD of the lumbar spine compared to placebo or anti-osteoporotic drugs as indicated in the current clinical reports on osteoporosis treatment. Long term of Chinese herbs over 12 months of treatment duration may increase BMD in the hip more effectively. However, further studies are needed to corroborate the positive effect of increasing the duration of Chinese herbs on outcome as the results in this analysis are based on indirect comparisons. To date there are no studies available that compare Chinese herbs, Chinese herbs plus anti-osteoporotic drugs, and anti-osteoporotic drug versus placebo in a factorial design. Consequently, we are unable to draw any conclusions on the possible superiority of Chinese herbs plus anti-osteoporotic drug versus anti-osteoporotic drug or Chinese herb alone in the context of BMD.

Most readers will feel that this evidence is quite impressive and amazingly solid; they might therefore advocate routinely using Chinese herbs for the common and difficult to treat problem of osteoporosis. The integration of TCM might avoid lots of human suffering, prolong the life of many elderly patients, and save us all a lot of money. Why then am I not at all convinced?

The first thing to notice is the fact that we do not really know which of the ~7000 different Chinese herbs should be used. The article tells us surprisingly little about this crucial point. And even, if we manage to study this question in more depth, we are bound to get thoroughly confused; there are simply too many herbal mixtures and patent medicines to easily identify the most promising candidates.

The second and more important hurdle to making sense of these data is the fact that most of the primary studies originate from inaccessible Chinese journals and were published in Chinese languages which, of course, few people in the West can understand. This is entirely our fault, some might argue, but it does mean that we have to believe the authors, take their words at face value, and cannot check the original data. You may think this is fine, after all, the paper has gone through a rigorous peer-review process where it has been thoroughly checked by several top experts in the field. This, however, is a fallacy; like you and me, the peer-reviewers might not read Chinese either! (I don’t, and I reviewed quite a few of these papers; in some instances, I even asked for translations of the originals to do the job properly but this request was understandably turned down) In all likelihood, the above paper and most similar articles have not been properly peer-reviewed at all.

The third and perhaps most crucial point can only be fully appreciated, if we were able to access and understand the primary studies; it relates to the quality of the original RCTs summarised in such systematic reviews. The abstract of the present paper tells us nothing at all about this issue. In the paper, however, we do find a formal assessment of the studies’ risk of bias which shows that the quality of the included RCTs was poor to very poor. We also find a short but revealing sentence: “The reports of all trials mentioned randomization, but only seven described the method of randomization.” This remark is much more significant than it may seem: we have shown that such studies use such terminology in a rather adventurous way; reviewing about 2000 of these allegedly randomised trials, we found that many Chinese authors call a trial “randomised” even in the absence of a control group (one cannot randomise patients and have no control group)! They seem to like the term because it is fashionable and makes publication of their work easier. We thus have good reason to fear that some/many/most of the studies were not RCTs at all.

The fourth issue that needs mentioning is the fact that very close to 100% of all Chinese TCM-trials report positive findings. This means that either TCM is effective for every indication it is tested for (most unlikely, not least because there are many negative non-Chinese trials of TCM), or there is something very fundamentally wrong with Chinese research into TCM. Over the years, I have had several Chinese co-workers in my team and was invariably impressed by their ability to work hard and efficiently; we often discussed the possible reasons for the extraordinary phenomenon of 0% negative Chinese trials. The most plausible answer they offered was this: it would be most impolite for a Chinese researcher to produce findings which contradict the opinion of his/her peers.

In view of these concerns, can we trust the conclusions of such systematic reviews? I don’t think so – and this is why I have problems with research of this nature. If there are good reasons to doubt their conclusions, these reviews might misinform us systematically, they might not further but hinder progress, and they might send us up the garden path. This could well be in the commercial interest of the Chinese multi-billion dollar TCM-industry, but it would certainly not be in the interest of patients and good health care.

On January 27, 1945, the concentration camp in Auschwitz was liberated. By May of the same year, around 20 similar camps had been discovered. What they revealed is so shocking that it is difficult to put it in words.

Today, on ‘HOCOCAUST MEMORIAL DAY’, I quote (shortened and slightly modified) from articles I published many years ago (references can be found in the originals) to remind us of the unspeakable atrocities that occurred during the Nazi period and of the crucial role the German medical profession played in them.

The Nazi’s euthanasia programme, also known as ”Action T4″, started in specialized medicinal departments in 1939. Initially, it was aimed at children suffering from “idiocy, Down’s syndrome, hydrocephalus and other abnormalities”. By the end of 1939, the programme was extended to adults “unworthy of living.” We estimate that, when it was stopped, more than 70,000 patients had been killed.

Action T4 (named after its address: Tiergarten Strasse 4) was the Berlin headquarters of the euthanasia programme. It was run by approximately 50 physicians who, amongst other activities, sent questionnaires to (mostly psychiatric) hospitals urging them to return lists of patients for euthanasia. The victims were transported to specialized centers where they were gassed or poisoned. Action T4 was thus responsible for medically supervised, large-scale murder. Its true significance, however, lies elsewhere. Action T4 turned out to be nothing less than a “pilot project” for the extinction of millions of prisoners of the concentration camps.

The T4 units had developed the technology for killing on an industrial scale. It was only with this know-how that the total extinction of all Jews of the Reich could be planned. This truly monstrous task required medical expertise.

Almost without exception, those physicians who had worked for T4 went on to take charge of what the Nazis called the ‘Final Solution’. While action T4 had killed thousands, its offspring would murder millions under the trained instructions of Nazi doctors.

The medical profession’s role in these crimes was critical and essential. German physicians had been involved at all levels and stages. They had created and embraced the pseudo-science of race hygiene. They were instrumental in developing it further into applied racism. They had generated the know-how of mass extinction. Finally, they also performed outrageously cruel and criminal experiments under the guise of scientific inquiry [see below]. German doctors had thus betrayed all the ideals medicine had previously stood for, and had become involved in criminal activities unprecedented in the history of medicine (full details and references on all of this are provided in my article, see link above).

Alternative medicine

It is well-documented that alternative medicine was strongly supported by the Nazis. The general belief is that this had nothing to do with the sickening atrocities of this period. I believe that this assumption is not entirely correct. In 2001, I published an article which reviews the this subject; I take the liberty of borrowing from it here.

Based on a general movement in favour of all things natural, a powerful trend towards natural ways of healing had developed in the 19(th)century. By 1930, this had led to a situation in Germany where roughly as many lay-practitioners of alternative medicine as conventional doctors were in practice.This had led to considerable tensions between the two camps. To re-unify German medicine under the banner of ‘Neue Deutsche Heilkunde’ (New German Medicine), Nazi officials eventually decided to create  the profession of the ’Heilpraktiker‘ (healing practitioner). Heilpraktiker were not allowed to train students and their profession was thus meant to become extinct within one generation; Goebbels spoke of having created the cradle and the grave of the Heilpraktiker. However, after 1945, this decision was challenged in the courts and eventually over-turned – and this is why Heilpraktiker are still thriving today.

The ‘flag ship’ of the ‘Neue Deutsche Heilkunde’ was the ‘Rudolf Hess Krankenhaus‘ in Dresden (which was re-named into Gerhard Wagner Krankenhaus after Hess’ flight to the UK). It represented a full integration of alternative and orthodox medicine.

‘Research’

An example of systematic research into alternative medicine is the Nazi government’s project to validate homoeopathy. The data of this massive research programme are now lost (some speculate that homeopaths made them disappear) but, according to an eye-witness report, its results were entirely negative (full details and references on alt med in 3rd Reich are in the article cited above).

There is,of course, plenty of literature on the subject of Nazi ‘research’ (actually, it was pseudo-research) and the unspeakable crimes it entailed. By contrast, there is almost no published evidence that these activities included in any way alternative medicine, and the general opinion seems to be that there are no connections whatsoever. I fear that this notion might be erroneous.

As far as I can make out, no systematic study of the subject has so far been published, but I found several hints and indications that the criminal experiments of Nazi doctors also involved alternative medicine (the sources are provided in my articles cited above or in the links provided below). Here are but a few leads:

Dr Wagner, the chief medical officer of the Nazis was a dedicated and most active proponent of alternative medicine.

Doctors in the alternative “Rudolf Hess Krankenhaus” [see above] experimented on speeding up the recovery of wounded soldiers, on curing syphilis with fasting, and on various other projects to help the war effort.

The Dachau concentration camp housed the largest plantation of medicinal herbs in Germany.

Dr Madaus (founder of the still existing company for natural medicines by the same name) experimented on the sterilisation of humans with herbal and homeopathic remedies, a project that was deemed of great importance for controlling the predicted population growth in the East of the expanding Reich.

Dr Grawitz infected Dachau prisoners with various pathogens to test the effectiveness of homeopathic remedies.

Schuessler salts were also tested on concentration camp inmates.

So, why bring all of this up today? Is it not time that we let grass grow over these most disturbing events? I think not! For many years, I actively researched this area (you can find many of my articles on Medline) because I am convinced that the unprecedented horrors of Nazi medicine need to be told and re-told – not just on HOLOCAUST MEMORIAL DAY, but continually. This, I hope, will minimize the risk of such incredible abuses ever happening again.

As I am drafting this post, I am in a plane flying back from Finland. The in-flight meal reminded me of the fact that no food is so delicious that it cannot be spoilt by the addition of too many capers. In turn, this made me think about the paper I happened to be reading at the time, and I arrived at the following theory: no trial design is so rigorous that it cannot to be turned into something utterly nonsensical by the addition of a few amateur researchers.

The paper I was reading when this idea occurred to me was a randomised, triple-blind, placebo-controlled cross-over trial of homeopathy. Sounds rigorous and top quality? Yes, but wait!

Essentially, the authors recruited 86 volunteers who all claimed to be suffering from “mental fatigue” and treated them with Kali-Phos 6X or placebo for one week (X-potencies signify dilution steps of 1: 10, and 6X therefore means that the salt had been diluted 1: 1000000 ). Subsequently, the volunteers were crossed-over to receive the other treatment for one week.

The results failed to show that the homeopathic medication had any effect (not even homeopaths can be surprised about this!). The authors concluded that Kali-Phos was not effective but cautioned that, because of the possibility of a type-2-error, they might have missed an effect which, in truth, does exist.

In my view, this article provides an almost classic example of how time, money and other resources can be wasted in a pretence of conducting reasonable research. As we all know, clinical trials usually are for testing hypotheses. But what is the hypothesis tested here?

According to the authors, the aim was to “assess the effectiveness of Kali-Phos 6X for attention problems associated with mental fatigue”. In other words, their hyposesis was that this remedy is effective for treating the symptom of mental fatigue. This notion, I would claim, is not a scientific hypothesis, it is a foolish conjecture!

Arguably any hypothesis about the effectiveness of a highly diluted homeopathic remedy is mere wishful thinking. But, if there were at least some promissing data, some might conclude that a trial was justified. By way of justification for the RCT in question, the authors inform us that one previous trial had suggested an effect; however, this study did not employ just Kali-Phos but a combined homeopathic preparation which contained Kalium-Phos as one of several components. Thus the authors’ “hypothesis” does not even amount to a hunch, not even to a slight incling! To me, it is less than a shot in the dark fired by blind optimists - nobody should be surprised that the bullet failed to hit anything.

It could even be that the investigators themselves dimly realised that something is amiss with the basis of their study; this might be the reason why they called it an “exploratory trial”. But an exploratory study is one whithout a hypothesis, and the trial in question does have a hyposis of sorts – only that it is rubbish. And what exactly did the authos meant to explore anyway?

That self-reported mental fatigue in healthy volunteers is a condition that can be mediatised such that it merits treatment?

That the test they used for quantifying its severity is adequate?

That a homeopathic remedy with virtually no active ingredient generates outcomes which are different from placebo?

That Hahnemann’s teaching of homeopathy was nonsense and can thus be discarded (he would have sharply condemned the approach of treating all volunteers with the same remedy, as it contradicts many of his concepts)?

That funding bodies can be fooled to pay for even the most ridiculous trial?

That ethics-committees might pass applications which are pure nonsense and which are thus unethical?

A scientific hypothesis should be more than a vague hunch; at its simplest, it aims to explain an observation or phenomenon, and it ought to have certain features which many alt med researchers seem to have never heard of. If they test nonsense, the result can only be nonsense.

The issue of conducting research that does not make much sense is far from trivial, particularly as so much (I would say most) of alt med research is of such or even worst calibre (if you do not believe me, please go on Medline and see for yourself how many of the recent articles in the category “complementary alternative medicine” truly contribute to knowledge worth knowing). It would be easy therefore to cite more hypothesis-free trials of homeopathy.

One recent example from Germany will have to suffice: in this trial, the only justification for conducting a full-blown RCT was that the manufacturer of the remedy allegedly knew of a few unpublished case-reports which suggested the treatment to work – and, of course, the results of the RCT eventually showed that it didn’t. Anyone with a background in science might have predicied that outcome – which is why such trials are so deplorably wastefull.

Research-funds are increasingly scarce, and they must not be spent on nonsensical projects! The money and time should be invested more fruitfully elsewhere. Participants of clinical trials give their cooperation willingly; but if they learn that their efforts have been wasted unnecessarily, they might think twice next time they are asked. Thus nonsensical research may have knock-on effects with far-reaching consequences.

Being a researcher is at least as serious a profession as most other occupations; perhaps we should stop allowing total amateurs wasting money while playing at being professioal. If someone driving a car does something seriously wrong, we take away his licence; why is there not a similar mechanism for inadequate researchers, funders, ethics-committees which prevents them doing further damage?

At the very minimum, we should critically evaluate the hypothesis that the applicants for research-funds propose to test. Had someone done this properly in relatiom to the two above-named studies, we would have saved about £150,000 per trial (my estimate). But as it stands, the authors will probably claim that they have produced fascinating findings which urgently need further investigation – and we (normally you and I) will have to spend three times the above-named amount (again, my estimate) to finance a “definitive” trial. Nonsense, I am afraid, tends to beget more nonsense.

 

In my last post, we discussed the “A+B versus B” trial design as a tool to produce false positive results. This method is currently very popular in alternative medicine, yet it is by no means the only approach that can mislead us. Today, let’s look at other popular options with a view of protecting us against trialists who naively or willfully might fool us.

The crucial flaw of the “A+B versus B” design is that it fails to account for non-specific effects. If the patients in the experimental group experience better outcomes than the control group, this difference could well be due to effects that are unrelated to the experimental treatment. There are, of course, several further ways to ignore non-specific effects in clinical research. The simplest option is to include no control group at all. Homeopaths, for instance, are very proud of studies which show that ~70% of their patients experience benefit after taking their remedies. This type of result tends to impress journalists, politicians and other people who fail to realise that such a result might be due to a host of factors, e.g. the placebo-effect, the natural history of the disease, regression towards the mean or treatments which patients self-administered while taking the homeopathic remedies. It is therefore misleading to make causal inferences from such data.

Another easy method to generate false positive results is to omit blinding. The purpose of blinding the patient, the therapist and the evaluator of the outcomes in clinical trials is to make sure that expectation is not the cause of or contributor to the outcome. They say that expectation can move mountains; this might be an exaggeration, but it can certainly influence the result of a clinical trial. Patients who hope for a cure regularly do get better even if the therapy they receive is useless, and therapists as well as evaluators of the outcomes tend to view the results through rose-tinted spectacles, if they have preconceived ideas about the experimental treatment. Similarly, the parents of a child or the owners of an animal can transfer their expectations, and this is one of several reasons why it is incorrect to claim that children and animals are immune to placebo-effects.

Failure to randomise is another source of bias which can make an ineffective therapy look like an effective one when tested in a clinical trial. If we allow patients or trialists to select or choose which patients receive the experimental and which get the control-treatment, it is likely that the two groups differ in a number of variables. Some of these variables might, in turn, impact on the outcome. If, for instance, doctors allocate their patients to the experimental and control groups, they might select those who will respond to the former and those who don’t to the latter. This may not happen with malicious intent but through intuition or instinct: responsible health care professionals want those patients who, in their experience, have the best chances to benefit from a given treatment to receive that treatment. Only randomisation can, when done properly, make sure we are comparing comparable groups of patients, and non-randomisation is likely to produce misleading findings.

While these options for producing false positives are all too obvious, the next possibility is slightly more intriguing. It refers to studies which do not test whether an experimental treatment is superior to another one (often called superiority trials), but to investigations attempting to assess whether it is equivalent to a therapy that is generally accepted to be effective. The idea is that, if both treatments produce the same or similarly positive results, both must be effective. For instance, such a study might compare the effects of acupuncture to a common pain-killer. Such trials are aptly called non-superiority or equivalence trials, and they offer a wide range of possibilities for misleading us. If, for example, such a trial has not enough patients, it might show no difference where, in fact, there is one. Let’s consider a deliberately silly example: someone comes up with the idea to compare antibiotics to acupuncture as treatments of bacterial pneumonia in elderly patients. The researchers recruit 10 patients for each group, and the results reveal that, in one group, 2 patients died, while, in the other, the number was 3. The statistical tests show that the difference of just one patient is not statistically significant, and the authors therefore conclude that acupuncture is just as good for bacterial infections as antibiotics.

Even trickier is the option to under-dose the treatment given to the control group in an equivalence trial. In our hypothetical example, the investigators might subsequently recruit hundreds of patients in an attempt to overcome the criticism of their first study; they then decide to administer a sub-therapeutic dose of the antibiotic in the control group. The results would then apparently confirm the researchers’ initial finding, namely that acupuncture is as good as the antibiotic for pneumonia. Acupuncturists might then claim that their treatment has been proven in a very large randomised clinical trial to be effective for treating this condition, and people who do not happen to know the correct dose of the antibiotic could easily be fooled into believing them.

Obviously, the results would be more impressive, if the control group in an equivalence trial received a therapy which is not just ineffective but actually harmful. In such a scenario, the most useless or even slightly detrimental treatment would appear to be effective simply because it is equivalent to or less harmful than the comparator.

A variation of this theme is the plethora of controlled clinical trials which compare one unproven therapy to another unproven treatment. Perdicatbly, the results indicate that there is no difference in the clinical outcome experienced by the patients in the two groups. Enthusiastic researchers then tend to conclude that this proves both treatments to be equally effective.

Another option for creating misleadingly positive findings is to cherry-pick the results. Most trails have many outcome measures; for instance, a study of acupuncture for pain-control might quantify pain in half a dozen different ways, it might also measure the length of the treatment until pain has subsided, the amount of medication the patients took in addition to receiving acupuncture, the days off work because of pain, the partner’s impression of the patient’s health status, the quality of life of the patient, the frequency of sleep being disrupted by pain etc. If the researchers then evaluate all the results, they are likely to find that one or two of them have changed in the direction they wanted. This can well be a chance finding: with the typical statistical tests, one in 20 outcome measures would produce a significant result purely by chance. In order to mislead us, the researchers only need to “forget” about all the negative results and focus their publication on the ones which by chance have come out as they had hoped.

One fail-proof method for misleading the public is to draw conclusions which are not supported by the data. Imagine you have generated squarely negative data with a trial of homeopathy. As an enthusiast of homeopathy, you are far from happy with your own findings; in addition you might have a sponsor who puts pressure on you. What can you do? The solution is simple: you only need to highlight at least one positive message in the published article. In the case of homeopathy, you could, for instance, make a major issue about the fact that the treatment was remarkably safe and cheap: not a single patient died, most were very pleased with the treatment which was not even very expensive.

And finally, there is always the possibility of overt cheating. Researchers are only human and are thus not immune to temptation. They may have conflicts of interest or may know that positive results are much easier to publish than negative ones. Certainly they want to publish their work – “publish or perish”! So, faced with disappointing results of a study, they might decide to prettify them or even invent new ones which are more pleasing to them, their peers, or their sponsors.

Am I claiming that this sort of thing only happens in alternative medicine? No! Obviously, the way to minimise the risk of such misconduct is to train researchers properly and make sure they are able to think critically. Am I suggesting that investigators of alternative medicine are often not well-trained and almost always uncritical? Yes.

Since it was first published, the “Swiss government report” on homeopathy has been celebrated as the most convincing proof so far that homeopathy works. On the back of this news, all sorts of strange stories have emerged. Their aim seems to be that consumers become convinced that homeopathy is based on compelling evidence.

Readers of this blog might therefore benefit from a brief and critical evaluation of this ”evidence” in support of homeopathy. Recently, not one, two, three but four independent critiques of this document have become available.

Collectively, these articles [only one of which is mine] suggest that the “Swiss report” is hardly worth the paper it was written on; one of the critiques published in the Swiss Medical Weekly even stated that it amounted to ”research misconduct”! Compared to such outspoken language, my own paper concluded much more conservatively: “this report [is] methodologically flawed, inaccurate and biased”.

So what is wrong with it? Why is this document not an accurate summary of the existing evidence? I said this would be a brief post, so I  will only mention some of the most striking flaws.

The report is not, as often claimed, a product by the Swiss government; in fact, it was produced by 13 authors who have no connection to any government and who are known proponents of homeopathy. For some unimaginable reason, they decided to invent their very own criteria for what constitutes evidence. For instance, they included case-reports and case-series, re-defined what is meant by effectiveness, were highly selective in choosing the articles they happened to like [presumably because of the direction of the result] while omitting lots of data that did not seem to confirm their prior belief, and assessed only a very narrow range of indications.

The report quotes several of my own reviews of homeopathy but, intriguingly, it omitted others for no conceivable reason. I was baffled to realise that the authors reported my conclusions differently from the original published text in my articles. If this had occurred once or twice, it might have been a forgivable error - but this happened in 10 of 22 instances.

Negative conclusions in my original reviews were thus repeatedly turned into positive verdicts, and evidence against homeopathy suddenly appeared to support it. This is, of course, a serious problem: if someone is too busy to look up my original articles, she is very unlikely to notice this extraordinary attempt to cheat.

To me, this approach seems similar to that of an accountant who produces a balance sheet where debts appear as credits. It is a simple yet dishonest way to generate a positive result where there is none!

The final straw for me came when I realised that the authors of this dubious report had declared that they were free of conflicts of interest. This notion is demonstrably wrong; several of them earn their living through homeopathy!

Knowing all this, sceptics might take any future praise of this ”Swiss government report” with more than just a pinch of salt. Once we are aware of the full, embarrassing details, it is not difficult to understand how the final verdict turned out to be in favour of homeopathy: if we convert much of the negative data on any subject into positive evidence, any rubbish will come out smelling of roses – even homeopathy.

 

Archives