MD, PhD, MAE, FMedSci, FRSB, FRCP, FRCPEd.

In my last post, we discussed the “A+B versus B” trial design as a tool to produce false positive results. This method is currently very popular in alternative medicine, yet it is by no means the only approach that can mislead us. Today, let’s look at other popular options with a view of protecting us against trialists who naively or willfully might fool us.

The crucial flaw of the “A+B versus B” design is that it fails to account for non-specific effects. If the patients in the experimental group experience better outcomes than the control group, this difference could well be due to effects that are unrelated to the experimental treatment. There are, of course, several further ways to ignore non-specific effects in clinical research. The simplest option is to include no control group at all. Homeopaths, for instance, are very proud of studies which show that ~70% of their patients experience benefit after taking their remedies. This type of result tends to impress journalists, politicians and other people who fail to realise that such a result might be due to a host of factors, e.g. the placebo-effect, the natural history of the disease, regression towards the mean or treatments which patients self-administered while taking the homeopathic remedies. It is therefore misleading to make causal inferences from such data.

Another easy method to generate false positive results is to omit blinding. The purpose of blinding the patient, the therapist and the evaluator of the outcomes in clinical trials is to make sure that expectation is not the cause of or contributor to the outcome. They say that expectation can move mountains; this might be an exaggeration, but it can certainly influence the result of a clinical trial. Patients who hope for a cure regularly do get better even if the therapy they receive is useless, and therapists as well as evaluators of the outcomes tend to view the results through rose-tinted spectacles, if they have preconceived ideas about the experimental treatment. Similarly, the parents of a child or the owners of an animal can transfer their expectations, and this is one of several reasons why it is incorrect to claim that children and animals are immune to placebo-effects.

Failure to randomise is another source of bias which can make an ineffective therapy look like an effective one when tested in a clinical trial. If we allow patients or trialists to select or choose which patients receive the experimental and which get the control-treatment, it is likely that the two groups differ in a number of variables. Some of these variables might, in turn, impact on the outcome. If, for instance, doctors allocate their patients to the experimental and control groups, they might select those who will respond to the former and those who don’t to the latter. This may not happen with malicious intent but through intuition or instinct: responsible health care professionals want those patients who, in their experience, have the best chances to benefit from a given treatment to receive that treatment. Only randomisation can, when done properly, make sure we are comparing comparable groups of patients, and non-randomisation is likely to produce misleading findings.

While these options for producing false positives are all too obvious, the next possibility is slightly more intriguing. It refers to studies which do not test whether an experimental treatment is superior to another one (often called superiority trials), but to investigations attempting to assess whether it is equivalent to a therapy that is generally accepted to be effective. The idea is that, if both treatments produce the same or similarly positive results, both must be effective. For instance, such a study might compare the effects of acupuncture to a common pain-killer. Such trials are aptly called non-superiority or equivalence trials, and they offer a wide range of possibilities for misleading us. If, for example, such a trial has not enough patients, it might show no difference where, in fact, there is one. Let’s consider a deliberately silly example: someone comes up with the idea to compare antibiotics to acupuncture as treatments of bacterial pneumonia in elderly patients. The researchers recruit 10 patients for each group, and the results reveal that, in one group, 2 patients died, while, in the other, the number was 3. The statistical tests show that the difference of just one patient is not statistically significant, and the authors therefore conclude that acupuncture is just as good for bacterial infections as antibiotics.

Even trickier is the option to under-dose the treatment given to the control group in an equivalence trial. In our hypothetical example, the investigators might subsequently recruit hundreds of patients in an attempt to overcome the criticism of their first study; they then decide to administer a sub-therapeutic dose of the antibiotic in the control group. The results would then apparently confirm the researchers’ initial finding, namely that acupuncture is as good as the antibiotic for pneumonia. Acupuncturists might then claim that their treatment has been proven in a very large randomised clinical trial to be effective for treating this condition, and people who do not happen to know the correct dose of the antibiotic could easily be fooled into believing them.

Obviously, the results would be more impressive, if the control group in an equivalence trial received a therapy which is not just ineffective but actually harmful. In such a scenario, the most useless or even slightly detrimental treatment would appear to be effective simply because it is equivalent to or less harmful than the comparator.

A variation of this theme is the plethora of controlled clinical trials which compare one unproven therapy to another unproven treatment. Perdicatbly, the results indicate that there is no difference in the clinical outcome experienced by the patients in the two groups. Enthusiastic researchers then tend to conclude that this proves both treatments to be equally effective.

Another option for creating misleadingly positive findings is to cherry-pick the results. Most trails have many outcome measures; for instance, a study of acupuncture for pain-control might quantify pain in half a dozen different ways, it might also measure the length of the treatment until pain has subsided, the amount of medication the patients took in addition to receiving acupuncture, the days off work because of pain, the partner’s impression of the patient’s health status, the quality of life of the patient, the frequency of sleep being disrupted by pain etc. If the researchers then evaluate all the results, they are likely to find that one or two of them have changed in the direction they wanted. This can well be a chance finding: with the typical statistical tests, one in 20 outcome measures would produce a significant result purely by chance. In order to mislead us, the researchers only need to “forget” about all the negative results and focus their publication on the ones which by chance have come out as they had hoped.

One fail-proof method for misleading the public is to draw conclusions which are not supported by the data. Imagine you have generated squarely negative data with a trial of homeopathy. As an enthusiast of homeopathy, you are far from happy with your own findings; in addition you might have a sponsor who puts pressure on you. What can you do? The solution is simple: you only need to highlight at least one positive message in the published article. In the case of homeopathy, you could, for instance, make a major issue about the fact that the treatment was remarkably safe and cheap: not a single patient died, most were very pleased with the treatment which was not even very expensive.

And finally, there is always the possibility of overt cheating. Researchers are only human and are thus not immune to temptation. They may have conflicts of interest or may know that positive results are much easier to publish than negative ones. Certainly they want to publish their work – “publish or perish”! So, faced with disappointing results of a study, they might decide to prettify them or even invent new ones which are more pleasing to them, their peers, or their sponsors.

Am I claiming that this sort of thing only happens in alternative medicine? No! Obviously, the way to minimise the risk of such misconduct is to train researchers properly and make sure they are able to think critically. Am I suggesting that investigators of alternative medicine are often not well-trained and almost always uncritical? Yes.

14 Responses to How to fool people with clinical trials

  • Very good entry, except for one point: The study you notes:

    Homeopathic treatment for chronic disease: a 6-year, university-hospital outpatient observational study.

    It is an observational study , you can find more references in articles or books or manuals biostatistics medical statistics.

    Criticism makes this study is poor, as did Alan Hess critical to study Bracho, et.al., on leptospirosis arguing that the study had no placebo group when the research design is clearly says <>.

  • m: i am not sure what you are trying to say.

  • M is saying you have described all the ruses used in conventioanl medical research as wellas whit you postulate in CAM research- but of course you cannot comment on this response as you are merely emeritus prof in CAM, old friend.

  • “Am I claiming that this sort of thing is only happenenig is only happening in alternative medicine? No!” learn to read Andrew.

  • There is still another way to cheat the trusting observer: beautify your results by using relative risc aor odds ration as outconme measure.

    Imagine this setting: You have a placebo group of 234 patients out of which 24 recover (10.3 %). The active drug group has 228 patients in it out of which 39 recover (17.1 %). This is a significant result as p = 0.03 by doing an independence test.

    How to rate this?

    Note: I am have a degree in mechanical engineering with 25+ years of standing in R&D and Quality management. In this branch of industry you would come to the conclusion that only 6.8 % of the patients (17.1 – 10.3) got some benefit for their money. 10.3 % would have recovered anyway and nearly 83 % (100 – 17.1) did not recover at all. About nineteen out of twenty patients threw there money out of the window. Anybody would claim this useless. If you would present this statistic to your and claim this a success you may well be out of a job or short one customer, depending on your position in the organisation.

    Anybody would claim this useless. That is, anybody outside the medical profession. To my very great astonishment they manage to modify this failure of a test to be a success story without being accused of downright cheating. You just compute relative reisc of healing and you get RR = 1.66. Wow. Verum group fared 66 % percent better than placebo. Good news, this one here.

    Imagine another set of data:

    Verum group has 20 patients out of which 13 react positive (65 %), placebo has 19 patients out of which 6 ‘react’ positive (32 %). Okay, in this setting only two out of three patients would fllush their money down the drain, which sure is a much better situation than above, but I would say still somewhat unsatisfying. But OR is 4.33, that is the odds for the verum group is by 433 % better than in placebo. Double wow. That is the spirit I like.

    As I said before, I am a retired engineer and took to looking into homeopathic trials as a kind of a pasttime after I found my wife spent a lot of money on useless homeopathic treatment. If you are new in the field you would not believe researchers could get away with this. But they can and claim it to be science, see the source below. These beautificators are not even referenced in my comprehensive textbooks of engineering statistics.

    Do I claim this to be a characteristic of homeopathic trials alone? No.
    Do I fear this to happen in medical science in general? Yes
    Do I think I understand now, how it can be, that a drug that in advertising is claimed to be clinically tested does not show any effect with me? Yes.

    (1) Ferley, JP, et al. ‘A controlled evaluation of a homeopathic preparation in he treatment of influenza like syndromes’, Br. J. Clin. Pharm. (1989), 27, 329-335

    (2) Jacobs, J, et al. ‘Treatment of acute childhood diarrhea with homeopathic medicine: A randomized clinical trial nin Nicaragua’, Pediatrics (1994),; 93, 719-725

  • Nice to hear from you- it is difficult helping people who come to surgeries wanting to be helped and respected and to have the best available EBM, especially when there is no evidence pertinent to them and their condition. Doing no harm is a noble action indeed. Fortunately ‘no EBM’ means doctors and allied registered health professionals can still vocationally act in the best interests of their patients…….unlike ‘no FT – no comment’.

  • It is a mistake to assume that randomisation fully controls for selection bias, since a selection process still occurs in recruiting for the trial in the first place.

    First of all, only those patients who feel the intervention (whether it be pharmacological or non-pharmacological) will significantly benefit them will choose to participate. (I feel this probably has a greater effect on non-pharma trials).

    Secondly, selection biases can also enter the equation, in trials which focus on “severe” cases, or syndrome based diagnoses, eg chronic pain, or cancer-related fatigue (as per the recent blog post) for example, patients can be arbitrarily excluded for having “insufficient” symptoms.

    Secondly, it has been suggested that pre-publishing protocols leads to less cherry picking, but I’ve seen papers which have been published with changes to virtually every measure of their protocol, with very low thresholds of change, changes in the scales used and cherry picking of secondary measures. Presumably because their data was much less impressive than expected. Surprisingly of these papers are still published in journals like The Lancet in spite of this.

  • Sure, randomisation and other measures only minimise bias, they do not always eliminate bias completely. As soon as we have a better methodology, we will adopt it. Until then we should work with what we have. The worst thing would be to say, “as this is not 100% fool-proof, we do not control for bias at all”.

Leave a Reply to Edzard Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe via email

Enter your email address to receive notifications of new blog posts by email.

Recent Comments

Note that comments can be edited for up to five minutes after they are first submitted but you must tick the box: “Save my name, email, and website in this browser for the next time I comment.”

The most recent comments from all posts can be seen here.

Archives
Categories