Science has seen its steady stream of scandals which are much more than just regrettable, as they undermine much of what science stands for. In medicine, fraud and other forms of misconduct of scientists can even endanger the health of patients.
On this background, it would be handy to have a simple measure which would give us some indication about the trustworthiness of scientists, particularly clinical scientists. Might I be as bold as to propose such a method, the TRUSTWORTHINESS INDEX (TI)?
A large part of clinical science is about testing the efficacy of treatments, and it is the scientist who does this type of research who I want to focus on. It goes without saying that, occasionally, such tests will have to generate negative results such as “the experimental treatment was not effective” [actually “negative” is not the right term, as it is clearly positive to know that a given therapy does not work]. If this never happens with the research of a given individual, we could be dealing with false positive results. In such a case, our alarm bells should start ringing, and we might begin to ask ourselves, how trustworthy is this person?
Yet, in real life, the alarm bells rarely do ring. This absence of suspicion might be due to the fact that, at one point in time, one single person tends to see only one particular paper of the individual in question – and one result tells him next to nothing about the question whether this scientist produces more than his fair share of positive findings.
What is needed is a measure that captures the totality of a researcher’s out-put. Such parameters already exist; think of the accumulated ”Impact Factor” or the ”H-Index”, for instance. But, at best, these citation metrics provide information about the frequency or impact of this person’s published papers and totally ignore his trustworthiness. To get a handle on this particular aspect of a scientist’s work, we might have to consider not the impact but the direction of his published conclusions.
If we calculated the percentage of a researcher’s papers arriving at positive conclusions and divided this by the percentage of his papers drawing negative conclusions, we might have a useful measure. A realistic example might be the case of a clinical researcher who has published a total of 100 original articles. If 50% had positive and 50% negative conclusions about the efficacy of the therapy tested, his TI would be 1.
Depending on what area of clinical medicine this person is working in, 1 might be a figure that is just about acceptable in terms of the trustworthiness of the author. If the TI goes beyond 1, we might get concerned; if it reaches 4 or more, we should get worried.
An example would be a researcher who has published 100 papers of which 80 are positive and 20 arrive at negative conclusions. His TI would consequently amount to 4. Most of us equipped with a healthy scepticism would consider this figure highly suspect.
Of course, this is all a bit simplistic, and, like all other citation metrics, my TI provides us not with any level of proof; it merely is a vague indicator that something might be amiss. And, as stressed already, the cut-off point for any scientist’s TI very much depends on the area of clinical research we are dealing with. The lower the plausibility and the higher the uncertainty associated with the efficacy of the experimental treatments, the lower the point where the TI might suggest something to be fishy.
A good example of an area plagued with implausibility and uncertainty is, of course, alternative medicine. Here one would not expect a high percentage of rigorous tests to come out positive, and a TI of 0.5 might perhaps already be on the limit.
So how does the TI perform when we apply it to my colleagues, the full-time researchers in alternative medicine? I have not actually calculated the exact figures, but as an educated guess, I estimate that it would be very hard, even impossible, to find many with a TI under 4.
But surely this cannot be true! It would be way above the acceptable level which we just estimated to be around 0.5. This must mean that my [admittedly slightly tongue in cheek] idea of calculating the TI was daft. The concept of my TI clearly does not work.
The alternative explanation for the high TIs in alternative medicine might be that most full-time researchers in this field are not trustworthy. But this hypothesis must be rejected off hand – or mustn’t it?
But the rule still applies: “As soon as you begin using anything as a metric, it becomes useless as a metric.”
Researchers would design studies that would deliberately generate negative results, in order to keep their TI in the appropriate range.
The only people for who the metric would be useful are those who have some integrity and therefore don’t need the metric.
and of course your TI would find that nearly every scientist that regularly publishes would be somewhere above 4, because there is a massive bias toward publishing negative results. negative results, as you correctly point out, are still positive in that they add to knowledge, but attempting to publish such data is sisyphean at best. a colleague of mine repeatedly expresses a desire for “the journal of negative results”; it would save us a lot of time in needless and unknown repetition.
ed yong has a great talk on publication bias (link), and ben goldacre has this ground well covered also.
here is a new, important and much more sophisticated paper on a similar subject:
http://rsos.royalsocietypublishing.org/content/5/1/171511