Page 23 of Bad Science


  Now, let’s say your chances of having a heart attack in the next year are tiny (you can probably see where I’m going, but I’ll do it anyway). Let’s say that four people out of 1,000 like you will have a heart attack in the next year, but if they are all on statins, then only two of them will have such a horrible event. Expressed as relative risk reduction, that’s still a 50 per cent reduction. Expressed as absolute risk reduction, it’s a 0.2 per cent reduction, which sounds much more modest.

  There are many people in medicine who are preoccupied with how best to communicate such risks and results, a number of them working in the incredibly exciting field known as ‘shared decision-making’.35 They have created all kinds of numerical tools to help clinicians and patients work out exactly what benefit they would get from each treatment option when presented with, say, different choices for chemotherapy after surgery for a breast tumour. The advantage of these tools is that they take doctors much closer to their future role: a kind of personal shopper for treatments, people who know how to find evidence, and can communicate risk clearly, but who can also understand, in discussion with patients, their interests and priorities, whether those are ‘more life at any cost’ or ‘no side effects’.

  Research has shown that if you present benefits as a relative risk reduction, people are more likely to choose an intervention. One study, for example, took 470 patients in a waiting room, gave them details of a hypothetical disease, then explained the benefits of two possible treatment options.36 In fact, both these treatments were the same, offering the same benefit, but with the risk expressed in two different ways. More than half of the patients chose the medication for which the benefit was expressed as a relative risk reduction, while only one in six chose the one whose benefit was expressed in absolute terms (most of the rest were indifferent).

  It would be wrong to imagine that patients are unique in being manipulated by the way figures on risk and benefit are presented. In fact, exactly the same result has been found repeatedly in experiments looking at doctors’ prescribing decisions,37 and even the purchasing decisions of health authorities,38 where you would expect to find numerate doctors and managers, capable of calculating risk and benefit.

  That is why it is concerning to see relative risk reduction used so frequently in reporting the modest benefits of new treatments, both in mainstream media and in professional literature. One good recent example comes, again, from the world of statins, in the coverage around the Jupiter trial.

  This study looked at the benefits of an existing drug, rosuvastatin, for people with low risk of heart attack. In the UK most newspapers called it a ‘wonder drug’ (the Daily Express, bless it, thought it was an entirely new treatment,39 when in reality it was a new use, in low-risk patients, of a treatment that had been used in moderate- and high-risk patients for many years). Every paper reported the benefit as a relative risk reduction: ‘Heart attacks were cut by 54 per cent, strokes by 48 per cent and the need for angioplasty or bypass by 46 per cent among the group on Crestor compared to those taking a placebo or dummy pill,’ said the Daily Mail. In the Guardian, ‘Researchers found that in the group taking the drug, heart attack risk was down by 54 per cent and stroke by 48 per cent.’40

  The numbers were entirely accurate, but as you now know, presenting them as relative risk reductions overstates the benefit. If you express the exact same results from the same trial as an absolute risk reduction, they look much less exciting. On placebo, your risk of a heart attack in the trial was 0.37 events per one hundred person years. If you were taking rosuvastatin, it fell to 0.17 events per one hundred person years. And you have to take a pill every day. And it might have side effects.

  Many researchers think the best way to express a risk is by using the ‘numbers needed to treat’. This is a very concrete method, where you calculate how many people would need to take a treatment in order for one person to benefit from it. The results of the Jupiter trial were not presented, in the paper reporting the final findings, as a ‘number needed to treat’, but in that low-risk population, working it out on the back of an envelope, I calculate that a few hundred people would need to take the pill to prevent one heart attack. If you want to take rosuvastatin every day, knowing that this is the likelihood of you receiving any benefit from the drug, then that’s entirely a matter for you. I don’t know what decision I would make, and everyone is different, as you can see from the fact that some people with low risk choose to take a statin, and some don’t. My concern is only whether those results are explained to them clearly, in the newspapers, in the press release, by their doctor, and in the original academic journal article.

  Let’s consider one final example. If your trial results really were a disaster, you have one more option. You can simply present them as if they were positive, regardless of what you actually found.

  A group of researchers in Oxford and Paris set out to examine this problem systematically in 2009.41 They took every trial published over one month that had a negative result, in the correct sense of the word, meaning trials which had set out in their protocol to detect a benefit on a primary outcome, and then found no benefit. They then went through the academic journal reports of seventy-two of these trials, searching for evidence of ‘spin’: attempts to present the negative result in a positive light, or to distract the reader from the fact that the main result of the trial was negative.

  First they looked in the abstracts. These are the brief summaries of an academic paper, on the first page, and they are widely read, either because people are too busy to read the whole paper, or because they cannot get access to it without a paid subscription (a scandal in itself). Normally, as you scan hurriedly through an abstract, you’d expect to be told the ‘effect size’ – ‘0.85 times as many heart attacks in patients on our new super-duper heart drug’ – along with an indication of the statistical significance of this result. But in this representative sample of seventy-two trials, all with unambiguously negative results for their main outcome, only nine gave these figures properly in the abstract, and twenty-eight gave no numerical results for the main outcome of the trial at all. The negative results were simply buried.

  It gets worse: only sixteen of these negative trials reported the main negative outcome of the trial properly anywhere, even in the main body of the text.

  So what was in these trial reports? Spin. Sometimes the researchers found some other positive result in the spreadsheets, and pretended that this was what they had intended to count as a positive result all along (a trick we have already seen: ‘switching the primary outcome’). Sometimes they reported a dodgy subgroup analysis – again, a trick we’ve already seen. Sometimes they claimed to have found that their treatment was ‘non-inferior’ to the comparison treatment (when in reality a ‘non-inferiority’ trial requires a bigger sample of people, because you might have missed a true difference simply by chance). Sometimes they just brazenly rambled on about how great the treatment was, despite the evidence.

  This paper is not a lone finding. In 2009 another group looked at papers reporting trials on prostaglandin eyedrops as a treatment for glaucoma42 (as always, the specific condition and treatment are irrelevant; it’s the principle that is important). They found thirty-nine trials in total, with the overwhelming majority, twenty-nine of them, funded by industry. The conclusions were chilling: eighteen of the twenty industry-funded trials presented a conclusion in the abstract that misrepresented the main outcome measure. All of the non-industry-funded studies were fine.

  All this is shameless, but it is possible because of structural flaws in the information architecture of academic medicine. If you don’t make people report the primary outcome in their paper, if you accept that they routinely switch outcomes, knowing full well that this distorts statistics, you are permitting results to be spun. If you don’t link protocols clearly to papers, allowing people to check one against the other for ‘bait and switch’ with the outcomes, you permit results to be spun. If editors and peer revi
ewers don’t demand that pre-trial protocols are submitted alongside papers, and checked, they are permitting outcome switching. If they don’t police the contents of abstracts, they are collaborators in this distortion of evidence, that distorts clinical practice, makes treatment decisions arbitrary rather than evidence-based, and so they play their part in harming patients.

  Perhaps the greatest problem is that many of those who read the medical literature implicitly assume that such precautions are taken by all journal editors. But they are wrong to assume this. There is no enforcement for any of what we have covered, everyone is free to ignore it, and so commonly – as with newspapers, politicians and quacks – uncomfortable facts are cheerfully spun away.

  Finally, perhaps most worryingly of all, similar levels of spin have been reported in systematic reviews and meta-analyses, which are correctly regarded as the most reliable form of evidence. One study compared industry-funded reviews with independently-funded reviews from the Cochrane Collaboration.43 In their written conclusions, the industry-funded reviews all recommended the treatment without reservation, while none of the Cochrane meta-analyses did. This disparity is striking, because there was no difference in their numerical conclusions on the treatment effect, only in the narrative spin of the discussion in the conclusions section of the review paper.

  The absence of scepticism in the industry-funded reviews was also borne out in the way they discussed methodological shortcomings of the studies they included: often, they simply didn’t. Cochrane reviews were much more likely to consider whether trials were at risk of bias; industry-funded studies brushed over these shortcomings. This is a striking reminder that the results of a scientific paper are much more important than the editorialising of the discussion section. It’s also a striking reminder that the biases associated with industry funding penetrate very deeply into the world of academia.

  5

  Bigger, Simpler Trials

  So, we have established that there are some very serious problems in medicine. We have badly designed trials, which suffer from all kinds of fatal flaws: they’re conducted in unrepresentative patients, they’re too brief, they measure the wrong outcomes, they go missing if the results are unflattering, they get analysed stupidly, and often they’re simply not done at all, simply because of expense, or lack of incentives. These problems are frighteningly common, both for the trials that are used to get a drug on the market, and for the trials that are done later, all of which guide doctors’ and patients’ treatment decisions. It feels as if some people, perhaps, view research as a game, where the idea is to get away with as much as you can, rather than to conduct fair tests of the treatments we use.

  However we view the motives, this unfortunate situation leaves us with a very real problem. For many of the most important diseases that patients present with, we have no idea which of the widely used treatments is best, and, as a consequence, people suffer and die unnecessarily. Patients, the public, and even many doctors live in blissful ignorance of this frightening reality, but in the medical literature, it has been pointed out again and again.

  Over a decade ago, a BMJ paper on the future of medicine described the staggering scale of our ignorance. We still don’t know, it explained, which of the many current treatments is best, for something as simple as treating patients who’ve just had a stroke. But the paper also made a disarmingly simple observation: strokes are so common, that if we took every patient in the world who had one, and entered them into a randomised trial comparing the best treatments, we would recruit enough patients in just twenty-four hours to answer this question. And it gets better: many outcomes from stroke – like death – become clear in a matter of months, sometimes weeks. If we started doing this trial today, and analysed the results as they came in, medical management of stroke could be transformed in less time than it takes to grow a sunflower.

  The manifesto implicit in this paper was very straightforward: wherever there is genuine uncertainty about which treatment is best, we should conduct a randomised trial; medicine should be in a constant cycle of revision, gathering follow-up data and improving our interventions, not as an exception, but wherever that is possible.

  There are technical and cultural barriers to doing this kind of thing, but they are surmountable, and we can walk through them by considering a project I’ve been involved in, setting up randomised trials embedded in routine practice, in everyday GP surgeries.1 These trials are designed to be so cheap and unobtrusive that they can be done whenever there is genuine uncertainty, and all the results are gathered automatically, at almost no cost, from patients’ computerised notes.

  To make the design of these trials more concrete, let’s look at the pilot study, which compares two statins against each other, to see which is best at preventing heart attack and death. This is exactly the kind of trial you might naïvely think has already been done; but as we saw in the previous chapter, the evidence on statins has been left incomplete, even though they are some of the most widely prescribed drugs in the world (which is why, of course, we keep coming back to them in this book). People have done trials comparing each statin against a placebo, a rubbish comparison treatment, and found that statins save lives. People have also done trials comparing one statin with another, which is a sensible comparison treatment; but these trials all use cholesterol as a surrogate outcome, which is hopelessly uninformative. We saw in the ALLHAT trial, for example, that two drugs can be very similar in how well they treat blood pressure, but very different in how well they prevent heart attacks: so different, in fact, that large numbers of patients died unnecessarily over many years before the ALLHAT trial was done, simply because they were being prescribed the less effective drug (which was, coincidentally, the newer and more expensive one).

  So we need to do real-world trials, to see which statin is best at saving lives; and I would also argue that we need to do these trials urgently. The most widely used statins in the UK are atorvastatin and simvastatin, because they are both off patent, and therefore cheap. If one of these turned out to be just 2 per cent better than the other at preventing heart attacks and death, this knowledge would save vast numbers of lives around the world, because heart attacks are so common, and because statins are so widely used. Failing to know the answer to this question could be costing us lives, every day that we continue to be ignorant. Tens of millions of people around the world are taking these drugs right now, today. They are all being exposed to unnecessary risk from drugs that haven’t been appropriately compared with each other, but they’re also all capable of producing data that could be used to gather new knowledge about which drug is best, if only they were systematically randomised, and their outcomes followed up.

  Our large, pragmatic trial is very simple. Everything in GPs’ offices today is already computerised, from the appointments to the notes to the prescriptions, as you will probably already know, from going to a doctor yourself. Whenever a GP sees a patient and decides to prescribe a statin, normally they click the ‘prescribe’ button, and are taken to a page where they choose a drug, and print out a prescription. For GPs in our trial, one extra page is added. ‘Wait,’ it says (I’m paraphrasing). ‘We don’t know which of these two statins is the best. Instead of choosing one, press this big red button to randomly assign your patient to one or the other, enter them into our trial, and you’ll never have to think about it ever again.’

  The last part of that last sentence is critical. At present, trials are a huge and expensive administrative performance. Many struggle to recruit enough patients, and many more struggle to recruit everyday doctors, as they don’t want to get involved in the mess of filling out patient report forms, calling patients back for extra appointments, doing extra measurements and so on. In our trial there is none of that. Patients are followed up, their cholesterol levels, their heart attacks, their weird idiosyncratic side effects, their strokes, their seizures, their deaths: all of this data is taken from their computerised health records, automatically, witho
ut anybody having to lift a finger.

  These simple trials have one disadvantage, which you may already have spotted, in that they aren’t ‘blinded’, so the patients know the name of the drug they’ve received. This is a problem in some studies: if you believe that you’ve been given a very effective medicine, or that you’ve been given a rubbish one, then the power of your beliefs and expectations can affect your health, through a phenomenon known as the placebo effect. If you’re comparing a painkiller against a dummy sugar pill, then a patient who knows they’ve been given a sugar pill for pain is likely to be annoyed and in more pain. But it’s harder to believe that patients have firm beliefs about the relative benefits of atorvastatin and simvastatin, and that these beliefs will then impact on cardiovascular mortality five years later. In all research, we make a trade-off between what is ideal and what is practical, giving careful consideration to the impact that any methodological shortcomings will have on a study’s results.

  So, alongside this shortcoming, it’s worth taking a moment to notice how many of the serious problems with trials can be addressed by our study design of simple trials in electronic health records. Setting aside the assumption that they will be analysed properly, without the dubious tricks mentioned in the previous chapter, there are other, more specific benefits. Firstly, as we know, trials are frequently conducted in unrepresentative ‘ideal patients’, and in odd settings. But the patients in our simple pragmatic trials are exactly like real-world patients, because they are real-world patients. They are all the people that GPs prescribe statins to. Secondly, because trials are expensive, stand-alone administrative entities, and because they struggle to recruit patients, they are often small. Our pragmatic trial, meanwhile, is vanishingly cheap to run, because almost all of the work is done using existing data – it cost £500,000 to set up this first trial, and that included building the platform that can be used to run any trial you like in the future. This is exceptionally cheap in the world of trials. Thirdly, trials are often brief, and fail to look at real-world outcomes: our simple trial runs forever, and we can collect follow-up data and monitor whether people have had a heart attack, or a stroke, or died, for decades to come, at almost no cost, by following their progress through the computerised health records that are being produced by their doctors anyway.