2008 - Bad Science
Randomisation
Let’s take this out of the theoretical, and look at some of the trials which homeopaths quote to support their practice. I’ve got a bog-standard review of trials for homeopathic arnica by Professor Edward Ernst in front of me, which we can go through for examples. We should be absolutely clear that the inadequacies here are not unique, I do not imply malice, and I am not being mean. What we are doing is simply what medics and academics do when they appraise evidence.
So, Hildebrandt et al. (as they say in academia) looked at forty-two women taking homeopathic arnica for delayed-onset muscle soreness, and found it performed better than placebo. At first glance this seems to be a pretty plausible study, but if you look closer, you can see there was no ‘randomisation’ described. Randomisation is another basic concept in clinical trials. We randomly assign patients to the placebo sugar pill group or the homeopathy sugar pill group, because otherwise there is a risk that the doctor or homeopath—consciously or unconsciously—will put patients who they think might do well into the homeopathy group, and the no-hopers into the placebo group, thus rigging the results.
Randomisation is not a new idea. It was first proposed in the seventeenth century by John Baptista van Helmont, a Belgian radical who challenged the academics of his day to test their treatments like blood-letting and purging (based on ‘theory’) against his own, which he said were based more on clinical experience: ‘Let us take out of the hospitals, out of the Camps, or from elsewhere, two hundred, or five hundred poor People, that have Fevers, Pleurisies, etc. Let us divide them into half, let us cast lots, that one half of them may fall to my share, and the other to yours…We shall see how many funerals both of us shall have.’
It’s rare to find an experimenter so careless that they’ve not randomised the patients at all, even in the world of CAM. But it’s surprisingly common to find trials where the method of randomisation is inadequate: they look plausible at first glance, but on closer examination we can see that the experimenters have simply gone through a kind of theatre, as if they were randomising the patients, but still leaving room for them to influence, consciously or unconsciously, which group each patient goes into.
In some inept trials, in all areas of medicine, patients are ‘randomised’ into the treatment or placebo group by the order in which they are recruited onto the study—the first patient in gets the real treatment, the second gets the placebo, the third the real treatment, the fourth the placebo, and so on. This sounds fair enough, but in fact it’s a glaring hole that opens your trial up to possible systematic bias.
Let’s imagine there is a patient who the homeopath believes to be a no-hoper, a heart-sink patient who’ll never really get better, no matter what treatment he or she gets, and the next place available on the study is for someone going into the ‘homeopathy’ arm of the trial. It’s not inconceivable that the homeopath might just decide—again, consciously or unconsciously—that this particular patient ‘probably wouldn’t really be interested’ in the trial. But if, on the other hand, this no-hoper patient had come into clinic at a time when the next place on the trial was for the placebo group, the recruiting clinician might feel a lot more optimistic about signing them up.
The same goes for all the other inadequate methods of randomisation: by last digit of date of birth, by date seen in clinic, and so on. There are even studies which claim to randomise patients by tossing a coin, but forgive me (and the entire evidence-based medicine community) for worrying that tossing a coin leaves itself just a little bit too open to manipulation. Best of three, and all that. Sorry, I meant best of five. Oh, I didn’t really see that one, it fell on the floor.
There are plenty of genuinely fair methods of randomisation, and although they require a bit of nous, they come at no extra financial cost. The classic is to make people call a special telephone number, to where someone is sitting with a computerised randomisation programme (and the experimenter doesn’t even do that until the patient is fully signed up and committed to the study). This is probably the most popular method amongst meticulous researchers, who are keen to ensure they are doing a ‘fair test’, simply because you’d have to be an out-and-out charlatan to mess it up, and you’d have to work pretty hard at the charlatanry too. We’ll get back to laughing at quacks in a minute, but right now you are learning about one of the most important ideas of modern intellectual history.
Does randomisation matter? As with blinding, people have studied the effect of randomisation in huge reviews of large numbers of trials, and found that the ones with dodgy methods of randomisation overestimate treatment effects by 41 per cent. In reality, the biggest problem with poor-quality trials is not that they’ve used an inadequate method of randomisation, it’s that they don’t tell you how they randomised the patients at all. This is a classic warning sign, and often means the trial has been performed badly. Again, I do not speak from prejudice: trials with unclear methods of randomisation overstate treatment effects by 30 per cent, almost as much as the trials with openly rubbish methods of randomisation.
In fact, as a general rule it’s always worth worrying when people don’t give you sufficient details about their methods and results. As it happens (I promise I’ll stop this soon), there have been two landmark studies on whether inadequate information in academic articles is associated with dodgy, overly flattering results, and yes, studies which don’t report their methods fully do overstate the benefits of the treatments, by around 25 per cent. Transparency and detail are everything in science. Hildebrandt et ah, through no fault of their own, happened to be the peg for this discussion on randomisation (and I am grateful to them for it): they might well have randomised their patients. They might well have done so adequately. But they did not report on it.
Let’s go back to the eight studies in Ernst’s review article on homeopathic arnica—which we chose pretty arbitrarily—because they demonstrate a phenomenon which we see over and over again with CAM studies: most of the trials were hopelessly methodologically flawed, and showed positive results for homeopathy; whereas the couple of decent studies—the most ‘fair tests’—showed homeopathy to perform no better than placebo.*
≡ So, Pinsent performed a double-blind, placebo-controlled study of fifty-nine people having oral surgery: the group receiving homeopathic arnica experienced significantly less pain than the group getting placebo. What you don’t tend to read in the arnica publicity material is that forty-one subjects dropped out of this study. That makes it a fairly rubbish study. It’s been shown that patients who drop out of studies are less likely to have taken their tablets properly, more likely to have hail side-effects, less likely to have got better, and so on. I am not sceptical about this study because it offends my prejudices, but because of the high drop-out rate. The missing patients might have been lost to follow-up because they are dead, for example. Ignoring drop-outs tends to exaggerate the benefits of the treatment being tested, and a high drop-out rate is always a warning sign.
The study by Gibson et al. did not mention randomisation, nor did it deign to mention the dose of the homeopathic remedy, or the frequency with which it was given. It’s not easy to take studies very seriously when they are this thin.
There was a study by Campbell which had thirteen subjects in it (which means a tiny handful of patients in both the homeopathy and the placebo groups): it found that homeopathy performed better than placebo (in this teeny-tiny sample of subjects), but didn’t check whether the results were statistically significant, or merely chance findings.
Lastly, Savage et al. did a study with a mere ten patients, finding that homeopathy was better than placebo; but they too did no statistical analysis of their results.
These are the kinds of papers that homeopaths claim as evidence to support their case, evidence which they claim is deceitfully ignored by the medical profession. All of these studies favoured homeopathy. All deserve to be ignored, for the simple reason that each was not a ‘fair test’ of homeopathy, simply on
account of these methodological flaws.
I could go on, through a hundred homeopathy trials, but it’s painful enough already.
So now you can see, I would hope, that when doctors say a piece of research is ‘unreliable’, that’s not necessarily a stitch-up; when academics deliberately exclude a poorly performed study that flatters homeopathy, or any other kind of paper, from a systematic review of the literature, it’s not through a personal or moral bias: it’s for the simple reason that if a study is no good, if it is not a ‘fair test’ of the treatments, then it might give unreliable results, and so it should be regarded with great caution.
There is a moral and financial issue here too: randomising your patients properly doesn’t cost money. Blinding your patients to whether they had the active treatment or the placebo doesn’t cost money. Overall, doing research robustly and fairly does not necessarily require more money, it simply requires that you think before you start. The only people to blame for the flaws in these studies are the people who performed them. In some cases they will be people who turn their backs on the scientific method as a ‘flawed paradigm’; and yet it seems their great new paradigm is simply ‘unfair tests’.
These patterns are reflected throughout the alternative therapy literature. In general, the studies which are flawed tend to be the ones that favour homeopathy, or any other alternative therapy; and the well-performed studies, where every controllable source of bias and error is excluded, tend to show that the treatments are no better than placebo.
This phenomenon has been carefully studied, and there is an almost linear relationship between the methodological quality of a homeopathy trial and the result it gives. The worse the study—which is to say, the less it is a ‘fair test’—the more likely it is to find that homeopathy is better than placebo. Academics conventionally measure the quality of a study using standardised tools like the ‘Jadad score’, a seven-point tick list that includes things we’ve been talking about, like ‘Did they describe the method of randomisation?’ and ‘Was plenty of numerical information provided?’
This graph, from Ernst’s paper, shows what happens when you plot Jadad score against result in homeopathy trials. Towards the top left, you can see rubbish trials with huge design flaws which triumphantly find that homeopathy is much, much better than placebo. Towards the bottom right, you can see that as the Jadad score tends towards the top mark of 5, as the trials become more of a ‘fair test’, the line tends towards showing that homeopathy performs no better than placebo.
There is, however, a mystery in this graph: an oddity, and the makings of a whodunnit. That little dot on the right-hand edge of the graph, representing the ten best-quality trials, with the highest Jadad scores, stands clearly outside the trend of all the others. This is an anomalous finding: suddenly, only at that end of the graph, there are some good-quality trials bucking the trend and showing that homeopathy is better than placebo.
What’s going on there? I can tell you what I think: some of the papers making up that spot are a stitch-up. I don’t know which ones, how it happened, or who did it, in which of the ten papers, but that’s what I think. Academics often have to couch strong criticism in diplomatic language. Here is Professor Ernst, the man who made that graph, discussing the eyebrow-raising outlier. You might decode his Yes, Minister diplomacy, and conclude that he thinks there’s been a stitch-up too.
There may be several hypotheses to explain this phenomenon. Scientists who insist that homeopathic remedies are in everyway identical to placebos might favour the following. The correlation provided by the four data points (Jadad score 1-4) roughly reflects the truth. Extrapolation of this correlation would lead them to expect that those trials with the least room for bias (Jadad score = 5) show homeopathic remedies are pure placebos. The fact, however, that the average result of the 10 trials scoring 5 points on the Jadad score contradicts this notion, is consistent with the hypomesis that some (by no means all) methodologically astute and highly convinced homeopaths have published results that look convincing but are, in fact, not credible.
But this is a curiosity and an aside. In the bigger picture it doesn’t matter, because overall, even including these suspicious studies, the ‘meta-analyses’ still show, overall, that homeopathy is no better than placebo. Meta-analyses?
Meta-analysis
This will be our last big idea for a while, and this is one that has saved the lives of more people than you will ever meet. A meta-analysis is a very simple thing to do, in some respects: you just collect all the results from all the trials on a given subject, bung them into one big spreadsheet, and do the maths on that, instead of relying on your own gestalt intuition about all the results from each of your little trials. It’s particularly useful when there have been lots of trials, each too small to give a conclusive answer, but all looking at the same topic.
So if there are, say, ten randomised, placebo-controlled trials looking at whether asthma symptoms get better with homeopathy, each of which has a paltry forty patients, you could put them all into one meta-analysis and effectively (in some respects) have a four-hundred-person trial to work with.
In some very famous cases—at least, famous in the world of academic medicine—meta-analyses have shown that a treatment previously believed to be ineffective is in fact rather good, but because the trials that had been done were each too small, individually, to detect the real benefit, nobody had been able to spot it.
As I said, information alone can be life-saving, and one of the greatest institutional innovations of the past thirty years is undoubtedly the Cochrane Collaboration, an international not-for-profit organisation of academics, which produces systematic summaries of the research literature on healthcare research, including meta-analyses.
The logo of the Cochrane Collaboration features a simplified ‘blobbogram’, a graph of the results from a landmark meta-analysis which looked at an intervention given to pregnant mothers. When people give birth prematurely, as you might expect, the babies are more likely to suffer and die. Some doctors in New Zealand had the idea that giving a short, cheap course of a steroid might help improve outcomes, and seven trials testing this idea were done between 1972 and 1981. Two of them showed some benefit from the steroids, but the remaining five failed to detect any benefit, and because of this, the idea didn’t catch on.
Eight years later, in 1989, a meta-analysis was done by pooling all this trial data. If you look at the blobbogram in the logo on the previous page, you can see what happened. Each horizontal line represents a single study: if the line is over to the left, it means the steroids were better than placebo, and if it is over to the right, it means the steroids were worse. If the horizontal line for a trial touches the big vertical ‘nil effect’ line going down the middle, then the trial showed no clear difference either way. One last thing: the longer a horizontal line is, the less certain the outcome of the study was.
Looking at the blobbogram, we can see that there are lots of not-very-certain studies, long horizontal lines, mostly touching the central vertical line of ‘no effect’; but they’re all a bit over to the left, so they all seem to suggest that steroids might be beneficial, even if each study itself is not statistically significant.
The diamond at the bottom shows the pooled answer: that there is, in fact, very strong evidence indeed for steroids reducing the risk—by 30 to 50 per cent—of babies dying from the complications of immaturity. We should always remember the human cost of these abstract numbers: babies died unnecessarily because they were deprived of this li fe-saving treatment for a decade. They died, even when there was enough information available to know what would save them, because that information had not been synthesised together, and analysed systematically, in a meta-analysis.
Back to homeopathy (you can see why I find it trivial now). A landmark meta-analysis was published recently in the Lancet. It was accompanied by an editorial titled: ‘The End of Homeopathy?’ Shang et al. did a very thorough meta-analysis of a vast n
umber of homeopathy trials, and they found, overall, adding them all up, that homeopathy performs no better than placebo.
The homeopaths were up in arms. If you mention this meta-analysis, they will try to tell you that it was a stitch-up. What Shang et al. did, essentially, like all the previous negative meta-analyses of homeopathy, was to exclude the poorer-quality trials from their analysis.
Homeopaths like to pick out the trials that give them the answer that they want to hear, and ignore the rest, a practice called ‘cherry-picking’. But you can also cherry-pick your favourite meta-analyses, or misrepresent them. Shang et al. was only the latest in a long string of meta-analyses to show that homeopathy performs no better than placebo. What is truly amazing to me is that despite the negative results of these meta-analyses, homeopaths have continued—right to the top of the profession—to claim that these same meta-analyses support the use of homeopathy. They do this by quoting only the result for all trials included in each meta-analysis. This figure includes all of the poorer-quality trials. The most reliable figure, you now know, is for the restricted pool of the most ‘fair tests’, and when you look at those, homeopathy performs no better than placebo. If this fascinates you (and I would be very surprised), then I am currently producing a summary with some colleagues, and you will soon be able to find it online at badscience.net, in all its glorious detail, explaining the results of the various meta-analyses performed on homeopathy.
Clinicians, pundits and researchers all like to say things like ‘There is a need for more research,’ because it sounds forward-thinking and open-minded. In fact that’s not always the case, and it’s a little-known fact that this very phrase has been effectively banned from the British Medical Journal for many years, on the grounds that it adds nothing: you may say what research is missing, on whom, how, measuring what, and why you want to do it, but the hand-waving, superficially open-minded call for ‘more research’ is meaningless and unhelpful.