Efficacy or effectiveness?

Published on 09/12/08 at 03:43pm

The question of 'generalisability' - or the extent clinical trials can be applied to the real world, has been an issue for some time. It certainly seems to be a limitation of many trials carried out for marketing authorisation.

Professor Sir Michael Rawlins' Harveian oration on the value of randomised controlled trials, at the Royal College of Physicians in October, provided some thought-provoking words on the subject.

Prof Rawlins was widely misquoted by the media as saying randomised clinical trials (RCTs) are not effective. In fact his message was intended as a challenge to the received wisdom on RCTs - challenging established beliefs being what science is all about.

When I subsequently asked him for clarification he was very clear: "Anyone who says that I am opposed to RCTs hasn't read the monograph or listened to my oration," he said.

His speech drew attention to types of evidence other than RCTs. He emphasised that science is based on two types of evidence - experiment and observation - and that the regulatory framework for medicines largely ignores the latter.

The simplest and least reliable type of observational data is the case report - otherwise known as anecdotal data. This is often the favourite of those who support complementary and alternative medicine, because they usually have a great deal of it, compared with very little reliable RCT data.

There is a role for individual case reports as signposts to more structured research, and they are sometimes assembled into a case series, but on their own they are not convincing.

Then there are more structured forms of observational data, including cohort studies, case-control studies, and cross-sectional surveys. Trisha Greenhalgh's excellent book How to Read a Paper sets out a hierarchy of evidence, putting meta-analyses and systematic reviews (assemblages of RCTs) near the top, just behind the 'gold standard' of RCTs.

There is a lot of justification for this: the RCT as controlled experiments is common to most if not all branches of science - in short, because it works.

But one of the problems RCTs face in medical research is that biological systems are highly variable, a feature which emerges from their huge complexity. So much so that we can't possibly know everything about them (at least not yet), so our trials have to be adjusted to try to minimise the confounding effects of unknown factors.

This is one limitation of RCTs, and there is another that I find curious in view of how they came to be the fulcrum on which regulators' decisions pivot. Most regulation of medicines arose because of safety problems, but the RCT as currently implemented is not very good at measuring safety. This is because it is usually not designed to do this.

So on what factors does the calculation of sample size depend? Not on safety data but on efficacy. We determine the size of the subject population in order to see whether our drug is more effective than placebo (for example) or an established comparator. We don't adjust the size in order to have confidence in detecting side effects.

Even large-scale phase III trials have therefore little chance of detecting the less common side effects - and these can easily include serious ones. For this reason pharmacovigilance depends much more on observational data than does efficacy testing. Even first-into-man studies - which are not RCTs, and usually designed with safety testing paramount - can only hope to detect common and prominent adverse events.

All this is not at all well understood by the lay public, which has been led to believe that the regulation of medicines ensures safety when treatments reach the market.

What are we measuring?

Prof Rawlins' argument was directed more at efficacy testing, something that is particularly relevant to the role of NICE. Efficacy is what we measure in RCTs - how well the drug works under tightly controlled conditions. But effectiveness in the real world of clinical practice is judged by how well the drug performs. It is related to the quality of life measures, which have now become essential in drug development.

This leads us to cost-effectiveness, that is, the relationship between impact on health and the cost of achieving that. Clearly the starting point is efficacy, which means that the RCT is the foundation stone. NICE can't assess cost-effectiveness without knowing what effectiveness is, and that can't be known without measuring efficacy.

"Experiment, observation and mathematics - individually and collectively - have a crucial role to play in providing the evidential basis for modern therapeutics," Prof Rawlins said in his speech. "Arguments about the relative importance of each are an unnecessary distraction. Hierarchies of evidence should be replaced by accepting - indeed embracing - a diversity of approaches."

There is already considerable discussion of the use of biomarkers to obtain early data on efficacy, instead of the traditional reliance on phase II proof of concept studies. Surrogate endpoints are as old as the hills (nobody asks for mortality data when evaluating an anti-hypertensive), and still have development mileage.

Would such tools, combined with large-scale controlled observational studies on a post-marketing basis, be the way forward? I'm not sure, but the fact remains that public confidence in the established drug development process is being eroded.

Serious, though admittedly rare, safety problems and the conflict between what NICE has to work with and what patients expect in terms of new treatments for serious diseases are leading us to a situation where the status quo seems unsustainable. The time is right for a paradigm change, and maybe it's up to the pharma industry to drive it forward.

Les Rose is a freelance clinical scientist and writer.

