Randomized controlled trials are not all that matters

by The Incidental Economist on February 5, 2014 · 6 comments

Look, I’m a big fan of randomized controlled trials. I even made a video talking about how critical they are. But they have their limits. Many of these were ignored in a recent article in the New York Times by Gina Kolata:

The idea seemed transformative. The Affordable Care Act would fund a new research outfit evocatively named the Innovation Center to discover how to most effectively deliver health care, with $10 billion to spend over a decade.

But now that the center has gotten started, many researchers and economists are disturbed that it is not using randomized clinical trials, the rigorous method that is widely considered the gold standard in medical and social science research. Such trials have long been required to prove the efficacy of medicines, and similarly designed studies have guided efforts to reform welfare-to-work, education and criminal justice programs.

But they have rarely been used to guide health care policy — and experts say the center is now squandering a crucial opportunity to develop the evidence needed to retool the nation’s troubled health care system in a period of rapid and fundamental change.

When you want to find out if one thing causes another, there is simply no better tool than a randomized controlled trial. If you want to determine the efficacy of a treatment in a specified population, there is no better design than an randomized controlled trial. But those studies have limits.

They almost always have strict inclusion and exclusion criteria to make sure that there is the best chance of seeing a significant result. Patients who are the least likely to comply are often prohibited from taking part. They often involve incentives and environments that help the study, but bear no resemblance to the real world.

Randomized controlled trials are great at determining efficacy. In other words, they are fantastic as seeing whether a certain therapy has the potential to produce a desired effect.

What they aren’t so good at is determining effectiveness. In other words, they aren’t nearly as good at telling us how these therapies work in the real world.

This is because in the real world, we rarely exclude patients. We want to treat as many as possible. We can’t deny therapy to those least likely to comply; they are, in fact, those who likely need the most help. And we can’t create perfect environments. We often have to care for people in under-resources settings.

Even the example Kolata uses to shore up support for the RCT makes my point. She discussed the RAND Health Insurance Experiment:

In health care, a seminal, large randomized study by the RAND Corporation in 1982 found that people used health care less, but that their health was not affected, when they had to pay a small amount — as compared with nothing — for doctor visits.

It’s true that this was the major finding of the RAND HIE. But what it misses is that, for the most part, the RAND HIE studies pretty healthy people who had jobs. That’s not necessarily who needs health care the most. In the real world, people are often poor and have chronic illnesses. And what the RAND HIE also found is that poorer people with high blood pressure who had to pay more had significantly higher mortality rates. (I made a video about this, too)

In other words, the efficacy of cost-sharing was determined by the RAND HIE. But the effectiveness of it in the real world was not necessarily the same thing.

Moreover, even randomized controlled trials are sometimes wrong.

The Innovation Center is focused on effectiveness. They want to know how we can change the delivery of health care in order to make it work better in the real world. They are focusing much of their efforts to designs tht are not randomized controlled trials.

Some, including the NYT piece, argue that this is “one-sided”. This is true only if you ignore the fact that the vast majority of NIH money goes not to effectiveness research, but to efficacy work, like RCTs.

The Innovation Center is trying to correct the balance in some small way. We should give it a chance to succeed. Randomized controlled trials are awesome, but they’re not perfect, and we also need to know what works in actual practice.




Don Goldmann February 5, 2014 at 1:14 pm

I would love to see an additional statement about the value of real time evaluation and adaptation in demonstration projects, as articulated in papers by Shrank and Parry (http://www.academicpedsjnl.net/article/S1876-2859(13)00099-5/fulltext). This is at the core of CMMI’s approach. The implementation plan must be sensitive to the context in which the intervention is being applied, and real time feedback from the field is essential in making adjustments that increase the chance of both results and learning.

The key in both RCTs (such as some of those performed by Esther Duflo and her Lab and others) and demonstration projects is to have an adaptive design, NOT a fixed protocol design which is rigid and impermeable to learning from the field. Such designs seldom show durable results/

erik February 6, 2014 at 2:00 pm

interesting post- it seems the real threat to validity of the RAND study was not necessarily the selection of healthy worker individuals, although that was certainly an issue. But rather attrition bias: many of the individuals who needed care simply dropped out of the experiment in order to have it covered by insurance.

Mike Stoto February 7, 2014 at 11:50 am

Here’s a letter I’ve submitted to the New York Times regarding Kolata’s article
Dr. Berlin and the other experts quoted in Gina Kolata’s article (“Method of Study is Criticized in Group’s Health Policy Tests,” Feb. 3, 2014) are right to hold the randomized clinical trial, or “RCT” as it is commonly called, in high regard. Most of what we know about what works in treating individual patients comes from RCTs. But when the focus shifts to evaluating changes in the delivery system itself, as the Centers for Medicare and Medicaid Services Innovation Center is charged with doing, different methods are necessary and appropriate.
RCTs are essential in studying the effect of medications and health services provided to individuals because patient outcomes vary and are unpredictable. Only a fraction of patients respond to the most effective medications, and some people get better even without medication. In addition, outcomes typically vary by income, education, insurance status and other factors, which also influence who gets that medication. RCTs deal with both of these problems by creating a control group that tells us the “counterfactual” – what would have been the outcome in those who received the medication if they hadn’t receive it. And because randomization is used to decide which patients are in the treatment and the control groups, as long as the sample is large enough researchers can be reasonably sure that the difference in outcomes is associated with the treatment and not any other factor.
But sometimes the counterfactual is clear, making randomization unethical. This was brought home with the humorous paper published in The Lancet about an RCT of parachute jumping. Everyone knows what happens to people who jump from planes without a parachute.
For system-level changes of the sort that the Innovation Center is studying, the counter factual is sometimes known. In 2003, for instance, Dr. Peter Pronovost partnered with the Michigan Health and Hospital Association to introduce a checklist of best practices to known to reduce the risk of central line-associated blood infections into more than one hundred Michigan ICUs. Three months after the intervention was implemented, the median rate of catheter-related infections per 1,000 catheter days decreased from 2.7 infections at baseline to zero, and a median rate of zero infections was sustained for the remaining 15 months of follow-up.
Although it’s always possible that infections dropped dramatically by chance just as the program was implemented, lack of such dramatic progress in other states is a more reasonable counterfactual against which to compare the Michigan results. You don’t need an RCT to tell you this, and indeed withholding the infection control practices, or even the checklist, from the control sites would have been unethical.

Michael A. Stoto, PhD
Professor of Health Systems Administration and Population Health
Georgetown University
e-mail: [email protected]

Emmett Keeler, Ph.D. February 12, 2014 at 4:19 pm

While I agree with Aaron’s comments in general, he repeated some common misconceptions about the design and results of the RAND Health Insurance Experiment (HIE), which I would like to correct here.

We were initially funded by OEO, the Office of Economic Opportunity, and oversampled poor people so that we would have adequate sample to look at whether they were adversely affected by cost-sharing. By design, we excluded homeless people and people in jails, people 62+ (who were or would be covered by Medicare) and rich people.

Income effects on total spending were small. Poor people got less outpatient care and more inpatient care, and a graph of spending against income was a very shallow U shape.

Most likely, poor people spent as much as the others because they faced less cost-sharing in absolute terms: our cost-sharing limits were income related.

Given that equal spending, it is unlikely that modest cost-sharing adversely affected the health of poor people, and mainly it did not. When I go to Kaiser here, many people who look poor to me, seem happy to pay the $5 visit charge.

The HIE could not study mortality as there were very few deaths in the study (23 as I recall). I used the Framingham risk index to aggregate several measures related to heart disease — maybe that is what Aaron is thinking about. Free care did better on this index for those at elevated risk, and in particular poor people at elevated risk. This was mainly due to hypertension, and a large part of that difference was due to a few people with high blood pressure on the cost sharing plan who never visited a physician over the course of the study — an argument for having a regular place of care, where you occasionally go.

Of the 23 physiological measures we tested, free care did significantly (not corrected for multiple comparisons) better overall on three (vision, blood pressure, and oral health). No other measures were significantly related to cost-sharing. Cost-sharing did insignificantly better on 13/23.
The problem with restricting the analysis to high risk or poor people is you lose sample. Restricting to high risk people, 1/23 measures was significantly better with free care, and 1 was significantly worse.

Restricting to poor high risk people (between 5 or 10% of the sample depending on the measure), health may have been effected by cost sharing, but we were not powered to detect small effects: 9/13 measures favor free care, but the only significant result, acne, favored cost-sharing. Diastolic BP was 2 units worse with cost-sharing, larger than in the overall sample, but it was not significant.

It is logical that insurance should be most helpful to people who are poor and sick, but everyone in the HIE had some insurance, and health differences between different kinds of insurance seem to be small. This is reinforced by a great recent HSR study — the Baicker-Finkelstein evaluation of Oregon’s Medicaid expansion. With many more people eligible and applying than available slots, the state conducted a lottery. People who were randomized to Medicaid did better on many dimensions (less financial worry, had a regular source of care etc.) than the control group of unlucky applicants.

Comments on this entry are closed.

{ 2 trackbacks }