Limitations: The Achilles Heel of Single Study Relevance

by The Incidental Economist on May 7, 2013 · 2 comments

The recent macroeconomic debate over the meaning of the work of Harvard economics Carmen Reinhart and Kenneth Rogoff is an opportunity for epistemic reflection, as is the flood of analysis (some of it mine) around the latest work oOregon’s Medicaid expansion by lottery.  What can really be learned from a single study and how do we collectively improve our communication around limitations for lay audiences? 

Of course the answer depends on many things, and, in particular, the nature of the study’s methods. Yet, even when methods are about as good as can be, we probably should never trust a single study with high confidence. Most serious consumers of the products of medical or social science know that the chances of a single study being wrong or, to be more precise, not being fully right, are significant. Even a well done study has limitations, which is why one is on safer ground examining a body of work, composed of many studies using a diversity of methods and data sources.

The full set of limitations of each individual study are rarely clearly articulated beyond the original publication itself, if they are even done so there. Yet the limitations are as important as the study itself. Even the perfect study — if ever there were such a thing — is only perfect in a narrow range of experience, and perhaps only laboratory reproducible experience.

Take, for example, the randomized controlled trial (RCT). It’s reasonably considered the gold standard of social science methods. When you read the results of a well-conducted RCT, does that mean you can take them and run with it? Not so fast. They may not apply outside the population studied. Sometimes that population is more narrow than you presume it to be or than even the authors tell you. Sometimes it is too small to draw useful conclusions.

Researchers should be forthcoming about this and other limitations, and many are. Yet one factor, the nonenrollment rate, or the proportion of individuals considered for but omitted from the trial, is not reported for a sizable proportion of RCTs.

In a Research Letter in JAMA Internal Medicine, Keith Humphreys and colleagues report on their study of the 20 most influential RCTs pertaining to each of 14 prevalent chronic disorders, 280 studies in total.

Only 145 studies (51.8%) provided sufficient information to allow calculation of the nonenrollment rate. These RCTs had a mean (SD) nonenrollment rate of 40.1% (23.7%). For 6 of the 14 diseases, the influential trials included at least 1 study with a nonenrollment rate higher than 90%. [...]

Only 35.0% of studies (n=98) provided sufficient information to categorize reasons for nonenrollment. In these studies, an average of 27.3% of participants did not meet eligibility criteria, 11.2% refused participation, and 3.7% were not enrolled owing to other reasons.

They do report that the trend in reporting nonenrollment rates is improving. In 2010, for example three-quarters of studies examined did report it.

There are often very good reasons not to enroll certain patients in studies. Children and elderly individuals, for example, are particularly vulnerable and it may, for this reason, be considered unethical to offer certain experimental treatments. Nonenrollment isn’t always the researcher’s fault either. Many patients refuse to participate in a trial, which is their right. Sometimes the sample you have is all you can reasonably obtain. Still, it imposes limitations that need to be considered in evaluating the results.

Nonenrollment does threaten external validity. Matters are even worse when it or the reasons for it are not reported. It’s hard to fathom a good, patient-centered reason to withhold such information from study reports.

I have not touched on any of the other, well-known threats to study validity. Whether RCTs or other methods, studies are only as good as fallible humans can make them, and rarely even that good. Reporting of them in the media is often even worse, famously confusing correlation for causation, for example, or overstating the import of results that are not statistically significant. From data registries to publicly reporting study hypotheses in advance of analysis, we know a great deal of ways to potentially improve the reliability and credibility of studies.

To all this, I want to add another point. The language of journal articles and the press releases and author interviews that accompany them is usually constructed to do more than convey science. It is also a means of self and institutional promotion. In this less scientific context, careful delineation of limitations and issues of causality or statistical significance often take a back seat. They’re just not the most exciting way to promote academic work, and we all know it. This is part of the art of academic communication. And it has its dangers, as Reinhart and Rogoff learned the hard way.

When it comes to medical science, the connection between a misleading or misunderstood study and patient care may be more direct and, therefore, more harmful. I don’t have a ready solution to this challenge. At least we should discuss it more. Are we all as honest as we could be in disseminating our work or as vigilant as we might be in getting out in front of the media’s misunderstanding of it? What would it take to be more so? Would our careers suffer if we were?

Austin Frakt, Ph.D.


Leave a Comment

{ 2 trackbacks }