Yves Smith sends us to Howard Brody who describes a paper by his colleague, Donald Light, a medical sociologist who "delivered [a paper] recently at the American Sociological Association annual meeting in Atlanta, August 17 (session 487), entitled, "Pharmaceuticals: A Two-Tiered Market for Producing 'Lemons' and Serious Harm."
"[T]he pharmaceutical market for 'lemons,' differs from other markets for lemons in that companies develop and produce the lemons. Evidence in this paper indicates that the production of lemon-drugs with hidden dangers is widespread and results from the systematic exploitation of monopoly rights and the production of partial, biased information about the efficacy and safety of new drugs. The institutional practices differ profoundly from car manufacturers working tirelessly to produce safe cars but inadvertently discovering a serious problem, like Toyota discovering its sticking accelerator problem in 2009. The massive reaction, Congressional investigations, and recalls [in the Toyota case] involved less than 1/10,000 as many deaths as are attributed to prescription drugs every year."
It's always thrilling to see a sociologist using economic principles and concepts. In this case, the underpinnings of the paper are George Akerlof's (2001 Nobel Memorial Prize in Economic Sciences) seminal 1970 paper The Market for Lemons: Quality Uncertainty and the Market Mechanism.
I only wish that Light had looked a bit farther afield than the recent Toyota accelerator problem before asserting that "The institutional practices differ profoundly from car manufacturers working tirelessly to produce safe cars but inadvertently discovering a serious problem." Specifically, I wish he had considered the 1970's Ford Pinto case. It is a classic example of the abuse of cost benefit analysis unfettered by ethical considerations since the "fix" for the problem of fiery death/injury from rear-end collisions was estimated at only $11/car. Oddly enough, even in the absence of moral sentiments, it's hard to see how one could conclude that because the total costs of a fix exceeds the total imputed dollar value of deaths and injuries averted, one should disregard the rather low marginal cost ($11/Pinto) to buyers who presumably would be willing to pay some amount for a decrement in fiery mortality/injury risk to themselves and their families. I have to think that (assuming only money matters here) the right calculation would involve buyers' willingness to pay for a decrease in risk coupled with the likely contraction in demand from the $11 dollar increment in price (along with the expansion in demand from the safety improvement). Then the correct (purely monetary) comparison would be between the resulting change in profits and in reduced costs of litigating liability claims, yes?
Seems almost like a corporate death panel, doesn't it? But I digress.
I concede that car safety is probably different from drugs safety, but I don't share Light's opinion that the "institutional practices" of the two industries differ that much. And that's the problem. Our legal, moral, and governmental institutions aren't really designed to deal with the causal and scientific uncertainty that characterizes the drugs industry.
It's much easier over time to draw an evidence-based causal link from car crash/failure to fiery death-watery grave-family of five killed on turnpike as such incidents happen repeatedly. Car crashes have the added advantage that when due to faulty equipment they tend to be nondiscriminatory, i.e., they don't just kill the already sick and infirm. They kill kids, young adults, and the middle-aged. All are groups that we can be reasonably certain were healthy and would have lived in the absence of the car crash.
If a car bursts into flame with higher than average frequency when compared to similar cars in similar crashes, it suggests a causal link between a specific car and the fiery outcome. If that car also happens to place its gas tank differently (closer to the rear of the car) than other similar cars and the gas tank ruptures with higher frequency than other similar cars in similar accidents, then it should be possible within less than seven years to conclude with some confidence that the car is the problem and it should be fixed. One can haggle over who pays, but it should be fixed.
Causal links between drug/device use and death or injury are harder to establish partly because people who take prescription drugs or use medical devices are...well...sick and (often) old. Many of them have several comorbidities which place them at higher risk for negative health sequelae and that make it difficult to know for sure what exactly caused them to have, for example, a heart attack.
A randomized controlled trial (RCT) can be helpful since (if the randomization is successful and if selection to be randomized and attrition from the trial are random in both experimental and control groups) we can compare similar people's death rates or heart attack rates. The problem is that when randomized controlled trials are designed to test drug efficacy, they frequently exclude those who are very ill and those with multiple comorbidities. However, when a drug or device is marketed, these same restrictions often do not apply.
The point here being that absence of evidence of elevated risk in an RCT does not guarantee that the risk is not there, because of the selection of relatively healthy people to participate in such trials. (Note: this is not an essay against RCTs, this is an essay about how information from a well-designed RCT needs to be subjected to critical review and interpretation.) A second problem with the types of RCTs conducted for drug approval is that they are designed to be large enough (i.e., to have enough people in the treatment group and the control group) to detect clinically meaningful differences in treatment response between the two groups. They are not designed and therefore are often not large enough to identify low frequency, high severity adverse side effects. When combined with conventionally abused methods of statistical inference, this last can be lethal (literally).
Let's say that you design an RCT with sufficient statistical power to detect a clinically meaningful difference in some outcome like symptomatic pain relief and let's say that requires about 5,500 people in your study. Half are treated with the new drug, half with an existing drug. . Now suppose that while you're conducting this study, you notice that 5 of the treated group develop a myocardial infarction (heart attack) while only one of the untreated group does. Should we be concerned that the drug in question poses elevated risk for heart attacks?
Let's take an actual case: Vioxx, a Cox-2 inhibitor manufactured by Merck that offered the potential for pain relief without the negative gastro-intestinal side effects associated with non-steroidal anti-inflammatory drugs (NSAIDS). (For most of the following discussion I draw on Ziliak and McCloskey's far superior treatment of this topic.) In the RCT of Vioxx against Naproxen Sodium (Aleve) reported in the Annals of Internal Medicine, there were roughly 5,500 individuals with osteoarthritis enrolled, roughly half in the treated-with-Vioxx group and half in the treated-with-Naproxen group. Five of the Vioxx group were reported to have experienced a heart attack, but only 1 in the Naproxen group.
The p-value associated with the 5 to 1 difference was .20. The authors of the report followed medical journal (and, unfortunately, many researchers') convention in concluding that the absence of "statistical significance" (because it exceeded a conventional (but arbitrary) threshold (p=.05)) indicated that the difference was a "null" finding. By "null finding" I mean that they concluded that there was no substantive difference in heart attack rates between two groups, despite the whopping 5 to 1 difference.
One problem with using p-values as the sole determinant of the importance of a research finding is that a p-value can be made smaller (thereby achieving statistical significance) simply by increasing the sample size. So if the trial had included 550,000 people and we observed 500 heart attacks among the Vioxx group and 100 among the Naproxen group, this 5 to 1 elevation in risk would have achieved statistical significance. Conversely the p-value can be made larger by decreasing sample size. If one makes a p-value smaller by adding to sample size (with no change in the relative differences between the two groups) one can then claim "statistical significance." If one makes a p-value larger by reducing one's sample size (without altering the relative difference of the two groups' outcomes), I'm sorry to say that one can almost always get away with claiming that there is no difference between the two groups, despite the fact that one may have insufficient statistical power to make any such claim (this is also known as a Type 2 error).
As Ziliak and McCloskey repeatedly emphasize: size matters. The size of the effect coupled with the potential severity, not some arbitrary and sizeless threshold of statistical significance, should determine how we decide if a difference matters or not.
In the case of the Vioxx study, it turned out that the difference did matter. In fact, it later came to light that the difference in heart attack risk between Vioxx takers and Naproxen takers was 8 to 1, not 5 to 1. According to the New England Journal of Medicine, three observations were suppressed by Merck. Had the 8 to 1 difference been reported, it would have tipped the statistical-significance-mindless-use-of-p-values scale to p<.05 and the authors would no longer have been able to pretend that the heart attack difference was a null finding.
Scientific uncertainty is why we use statistical inference to help us interpret the results of a scientific study. In the above example, two things stand out. First, the journal's editors and reviewers were content to accept the author's conclusion that a 5 to 1 difference did not rise to a level that would require further investigation. At least, not before the article was published. The Vioxx case is an example of how scientific uncertainty combined with a rather laissez-faire approach to statistical inference can lead to what I think (based on Brody's blog) Light calls a "lemon" drug.
But lemon drugs are different from lemon cars in other ways besides causal and scientific uncertainty. When a car is a lemon, nobody benefits from driving it. No matter who owns it, it breaks down, it explodes on impact, it costs the owner a fortune in repair bills and, sometimes, it costs the owner her job (if she's late for work repeatedly due to car failure).
When a drug is a lemon (at least in cases like Vioxx), some people actually benefit from taking the drug. Others pay the ultimate price for pain relief. Many drugs have this characteristic, benefiting some, harming others. Sometimes docs can guess pretty accurately who will benefit and who will be harmed. Unfortunately, drugs manufacturers spend a lot of money telling docs about drugs they manufacture. As with Vioxx, they have strong incentives to shape the information in ways that benefit the manufacturers and that misrepresent the safety of the drugs. Instead of leading private interests to a publicly beneficial result ("as if led by an invisible hand"), the self-interest of drugs manufacturers can lead to less benefit. But how are we to evaluate this?
80 million people took Vioxx resulting in $2.5 Bn in annual sales. Eric Topol, writing in the New England Journal of Medicine, puts the possible number of Vioxx deaths at 160,000. Of course, it could be much higher. The prescriptions were written for many people who were most likely older and sicker than those who were screened for participation in the RCTs. That means that heart attack risk could have been higher than 8 to 1 in "the real world." Notice how high severity, small probability events can become big problems with large losses when 10,000,000 prescriptions a month are being written.
Institutions matter. The lawsuits cost Merck nearly $5 Bn. Regulation clearly has to have a role in policing a market where lemons can harm people, especially markets where lemons are complex as are the causal pathways. But there are several other messages that I believe the Vioxx episode sends us. One is that science isn't science when arbitrary thresholds derived from a dominant, but possibly inferior, framework for statistical inference are mindlessly accepted as indicators of the presence or absence of clinically or socially important effects. This is little different from Ford using cost benefit analysis without ethical consideration of the "recommendations" of the analysis (nor as far as I can tell a real understanding of the welfare economics that underlie it).
Another message from the Vioxx scandal is that we as scientists, physicians, and ethicists must figure out how to quantify and communicate competing risks so that consumers (and their physicians) can make informed decisions about drugs that both help and harm. For each person who suffered a heart attack from Vioxx, there were many more who benefited from the pain relief. How are policy makers to evaluate this? A purely utilitarian perspective would trade off quality adjusted life years gained for the many against the lives lost of the few. This seems unsatisfactory and not unlike making decisions based solely on net benefits calculated from a cost benefit analysis or a p-value from an RCT. What should a new framework look like?
A final message is that we consumers don't get younger or healthier over time. We're the ones most likely to be affected by side effects not captured in the RCTs that lead to drug approval. We can increase the size of drug trials, but it's not clear that the added expense is a wise use of societal resources. A pharmacologist friend of mine once advised me never to take any new drug for which there was a close substitute that had been on the market for at least 5 years. Maybe if we all adopted that policy, the incentives to create new copycat drugs would be substantially reduced and the risks with them.
Pharma is like finance in that the opportunities for "innovation" that yield positive (but sometimes quite small) marginal benefit to most, great harm to some, and great profit to the innovators are rampant. Information asymmetry is compounded by causal and scientific uncertainty. Causal links are difficult to establish. Not all risks can be identified and managed in timely ways. There are unknown unknowns even after the results of RCTs are in and drugs have been approved. The risks for us as consumers are to our health and well-being, both from the negative sequelae of unforeseen side effects and through the distortions in manpower and capital that are drawn into a sector whose output may not yield benefits commensurate with its costs.