Challenging the ‘White Hat Bias’

What’s At Stake With the Subpoena of EPA Data

Last month Republicans in the US House of Representatives launched a new offensive in the long-running battle over the Environmental Protection Agency’s regulation of air pollution under the Clean Air Act. For the first time in 21 years the House Committee on Science, Space, and Technology issued a subpoena requiring the EPA to hand over the data from two scientific studies, which provide the basis for most of the regulations.

According to the committee’s chairman Rep. Lamar Smith (R–TX), the subpoena comes after nearly two years of requests for EPA to make the data available. He argues that the scientific evidence used to justify new regulations that will cost billions of dollars should be accessible to independent researchers and other groups, rather than hinging on the analysis and interpretation of a small group of academic researchers. In an op-ed in the Wall Street Journal, he charged the EPA with basing its rule-making on “secret science.”

The committee, however, split along party lines on whether to subpoena the data. The ranking Democratic member Eddie Bernice Johnson (D–TX) characterized Chairman Smith’s action as an attempt to make the data available to “industry hacks” in order to discredit the research and weaken clean air regulation.

While the subpoena backed by only one party can easily be viewed as a political move, there is a vital issue at stake, and this must not be hijacked by politics.

The two studies in question are the Harvard Six Cities Study (HSCS) and the American Cancer Society’s Cancer Prevention Study II (CPS II). Both studies were initiated in the 1980s and both linked individual information on cohort members to air pollution data obtained from monitoring stations. The HSCS enrolled 8,111 adults in 6 US cities and followed the cohort for 14-16 years. The CPS II used data on roughly 500,000 adults for whom air pollution data for metropolitan areas throughout the US was available and followed the cohort for 16 years.

Both studies showed that exposure to fine particle air pollution (that is, particles with a diameter of less than 2.5 microns, or PM2.5) was linked with increased mortality. Their results provide the basis for most EPA regulations targeting air quality and have been used to make claims of large numbers of lives saved due to regulation.

However, the association observed in these studies was a weak one and raises a number of questions that have not been adequately explored. First, the association of PM2.5 with mortality shows geographic heterogeneity – no such association is seen in the western US, where the climate is dry and PM2.5 make-up differs from that in the eastern US.

Second, the results of the studies have been presented in a way that focuses narrowly on PM2.5 and precludes putting the association in perspective relative to other predictors of mortality, including cigarette smoking, income, and other factors.

Third, reports from these two studies tend to cite only supporting studies and to ignore studies that have not found an association of PM2.5 with mortality.

These points have been presented in a thoughtful and temperate analysis by Stanley Young and Jesse Xia of the National Institute for Statistical Sciences.

It should also be mentioned that the prevailing view on the health effects of air pollution is set by a small group of researchers who both carried out the studies used by the EPA as the basis for regulation and are also involved in the implementation of EPA policy, giving the appearance of a closed loop.

Others have pointed out that using particle size as criterion for health effects is a crude approach, reflecting our lack of understanding of the effects of specific pollutants.

Finally, rather than focusing narrowly on the effects of air pollution, one has to consider other influences on health. It may be that alternative actions such as reducing smoking rates, improving access to and affordability of fresh produce, reducing obesity, and boosting employment may do more to improve health than making further reductions in PM2.5.

Beyond these issues, there are other relevant considerations. The studies in question are observational studies and such studies have well-known limitations. Everyone agrees that these studies alone cannot establish causality; however, once results get highlighted in influential papers in prestigious journals, this inconvenient caveat is lost sight of and researchers and regulators proceed to act as if the studies demonstrate causality.

But, as Young and Xia point out, “But if that analysis is not correct, and small-particle air pollution is not causing excess statistical deaths, then the faulty science is punishing society through increased costs and unnecessary regulation.

In the abstract, all parties seem to be in agreement that the data used in rule-making should be publicly available. A statement of EPA Scientific Integrity Policy, as of February 2012, reads: “Scientific research and analysis comprise the foundation of all major EPA policy decisions. Therefore, the Agency should maintain vigilance toward ensuring that scientific research and results are presented openly and with integrity, accuracy, timeliness, and the full public scrutiny demanded when developing sound, high-quality environmental science.”

The main sticking point in the current standoff between the committee and the EPA appears to involve the protection of subject confidentiality. Opponents of the subpoena argue that if third parties are given access to the data, the identity of study participants could become public, in violation of the researchers’ guarantee of confidentiality. The lead researcher on the CPS II study has made this argument. Supporters of the subpoena argue that the dataset could be stripped of personal identifiers.

In fact, the issue of confidentiality appears to be a dodge since large datasets of this type are routinely stripped of personal identifiers to protect subject confidentiality and enable use by researchers. What is much more likely to be the real reason for withholding the data is that when researchers have spent years obtaining grant support, designing a study, collecting and analyzing the data, and publishing “definitive” results, they are understandably reluctant to have other researchers poring over the data and conducting alternative analyses which may yield different results and a different interpretation.

Typically, in studies of this sort, one conducts many analyses examining the impact of different variables and different analytic strategies. In the end, one reports only what one considers to be the most informative of these analyses. However, researchers are human and their selection of which results to present may be influenced by subjective factors. Some analyses may be favored because they provide stronger support for the researchers’ hypothesis. In addition, journals are more likely to be interested in strongly framed results that appear to have important public health implications. Finally, publication of such high-impact results can make careers. All of these factors are only re-enforced when those presenting the data see their role as advocating for a societal good – a phenomenon referred to as “White Hat bias.”

This brings us to the crux of the matter. As the psychologist Brian Nosek and colleagues recently wrote: “Publishing norms emphasize novel, positive results. As such, disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results.” In other words, the emphasis and rewards are on novel findings rather than on getting it right. Hand-in-hand with this emphasis goes a low value placed on the mere replication of previous findings, which is viewed as unglamorous and not leading to career advancement.

However, replication of results is essential to the scientific project. Without replication, what we have are false and unverified results that persist in the literature unchallenged because no one has taken the time and effort to attempt to verify them.

The tolerance of false results is much greater in relatively “soft” sciences, such as epidemiology, where a different culture prevails from that in “harder” scientific disciplines. You would never have neuroscientists, geneticists, or molecular biologists, let alone physicists, accept findings without checking and replication. Peer-review and publication in a prestigious journal does not substitute for replication.

Thus, at the heart of this political standoff lie fundamental questions about scientific evidence, its interpretation, and its use in making policy. Policy and the formulation of regulations must be based on solid science. Therefore, the first step is making sure that the science is sound and says what we think it says. Politics, political opinions, and ideologies have no place in the vetting of the science. Once the science has been hashed out in the open, the process of taking costs and benefits into account can be carried out in a more meaningful way.

Asking that the underlying data from these studies be made available to the research community for independent re-analysis should not be regarded as some extreme demand – it should be a matter of course.

The science regarding the health effects of air pollution is difficult enough to get right without amplifying the confusion by imposing politics on it.

Geoffrey Kabat is an epidemiologist at the Albert Einstein College of Medicine who has studied a wide range of lifestyle and environmental exposures in relation to cancer and other chronic diseases. In addition to his scientific work, he is interested in risk perception and the public understanding of science. He is the author of Hyping Health Risks: Environmental Hazards in Daily Life and the Science of Epidemiology and writes the "Risk-omics" column at Forbes. This article is reprinted with his permission.

Photo Credit:
Sierra Club