The EPA’s Rule to Restrict Science Could Compromise Your Confidential Research Data

March 5, 2020 | 10:00 am
Anita Desikan
Senior Analyst

It is a nice story that the tobacco industry came up with in the 1990’s – in order to be transparent, scientific studies informing policies affecting the tobacco industry should be “sound.” And tobacco companies could better determine this “soundness” if they were allowed to reanalyze the underlying data these studies used in their analyses. Numerous internal documents have shown that the tobacco industry’s primary reason for this suggestion to policymakers was to obtain the raw data of the studies linking smoking to lung cancer, for the purpose of creating doubt of the science underlying this finding. It was never about the “soundness” of science; it was a tactic to delay rulemaking based on the science. Previously, it was my job to analyze and keep safe the private health information of thousands of research participants, so it makes my skin crawl to hear that tobacco companies wanted to grab this data from magnanimous people for what turned out to be a for-profit scheme.

This is the origin story of the Environmental Protection Agency’s (EPA) misleading-titled proposed rule “Strengthening Transparency in Regulatory Science,” which as we have noted before would diminish the role of science at the EPA and endanger our health and safety. On March 3, the EPA announced that they will issue a supplement to this rule, open to a 30-day comment period, which appears to be far worse than the original rule.

Like the tobacco company, the EPA is falsely claiming they can use this rule to increase scientific transparency if the agency restricts their attention to only “certain” scientific studies – studies in which scientists have relinquished their data and are willing to make it available for public scrutiny. If the study’s authors aren’t willing to surrender their data, then the EPA cannot reference the study when setting health-protective rules on topics such as air pollution and chemical safety. This will likely restrict the agency from using science in their decisionmaking process through a series of endless and biased re-analyses of the data by entities with an economic stake in creating doubt of the science. The result will be an agency that is paralyzed from carrying out scientifically justified measures to protect us all from environmental hazards.

But the EPA’s rule is gaslighting all of us in a different way. Since the EPA rule’s new supplement listed possible privacy protections for only three types of data (confidential business information, proprietary data, and data that cannot be anonymized), we can assume that the agency believes that a simple anonymization process is enough to fully and completely protect people’s confidential data. Scientists who collect and analyze human data, in fields like childhood lead poisoning, see the EPA’s rule as woefully at odds with the security and protection that research participants’ data deserve to have. And it is worth examining why this is in closer detail.

This rule would endanger people’s personal information…

Prior to joining UCS, for over five years, I used to work at two different public health labs: a Californian lab that collected alcohol and drug binging data from Indigenous people and people of color, and a British lab that collected hospital medical records from people who lived in an impoverished part of town and had recently experienced a stroke. People who donate their time and personal information to scientific labs are usually carrying out an altruistic act – they are providing researchers with valuable scientific knowledge that does not directly benefit them. I am grateful that thousands of people trusted me and my colleagues with the most private information imaginable, since human health research is entirely dependent on these acts of benevolence. And so, like other public health scientists, I took my promise to protect this data very seriously because I felt that I had an obligation and duty towards these charitable individuals.

People trust scientists with their private information because of the legal and ethical framework that has developed to protect human participants in scientific studies; for instance, the Health Insurance Portability and Accountability Act (HIPAA) requires that medical information be protected by a set of administrative, physical, and technical safeguards. Safeguards can include processes like anonymization. Anonymizing a dataset is the taking out of highly sensitive information from a dataset that could readily identify people like name, address, and social security numbers. But it is not uncommon for scientists to go further in this process to protect the data from threats such as hacking. Measures employed by scientists could include working on a non-Internet connected computer, encrypting the data, password-protecting a database, storing medical/biological information in a locked room, or even using a physical safe.

The EPA seems to believe that the process of anonymization is all that is required to protect people’s data. But anonymized data is not the be all, end all of data protection – many people can be identified from anonymized databases. Simple tools like an Internet search and a database query search have been able to re-identify research participants. One paper found that with just 15 characteristics (like age, gender, and marital status), a machine learning program could re-identify 99.98 percent of Americans from an anonymized database. Another paper found that a research participant’s region of residence could be deduced from prominent environmental studies with 80 to 98 percent accuracy. Therefore, a simple anonymization process, like that in the EPA rule, is not protective enough to safeguard people’s personal information from the public.

…Which is why scientists find this rule so appalling

Let’s imagine a scenario where several scientists, wanting to have their work inform the agency’s policymaking process, decided to follow the EPA’s restricted science rule and hand over their research data. What would be the repercussions?

Depending on their study’s informed consent process and the requirements of their Institutional Review Board (IRB) approval, a scientist who releases raw anonymized data to the public could be in violation of data privacy regulations. And the data may include participants who would never be able to provide consent for such a process, like research participants that have died. The EPA’s rule may also prove to be a disincentive for future research participants. If people think scientists will publish their private data (in an anonymized form) for the whole world to see, people may be less willing to sign up for research studies.

Since the anonymization alone is not a completely secure process, in this scenario it is almost inevitable that the names of some research participants can be identified. This could be highly damaging. Think of how a person’s employment, insurance, or personal relationships would be affected if information on the following topics were linked back to them: child abuse, sexual assault, illegal drug use, mental illness, immigration status, sexual orientation, genetic information, or sexually transmitted diseases. And in cases such as genetic information, this data could reveal sensitive information about the relatives of the research participants, including those that have yet to be born.

You can do something to stop this

In a rare joint statement, the editors of six prominent scientific journals sounded an alarm on the EPA’s rule, saying that the “proposal’s push for ‘transparency’ would be used as a mechanism for suppressing the use of relevant scientific evidence in policy-making, including public health regulations.” They also state that some data “cannot be shared openly; even anonymized personal data can be subject to re-identification, and it has been a longstanding practice for agencies and journals to acknowledge the value of data privacy adjustments.”

The EPA’s restricted science rule will have the impact of shuttering the agency from considering rigorous, comprehensive, and influential studies during their rulemaking process. And we the people will pay the price for this reduced scientific input in the policymaking process with our health and safety. But there is something you can do to change this. The comment period for the EPA’s rule is about to open up and we highly encourage you to write to the agency, using our public comment guide, to tell them what you really think of such a rule, a rule that robs science from the ability to inform health and environmental policies at the EPA.