Practical Approaches to Big Data Privacy Over Time
Authored by Micah Altman, Alexandra Wood, David O’Brien, and Urs Gasser
The Berkman Klein Center is pleased to announce a new publication from the Privacy Tools project, authored by a multidisciplinary group of project collaborators from the Berkman Klein Center and the Program on Information Science at MIT Libraries. This article, titled "Practical approaches to big data privacy over time," analyzes how privacy risks multiply as large quantities of personal data are collected over longer periods of time, draws attention to the relative weakness of data protections in the corporate and public sectors, and provides practical recommendations for protecting privacy when collecting and managing commercial and government data over extended periods of time.
Increasingly, corporations and governments are collecting, analyzing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis are yielding deeper understandings of individuals’ characteristics, behavior, and relationships. It is now possible to measure human activity at more frequent intervals, collect and store data relating to longer periods of activity, and analyze data long after they were collected. These developments promise to advance the state of science, public policy, and innovation. At the same time, they are creating heightened privacy risks, by increasing the potential to link data to individuals and apply data to new uses that were unanticipated at the time of collection. Moreover, these risks multiply rapidly, through the combination of long-term data collection and accumulations of increasingly “broad” data measuring dozens or even thousands of attributes relating to an individual.
Existing regulatory requirements and privacy practices in common use are not sufficient to address the risks associated with long-term, large-scale data activities. In practice, organizations often rely on a limited subset of controls, such as notice and consent or de-identification, rather than drawing from the wide range of privacy interventions available. There is a growing recognition that privacy policies often do not adequately inform individuals about how their data will be used, especially over the long term. The expanding scale of personal data collection and storage is eroding the feasibility and effectiveness of techniques that aim to protect privacy simply by removing identifiable information.
Recent concerns about commercial and government big data programs parallel earlier conversations regarding the risks associated with long-term human subjects research studies. For decades, researchers and institutional review boards have intensively studied long-term data privacy risks and developed practices that address many of the challenges associated with assessing risk, obtaining informed consent, and handling data responsibly. Longitudinal research data carry risks similar to those associated with personal data held by corporations and governments. However, in general, personal information is protected more strongly when used in research than when it is used in commercial and public sectors—even in cases where the risks and uses are nearly identical.
Combining traditional privacy approaches with additional safeguards identified from exemplar practices in long-term longitudinal research and new methods emerging from the privacy literature can offer more robust privacy protection. Corporations and governments may consider adopting review processes like those implemented by research ethics boards to systematically analyze the risks and benefits associated with data collection, retention, use, and disclosure over time. Rather than relying on a single intervention such as de-identification or consent, corporate and government actors may explore new procedural, legal, and technical tools for evaluating and mitigating risk, balancing privacy and utility, and providing enhanced transparency, review, accountability, as potential components of data management programs. Adopting new technological solutions to privacy can help ensure stronger privacy protection for individuals and adaptability to respond to emerging sophisticated attacks on data privacy. Risks associated with long-term big data management can be mitigated by combining sets of privacy and security controls, such as notice and consent, de-identification, ethical review processes, differential privacy, and secure data enclaves, when tailored to risk the factors present in a specific case and informed by the state of the art and practice.
This article was published by Oxford University Press in International Data Privacy Law, available at https://doi.org/10.1093/idpl/ipx027. The research underlying this article was presented at the 2016 Brussels Privacy Symposium on Identifiability: Policy and Practical Solutions for Anonymization and Pseudonymization, hosted by the Brussels Privacy Hub of the Vrije Universiteit Brussel and the Future of Privacy Forum, on November 8, 2016. This material is based upon work supported by the National Science Foundation under Grant No. CNS-1237235, the Alfred P. Sloan Foundation, and the John D. and Catherine T. MacArthur Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, the Alfred P. Sloan Foundation, or the John D. and Catherine T. MacArthur Foundation.
About the Privacy Tools for Sharing Research Data Project
Funded by the National Science Foundation and the Alfred P. Sloan Foundation, the Privacy Tools for Sharing Research Data project is a collaboration between the Berkman Klein Center for Internet & Society, the Center for Research on Computation and Society (CRCS), the Institute for Quantitative Social Science, and the Data Privacy Lab at Harvard University, as well as the Program on Information Science at MIT Libraries, that seeks to develop methods, tools, and policies to facilitate the sharing of data while preserving individual privacy and data utility.
Executive Director and Harvard Law School Professor of Practice Urs Gasser leads the Berkman Klein Center's role in this exciting initiative, which brings the Center's institutional knowledge and practical experience to help tackle the legal and policy-based issues in the larger project.
More information about the project is available on the official project website.