“Big data” can be defined as a problem-solving philosophy that leverages massive datasets and algorithmic analysis to extract “hidden information and surprising correlations.”[1] Not only does big data pose a threat to traditional notions of privacy, but it also compromises socially shared information. This point remains underappreciated because our so-called public disclosures are not nearly as public as courts and policymakers have argued—at least, not yet. That is subject to change once big data becomes user friendly.

Most social disclosures and details of our everyday lives are meant to be known only to a select group of people.[2] Until now, technological constraints have favored that norm, limiting the circle of communication by imposing transaction costs—which can range from effort to money—onto prying eyes. Unfortunately, big data threatens to erode these structural protections, and the common law, which is the traditional legal regime for helping individuals seek redress for privacy harms, has some catching up to do.[3]

To make our case that the legal community is under-theorizing the effect big data will have on an individual’s socialization and day-to-day activities, we will proceed in four steps.[4] First, we explain why big data presents a bigger threat to social relationships than privacy advocates acknowledge, and construct a vivid hypothetical case that illustrates how democratized big data can turn seemingly harmless disclosures into potent privacy problems. Second, we argue that the harm democratized big data can inflict is exacerbated by decreasing privacy protections of a special kind—ever-diminishing “obscurity.” Third, we show how central common law concepts might be threatened by eroding obscurity and the resulting difficulty individuals have gauging whether social disclosures in a big data context will sow the seeds of forthcoming injury. Finally, we suggest that one way to stop big data from causing big, unredressed privacy problems is to update the common law with obscurity-sensitive considerations.

I. Big, Social Data

For good reason, the threat big data poses to social interaction has not been given its due. Privacy debates have primarily focused on the scale of big data and concentrations of power—what big corporations and big governments can do with large amounts of finely analyzed information. There are legitimate and pressing concerns here, which is why scholars and policymakers focus on Fair Information Practice Principles (FIPPs), deidentification techniques, sectoral legislation protecting particular datasets, and regulatory efforts to improve data security and safe international data transfers.[5]

This trajectory fails to address the full scope of big data as a disruptive force in nearly every sector of the patchwork approach to privacy protection in the United States. Individuals eventually will be able to harness big datasets, tools, and techniques to expand dramatically the number and magnitude of privacy harms to themselves and others, perhaps without even realizing it.[6] This is problematic in an age when so many aspects of our social relationships with others are turned into data.

Consider web-scraping companies that dig up old mugshots and showcase them online, hoping embarrassed or anxious citizens will pay to have their images taken down. It isn’t hard to imagine that the next generation of this business will cast a wider net, capitalizing on stockpiles of aggregated and filtered data derived from diverse public disclosures. Besides presenting new, unsettling detail about behavior and proclivities, they might even display predictive inferences couched within litigation-buttressing weasel wording—e.g., “correlations between X and Y have been known to indicate Z.” Everyone, then, will be at greater risk of unintentionally leaking sensitive personal details. Everyone will be more susceptible to providing information that gets taken out of its original context, becomes integrated into a new profile, and subsequently harms a friend, family member, or colleague.

Inevitably, those extracting personal details from big data will argue that the information was always apparent and the law should not protect information that exists in plain sight.[7] The law has struggled with protecting privacy in public long before big data. However, we envision a tipping point occurring whereby some pro-publicity precedent appears more old than wise.

II. More Data, Less Obscurity

Socialization and related daily public disclosures have always been protected by varying layers of obscurity, a concept that we previously defined as follows:

Obscurity is the idea that when information is hard to obtain or understand, it is, to some degree, safe. Safety, here, doesn’t mean inaccessible. Competent and determined data hunters armed with the right tools can always find a way to get it. Less committed folks, however, experience great effort as a deterrent.

Online, obscurity is created through a combination of factors. Being invisible to search engines increases obscurity. So does using privacy settings and pseudonyms. Disclosing information in coded ways that only a limited audience will grasp enhances obscurity, too. Since few online disclosures are truly confidential or highly publicized, the lion’s share of communication on the social web falls along the expansive continuum of obscurity: a range that runs from completely hidden to totally obvious.[8]

In the past, individuals have been able to roughly gauge whether aspects of their daily routines and personal disclosures of information would be safeguarded at any appropriate level of privacy protection by (sometimes implicitly) guessing the likelihood their information would be discovered or understood by third parties who have exploitative or undesirable interests. In the age of big data, however, the confidence level associated with privacy prognostication has decreased considerably, even when conscientious people exhibit due diligence.

Increasingly powerful and often secretive (proprietary and governmental) algorithms combined with numerous and massive datasets are eroding the structural and contextual protections that imposed high transactional costs on finding, understanding, and aggregating that information. Consumers got a taste of both the ease and power in which these processes can occur when Facebook rolled out Graph Search, denied it had privacy implications, then also revealed how readily what we “like” gets translated into who we are.

Maintaining obscurity will be even more difficult once big data tools, techniques, and datasets become further democratized and made available to the non-data-scientist masses for free or at low cost. Given recent technological trends, this outcome seems to be gradually approaching inevitability. At the touch of a button, Google’s search engine can already unearth an immense amount of information that not too long ago took considerable effort to locate. Looking ahead, companies like Intel are not shy about letting the public know they believe “data democratization is a good bet.”[9]

Decreasing confidence in our ability to judge the privacy value of disclosures puts us on a collision course for deepening the problem of “bounded rationality” and, relatedly, what Daniel Solove recognized as the problems of scale, aggregation, and assessing harm.[10] It appears that the courts will need to grapple with a new wave of allegations of harms arising from behavior that yielded unintended and unforeseeable consequences.

As a thought experiment that crystalizes our guiding intuitions, consider a big data update to the problems that occurred when college students were revealed to be gay to their disapproving parents after a third party added them as members to Facebook’s Queer Chorus group.[11] In the original instance, the salient tension was between how Facebook described its privacy settings and what users expected when utilizing the service. But what if someday a parent, teacher, or other authority figure wanted to take active steps to determine if their child, student, or employee was gay? Using democratized big data, a range of individually trivial, but collectively potent, information could be canvassed. Geolocation data conveyed when the child, or, crucially, his or her friends, used services like Foursquare combined with increasingly sophisticated analytical tools could lead to a quick transition from checking in to being outed. People-search services like Spokeo are well positioned to offer such user-friendly big data services.

III. The Common Law Privacy Implications of Big Data for Everyone

Once big data is democratized and obscurity protections are further minimized, peer-to-peer interactions are poised to challenge many traditional common law concepts. Because the courts already make inconsistent rulings on matters pertaining to what reasonable expectations of privacy are, tort law is especially vulnerable.[12]

Here are a few of the fundamental questions we expect the courts will struggle to answer:

What Constitutes a Privacy Interest? A crucial question for both the tort of public disclosure of private facts and the tort of intrusion upon seclusion is whether the plaintiff had a privacy interest in a certain piece of information or context. This determination has varied wildly among the courts, and it is unclear how ubiquitous big data will alter this. For example, some courts have found that a privacy interest exists in involuntary exposure in public.[13] Other courts have found that overzealous surveillance in public that reveals confidential data can be seen to violate a privacy interest.[14] Will invasive “dataveillance” trigger the same protections?[15] Finally, courts have found, albeit inconsistently, a privacy interest in information known only to, and likely to stay within, a certain social group.[16] Does an increased likelihood that such information might be ascertained by outsiders destroy the privacy interest in information shared discreetly in small groups?[17]

What Actions Are Highly Offensive? Directly revealing or gaining access to certain kinds of information has been found to be highly offensive for purposes of the disclosure, intrusion, and false light torts.[18] In an age of predictions based upon data, would indirect disclosures of private information also be considered highly offensive? If not, does the law need to better articulate these limits? Does it matter if the eventual revelation of certain kinds of information that is highly offensive was predictable? Regarding the intrusion tort, can information gleaned from “public” big datasets ever be considered “secluded” and, if so, would using tools to unearth such data ever be considered highly offensive to a reasonable person?[19]

What Kinds of Disclosures Breach a Confidence? When has a confidant disclosed enough indirect information effectively to breach a confidence? If revealing a friend’s location more than once a week allows others to determine that he is visiting a doctor for treatment of a communicable disease—a secret you promised to keep confidential—have you breached your promise? Courts would likely be hesitant to find a breach if the link between the disclosure and revealed confidential information were speculative, though inevitably some indirect disclosures will be so likely to compromise the confidentiality of other pieces of information so as to result in a de facto disclosure of the information itself. Should contracts with privacy-protective terms between individuals and small groups contemplate potential uses in big data? What lengths must confidants go to protect facts from being uncovered via big data techniques?

IV. Regulating the Big Impact of Small Decisions

Given the powerful debate over large-scale regulation of big data, safeguarding smaller, peer-to-peer interaction may prove to be the most feasible and significant privacy-related protection against big data.[20] The concept of obscurity might be useful in guiding the common law’s evolution. If embraced as part of the disclosure and intrusion privacy torts, obscurity would allow socially shared information to fall within the ambit of “private facts” and “secluded” contexts. Contracts could also be used to protect the obscurity of individuals by targeting big data analysis designed to reveal socially shared but largely hidden information. Those charged with interpreting broad privacy-related terms should keep in mind structural and contextual protections that might have been relied upon by those whose privacy was to be protected.

Those forming the common law can now choose one of two paths. They can cling to increasingly ineffective and strained doctrines that were created when structural and contextual protections were sufficient for most of our socialization and obscure activities in public. Or they can recognize the debilitating effect big data has on an individual’s ability to gauge whether social disclosures and public activity will later harm themselves and others, and evolve the common law to keep small acts of socialization and our day-to-day activities from becoming big problems.