Legal debates over the “big data” revolution currently focus on the risks of inclusion: the privacy and civil liberties consequences of being swept up in big data’s net. This Essay takes a different approach, focusing on the risks of exclusion: the threats big data poses to those whom it overlooks. Billions of people worldwide remain on big data’s periphery. Their information is not regularly collected or analyzed, because they do not routinely engage in activities that big data is designed to capture. Consequently, their preferences and needs risk being routinely ignored when governments and private industry use big data and advanced analytics to shape public policy and the marketplace. Because big data poses a unique threat to equality, not just privacy, this Essay argues that a new “data antisubordination” doctrine may be needed.
* * *
The big data revolution has arrived. Every day, a new book or blog post, op-ed or white paper surfaces casting big data, for better or worse, as groundbreaking, transformational, and “disruptive.” Big data, we are told, is reshaping countless aspects of modern life, from medicine to commerce to national security. It may even change humanity’s conception of existence: in the future, “we will no longer regard our world as a string of happenings that we explain as natural or social phenomena, but as a universe comprised essentially of information.”
This revolution has its dissidents. Critics worry the world’s increasing “datafication” ignores or even smothers the unquantifiable, immeasurable, ineffable parts of human experience. They warn of big data’s other dark sides, too: potential government abuses of civil liberties, erosion of long-held privacy norms, and even environmental damage (the “server farms” used to process big data consume huge amounts of energy).
Legal debates over big data focus on the privacy and civil liberties concerns of those people swept up in its net, and on whether existing safeguards—minimization, notice, consent, anonymization, the Fourth Amendment, and so on—offer sufficient protection. It is a perspective of inclusion. And that perspective makes sense: most people, at least in the industrialized world, routinely contribute to and experience the effects of big data. Under that conception, big data is the whale, and we are all of us Jonah.
This Essay takes a different approach, exploring big data instead from a perspective of exclusion. Big data poses risks also to those persons who are not swallowed up by it—whose information is not regularly harvested, farmed, or mined. (Pick your anachronistic metaphor.) Although proponents and skeptics alike tend to view this revolution as totalizing and universal, the reality is that billions of people remain on its margins because they do not routinely engage in activities that big data and advanced analytics are designed to capture.
Whom does big data exclude? What are the consequences of exclusion for them, for big data as a technology, and for societies? These are underexplored questions that deserve more attention than they receive in current debates over big data. And because these technologies pose unique dangers to equality, and not just privacy, a new legal doctrine may be needed to protect those persons whom the big data revolution risks sidelining. I call it data antisubordination.
* * *
Big data, for all its technical complexity, springs from a simple idea: gather enough details about the past, apply the right analytical tools, and you can find unexpected connections and correlations, which can help you make unusually accurate predictions about the future—how shoppers decide between products, how terrorists operate, how diseases spread. Predictions based on big data already inform public- and private-sector decisions every day around the globe. Experts project big data’s influence only to grow in coming years.
If big data, as both an epistemological innovation and a new booming industry, increasingly shapes government and corporate decisionmaking, then one might assume much attention is paid to who and what shapes big data—the “input.” In general, however, experts express a surprising nonchalance about the precision or provenance of data. In fact, they embrace “messiness” as a virtue. Datasets need not be pristine; patterns and trends, not granularity or exactness, are the goal. Big data is so big—terabytes, petabytes, exabytes—that the sources or reliability of particular data points cease to matter.
Such sentiments presume that the inevitable errors creeping into large datasets are random and absorbable, and can be factored into the ultimate analysis. But there is another type of error that can infect datasets, too: the nonrandom, systemic omission of people who live on big data’s margins, whether due to poverty, geography, or lifestyle, and whose lives are less “datafied” than the general population’s. In key sectors, their marginalization risks distorting datasets and, consequently, skewing the analysis on which private and public actors increasingly depend. They are big data’s exclusions.
Consider two hypothetical people.
The first is a thirty-year-old white-collar resident of Manhattan. She participates in modern life in all the ways typical of her demographic: smartphone, Google, Gmail, Netflix, Spotify, Amazon. She uses Facebook, with its default privacy settings, to keep in touch with friends. She dates through the website OkCupid. She travels frequently, tweeting and posting geotagged photos to Flickr and Instagram. Her wallet holds a debit card, credit cards, and a MetroCard for the subway and bus system. On her keychain are plastic barcoded cards for the “customer rewards” programs of her grocery and drugstore. In her car, a GPS sits on the dash, and an E‑ZPass transponder (for bridge, tunnel, and highway tolls) hangs from the windshield.
The data that she generates every day—and that governments and companies mine to learn about her and people like her—are nearly incalculable. In addition to information collected by companies about her spending, communications, online activities, and movement, government agencies (federal, state, local) know her well: New York has transformed itself in recent years into a supercharged generator of big data. Indeed, for our Manhattanite, avoiding capture by big data is impossible. To begin even to limit her exposure—to curb her contributions to the city’s rushing data flows—she would need to fundamentally reconstruct her everyday life. And she would have to move, a fate anathema to many New Yorkers. Thus, unless she takes relatively drastic steps, she will continue to generate a steady data flow for government and corporate consumption.
Now consider a second person. He lives two hours southwest of Manhattan, in Camden, New Jersey, America’s poorest city. He is underemployed, working part-time at a restaurant, paid under the table in cash. He has no cell phone, no computer, no cable. He rarely travels and has no passport, car, or GPS. He uses the Internet, but only at the local library on public terminals. When he rides the bus, he pays the fare in cash.
Today, many of big data’s tools are calibrated for our Manhattanite and people like her—those who routinely generate large amounts of electronically harvestable information. A world shaped by big data will take into account her habits and preferences; it will look like her world. But big data currently overlooks our Camden subject almost entirely. (And even he, simply by living in a U.S. city, has a much larger data footprint than someone in Eritrea, for example.) In a future where big data, and the predictions it makes possible, will fundamentally reorder government and the marketplace, the exclusion of poor and otherwise marginalized people from datasets has troubling implications for economic opportunity, social mobility, and democratic participation. These technologies may create a new kind of voicelessness, where certain groups’ preferences and behaviors receive little or no consideration when powerful actors decide how to distribute goods and services and how to reform public and private institutions.
This might sound overheated. It is easy to assume that exclusion from the big data revolution is a trivial concern—a matter simply of not having one’s Facebook “likes” or shopping habits considered by, say, Walmart. But the consequences of exclusion could be much more profound than that.
First, those left out of the big data revolution may suffer tangible economic harms. Businesses may ignore or undervalue the preferences and behaviors of consumers who do not shop in ways that big data tools can easily capture, aggregate, and analyze. Stores may not open in their neighborhoods, denying them not just shopping options, but also employment opportunities; certain promotions may not be offered to them; new products may not be designed to meet their needs, or priced to meet their budgets. Of course, poor people and minority groups are in many ways already marginalized in the marketplace. But big data could reinforce and exacerbate existing problems.
Second, politicians and governments may come to rely on big data to such a degree that exclusion from data flows leads to exclusion from civic and political life—a barrier to full citizenship. Political campaigns already exploit big data to raise money, plan voter-turnout efforts, and shape their messaging. And big data is quickly making the leap from politics to policy: the White House, for example, recently launched a $200 million big data initiative to improve federal agencies’ ability “to access, organize, and glean discoveries from huge volumes of digital data.”
Just as U.S. election districts—and thus U.S. democracy—depend on the accuracy of census data, so too will policymaking increasingly depend on the accuracy of big data and advanced analytics. Exclusion or underrepresentation in government datasets, then, could mean losing out on important government services and public goods. The big data revolution may create new forms of inequality and subordination, and thus raises broad democracy concerns.
* * *
“There is no caste here,” Justice Harlan said of the United States, “no superior, dominant, ruling class of citizens.” But big data has the potential to solidify existing inequalities and stratifications and to create new ones. It could restructure societies so that the only people who matter—quite literally the only ones who count—are those who regularly contribute to the right data flows.
Recently, some scholars have argued that existing information privacy laws—whether the U.S. patchwork quilt or Europe’s more comprehensive approach—may be inadequate to confront big data’s privacy risks. But big data threatens more than just privacy. It could also jeopardize political and social equality by relegating vulnerable people to an inferior status.
U.S. equal protection doctrine, however, is ill suited to the task of policing the big data revolution. For one thing, the poor are not a protected class, and thus the doctrine would do little to ensure, either substantively or procedurally, that they share in big data’s benefits. And the doctrine is severely limited in its ability to “address disadvantage that cannot readily be traced to official design or that affects a diffuse and amorphous class.” Moreover, it is hard to imagine what formal equality or “anticlassification” would even look like in the context of big data.
Because existing equality law will not adequately curb big data’s potential for social stratification, it may become necessary to develop a new equality doctrine—a principle of data antisubordination. Traditionally, U.S. antisubordination theorists have argued “that guarantees of equal citizenship cannot be realized under conditions of pervasive social stratification,” and “that law should reform institutions and practices that enforce the secondary social status of historically oppressed groups.” This antisubordination approach—what Owen Fiss called the “group-disadvantaging principle”—may need to be revised, given big data’s potential to impose new forms of stratification and to reinforce the status of already-disadvantaged groups.
A data antisubordination principle would, at minimum, provide those who live outside or on the margins of data flows some guarantee that their status as persons with light data footprints will not subject them to unequal treatment by the state in the allocation of public goods or services. Thus, in designing new public-safety and job-training programs, forecasting future housing and transportation needs, and allocating funds for schools and medical research—to name just a few examples—public institutions could be required to consider, and perhaps work to mitigate, the disparate impact that their use of big data may have on persons who live outside or on the margins of government datasets. Similarly, public actors relying on big data for policymaking, lawmaking, election administration, and other core democratic functions could be required to take steps to ensure that big data’s marginalized groups continue to have a voice in democratic processes. That a person might make only limited contributions to government data flows should not relegate him to political irrelevance or inferiority.
Data antisubordination could also (or alternatively) provide a framework for judicial review of congressional and executive exploitation of big data and advanced analytics. That framework could be modeled on John Hart Ely’s “representation-reinforcing approach” in U.S. constitutional law, under which “a court’s ability to override a legislative judgment ought to be calibrated based on the fairness of the political process that produced the judgment.” In the context of big data, rather than mandating any particular substantive outcome, a representation-reinforcing approach to judicial review could provide structural, process-based safeguards and guarantees for those people whom big data currently overlooks, and who have had limited input in the political process surrounding government use of big data.
To be most effective, however, a data antisubordination principle would need to extend beyond state action. Big data’s largest private players exert an influence on societies, and a power over the aggregation and flow of information, that in previous generations not even governments enjoyed. Thus, a data antisubordination principle would be incomplete unless it extended, in some degree, to the private sector, whether through laws, norms, or standards.
Once fully developed as theory, a data antisubordination principle—at least as it applies to state action—could be enshrined in law by statute. Like GINA, it would be a civil rights law designed for potential threats to equal citizenship embedded in powerful new technologies—threats that neither the Framers nor past civil rights activists could have envisioned.
As lines between the physical and datafied worlds continue to blur, and as big data and advanced analytics increasingly shape governmental and corporate decisionmaking about the allocation of resources, equality and privacy principles will grow more and more intertwined. Law must keep pace. In “The Right to Privacy,” their 1890 Harvard Law Review article, a young Louis Brandeis and co-author Samuel Warren recognized that “[r]ecent inventions and business methods call attention to the next step which must be taken for the protection of the person.” The big data revolution, too, demands “next steps,” and not just in information privacy law. Brandeis and Warren’s “right to be let alone”—which Brandeis, as a Supreme Court justice, would later call the “most comprehensive of rights and the right most valued by civilized men”—has become an obsolete and insufficient protector. Even more modern information privacy principles, such as consent and the nascent “right to be forgotten,” may turn out to have only limited utility in an age of big data.
Surely revised privacy laws, rules, and norms will be needed in this new era. But they are insufficient. Ensuring that the big data revolution is a just revolution, one whose benefits are broadly and equitably shared, may also require, paradoxically, a right not to be forgotten—a right against exclusion.