Hitting the Books: America Needs a New Public Data System


Earlier this month, the Trump administration stripped the CDC of its control over the country’s coronavirus data. By insisting that all case reports go through the White House, the administration has further undermined public confidence in its response to the pandemic and tainted any future release of information with the prospect of having been politicized. But incidents like this are symptomatic of a deeper problem, says Julia Lane, a professor of public policy at NYU, in her new book, Democratizing our data: a manifesto. She argues that the steady decline in the quality of government-produced data that we have seen in recent years is not only a threat to our information-based economy, but the very foundations of our democracy itself.

In the excerpt below, Lane illustrates the challenges government employees face when they receive incomplete or biased data and are still expected to do their jobs, along with the huge benefits. that we can collect when data is used efficiently and ethically for the public good. Democratize our data is already available on Amazon Kindle and will go on sale in print on September 1.

MIT Press

From Democratizing Our Data: A Manifesto by Julia Lane. Reprinted with permission from The MIT PRESS. Copyright 2020. On sale now as an ebook. On sale in print on 01/09/2020.


These days, when people have a date to get to the other side of town, their calendar app helpfully predicts how long it will take to get there. When they go to Amazon to search for books that might be of interest to them, Amazon makes helpful suggestions and asks for feedback on how to improve their platform. If they select photos in Google Photos, it suggests who to send them to, prompts with other photos it thinks are like the ones selected, and a warning if the zip file is going to be particularly large. Our applications today are aware of the multiple dimensions of the data they manage for us, they update this information in real time and suggest options and possibilities based on those dimensions. In other words, the private sector is preparing for success because it uses data to provide us with useful products and services.

The government, not so much. The lack of data makes Joe Salvo’s job much more difficult. He is New York City’s chief demographer and uses data from the Census Bureau’s American Community Survey (ACS) to prepare for emergencies like Hurricane Sandy. It must use data to decide how to get elderly residents to physically accessible shelters – operationally, or tell a fleet of fifty buses to pick up and evacuate the elderly. He needs data on the characteristics of the local population for the mayor’s office for people with disabilities. He must identify areas with large populations of older people to tell the Metropolitan Transit Authority where to send the buses. It must identify neighborhoods with significant vulnerable populations so that the Ministry of Health and Mental Hygiene can install emergency generators in the facilities of the Ministry of Health. But the products of the federal statistical system are not giving it the value it needs. The most recent data from the leading source on the U.S. population, ACS, is released two years after collection, and itself reflects five-year moving averages.

Creating value for the consumer is the key to success in the private sector. The challenge for statistical agencies is figuring out how to prepare for success and produce high quality data measured against the same checklist by providing access to the data while protecting privacy and confidentiality.

The problem is, the checklist for agencies is even longer with additional requirements so Joe Salvo and his counterparts can do their jobs better. A requirement, given that the United States is a democracy, is that the statistics be as unbiased as possible, so that all residents, regardless of their characteristics, are counted and treated equally as far as they are concerned. . Correcting the inevitable bias in source data is an important role for statistical agencies. Another requirement is that the data collection is cost effective, so that the taxpayer gets a good deal. A third requirement is that the information collected is consistent over time so that trends can be easily spotted and addressed. Agencies need outside help from stakeholders and experts to ensure that all of these requirements are met. This requires access to the data, which requires addressing privacy issues.

The value generated when government agencies can directly provide access and produce new metrics can be substantial. For example, the same people who bring you the National Weather Service and its weather forecasts — the National Oceanic and Atmospheric Agency, or NOAA — have provided scientists and entrepreneurs with access to data to develop new products, such as weather forecasting. forest fires and the provision of real-time intelligence services for natural disasters in the United States and Canada. Transit agencies share transportation data with private sector app developers who produce high-quality apps that offer real-time maps of bus locations and expected arrival times at bus stops, etc. .

But other cases where the government has confidential data, which most statistical agencies do, are different. We need to be able to rely on our government to keep some data very private, but that often means we have to forgo the granularity of government data that is produced. If, for example, the IRS provided so much information about taxpayers that it was possible to know how much money a given individual was making, the public would be outraged.

So many government agencies have to worry about two things: (1) producing data that is valuable and (2) at the same time ensuring that the privacy of data owners is protected. This can be done. Some (smaller) governments have been more successful than others in creating data systems that live up to the checklist of desired features while protecting privacy.

Take the children’s service system, for example. To put children’s services into context, nearly four in ten American children will be referred to their local government for possible child abuse or neglect before the age of eighteen. That’s nearly four million references per year. Frontline social workers need to make quick decisions about these referrals. If they’re wrong one way or the other, the potential downside is huge: Children who are poorly screened due to inadequate or inaccurate data could be ripped from loving families. Or, conversely, also due to insufficient data, children could end up in abusive families and die. Additionally, there could be bias in decisions, leaving black or LGBTQ parents more likely to be penalized, for example.

In 2014, the Allegheny County Office of Children, Youth and Families (CYF) in Allegheny County, Pennsylvania mobilized to use its internal data prudently and ethically to help social workers do their jobs better. The results gained national attention, as reported in a New York Times Magazine article. CYF hired academic experts to design an automatic risk assessment tool that summarizes information about a family to help the social worker make better decisions. The risk score, a number between 1 and 20, uses much of the family information in the county system, such as child protection records, prison records, and behavioral health records, to predict adverse events that may lead to fostering a child.

An analysis of the effectiveness of this tool showed that a child with the highest possible referral placement score (20) is twenty-one times more likely to be admitted to hospital for a referral. self-inflicted injury, seventeen times more likely to be admitted for physical assault, and 1.4 times more likely to be admitted for accidental fall than a child with a risk score of 1, the lowest possible. An independent evaluation found that social worker decisions based on score were more accurate (cases were more likely to be correctly identified as needing help and less likely to be mistakenly identified as not needing help. assistance), case caseloads decreased and racial bias was likely. To reduce. On the eight-item checklist, Allegheny County hit all of the items. They produced a new product that was used, was cost effective, and produced real-time, accurate, complete, relevant, accessible, interpretable, granular, and consistent data. And CYF did not violate confidentiality. But more importantly, Allegheny County has worked carefully and openly with parent, child, and civil rights advocates to ensure the program was not built behind closed doors. They worked, in other words, to ensure that the new measures were developed and used democratically.

The story of Allegheny County is an illustration of how new technology can be used to democratize the decision to balance the ever-present trade-off between the usefulness of a new measure and the risk of compromising privacy. They took advantage of the potential to create useful information that people and policy makers need while protecting privacy. This potential can be realized in other contexts by making the value of data clearer to the public. While this service / cost trade-off has typically been made by a small group of experts within an agency, there are many new tools that can democratize decision-making by providing more information to the public. This chapter discusses in more detail the challenges and new approaches to the utility / cost trade-off. There are many lessons to be learned from past experiences.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through any of these links, we may earn an affiliate commission.


Comments are closed.