Why I am a Data Policy Nerd

Sam Wu
City as a Service
Published in
4 min readAug 6, 2019

--

The rise of insights.

There’s a great, big assumption that with the growing ability to create and collect data, organizations will be able to know more about themselves and how to improve their operations and services.

From Microsoft’s internal inquiry of email usage to the analysis of employee facial recognition to discover indicators for high performance, the crunching of numbers to uncover trends or hidden solutions impacts a wide swath of industries and individuals. But along with this reliance on data-driven insights, there are also many stories of how those insights are later found to be unreliable or even biased.

Until recent data breaches, there has been little interrogation by data analysts or the general public over the collection and usage of personal, behavioral data. Part of that is because it’s difficult to know you are creating data or that it is being collected. Another aspect is that software development teams don’t often investigate whether they have the right to the product analytics data when they interact with their organization or services.

I fully admit that I did not question product development practices in the past as my intentions were good — I wanted to improve our service, make sure our changes worked as intended, etc. — and that was the job. There was some light effort to ensure we collected the right data points for the questions we were asking to define features, but I’ll admit it wasn’t incredibly thorough or scientific. Nor did we communicate how we were tracking user interactions or what we were trying to learn from our users.

Detail of the Stae Open Data Portal, where governments can share their civic data for the public to explore, as well as publish via API for developers to build with: Stae.co

Now, enter the government.

Over the decades, this push for analytics has spread beyond businesses into our public institutions. People want their tax dollars to fund things people can prove work. Philanthropy also relies on the collection of data points and the narratives that can be crafted around them.

And with the rise of data-generating and cloud-computing technologies this push is happening faster than it did in the business world. Governments, from small cities to intergovernmental institutions, are being pushed to build out analytic capabilities now. Government impact is being questioned when they don’t have evidence to back budgeting.

This isn’t to say that governments don’t already use data in their daily operations. Police departments, after all, were some of the early adopters of a “data-driven” mindset. But most government datasets are operational, collected physically, and highly sensitive. Information collected for the management of social services or public schools cannot be handed around without care.

At the same time, there is a race to become “smart” and install technology that will help redesign cities using new insights these tiny computers will collect. Instead of people’s data creation being limited to the usage of a specific service, it will be from our every move.

Governments are now on the hook to transition from physical to digital, operational to analytic, internal to inter-agency, but they also need to integrate data from new technologies, craft meaningful insights and narratives, and all the while, improve their services and grapple with equity concerns. This is because unlike in earlier experiments in the corporate realm, the users are the general public. And now we know we are creating data, that it’s being collected, and that it’s being used. The public has the right to government data, with some limitations, but what exactly does that mean? Where is the line between the right to know how government is operating and the right to personal privacy?

Members of the Stae team working with Jasmine McNealy, lawyer and data ethicists, to co-create an ethics framework into our data product and practice. (Source: Stae.co, 2019)

Building a shared ethics.

While the business world has established some frameworks for the safe collection and usage of data, generally referred to as “data governance,” it gets more complicated in the context of political governance. This is exactly the challenge I want to dig into and research in my role at Stae. What does a helpful and systemic governance structure look like for government-managed data? How does it account for the variety of sources, sensitivity, collection methods, and usage of government data? When can data be shared with the public, across departmental agencies, to a private company, or with no one at all? Should all data be collected? Should all data be released from private to public hands? Overall, building a robust, flexible data policy is a critical, public priority as services and technology increasingly rely upon using this data. This is why in my role leading data strategy and governance at Stae, I review our own internal analytics process and find ways to make it more transparent to our users so they know what we’re tracking, why, and where that data goes. If we don’t figure out better frameworks now, we will rely upon very shaky foundations. I look forward to sharing my findings and implementations in future posts :)

Sam Wu is the Head of Data Strategy and Governance for Stae. She has worn multiple hats in the technology space including software engineer, data analyst, product manager, and now policy researcher.

--

--

Sam Wu
City as a Service

I've worked as a software engineer, data analyst, product manager, policy advisor, etc. Also an activist, with the NYC NAPAWF chapter. https://www.sampswu.com/