The Advantages of a Doppelgänger City

Marketing companies own vast amounts of data gleaned from our smartphone apps, which reveal exactly where we’ve been and when. A dataset reviewed by the New York Times shows more than 235 million locations captured from 1.2 million devices in the New York area over a three-day period alone. In a noteworthy article and podcast episode published in December, the newspaper does an excellent job of shaking us out of our complacency, revealing the tragedy of vulnerable individuals whose privacy gets sold “en masse” to the highest bidder. It raises many issues concerning a lack of policy and oversight in the field of location tracking, and exposes its personal, societal, institutional and corporate dimensions.

The staggering numbers involved in location services data gathering could well represent an Orwellian nightmare in the making. As a mobility researcher in the age of big data, however, one eventually becomes inured to such numbers. Instead of viewing them as the doomsday of privacy, one can focus on the promise they hold for building better mobility models. For it’s indeed possible to use such data while protecting people’s privacy.

Privacy protection arms race

There are several approaches to doing this. In the initial phase of their investigation, the journalists’ queries to data providers were met with claims that data were being aggregated or anonymized. Generally, this means that either data points are bundled together so that individuals cannot be told apart, or that identifying information about them is “masked”, i.e. deliberately altered.

When it comes to data on people's movement, however, anonymization is more tricky. As the technologies for protecting privacy and anonymizing individual trajectories advance, so do the de-anonymizing algorithms for reconstructing traces of the individuals. Which means that a responsible data collector might invest in an array of certified devices, only to find that the privacy protection gets defeated sometime later in an unending privacy protection arms race.

Synthetic data as an alternative

This is what motivates our team at the Future Cities Laboratory to develop an alternative to typical location masking techniques. What if we could create synthetic location data streams as what is actually sensed through devices, without compromising resolution in time and space and without reproducing any actual trajectory?

In practice, there are very few circumstances in which someone who wants to analyze mobility data needs access to the detailed original data of a specific person. And it’s also possible to work with a deliberately modified data set. In our research, we focus on building synthetic data streams, using techniques that intentionally restrict the actual raw data to machine-eyes-only.

There are several steps to generating synthetic location data: First raw data from mobile devices are transmitted in a secure and encrypted manner and used to produce audited and certified data aggregates. These can then be deployed to generate synthetic mobility data, which do not differ in their statistical characteristics from the real data. In our lab we’re currently working on two distinct techniques to implement this.

These techniques not only represent advances in privacy preservation; more importantly, they stretch the potential of transport modelling. By feeding this synthetic data into state-of-the-art mobility simulation programmes, it’s possible to create an entire “doppelgänger city” to test, probe and experiment with policy decisions, while leaving people in the real world safe and surveillance-free.

Editor’s note:this article was originally published by ETH Zurich and republished here with permission.