Data Ethics

by Heather Dewey-Hagborg and Joerg Blumtritt
at Strata + Hadoop World Conference Singapore 2016

We are generating data literally wherever we go and whatever we do—and not only about all our digital and mobile actions, like searches, purchases, preferences, and interests. In the Internet of Things, we leave behind a broad trace of all kinds of data that is often far more telling than results of classic social, psychological, or medical research, and we can hardly prevent this data from being accidentally collected, while passing by a WiFi router, for instance. Since a multitude of dimensions are tracked, the resulting profiles are so unique that they can no longer be anonymized. Persisting images of ourselves are created that we cannot control.

However, most people do not want to refuse the comfort and opportunities of our data-driven economy (benefits include online shopping, distributed energy production, and precision medicine, to name a few). Data sharing can create huge economic and social value. For example, compared to the average samples of a few hundred participants, real data could support medical research in an unprecedented way. Thus data sharing should be made attractive, but in order to do so, people must have confidence that their goodwill is not turned against them.

Joerg Blumtritt and Heather Dewey-Hagborg show how to deal with data in an ethical way that has sound economic value, covering three main threads.

The first level in implementing data ethics is about shaping applications. Privacy by design is already a well-established concept, but it must be extended to data ethics by design, incorporating built-in prevention of potential discrimination, misclassification, and assaultive abuse. The design follows the simple patterns of data courtesy—being kind with people and avoiding presumptions. Such design can also be cast into law. In Europe, health insurance companies are legally prevented from using gender to determine pricing; likewise, it is illegal to include data from social media profiles to calculate credit risks.

Second, and even more important, we must be empowered to make use of our data ourselves. We should own our data and decide about its proliferation and use. Data should be as open as possible and shared as simply as possible. Collecting data has to be done in a fair way. Of course, no one can care about their data explicitly all the time. Thus, we need algorithmic agents to deal with this task on our behalf.

Third, we need to work even harder to maintain a just and liberal democratic system that offers legal remedies to everyone and enforces good conduct. Malign political leadership on digital steroids might be much worse than those in predigital times. At the same time, big data promises nothing less than a smart society with distributed, noncentralized infrastructure that could offer much more freedom to people. Even more than our data-driven economy, we should actively shape our data-driven public.