Ethical AI requires ethical data

Last summer, the EU appointed a High-Level Expert Group on Artificial Intelligence, chaired by Pekka Ala-Pietilä, who also leads Finland’s artificial intelligence programme. The EU Expert Group has now published its Draft Ethics Guidelines for Trustworthy AI. The first draft of the Guidelines, published on 18 December, is now open for consultation until 1 February.

The draft text deals with the ethics of the use of artificial intelligence, dividing it into 1) the ethical purpose of AI’s use, 2) guidelines on the use and reliability of data and 3) assessment criteria for practical implementation. The aim has been for the draft to strictly focus on the reliable use of artificial intelligence, and Ala-Pietilä’s group has done an excellent job in that respect. Unfortunately, the draft does not include a review of the actual collection of data and its ethics, which are essentially connected to it.

Recent news has indicated that the use of data is still akin to something out of the Wild West. Even some governments collect data and use it with no regard for the privacy of citizens. Such operations are only possible using data collected about us and used against us. Thus, a discussion about the ethics of data is more necessary than ever.

Ethical artificial intelligence depends on the data it uses

Before data can be refined using diverse analytics tools, of which artificial intelligence is just one, it must first be collected, organised, edited and stored. All of these phases involve numerous ethical questions.

Artificial intelligence cannot be ethical if the raw material it collects, data, is not ethical.

As a rule, collecting an individual’s data without permission should not be allowed, as the right to privacy is our fundamental right.

Should privacy protection be created in the digital world, similar to the protection of the confidentiality of correspondence and telephone calls, also with regard to digital data about us?

In accordance with the General Data Protection Regulation (GDPR) that entered into force in 2018, in Europe, companies and other organisations must request the user’s consent when they collect data and profile people. It is a good start, but follow-up work is required.

Problems with the collection and use of data

The first problem has to do with how the consent for the collection of data is requested. Few users understand what they are actually accepting by accepting the terms of use of a service. It is the original sin of the lawyers preparing terms of use that the documents they create are only understood by other lawyers. Also, in terms of time use, reading the terms of use of all services is practically impossible for the user of the service. According to research company Carnegie Mellon, it would take 76 work days for the average American to read the terms of use of all of the services they use.

Therefore, particular attention should be paid to the comprehensibility of terms of use, and consent should not be requested for data other than that absolutely required for providing the service.

When granting consent, the person giving consent must get a clear view of what data is collected and how it will be used in the future.

The user should also be informed that data can be used to make the service provider’s own interpretations, classifications and information can be combined with other data and people. The derived data can also be used for commercial purposes.

In the case of some services, the user might not even have any other option than to accept the terms of use to gain access to the services they absolutely need for work or studying, for example. Even in Finland, many public bodies force individuals to use YouTube and Google Docs without understanding that they simultaneously force the user to disclose their own data to these companies for marketing purposes, among other things. Is forcing users to accept an agreement that is unfavourable to them an ethically sustainable operating method?

Another question relating to the collection of data is how long the company retains the right to collect information about the user once they have approved the terms of use once. It is not reasonable to assume that the user understands, even after reading the terms of use of the service, that they are consenting to data collection that can even go on for years.

The third question relating to the collection of data is the amount of data collected. Collecting data about all of the activities of people reduces people’s ability to remain anonymous, which also offers protection to the individual. In the future, it might even be possible to end up in a situation in which people begin to restrict what they do because they do not want data to be collected about them. This could lead to the deterioration of the protection of individuals and therefore it might have direct consequences for our society.

The fourth question, one that should be considered in societal decision-making in particular, relates to the sampling of data collection. How can it be ensured that the data used in analyses is not already biased? Groups that use digital services less than average, such as those in the weakest position in society, might be excluded from data collection. Unless attention is paid to this, social decisions based on the analysis of data might lead to an even greater deterioration in the position of the individuals outside the scope of data collection.

Rapid development requires self-regulation as regulations lag behind

Legislation and legal praxis set boundary conditions for the collection and use of data. The GDPR is a good step in the right direction. The regulation also forces organisations to consider ethical questions about data and its use. However, it is only a signpost, leaving many practical procedures open. Therefore, self-regulation is also needed, and it is now necessary to invest in developing the ethics of artificial intelligence and data use as much as possible through this self-regulation.

We are hoping that the discussion about digital ethics will reach both decision-makers and common people as broadly as possible and force us all to consider the impacts of data collection and its use on our lives.

Digital ethics should be included in the agenda of companies and the government before the trust of individuals and customers is undermined. Building trust takes a long time, but it can be destroyed in a moment. Trust, on the other hand, is vital in a digital society to enable data to move and to allow new digital services that make our day-to-day lives easier to emerge.

Let’s think together about the value choices we would make if we were to design a digital society for future generations from scratch.

#IHAN

Reading list

Continue reading

Participants in Sitra’s business programme have great trust in the future of the data economy

How to make libraries forums of democracy?

Sitra’s rulebook provides a framework for companies to share data more easily, securely and fairly

Rulebook for a fair data economy

Data will make future traffic systems more efficient – sector creates fair rules to support service development

Digipower investigation: participants eager to find out what’s going on with their data

Data economy giants are like the clergy in the Middle Ages: they are asking us to have blind faith in them

Digipower investigation: Participants showing where our data goes

Digipower investigation: What puzzles the test subjects about the digital world?

Digipower investigation: What do the test subjects expect and fear in the digital world?

Exercise your digital rights and find out what your data is used for

A unique investigation is underway – How are decision-makers influenced on the web?

Digipower investigation Q&A

A new direction for the data economy – creativity grows stronger when privacy is protected

Speech by Sitra’s President Jyrki Katainen at the HIMMS Global Health Conference

Sitra’s training package for fair data economy business programmes

What's this about?

IHAN® project

Capacity for renewal

Ethical AI requires ethical data

Writers

Antti Larsio

Published