Personal data are all information that allow you to directly or indirectly identify a living human being. This is determined by the Dutch privacy regulation (AVG), based on the European privacy regulation (GDPR). Examples of personal information are:
It is important to know that there are special categories of personal data, also called sensitive data:
According to the law, is it by default forbidden to process sensitive personal data, unless your research meets one of the legal conditions.
The following types of data are always considered sensitive data:
If possible, you should ask for the consent of the persons whose data you will collect or process. These persons are called 'respondents' or 'research persons'. Besides asking whether they consent to participate in your research, you are also obliged to provide information about your research project. This is called 'informed consent'. Explain in the information letter:
Finally, you ask for written consent to participate in the research, as it is required to be able to proof that the informed consent of all respondents/research persons. The Hanze UAS provides this template for informed consent (in Dutch) for researchers, which can be used as an example.
Radboud University, 2017. This video is designed by Rikkert Veldman.
Anonymization
In some cases, it is possible to anonymize data with personal information. This means that you remove all information that can be used to identify individual human beings. Anonymization implies that the removal of this information is irreversible: it won't be possible to restore the original dataset. Don't forget to also remove the data from the trash bin of your personal devices.
Not all data can be anonymized, for example if that would prevent you from analyzing the data. Or because personal information is required for further research. An alternative to anonymization is pseudonymization of the data.
Pseudonymization
To pseudonymize a dataset, you create a 'key' to the information that can be used to identify individual persons. You do this by linking a unique code to the information of each respondent. After that, you will create two different sets of data: