Documentation to make data (re)usable

To be able to use data in the long run, it is important to accompany the data with rich metadata and documentation. Not only to be able to reuse the data, but also during the research project, for example when new colleagues join the project team.

 

To be able to analyse the data, it is important to clearly describe the goals and context of data collection, in order to prevent that data is interpreted incorrectly. Therefore, it is important to annotate the research concepts, measurements and variables in a clear way. For reuse of the data, it is important to document who collected the data and under which conditions the data can be reused. 

 

The following elements are important:

  1. Purpose of data collection
  2. Particularities or restrictions of the data 
  3. Date of data collection
  4. Context of data collection (experimental setting, lab conditions or context of interviews)
  5. Description of the variables
  6. Name and version of the software which is used for data analysis
  7. File names are explained or are self-explanatory
  8. Cleary specified and documented version of the archived and/or reused data. 

Different types of metadata

Metadata about the data set:

  • Descriptive metadata such as creator, title, summary
  • Contextual metadata such as location, time, methods of data collection
  • Description of the different types of data and how they should be opened or analysed 
  • Who should or should not be allowed to access the data
  • Contact details of the person who created the data

Metadata can be applied to the dataset (such as Dublin Core) but also to the variable level. For FAIR data, it is crucial that a dataset has good metadata about the variables in the dataset, to be able to interpret the different values of the data.

The use of metadata standards, or controlled vocabulary, for the metadata on variable level, makes it easier to combine different datasets into a larger dataset. This is often done in the field of data science.

Often, supplementary materials are necessary to be able to interpret and analyse the data. For example the questionnaire or a description of the context in which the data are collected. This can often not be interpreted from the raw data files but should be documented explicitly. 

 

Examples of supplementary documentation: 

  • Project plan, description of methodology
  • Codebook
  • Questionnaires
  • Export of Electronic Lab Notebook (ELN)
  • README-file which describes the relation between the different folders and files and other metadata
  • Information about missing data, changes in the data
  • Code or software that is developed specifically for the research, or a reference to where the software can by found (for example on Github)

To make data truly FAIR, it's important that metadata can be read by both humans and computers (machine-readable). The findability of the data will be further increased by using metadata standards. These are agreements within a (research) community about the structure of a dataset, how information is coded and how the content of data should be interpreted. A well-known metadata standard is Dublin Core Metadata Element Set.

[anchornavigation]