Publishing data: an introduction

Publishing research data is making (meta)data findable, citeable, and (re)usable, using a license that clearly states how the data can be used. Publishing research data is also often referred to as 'data sharing' (RDNL, 2022).

 

Publishing data has several advantages: 

  • Increasing the impact of your work by making it available for other research, such as meta-analyses
  • Increasing the visibility of researchers and their work, leading to possible new (interdisciplinary) collaboration
  • Reducing the duplication of data collection
  • Enhancing the transparency of research by making it verifiable and reproducible

Self-check: can I publish my data?

Important aspects when publishing data

The access to data can be restricted due to special conditions that apply to the data, such as privacy regulations or if there is a non-disclosure agreement applicable to the data. There can also be other reasons to not publish the data fully open. Want to know whether your data can be made available with or without restrictions? Please contact the Information Specialist of your research centre.

 

An alternative that is often allowed, is to publish the metadata of the dataset or to publish the data set 'on request'. This is a good way to protect the sensitive data, while maintaining control over who and under which conditions can use the data.

A license is a way to communicate under which conditions a dataset can be used. For example, whether the data can be read, reused, shared and altered. One condition that is often used for datasets, is the requirement to cite the creator of the dataset.

 

To improve the reusability of the data, we advise to choose a license which:

  • makes data available for an as large as possible audience;
  • allows as many types of use as possible.

The Open Science-policy recommends the use of the Creative Commons licenses. If your project is externally funded, also check the license requirements of the funding agency, if applicable.

One of the advantages of publishing a dataset is making your work citable by yourself and others. A Persistent Identifier (PID) is a useful tool to make this possible. A PID is a sustainable reference to a document, file, webpage or other (physical or digital) object. Many data repositories use the Digital Object Identifier (DOI), which is also widely used by academic journals. 

 

You can also link a dataset to your personal identifier, such as ORCID. This increases the online findability and visibility of your work even more.

If you want to make your data available according to the FAIR principles, it's important to find a suitable repository. The following criteria may be relevant:

  • How well-known or findable is the repository by researchers in your field?
  • Does the repository provide unique identifiers, such as DOI?
  • Does the repository work with metadata standards (such as Dublin Core)?
  • Does the repository guarantee to archive the data and make the data available in the long term as well?

We advise to deposit research data in a data archive that is certified as Trusted Digital Repository (TDR). This means that the repository is recognized as trustworthy by a third, independent organisation. A well-known certificate for data repositories is the CoreTrustSeal, initiated by KNAW-DANS.

Where to publish research data

Hanze UAS uses the national data repository DataverseNL for publishing and reusing datasets. DataverseNL makes it possible to store, share and publish data during the research project until the recommended ten years after finalization of the project. DataverseNL is a Dutch network of data repositories that uses the Dataverse software developed by Harvard University (USA). The software is used worldwide. DataverseNL is managed by the Data Archiving and Networked Services (DANS).

 

Support at Hanze UAS

Hanze UAS has their own DataverseNL account where researchers can make their datasets available. This is an excellent way to comply with the FAIR data requirements from funding agencies. When publishing your dataset, you will receive a DOI (unique, persistent identifier) that makes the dataset citeable by yourself and others.

We support this process by making the data and metadata ready for publication, in collaboration with the researcher. After providing the dataset and metadata, the Information Specialist of your research centre will take care of the process of publishing the dataset, of course after review by the researcher. Read more about this process here.

 

Dataverse logo

DANS EASY is an online archiving system for the deposition and reuse of research data. EASY contains datasets from the humanities, health sciences, social and behavioral sciences, oral history and geoscience. Moreover, EASY provides access to secured microdata of Statistics Netherlands (CBS) and serves as the E-depot for Dutch archeology.

 

4TU.ResearchData is an online data repository with a focus on technical research domains. 4TU.ResearchData provides long-term data storage, archiving, access to and curation of research data. The repository was developed in 2010 and has the purpose to provide an online environment for researchers to upload and share data, as well as download and use research data. 

In some research domains, it is preferable to publish datasets in a domainspecific repository. One reason could be that such a repository is well-known and often used to retrieve data in your field of research. If this is the case, publishing a dataset in a domainspecific repository could add more to your visibility than in a general purpose repository such as DataverseNL.

 

In the Registry or Research Data Repositories, Re3data.org, you can search for suitable repositories in your field of research. You can filter on subject, on whether they provide DOI's or other Persistent Unique Identifiers or on whether they use certain metadata standards. 

 

data article or data paper describes a dataset with details on how the data is collected or generated as well as other propoerties of the dataset. The purpose of the a data article is: 

 

"a data paper describes a dataset, giving details of its collection, processing, software, file formats etc, without the requirement of novel analyses or ground breaking conclusions. It allows the reader to understand the when, how and why data was collected and what the data-product is."

 

A well-known journal to publish data articles is Elsevier's Data in Brief.

Data citation

For citing a dataset, it's best to use a well-known international citation style, with elements from the internationally recognized DataCite standard. DataverseNL, Dans-EASY and 4TU.ResearchData all use this standard. 

 

Elements of data citation:

  • Name(s) and organisations of the persons who produced the dataset
  • Year when the dataset is produced
  • The title of the dataset
  • The name of the organisation where the dataset is archived
  • The persistent identifier of the dataset

 

Example of data citation:

Polstra, Louis; Klumpenaar, Desiree; Veldboer, Lex; De Lange, Meta; Keinemans, Sabrina; Potting, Marianne, 2021, "Bridging Differences", https://doi.org/10.34894/7WGFVP, DataverseNL, V2

Sharing data supplementary to a journal article

When you submit your paper to a scientific journal, the publisher might request you to make the underlying data available. There are different options to fulfill this request:

  • Depositing in an online data repository.
  • Making the data available as part of the 'supplementary materials' or as a part of the article. This is only advisable in the case of Open Access journals. When publishing the data in a regular journal, there's a risk that you transfer the user rights of the data to the publisher. You will need to ask permission from the publisher for future use of these data.

 

Data availability statement

A 'data availability statement' (also called 'data access statement') is a short paragraph in an article that describes where the data underlying the research results can be accessed. It provides information to the reader where the data are made available and under which conditions they can be accessed. If the data are deposited in an online repository, you can refer in the data availability statement to this repository and add the URL or DOI. 

Even if the publisher does not request this specifically, it is advisable to include such a statement in every publication. This adds to the possibility to validate, cite and reuse the data from your research. 

 

Example text:

You can find several example texts for different scenario's on this page of Taylor & Francis.

[anchornavigation]