Skip to Main Content

Scholarly Publishing

Data Sharing

Preparing your data for shareability

Effective data sharing requires it to be organized, well-documented, and appropriately preserved. Human subjects research requires the informed consent of study participants to share data. Make sure data sharing was mentioned in your IRB and informed consent forms. It is also crucial to de-identify data prior to sharing to reduce the risk of identifying individuals in datasets.

Licensing Your Data

When you share your data, it’s important to include a license. A license tells others exactly how they can use your data and how to give you credit. Without a license, people may be unsure what they’re allowed to do, which can lead to confusion and discourage reuse.

Licensing data is different from licensing other open access materials. Because datasets are often combined, reused, and built from many sources, requiring detailed attribution can quickly become complicated and it can make your data harder to reuse.

To avoid these issues, many researchers choose a license that doesn’t require attribution, such as CC0 or the Open Data Commons Public Domain Dedication and License (PDDL). These licenses make it easier for others to reuse your data without legal uncertainty.

The resources linked below can help you understand what needs to be considered when licensing your data.

Where can data be shared?

Domain Specific Repositories

The NIH supports a large number of domain-specific data sharing repositories. These repositories are described in two lists: one for repositories that allow open submission and access and one for repositories that may restrict submission and access to specific researchers. If available, best practices and many policies dictate that data should be shared via domain-specific repositories.  

Generalist Repositories

The repositories listed below accept datasets from all research disciplines and are appropriate when a domain-specific repository does not exist. They also accept deposits of other scholarly outputs, such as preprints and software. 

 

Reusing Existing Data

Reusing Existing Data

Does permission need to be obtained to reuse data?

In the U.S., facts can't be copyrighted, so most raw data isn’t protected. However, curated data, tables, or visualizations can be.

Even without copyright, licenses help others understand how they can use your data. Some licenses may:

  • Require attribution
  • Limit commercial use
  • Protect participant privacy

Proprietary datasets may have restrictions or fees.

Publishing data under a public domain license (like CC0 or PDDL) supports broad reuse and avoids issues like license conflicts and attribution stacking, which is when too many credit requirements make reuse difficult.

Find more information about data licensing in the Sharing Data section of this guide.

How should reused datasets be cited?

When citing data, the following elements should be included:

  • who created the dataset
  • what the dataset is named
  • what year the dataset was published or released
  • what version of the dataset was used
  • where the dataset is hosted
  • what unique identifiers have been assigned to the dataset, such as a Digital Object Identifer (DOI) or Archival Resource Key (ARK)
  • what date the dataset was accessed

Note that many data repositories offer features that automatically generate formatted citations for the data they host, which can save you the work of creating the citation from scratch.