FAIR self-assessment tool

INTRODUCTION

The CABI FPF Data Assessment Tool is based on original development of the DANS SATIFYD tool by: Eliane Fankhauser, Jerry de Vries, Nina Westzaan, Vesa Åkerman in 2019. It is designed for manual assessment by a data practitioner, prior to publishing a dataset in a shared data pool and data catalog service such as public data catalog or CG data space.

12 questions

The 12 questions of the tool are divided into three questions per letter, are based on the FAIR data principles. Questions which typically concern machine readability are those about (meta)data standards such as controlled vocabularies, ontologies and taxonomies. Some questions are posed more than once (e.g. on metadata and data standards or usage licenses), because the topics are relevant in more than one letter. Explanatory texts can be found with each question, by clicking on the “i” symbol.

Scores

After answering the questions, the letters are filled with green colour according to the percentage. An overall score is provided at the end of the page. The degree to which a dataset can be made FAIR is different for each domain. Some have more standards and shared vocabularies available than others. In order to do justice to this difference, options to indicate that standards are not available have been added to the tool. Tips to improve the FAIRness of the dataset can be found at each letter, by clicking on ‘Want to improve?’.

Background

In 2024 CABI released the first iteration of the FAIR process framework (FPF), a collaboration with CGIAR and the Bill and Melinda Gates Foundation. The CABI FAIR process framework is flexible, sensible, accessible, and evidence-based. It is informed by best practice and designed with people in mind, so it helps them tackle any situations and problems, because it does not have to be followed like a strict series of steps.

Credits

FAIR Data Framework: Data Assessment Tool, CABI

SATIFYD was established by DANS: Eliane Fankhauser, Jerry de Vries, Nina Westzaan, Vesa Åkerman in 2019.

Now that you have finished your research project, you are on the brink of depositing your research data in a trustworthy long-term repository. Findability is one of the four pillars of the FAIR Guiding Principles

If you take care of the findability of your data, you will enable search engines to find it and possibly also link it to related sources on the web. Moreover, you will improve the exposure of your research and help researchers to find and potentially reuse your data. Findability generally comes down to giving a proper description of your dataset. This description can be divided into three elements:

Persistent links / Persistent Identifiers (provided by CGIAR or other trusted providers
Rich and detailed metadata and additional information
Standards: the more standardized terms you use, the more findable your data are. Some domains have specific standards, for other domains there are more generic standards like the Getty Thesaurus of Geographical Names. Using standards will enable peers to find your data through (domain-specific) search engines. This webpage has a comprehensive list of the most used ontologies in agriculuture and generic ontologies & thesauri

Persistent Identifiers (PIDs) are critical for ensuring that datasets are findable, accessible, and citable. The repository or system that issues the PID often determines the long-term trustworthiness and visibility of the dataset.

Reputable repositories such as Dataverse, CGIAR GARDIAN, or Agrimetrics offer well-documented metadata standards, active curation, and integration with trusted PID handlers such as DOI (DataCite).

However, datasets may also be hosted in repositories that are newer or have more limited adoption. For instance:

Figshare: Widely used by individual researchers and institutions, but may lack domain-specific community adoption.
ARK (Archival Resource Key): A flexible system primarily used for digital archives, but less common in research data ecosystems.
PURL (Persistent Uniform Resource Locator): Primarily used in government and library systems but less robust for research data workflows.
GenBank: The gold standard for genomic data but lacks support for metadata beyond sequence data, making its PIDs less ideal for general datasets.
Zenodo Instances: While Zenodo is widely used, some custom instances of Zenodo may be less known or poorly curated.

These repositories and their PIDs can be valuable but may present challenges in terms of domain integration or global interoperability. It is recommended to consider repositories that align with established FAIR principles to maximise the dataset’s impact and reuse potential.

Tip: Where possible, aim to publish datasets in repositories that are widely adopted in your domain or region, with support for rich metadata and integration with trusted PID handlers. For example, trusted repositories like CGIAR GARDIAN or Dataverse are excellent choices.

Metadata is information that describes an object such as a dataset. It gives context to the research data, providing information about the URL where the dataset is stored, creator, provenance, purpose, time, geographic locations, access conditions, and terms of use of a data collection. The extent to which metadata is provided for a dataset can vary greatly and has an effect on how findable a dataset is. In public data catalog or CG data space a minimum number of metadata fields are required in order to successfully deposit a dataset. The minimum metadata, however, is not comprehensive enough to fulfilll the requirements of FAIR. The following list provides a comprehensive list of items that should be covered when aiming for sufficient metadata:

A globally unique Persistent Identifier (PID) e.g. a DOI (provided by public data catalog or CG data space)
A title
Related people, i.e. the creator of the dataset
Other related people who contributed to the dataset
Date on which the dataset was completed
A description of how the data were created (contextual information)
Target group for the dataset deposited (i.e. scientific disciplines)
Keywords that describe your data (use controlled vocabularies if available for your field)
A license that clearly states the extent to which the data is accessible (a list to choose from is given in public data catalog or CG data space)
Temporal coverage: the period of time to which the data relate
Spatial coverage: Geographical location of the research area or site
Related datasets, resources like publications, websites etc. (digital or analogue)
File formats used in the dataset

An ORCID ID (Open Researcher and Contributor ID) is a free, unique, and persistent digital identifier that allows researchers to be uniquely and unambiguously identified. It is increasingly required by publishers, funders, and institutions for tracking research outputs.

By linking researchers’ profiles, publications, and datasets across systems, ORCID IDs help make research more findable and ensure easy disambiguation, especially when a researcher shares a common name. ORCID is an international, interdisciplinary, and non-profit organization created by the research community to benefit all stakeholders in the research ecosystem.

Click here for the ORCID website.

CONTRADICTION

You answered question 1 with “no metadata”. This won’t allow you to answer the following two questions in F.

CONFLICT

You answered question 1 with “no PID”. This won’t allow you to use Dublin Core or CG Core in F.

Advice to improve Findability

Excellent! Your dataset has a PID (Persistent Identifier) that is managed by CGSpace. This ensures that your dataset is not only findable but also linked to a trusted repository within the CGIAR network, increasing its discoverability and credibility. The use of PIDs managed by trusted authorities like CGSpace promotes long-term accessibility, stability, and reliable citation of your dataset. Well done!

Your dataset has a PID, which is great for ensuring that it can be persistently identified and referenced by others. However, consider migrating your PID to CGSpace for increased visibility within the CGIAR ecosystem. PIDs from trusted, domain-specific repositories like CGSpace enhance the discoverability of your dataset and improve its integration into relevant knowledge-sharing networks.

Your dataset currently lacks a PID (Persistent Identifier). Without a PID, it may be challenging for others to find, access, and cite your dataset reliably. PIDs act as a persistent, globally unique link to your dataset, ensuring it remains findable over time, regardless of changes in storage locations. We strongly encourage you to obtain a PID, preferably through CGSpace, to improve the findability and reusability of your dataset.

You have not yet selected whether your dataset has a PID. Persistent Identifiers (PIDs) are essential for making datasets findable and reliably accessible over time. Please choose an appropriate option to ensure your dataset is properly referenced and cited. If you don’t have a PID yet, we recommend obtaining one, preferably from CGSpace, to enhance the findability of your dataset.

You haven't provided the minimum metadata (title, author). also consider adding an abstract and issue date to improve findability.

Well done for including metadata beyond title, author, abstract and issue date. But, You’ve not included

title

contributor/author, and

abstract and issue date

which are critical additions to your dataset's metadata. You should add these as minimum metadata.

If you haven’t already, consider linking your metadata to shared vocabularies like Dublin Core or CG Core to make it more discoverable and interoperable.

Great! You've provided metadata using Dublin Core and/or CG Core vocabularies and further metadata beyond title, author, abstract and issue date. But, You’ve not included

title

contributor/author, and

abstract and issue date

which are critical additions to your dataset's metadata. You should add these as minimum metadata to improve findability.

You haven't provided the minimum metadata (title, author). Also, consider adding an abstract and issue date to improve findability.

You have defined basic metadata (title and contributor/author) but haven’t yet added an abstract or issue date. To further improve the findability of your dataset, we recommend adding these fields. Additionally, linking this metadata to shared vocabularies like Dublin Core or CG Core will standardize your dataset and make it more accessible and interoperable.

You’ve added metadata beyond the basics, which is excellent! However, make sure that your metadata (like title, author, and abstract) is linked to shared vocabularies such as Dublin Core or CG Core for better standardization. This will improve your dataset's findability and machine-readability across different systems.

You’ve defined the basic metadata fields and used shared vocabularies such as Dublin Core or CG Core to standardize your dataset and make it more interoperable and easier to find. Consider adding more detailed metadata e.g. citation information, spatial coverage, access protocols, curator details, language

You have defined basic metadata (title and contributor/author) and used shared vocabularies such as Dublin Core or CG Core to standardize your dataset and make it more interoperable and easier to find but haven’t yet added an abstract or issue date. Also, consider adding more detailed metadata e.g. citation information, spatial coverage, access protocols, curator details, language.

Fantastic! You have provided comprehensive metadata, and you are also using shared vocabularies such as Dublin Core or CG Core to describe it. This makes your dataset highly findable, standardized, and interoperable, ensuring that it is both accessible and reusable by other researchers and systems.

Good! You've added the ORCID ID, which provides a persistent link to authorship and enhances the findability of the dataset.

Add an ORCID ID to provide your dataset with a findable person who contributed to this dataset. This will make your dataset more findable and reusable.

The accessibility of a dataset and its corresponding metadata is essential for researchers to assess and potentially reuse a dataset. The questions that you will find under accessibility concern the accessibility of the metadata over time, meaning that the repository guarantees that the metadata will be available even if the data itself is no longer available. In the FAIR Principles, the automated accessibility of metadata and data by machines is also covered under Accessibility, here FAIR Principles

Metadata is a description of your data and is associated with your dataset. It contains key descriptive information such as the title, author, date, and keywords, making your data more findable and ensuring its existence is acknowledged in the research community.

Making your dataset metadata publicly accessible in trusted data catalogs ensures that the data is findable and reusable. Even if your data is restricted, publishing metadata about it enhances its discoverability and aligns with FAIR principles. If you are unsure whether the metadata exists in a catalog, or if the catalog doesn’t support public metadata retention, this can significantly impact accessibility..

Personal data refers to any information related to an identified or identifiable living individual. This includes direct identifiers such as names, addresses, and phone numbers, as well as indirect identifiers that, when combined, could lead to the identification of a person. Protecting personal data is critical to comply with ethical guidelines and data protection regulations such as the General Data Protection Regulation (GDPR).

If your dataset contains personal data, you may need to anonymise or pseudonymise the data before sharing it publicly, or deposit it in a restricted-access repository to safeguard privacy. Additionally, providing clear metadata about the nature of the personal data and any ethical clearance or consent obtained for its use is important for ensuring responsible reuse of the data.

Click here to learn more about data protection rules and how they affect the sharing of personal data in research.

When selecting a repository for publishing your dataset, consider the following factors:

Choose a repository that specializes in your field or discipline to ensure your dataset reaches the right audience and is more discoverable by relevant researchers.
Ensure the repository supports FAIR principles, offering good metadata standards, persistent identifiers (PIDs), and clear licensing options.
Select a repository with a proven track record of long-term data preservation, ensuring your dataset will remain accessible and usable in the future.
Consider if the repository allows open access, controlled access, or embargo options, based on your need for data protection, privacy, or timing of data release.
Ensure the repository supports standardized metadata schemas and domain-specific vocabularies, making your data more easily integrated with other systems and datasets.
A good repository will provide a DOI or other Persistent Identifier (PID), ensuring that your dataset can be cited correctly and tracked.
Look for repositories that offer user support, thorough documentation, and guidelines for depositing datasets, ensuring smooth submission and management processes.
Consider whether the repository is recommended or required by your institution, funder, or research community.

CONTRADICTION

If your data contains personal data, you won’t be able to choose the CC0 license for your dataset.

Advice to improve Accessibility

Well done! Your metadata is publicly accessible in a trusted public data catalog, like CG Datapool (and GARDIAN), Dataverse, Agrimetrics, or similar. This ensures that even if the data itself is no longer available, the metadata remains accessible and can guide researchers to important details about your dataset. Metadata persistence is crucial for long-term research reproducibility and data discoverability.

Your metadata is not yet publicly accessible in a trusted public data catalog, such as Dataverse, GARDIAN or Agrimetrics. To enhance the findability and accessibility of your dataset, it is essential to make your metadata publicly available in such a repository. This will ensure that even if the dataset is no longer available, its key descriptive information remains discoverable for future research.

Your intended data catalog doesn’t currently offer retention of dataset metadata for public access. This limits the long-term discoverability and accessibility of your dataset. We recommend considering a different repository that ensures metadata is retained even when the dataset itself is no longer available. Trusted repositories like CG Datapool (and GARDIAN), Dataverse, Agrimetrics, or similar alternatives provide robust metadata preservation and align with best practices for data stewardship. If no alternatives are available, engage with your data management team to explore other solutions.

Excellent! Your dataset is stored in a trusted data repository like Dataverse, Agrimetrics, or CG Datapool. These repositories ensure long-term preservation, accessibility, and discoverability through the use of persistent identifiers (PIDs) and metadata standards. Keeping your data in a trusted repository aligns well with FAIR principles and enhances its potential for reuse and impact.

Great! Your dataset is in a public repository like Figshare, Zenodo, or ARK, which provides accessibility and preservation. However, to further enhance its discoverability and ensure long-term management, consider whether a trusted repository like Dataverse, Agrimetrics, or CG Datapool could offer additional benefits aligned with FAIR principles and strategic goals.

Your data is currently unpublished and stored within your research institute. While this may be necessary for ongoing research, it’s important to plan for its eventual sharing to align with FAIR principles. Check with your investment programme to identify a trusted repository for long-term storage and accessibility. If no repository has been defined, reach out to your Programme Officer to update the Data Management and Access Plan (DMAP) to include a trusted repository.

It looks like you haven’t yet decided where to store your dataset. It’s crucial to plan for data preservation and accessibility. Consider using a trusted repository like Dataverse, Agrimetrics, or CG Datapool to maximise the impact of your data. Consult your investment programme or Programme Officer to define a strategic repository for your data and ensure alignment with FAIR principles.

It looks like you haven’t yet decided where to store your dataset. It’s crucial to plan for data preservation and accessibility. We encourage you to consider sharing your data openly to benefit both research and society. Check with your investment programme to find out which data repository they use strategically. If this hasn’t been defined, ask your Programme Officer to update the Data Management and Access Plan to include a repository like a CG DataPool, Dataverse or Agrimetrics for long-term storage and accessibility.

"If you want other researchers to reuse your data, it is important that your data can be integrated in other data(sets). This process of exchanging information between different information systems such as applications, storage or workflows is called interoperability here FAIR Principles.
The following actions will improve the interoperability of your data:

Use standardized controlled vocabularies, taxonomies and/or ontologies (see Question 2) both in describing your data (metadata level) and on in your dataset (data level)
Use prefered formats (see Question 7) in your dataset
Link to other/relevant (meta)data that are online resolvablee
Add contextual information to your dataset
Add files that explain the context in which the research was performed. You can think of documentation in the form of notebooks, version logs, software documentation, documentation about the data collection describing the hypotheses, project history and objectives of the project, documentation of methods used such as sampling, data collection process, etc. and information on access and terms of use
Add documentation about the structure of the dataset, for instance a readme.txt file
Add documentation about the content of the dataset. Provide a description on the data level such as a codebook
Adding scientific links (e.g. links to datasets/research paper used within your project, ORCIDs to identify people who worked on the project, persistent links (PIDs) to related research/dataset) between your dataset and other datasets

File and data formats play a crucial role in ensuring the interoperability of datasets. The choice of format directly affects how easily your data can be exchanged, understood, and reused across different systems and domains.

Open standards formats (e.g., XML, JSON, RDF) are highly interoperable because they are widely accepted, non-proprietary, and designed to facilitate machine-readable data sharing.

W3C data formats (e.g., Turtle, SPARQL) are specifically optimized for the semantic web and linked data, offering a high degree of interoperability in web-based and data-driven applications.

Non-proprietary domain-specific formats (e.g., NetCDF, FASTA) are valuable in specific disciplines, ensuring that data can be understood and reused within a particular scientific community.

Proprietary domain-specific formats (e.g., SAS files, GenBank) can support interoperability within specific domains but may limit broader usability due to reliance on proprietary software.

General proprietary formats (e.g., Excel, SPSS, MATLAB) are widely used but pose challenges for interoperability as they are tied to specific software tools, making it harder to share and integrate data with other systems.

General open formats (e.g., CSV, ODS) are accessible and non-proprietary, making them an excellent choice for basic interoperability, though they may lack some of the richer metadata capabilities of other formats.

When selecting file formats for your dataset, aim for those that are open, widely recognized, and appropriate for your domain. This will maximize the interoperability and reuse potential of your data.

Select the primary domain of your dataset to get more specific advice on how to improve interoperability.

The use of controlled vocabularies, thesauri, and ontologies significantly improves the interoperability of your data. Shared vocabularies like AGROVOC and domain-specific ontologies (e.g., Crop Ontology, Agronomy Ontology) ensure that the metadata and keywords associated with your dataset are understood across different systems. Using common, standardised vocabularies makes your dataset more discoverable and easier to integrate with other datasets.

If you are using vocabularies such as DCAT, Prov-O, or Dublin Core, you are contributing to making your metadata machine-actionable and interoperable across a wide range of systems. Combining domain-specific and general-purpose vocabularies will further improve the chances of your dataset being found and reused.

This webpage has a comprehensive list of the most used ontologies in agriculuture and generic ontologoies & thesauri -:

To learn more about the AGROVOC and it application in agricultural research click here for the UN FAO Agrovo website.

Reference data such as identifiers, codes, or metadata schemas from shared vocabularies or ontologies are crucial for the interoperability of your dataset. By aligning your dataset with common reference data like AGROVOC or the CAB Thesaurus, you make it easier for others to understand and integrate your data. Domain-specific ontologies, such as the Crop Ontology or Animal Trait Ontology for Livestock, provide a rich context for the dataset, enabling more specialised research applications.

To maximise the reuse and interoperability of your dataset, ensure that the reference data you use is openly accessible and widely recognized in your research field. Using persistent identifiers (PIDs) like DOIs for your reference data ensures that they remain accessible over time, increasing the long-term value of your dataset.

Click here to learn more about the Crop Ontology and its application in agricultural research.

Advice to improve Interoperability

Excellent choice! By selecting Open Standards and W3C formats, you are ensuring the highest level of interoperability and accessibility for your dataset. This approach guarantees long-term usability and compatibility with global data systems.

Consider using Open Standards and W3C formats to enhance your dataset’s interoperability and accessibility. These formats are widely supported and ensure your data remains reusable across diverse platforms and systems.

Well done for using domain-specific formats! These formats ensure that your dataset is highly interoperable within your field. For broader accessibility, consider complementing these with general open standards like CSV or JSON.

Domain-specific formats (e.g., FASTA, NetCDF) enhance interoperability within a research domain. Using these formats can improve your dataset's relevance and reuse potential. Consider adopting them if applicable to your research.

Your dataset includes proprietary formats, which may limit interoperability and accessibility. To improve reusability, we recommend converting these formats to open standards like CSV, JSON, or XML.

Proprietary formats can restrict access and limit interoperability. Avoid these formats where possible, or convert them to open standards to ensure broader accessibility and reuse.

While office documents (e.g., Word, Excel) are common, they limit interoperability and reusability. Converting these files to open formats like CSV or XML will improve accessibility and ensure long-term usability.

Office documents like Word or Excel can reduce interoperability. Avoid using these formats for datasets, and prefer open standards like CSV or XML instead.

Using PDF files for datasets may severely limit interoperability and reusability. We strongly recommend converting PDF content into open, machine-readable formats like CSV or JSON to enhance future accessibility.

PDF formats are not ideal for data sharing or interoperability. Avoid using PDF for datasets, and opt for open, machine-readable formats like CSV or JSON.

Excellent! You're using AGROVOC, common shared vocabularies, domain-specific vocabularies like Crop Ontology, and non-domain vocabularies like Dublin Core and DCAT. This combination provides a comprehensive coverage of both general and domain-specific terms, greatly enhancing the interoperability and findability of your dataset. Keep up the great work!

Great! You're using AGROVOC, common shared vocabularies, and domain-specific vocabularies. This ensures strong interoperability within the general and domain-specific contexts. You might want to consider adding non-domain vocabularies like Dublin Core to further enhance the reach of your metadata.

Well done! You're using AGROVOC, common shared vocabularies, and shared non-domain vocabularies like Dublin Core. This combination provides solid interoperability. Consider also adding domain-specific vocabularies like Crop Ontology for keywords to enhance your dataset's relevance to specific research areas and searches.

Good job! You're using AGROVOC and common shared vocabularies, which are widely recognized. To further enhance the domain specificity of your metadata, consider incorporating domain-specific vocabularies like Crop Ontology or Agronomy Ontology for keywords.

You're using AGROVOC along with domain-specific and non-domain vocabularies, which ensures both broad and deep coverage of your metadata. This is a strong combination. Keep in mind that adding common shared vocabularies could further enhance interoperability in agricultural contexts.

You're using AGROVOC and domain-specific vocabularies, which is great for ensuring interoperability both within and across domains. Consider adding common shared vocabularies for additional general metadata coverage.

You're using AGROVOC along with non-domain vocabularies like Dublin Core, which enhances the general interoperability of your metadata. Consider adding keywords from domain-specific vocabularies to increase relevance within specific research contexts.

You're using AGROVOC, for your keywords, which is a great start! AGROVOC is a widely recognized vocabulary in agriculture. To further improve interoperability and domain specificity, consider adding other vocabularies like common shared vocabularies, Crop Ontology, or Dublin Core.

You're using common shared vocabularies along with domain-specific and non-domain vocabularies. This provides both general and specific metadata coverage, enhancing the findability and interoperability of your dataset. Consider adding AGROVOC for even broader agricultural relevance.

You're using common shared vocabularies and domain-specific vocabularies. This ensures strong domain and general metadata coverage. Adding AGROVOC could further enhance the agricultural context and broaden interoperability.

You're using common shared vocabularies and non-domain vocabularies like Dublin Core, for your keywords, which provides a good balance between general and specific metadata. Adding AGROVOC or domain-specific vocabularies will enhance both general and specialized coverage.

You're using common shared vocabularies, which is a solid general vocabulary base, for your keywords. Consider adding AGROVOC for more agricultural relevance, and domain-specific vocabularies to improve the precision of your metadata.

You're using domain-specific and non-domain vocabularies, which improves interoperability in certain contexts. However, adding general vocabularies like AGROVOC or common shared vocabularies would enhance the overall findability of your dataset.

You're using domain-specific vocabularies like Crop Ontology for your keywords, which is excellent for domain-level interoperability. However, adding general vocabularies like AGROVOC or common shared vocabularies will ensure broader reuse of your metadata.

You're using non-domain vocabularies like Dublin Core, which improves general interoperability. However, adding domain-specific vocabularies and agricultural vocabularies like AGROVOC will greatly increase the relevance and findability of your dataset within the agricultural context.

You haven’t selected any vocabularies that are being used for your dataset keywords. To make your dataset more interoperable and findable, we recommend using shared vocabularies like AGROVOC or common shared vocabularies, as well as domain-specific vocabularies like Crop Ontology.

Excellent work! You are using shared/public vocabularies, domain-specific vocabularies, and cross-domain vocabularies. This combination ensures that your dataset is highly interoperable, as it leverages both general and specialized reference data. This approach significantly increases the chances of your data being reused and understood across multiple research disciplines.

Great job! By using shared/public vocabularies and domain-specific vocabularies for you reference data, you’ve taken significant steps toward ensuring interoperability within your field. To further enhance interoperability, consider integrating cross-domain vocabularies that can broaden the applicability and visibility of your dataset across different domains.

Well done! You are using shared/public vocabularies and cross-domain vocabularies, which boosts the interoperability of your dataset across different fields. Consider also using domain-specific vocabularies for keywords to further enhance the relevance of your data within your research domain.

Good work! You are using shared/public vocabularies, which increases the interoperability of your dataset. To further improve, you could consider adding domain-specific or cross-domain vocabularies to provide more precise reference data relevant to your field of research.

You’re on the right track by using domain-specific and cross-domain vocabularies, but consider also using shared/public vocabularies such as AGROVOC. This would help your dataset achieve even greater interoperability and ensure it can be easily integrated with other datasets across the global research community.

You are using domain-specific vocabularies, which is great for enhancing interoperability within your field. To further increase the reach of your dataset, consider integrating shared/public vocabularies, which can help your data be more widely understood and reused across different research communities.

You are using cross-domain vocabularies, which can boost interoperability across multiple fields. However, to make your dataset even more widely accessible and reusable, consider adding shared/public vocabularies and domain-specific vocabularies to improve precision and relevance.

It seems you haven’t selected any vocabularies. For best practices in interoperability, we highly recommend using shared/public vocabularies like AGROVOC, domain-specific vocabularies, and cross-domain vocabularies. These can significantly improve the discoverability and reuse potential of your dataset across various research disciplines.

The ultimate goal in making data FAIR is to foster reusability. Whether or not datasets are reusable by other researchers is dependent on a number of aspects. One of the preconditions is that the dataset has a usage license which clarifies under which circumstances the data may be reused.

In order to gain insight into the process of data generation, it is important to describe the data and metadata as detailed as possible. Think of questions like Under which circumstances did I / we collect the data? Where does the data come from?

Moreover, similar to aspects in Findable, Accessible and Interoperable, it is important that you meet the standards in your discipline when describing your data and metadata.

FAIR Principles.

Provenance information is essential to ensure the reuse and credibility of your data. By including detailed provenance information such as the origin of the data, citations for reused data, workflow descriptions for collecting and processing data, and a version history, you enable future users to understand the context and lineage of your dataset.

Workflow descriptions in particular help others understand how the data were created and processed, improving transparency and reproducibility. Provenance information also enhances the trustworthiness of the dataset by clearly documenting how the data has evolved over time.

If you haven't already, consider adding citations for any reused datasets and clearly describe any workflow or versioning applied to the data.

Click here to learn more about the importance of provenance in data management.

The type of license attached to your dataset plays a significant role in its reusability. Licenses such as Creative Commons (CC0) or Public Domain licenses are ideal for maximising the reuse of your dataset, as they impose very few restrictions on users. More restrictive licenses, such as those that require attribution or restrict access, can limit the reuse potential of your dataset, but may still be necessary in certain cases (e.g., for sensitive data).

Ensure that the license you choose is clear and widely recognized so that potential users understand the terms under which the data can be reused. The more open the license, the easier it is for others to reuse your data, thus increasing its overall value to the research community.

Click here to learn more about Creative Commons licenses and how they promote data reuse.

It is more likely that other researchers reuse your data if the metadata contains domain-specific standards, i.e. (meta)data has the same type, is organised in a standardized way, follows a commonly used or community accepted template, etc. Within different communities and domains minimal standards have been described but, unfortunately, not every domain has standards yet. You could use more generic standards if there are no domain-specific standards. Most of the standards come with instructions on how to use them. A list of standards can be found here:

FAIR sharing standards

It is better to use a standard, even if it is imperfect.

Advice to improve Reusability

When datasets are reusable, they can be exploited by your domain partners to further global research causes, and the contributor (you) is recognized as contrbuting to that further research.

You haven’t included the Origin of data. Adding this information would significantly help other researchers understand the source of your dataset, ensuring better trust in its reusability. Provenance details such as where the data comes from are crucial in establishing the dataset's reliability and encouraging reuse.

You haven’t included Citations for reused data. Providing citations for any reused data is important for traceability and ensuring that credit is appropriately given. This also helps other researchers track down the original datasets and further increases the transparency and reusability of your dataset.

You haven’t included a Workflow description for collecting data (machine-readable). Documenting how the data was collected, especially in a machine-readable format, improves reproducibility and allows others to understand and replicate your research process. This is key to ensuring that the dataset remains valuable and reusable in the long term.

You haven’t included a Processing history of the data. Providing a clear processing history helps users understand any transformations the data has undergone and ensures that they can accurately interpret the data. This information is essential for validating the dataset’s quality and increases the potential for reuse.

You haven’t included a Version history of the data. A version history can help users identify the most up-to-date and accurate version of the dataset, especially if the data is updated regularly. Tracking versioning also allows for reproducibility, ensuring others can replicate earlier stages of the data if needed.

You’ve chosen the Creative Commons License (CC0), which is ideal for maximizing reuse. This license removes almost all restrictions on your dataset, allowing anyone to use it freely while promoting wide adoption and reuse. By choosing this license, you’ve ensured that your data is easily reusable, aligning with open science best practices.

You’ve chosen Public Domain, which is highly recommended for making your data as reusable as possible. Public domain status ensures that anyone can use your data without restrictions, encouraging wider adoption and integration into future research projects. This approach aligns with global trends towards open data and transparency.

You’ve selected Restricted Access. While this ensures that your data is controlled, it may limit its reusability. Consider transitioning to a more open license, such as a Creative Commons license, to increase the discoverability and reuse of your dataset. By doing so, you'll make it easier for others to build on your work.

You’ve chosen the Open Government License, which provides some openness but still places conditions on the reuse of your data. To further improve the reusability and accessibility of your dataset, consider shifting to a Creative Commons license. This will increase trust in your data and make it more appealing to a broader audience.

You’ve selected a Data Sharing Agreement. While this approach ensures controlled use of your dataset, it restricts its reusability. To encourage wider reuse and trust in your data, consider publishing it under a more open license, such as Creative Commons, that facilitates greater sharing and interoperability.

You’ve chosen an Embargo on your dataset. While this may be necessary in some cases (e.g., pending publications), it significantly limits the current reusability of your data. Once the embargo period is over, consider releasing your dataset under a Creative Commons or Public Domain license to promote open access and reuse.

You’ve selected Other Access, which indicates that your dataset may not be easily accessible for reuse. To improve its reusability, consider moving your dataset to a repository that supports open licenses like Creative Commons. This will enhance its discoverability and ensure that it can be reused by a wider audience.

You haven’t selected a specific license, or you are unsure about the applicable license for your dataset. Not having a clear usage license significantly limits the reusability of your dataset. To avoid confusion and ensure legal clarity, choose an open license such as Creative Commons, which will help others understand how they can reuse your data appropriately.

Not only has it become easier to find your data (see under 'F'), your metadata now also meets the requirements for proper and correct reuse. This is an important step for your data to become more FAIR.

Your data meet domain standards to a certain extent. FAIR data should at least contain minimal information standards. Try to look for ways to improve. The more your data and metadata are organized in a standardized way, the better they are suited for re-use! Always try to keep in mind the user-perspective.

Generic metadata standards are widely adopted. Domain standards, however, are much richer in vocabulary and structure and therefore will help researchers within your discipline to reuse your data. Check whether your domain has specific metadata standards.

FAIR data should at least contain minimal information standards. Did you check whether there are metadata standards available for your domain? The more your metadata and data are organized in a standardized way, the better they are suited for re-use. Always try to keep in mind the user-perspective.

Dataset Information

Respondent Information

FINDABLE

ACCESSIBLE

INTEROPERABLE

REUSABLE