Data Management Plan

Planning data management

Research data and research publications are among the most important outputs of publicly funded research. By default, therefore, all research data and materials produced with funding from the Academy of Finland are openly available.

The degrees of data openness may justifiably vary, ranging from fully open to strictly confidential. If the research data cannot be made openly available in full, the metadata must be stored in a Finnish or international data finder. From a research ethical and legislative point of view, research data should normally be stored for research verification purposes.

The planning of data management enables the opening of research data, reduces the risk of the loss of research data and is an essential part of good scientific practice. The data management plans must be feasible at site of research and the measures taken in accordance with the plans must be in accordance with good data management practice.

Data management plans are submitted to the Academy of Finland at two stages:

1. At the application stage

At the application stage, all applicants shall briefly describe their data management in section 4.3 of the research plan (‘Open science’).

Describe the following:

  • where the data will be stored and how they will be backed up during the project
  • how any legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolved
  • where the data or a publishable portion of them will be made available after the end of the project
  • If the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially.

In the application’s funding plan, the applicant should consider that the costs associated with storing and sharing research data and material are regarded as overheads for the project’s host organisation, but they may also be legitimately accepted as research costs to be covered with Academy research funding.

2. After a positive funding decision

A researcher who has received a positive funding decision must submit the actual data management plan within eight weeks of the funding decision. Make sure to ask your organisation’s data expert for help in good time when writing the data management plans. We recommend that you use the DMPTuuli tool to draft the data management plan. The plan should be no more than approximately two pages long.

The actual data management plan shall be submitted in the online services in connection with applicant approval but before the approval of the representative of the site of research. The site of research commits to ensuring that the data management plan can be implemented at the site of research, and that the measures to be taken comply with good data management practice. The funds can be paid only after the applicant and the representative of the site of research have accepted them. The system will then notify the funding to the finance administration of the site of research, whereupon the funds will be ready to use.

These detailed guidelines are the same as those published in DMPTuuli that concern calls by the Academy of Finland. NB! The Academy’s research infrastructure calls have their own data management guidelines.

DMP questions, Academy of Finland guidelines and best practices

Contents:

  1. General description of data
  2. Ethical and legal compliance
  3. Documentation and metadata
  4. Storage and backup during the research project
  5. Opening, publishing and archiving the data after the research project
  6. Data management responsibilities and resources

How do I write a data management plan?

First read all of the questions!

Avoid redundancies with the research plan.

  • The research plan describes the scientific, analytical and methodological processing of data.
  • The data management plan describes the technical and administrative management of data.
  • To avoid redundancy, refer to your research plan in your data management plan.
  • Use the data management plan as a risk assessment document. Show that you can recognise, anticipate and handle the risks related to your data management workflow.
  • The data management plan should be drawn from the perspective of your own research project –do not copy/paste examples from somewhere else.
  • Only write sentences you yourself understand.
  • Answer the questions where applicable. If a question is not applicable in your case, justify why not.
  • Answer at least the main categories of the questions. Each sub-question does not need to be answered separately.
  • Include background information such as the name of the applicant and the project title, the project number, the funding decision identifier and the version of the data management plan.
  • Demonstrate your data management and version control skills, for example, when naming the data management plan.
  • Follow the organisation’s or funder’s requirements.

Why should I manage my research data and write a data management plan?

  • It is good scientific practice.
  • You will reduce the risk of losing your data.
  • You will be able to anticipate complex ownership and user rights issues in advance.
  • It helps you support open access to create productive future collaborations.
  • You will meet your funder’s requirements.
  • It helps you save time and money.
  • Your data management plan reflects your managerial skills as a project leader.

In the data management context, ‘data’ is understood as a broad term. Data cover all of the information and material resources your research results are based on. In the plan, you can concentrate on the data for which you are responsible.

The data management plan should describe how the data will be managed throughout the lifecycle of the research. The plan is a living document, which should be updated as the research progresses.

Research data management practices should aim to follow FAIR principles, that is, the data should be Findable, Accessible, Interoperable and Re-usable.

1        General description of data

1.1 What kinds of data is your research based on? What data will be collected, produced or reused? What file formats will the data be in? Additionally, give a rough estimate of the size of the data produced/collected.

Briefly describe what types of data you are collecting or producing. In addition, explain what kinds of existing data you will (re)use. List, for example, the types of texts, images, photographs, measurements, statistics, physical samples or codes.

Categorise your data in a table or with a clear list, for example:

A) data collected for this project

B) data produced as an outcome of the project

C) previously collected data reused in this project

D) documents related to research management.

The categorisation follows the licence policy of your datasets. Briefly describe the licence that entitles you to (re)use the data. The categorisation (item 1.1) can form a general structure for the rest of the data management plan.

List the file formats. In some cases, the file formats used during the research project may differ from those used in archiving the data after the project. List both. The file format is a primary factor in the accessibility and reusability of your data in the future.

In the plan, describe the required disk space, not how many informants participated in the project. A rough estimation of the size of the data is sufficient (e.g., less than 100 GB, approx. 1 TB or several petabytes).

Tips for best practices

  • Use a table or bullet points for a concise way to present data types, file formats, the software used and the size of the data.
  • Examples of file formats:.csv, .txt, .docx, .xslx and .tif.
  • Make sure to describe any special or uncommon software necessary to view or use the data, especially if the software is coded or produced in your project.
  • You can also estimate the amount of data production or collection during the project for a specific time period, such as per week: “The project is producing/collecting approximately 100 GB of data per week.”
  • Avoid overlaps with the research plan! Data analysis and methodological issues related to data and materials should be described in your research plan.

1.2 How will the consistency and quality of data be controlled?

Explain how the data collection, analysis and processing methods used may affect the quality of the data and how you will minimise the risks related to data accuracy.

Data quality control ensures that no data are accidentally changed and that the accuracy of data is maintained over their entire life cycle. Quality problems may emerge due to the technical handling, converting or transferring of data, or during its contextual processing and analysis.

Tips for best practices

  • Transcriptions of audio or video interviews should be checked by someone other than the transcriber.
  • Analog material should be digitised in the highest resolution possible for accuracy.
  • In all conversions, maintaining the original information content should be ensured.
  • Checksum software should be used.
  • Organise training sessions and set guidelines to ensure that everyone on your research team can implement quality control and anticipate the risks related to the quality of the data.
  • Avoid overlaps with the research plan! Issues related to data analysis, methods and tools should be described in your research plan. Do not include, for example, instrument calibration descriptions in the data management plan.

2        Ethical and legal compliance

2.1 What legal issues are related to your data management (e.g., GDPR and other legislation affecting data processing)?

All types of research data involve questions of rights and legal and ethical issues. Demonstrate that you are aware of the relevant legislation related to your data processing. If you are handling personal or sensitive information, describe how you will ensure privacy protection and data anonymisation or pseudonymisation.

Tips for best practices

  • Check your institutional ethical guidelines, data privacy guidelines and data security policy, and prepare to follow the instructions that are given in these guidelines.
  • If your research is to be reviewed by an ethical committee, outline in your data management plan how you will comply with the protocol (e.g., how to remove personal or sensitive information from your data before sharing them to ensure privacy protection).
  • Will you process personal data? If you intend to do so, please detail what type of personal data you will collect.
  • All data related to an identified or identifiable person is personal data. Information such as names, telephone numbers, location data and information on the congenital diseases of the individual’s grandparents is personal data.
  • Read more on the website of the Office of the Data Protection Ombudsman.
  • Avoid overlaps with the research plan! Detailed research ethical aspects, statements of ethics committees and the use of laboratory animals, etc., are described in the research plan.

2.2 How will you manage the rights of the data you use, produce and share?

Describe how you will agree upon the rights of use related to your research data – including the collected, produced and (re)used data of your project. Here, you can employ your categorisation in Question 1. Each of the categories in Question 1.1 involves different rights and licences. Describe the transfer of rights procedures relevant to your project. Describe confidentiality issues if applicable in your project.

Tips for best practices

  • Check your organisational data policy for ownership, right of use and right to distribute.
  • Have you gained consent for data preservation and sharing?
  • Agreements on ownership and rights of use should be made as early as possible in the project lifecycle.
  • Consider the funder’s policy.
  • It is recommended to make all research data, code and software created within a research project available for reuse, for example, under Creative Commons, GNUMIT or another relevant licence.

3        Documentation and metadata

3.1 How will you document your data to make them findable, accessible, interoperable and re-usable for you and others?  What kinds of metadata standards, README files or other documentation will you use to help others understand and use your data?

Data documentation enables datasets and files to be found, used and properly cited by other users (human or computer). Documentation includes essential information regarding the data, for example, where, when, why and how the data were collected, processed and interpreted. Without the proper documentation, your data are useless. Describe the tool, such as Qvain, that you will use to describe your datasets. Do not mention metadata standards if you do not intend to use them. You can anticipate the open accessibility of your data and its description here. Include a detailed description of which part of your data can be set openly available in section 5 below.

Avoid overlaps with the research plan!

The data-level documentation and details about experiments, analytical methods and the research context belong to the research plan.

In the data management plan, you should concentrate on the study-level documentation.

Tips for best practices

  • Describe all types of documentation (README files, metadata, etc.) you will provide to help secondary users find, understand and reuse your data.
  • Following the FAIR principles will help you ensure the Findability, Accessibility, Interoperability and Re-usability of your data.
  • Know the minimum requirements for data documentation; see, for example, the Qvain Light User Guide.
  • Use research instruments that automatically create standardised metadata formats.
  • Identify the types of information that should be captured to enable other researchers to discover, access, interpret, use and cite your data.

4        Storage and backup during the research project

4.1 Where will your data be stored, and how will they be backed up?

Describe where you will store and back up your data during your research project. Explain the methods for preserving and sharing your data after your research project has ended in more detail in section 5.

Consider who will be responsible for backup and recovery. If there are several researchers involved, create a plan with your collaborators and ensure safe transfer between participants.

Show that you are aware of the storing solutions provided by your organisation. Do not merely refer to the IT services. In the end, you are responsible for your data, not the IT department or the organisation.

Tips for best practices

  • The use of a safe and secure storage provided and maintained by your organisation’s IT support is preferable.
  • Do not use external hard drives as the main storing option.

4.2 Who will be responsible for controlling access to your data, and how will secured access be controlled?

It is essential to consider data security issues, especially if your data include sensitive data, personal data, politically sensitive information or trade secrets. Describe who has access to your data, what they are authorised to do with the data and how you will ensure the safe transfer of data to your collaborators.

Tips for best practices

  • Access controls should always be in line with the level of confidentiality involved.

5        Opening, publishing and archiving the data after the research project

5.1 What part of the data can be made openly available or published? Where and when will the data, or their metadata, be made available?

Describe whether you will make openly available or publish all your data or only parts of the data. If your data or parts of them cannot be opened, please explain why.

In the case of sensitive data, which cannot be opened, describe the opening of their metadata. Describe the secured preservation procedure of sensitive data in section 5.2.

The openness of research data promotes its reuse.

Tips for best practices

  • You can publish a  description (i.e., the metadata) of your data without making the data themselves openly available, which enables you to restrict access to the data.
  • Publish your data in a data repository or a data journal.
  • Check re3data.org to find a repository for your data.
  • Remember to check the funder, disciplinary or national recommendations for data repositories.
  • It is recommended to make all research data, code and software created within a research project available for reuse, for example, under Creative Commons, GNU, MIT or another relevant licence.
  • Consider using repositories or publishers that provide persistent identifiers (PID) to enable access to the data via a persistent link (e.g. DOI, URN).
  • Avoid overlaps with the publication plan! The research article publication does not equal data publication. The data journal is a publication forum specialised in publishing research data.

5.2 Where will data with long-term value be archived, and for how long?

Briefly describe what part of your data you will preserve and for how long. Categorise your datasets according to the anticipated preservation period:

A) data to be destroyed upon the end of the project

B) data to be archived for a verification period, which varies across disciplines (e.g., 5–15 years)

C) data to be archived for potential re-use (e.g., for 25 years)

D) data with long-term value to be archived by a curated facility for future generations for tens or hundreds of years.

Describe which parts of the data you will dispose of after the project and how you will destroy them. Describe the access policy to the archived data. Consider using archives with a curation policy.

Tips for best practices

  • Remember to check funder, disciplinary or national recommendations for data archives.

6        Data management responsibilities and resources

6.1 Who (e.g., role and institution) will be responsible for data management (i.e., the data steward)?

Summarise here all the roles and responsibilities described in the previous answers.

Tips for best practices

  • Outline the roles and responsibilities for data management/stewardship activities, for example, data capture, metadata production, data quality, storage and backup, data archiving, and data sharing. Name the responsible individual(s) where possible.
  • For collaborative projects, explain the coordination of data management responsibilities across partners.
  • Indicate who is responsible for implementing the data management plan and for ensuring that it is reviewed and, if necessary, revised.
  • Consider scheduling regular updates of the data management plan.
  • Consider who will be responsible for the data resulting from your project after your project  has ended.

6.2 What resources will be required for your data management procedures to ensure that the data can be opened and preserved according to FAIR principles (Findable, Accessible, Interoperable, Re-usable)?

Estimate the resources needed (e.g., time and money) to manage, preserve and share the data. Consider the additional computational facilities and resources that need to be accessed, and what the associated costs will amount to.

Tips for best practices

  • Remember to specify your data management costs in the budget, according to funder requirements.
  • Carefully consider and justify any resources needed to share, store and curate the data. These may include storage costs, hardware, staff time, the costs of preparing data for deposit and repository charges. Also note that preparing the data for open access also entails costs and takes time. Explain how all of the  above points have been taken into account in the cost estimate.

Do you have questions or feedback for us?