Data management plan

Planning data management

Research data and research publications are among the most important outputs of publicly funded research. By default, therefore, all research data and materials produced with funding from the Research Council of Finland are openly available.

The degrees of data openness may justifiably vary, ranging from fully open to strictly confidential. If the research data cannot be made openly available in full, the metadata must be stored in a Finnish or international data finder. From a research ethical and legislative point of view, research data should normally be stored for research verification purposes.

The planning of data management enables the opening of research data, reduces the risk of the loss of research data and is an essential part of good scientific practice. The data management plans must be feasible at site of research and the measures taken in accordance with the plans must be in accordance with good data management practice.

Data management plans are submitted to the Research Council of Finland at two stages:

A. At the application stage

At the application stage, all applicants shall briefly describe their data management in section 4.3 of the research plan (‘Open science’).

Describe the following:

  • where the data will be stored and how they will be backed up during the project
  • how any legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolved
  • where the data or a publishable portion of them will be made available after the end of the project
  • If the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially.

In the application’s funding plan, the applicant should consider that the costs associated with storing and sharing research data and material are regarded as overheads for the project’s host organisation, but they may also be legitimately accepted as research costs to be covered with Research Council research funding. If a project is focused, for instance, on the processing of large amounts of data, or if the processing of the project’s data requires exceptionally much work or time, the researcher may apply for funding for salary costs related to the data processing.

B. After a positive funding decision

A researcher who has received a positive funding decision must submit the actual data management plan within eight weeks of the funding decision. Make sure to ask your organisation’s data expert for help in good time when writing the data management plans. We recommend that you use the DMPTuuli tool to draft the data management plan. The plan should be no more than approximately three pages long.

The actual data management plan shall be submitted in the online services in connection with applicant approval but before the approval of the representative of the site of research.  The consortium has a single joint data management plan and only the consortium leader will submit the plan during while approving the funding. The site of research commits to ensuring that the data management plan can be implemented at the site of research, and that the measures to be taken comply with good data management practice. The funds can be paid only after the applicant and the representative of the site of research have accepted them. The system will then notify the funding to the finance administration of the site of research, whereupon the funds will be ready to use.

These detailed guidelines are the same as those published in DMPTuuli that concern calls by the Research Council of Finland. NB! The Research Council's research infrastructure calls have their own data management guidelines.

DMP questions, Research Council of Finland guidelines and best practices

Contents:

  1. General description of data
  2. Ethical and legal compliance
  3. Documentation and metadata
  4. Storage and backup during the research project
  5. Opening, publishing and archiving the data after the research project
  6. Data management responsibilities and resources

Why data management plan?

Your DMP is a living document where you describe how you will manage your data throughout the research life cycle. Update the plan when your project progresses.

To avoid redundancy, refer to your research plan in your DMP and vice versa. The research plan describes the scientific, analytical and methodological processing of data, whereas the DMP describes the technical and administrative management of data.

In the DMP context, ‘data’ is understood as a broad term. Data covers all the information and research material your results are based on.

How do I write a DMP?

  • Read all of the questions first! Answer the questions where applicable and at least the main categories – if a certain question is not applicable in your case, justify why not.
  • Include background information such as the name of the applicant and the project, the project number, the funding programme and the version of the DMP. Demonstrate your data management and version control skills, for example, when considering the name of the DMP file.
  • Use the DMP as a risk evaluation document – it shows that you can recognise, anticipate and handle the risks related to your data management workflow.
  • The DMP should be drawn from your own research project – do not copy from somewhere else and write only sentences you understand.
  • Follow the organisation’s or funder’s requirements.

Why should you manage your research data and write a data management plan (DMP)?

  • It is good research practice and helps you to save time and money!
  • You will reduce the risk of losing your data.
  • You will be able to anticipate complex ownership and user rights issues in advance.
  • DMP supports making your data FAIR: Findable, Accessible, Interoperable and Re-usable. This will increase data reuse as well as visibility of your project.
  • You will meet your funder’s requirements.
  • Your DMP reflects your researcher skills

1. General description of data

1.1 What kinds of data is your research based on? What data will be collected, produced or reused? What file formats will the data be in? Additionally, give a rough estimate of the size of the data produced/collected.

Briefly describe what types of data you are collecting or producing. In addition, explain what kinds of existing data you will (re)use. List, for example, the types of texts, images, photographs, measurements, statistics, physical samples or codes.

Categorise your data in a table or with a clear list, for example:

A) previously collected existing data which is being reused in this project,

B) data collected for this project,

C) data produced as an outcome of the research process.

The categorisation can form a general structure for the rest of the DMP.

List the file formats for each data set. In some cases, the file formats used during the research project may differ from those used in archiving the data after the project. List both. The file format is a primary factor in the accessibility and reusability of your data in the future.

In the DMP, what is important is to describe the required disk space, not how many informants participated in the project. A rough estimation of the size of the data is sufficient, for example, less than 100 GB, approx. 1 TB or several petabytes.

Tips for best practices

  • Use a table or bullet points for a concise way to present data types, file formats, the software used and the size of the data.
  • Examples of file formats:.csv, .txt, .docx, .xslx and .tif.
  • Make sure to describe any special or uncommon software necessary to view or use the data, especially if the software is coded or produced in your project.
  • You can also estimate the amount of data production or collection during the project for a specific time period, such as per week: “The project is producing/collecting approximately 100 GB of data per week.”

Avoid overlaps with the research plan! Data analysis and methodological issues related to data and materials should be described in your research plan.

1.2 How will the consistency and quality of data be controlled?

Explain how the data collection, analysis and processing methods used may affect the quality of the data and how you will minimise the risks related to data accuracy.

Data quality control ensures that no data are accidentally changed and that the accuracy of data is maintained over their entire life cycle. Quality problems may emerge due to the technical handling, converting or transferring of data, or during its contextual processing and analysis.

Tips for best practices

  • Adopt and enforce formal version control processes. This can mean e.g. simply shared and documented file naming conventions, or everyone in team working in Git repositories.
  • Transcriptions of audio or video interviews should be checked by someone other than the transcriber.
  • Analog material should be digitised in the highest resolution possible for accuracy.
  • In all conversions, maintaining the original information content should be ensured.
  • Organise training sessions and set guidelines to ensure that everyone in your research group can implement quality control and anticipate the risks related to the quality of the data.

Avoid overlaps with the research plan! Issues related to data analysis, methods and tools should be described in your research plan, that is, do not include, for example, instrument calibration descriptions here.

2. Ethical and legal compliance

2.1 What legal issues are related to your data management (for example, GDPR and other legislation affecting data processing)?

All types of research data involve questions of rights and legal and ethical issues. Demonstrate that you are aware of the relevant legislation related to your data processing. If you are handling personal or sensitive information, describe how you will ensure privacy protection and data anonymisation or pseudonymisation.

Tips for best practices

  • Check your institutional ethical guidelines, data privacy guidelines and data security policy, and prepare to follow the instructions that are given in these guidelines.
  • If your research is to be reviewed by an ethical committee, outline in your data management plan how you will comply with the protocol (e.g., how to remove personal or sensitive information from your data before sharing them to ensure privacy protection).
  • Will you process personal data? If you intend to do so, please detail what type of personal data you will collect.
  • All data related to an identified or identifiable person is personal data. Information such as names, telephone numbers, location data and information on the congenital diseases of the individual’s grandparents is personal data.
  • Read more on the website of the Office of the Data Protection Ombudsman.

Avoid overlaps with the research plan! Detailed research ethical aspects, statements of ethics committees and the use of laboratory animals, etc., are described in the research plan.

2.2 How will you manage the rights of the data you use, produce and share?

Describe how you will agree upon the rights of use related to your research data – including the collected, produced and (re)used data of your project. Here, you can employ your categorisation in Question 1. Each of the categories in Question 1.1 involves different rights and licences. Describe the transfer of rights procedures relevant to your project. Describe confidentiality issues if applicable in your project. License your data!

Tips for best practices

3. Documentation and metadata

How will you document your data to make them findable, accessible, interoperable and re-usable for you and others?  What kinds of metadata standards, README files or other documentation will you use to help others understand and use your data?

Data documentation enables data sets and files to be discovered, used and properly cited by other users (human or computer). Without sufficient documentation the data cannot be reused.

Documentation includes essential information regarding the data, for example a) core metadata (for discovery and identification) where, when, why and how the data were collected as well as b) descriptive information how the data is interpreted correctly using metadata standards, vocabularies and e.g. readme-files.

Tips for best practices

  • Describe all the types of documentation (README files, metadata standards, vocabularies etc.) you will provide to help secondary users to understand and reuse your data. Repositories often require the use of a specific metadata standard. Check whether a discipline-specific metadata schema or standard exists that can be adopted.
  • Consider how the data will be organised during the project. Describe, for example, your file-naming conventions, version control and folder structure.
  • Use research instruments, which create standardised metadata formats automatically.
  • Identify the types of information that should be captured to enable other researchers to discover, access, interpret, use and cite your data. See for example Qvain requirements (https://www.fairdata.fi/en/user-guides/qvain-user-guide/#QvainDataset

4. Storage and backup during the research project

4.1 Where will your data be stored, and how will they be backed up?

Describe where you will store and back up your data during your research project. Consider who will be responsible for backup and recovery. If there are several researchers involved, create a plan with your collaborators and ensure safe transfer between participants.

Show that you are aware of the storing solutions provided by your organisation. Do not merely refer to IT services. In the end, you are responsible for your data, not the IT department or the organisation.

Explain the methods for preserving and sharing your data after your research project has ended in more detail in Section 5.

Tips for best practices

  • The use of a safe and secure storage provided and maintained by your organisation’s IT support or other reliable IT provider such as CSC is preferable.
  • Do NOT USE external hard drives as the main storing option.
  • Follow your institution's data security requirements.

4.2 Who will be responsible for controlling access to your data, and how will secured access be controlled?

It is essential to consider data security issues, especially if your data include sensitive data, personal data, politically sensitive information or trade secrets. Describe who has access to your data, what they are authorised to do with the data and how you will ensure the safe transfer of data to your collaborators.

Tips for best practices

  • Access controls should always be in line with the level of confidentiality involved.

5. Opening, publishing and archiving the data after the research project

5.1 What part of the data can be made openly available or published? Where and when will the data, or their metadata, be made available?

Describe how you will make data available and findable for reuse. If your data or parts of the data cannot be opened, explain why you publish only metadata.

In the case of sensitive data, which cannot be opened, describe the opening of their metadata. Describe the secured preservation procedure of sensitive data in section 5.2.

The openness of research data promotes its reuse.

Tips for best practices

  • You can publish a description (i.e., the metadata) of your data without making the data itself openly available, which enables you to restrict access to the data.
  • Publish your data in a data repository or a data journal.
  • Check re3data.org (https://www.re3data.org/) to find a repository for your data.
  • Prefer repositories or publishers, which provide persistent identifiers (PID) to enable access and citation to the data via a persistent link (e.g. DOI, URN).
  • Remember to check the funder, institutional, disciplinary or national recommendations for data repositories.
  • It is recommended to make all of the research data, code and software created within a research project available for reuse, for example, under a Creative Commons (https://creativecommons.org/choose/), GNU (https://www.gnu.org/licenses/gpl-3.0.en.html) or MIT license (https://opensource.org/licenses/MIT), or under another relevant license.

Avoid overlaps with the publication plan! The research article publication does not equal data publication. The data journal is a publication forum specialised in publishing research data.

5.2 Where will data with long-term value be archived, and for how long?

Briefly describe what part of your data you will preserve, where it is preserved, and for how long. Long term preservation means that data is preserved for as long as necessary, for several decades or even centuries.

You can categorise your data sets according to the anticipated preservation period:

A) data to be destroyed upon the end of the project

B) data to be archived for a verification period, which varies across disciplines (e.g., 5–15 years)

C) data to be archived for potential re-use (e.g., for 25 years)

D) data with long-term value to be archived by a curated facility for future generations for tens or hundreds of years.

You will need to decide which of your research data to preserve and dispose of. Data that is unique or difficult to replicate might have long-term value and be fit for preservation. Special long term data repositories should be used for digital preservation.

Tips for best practices

  • Decisions about preserving data should begin during the data management planning stage, and should take into account e.g. institutional guidance and requirements.
  • Use data repositories with a commitment to long-term curation, e.g. Fairdata Digital Preservation Service is dedicated for research datasets that have significant value to the organization or on a national level currently and especially also in the future. Contact your home organisation for further information.member to check funder, disciplinary or national recommendations for data archives.

6. Data management responsibilities and resources

6.1 Who (e.g., role and institution) will be responsible for data management?

Summarise here all the roles and responsibilities described in the previous answers. Also, consider who will be responsible for the data resulting from your project after your project has ended.

Tips for best practices

  • Outline the roles and responsibilities for data management/stewardship activities, for example, data capture, metadata production, data quality, storage and backup, data archiving, and data sharing. Name the responsible individual(s) where possible.
  • For collaborative projects, explain the co-ordination of data management responsibilities across partners.
  • Indicate who is responsible for implementing the DMP and for ensuring that it is reviewed and, if necessary, revised.
  • Consider scheduling regular updates of the DMP.

 

6.2 What resources will be required for your data management procedures to ensure that the data can be opened and preserved according to FAIR principles (Findable, Accessible, Interoperable, Re-usable)?

Estimate the resources, such as time and financial costs, needed to manage, share and preserve the data. These may include storage costs, hardware, staff time, the costs of preparing data for deposit and repository charges.

Tips for best practices

  • Consider, if there will be additional costs from computational facilities or resources that need to be accessed.
  • Account for resources, time and money, needed to prepare the data for sharing it and preservation (data curation).
  • Remember to specify your data management costs in the budget, according to funder requirements.

Do you have questions or feedback for us?