Data management plan
Planning data management
Research data and research publications are among the most important outputs of publicly funded research. By default, therefore, all research data and materials produced with funding from the Research Council of Finland are openly available.
The degrees of data openness may justifiably vary, ranging from fully open to strictly confidential. If the research data cannot be made openly available in full, the metadata must be stored in a Finnish or international data finder. From a research ethical and legislative point of view, research data should normally be stored for research verification purposes.
The planning of data management enables open access to research data, reduces the risk of the loss of research data and is an essential part of good scientific practice. The data management plans must be feasible at site of research and the measures taken in accordance with the plans must be in accordance with good data management practice.
Data management plans are submitted to the Research Council of Finland at two stages:
A. At the application stage
At the application stage, all applicants shall briefly describe their data management on the application form.
Describe the following:
- where the data will be stored and how they will be backed up during the project
- how any legal and ethical issues related to the sharing of data (e.g. ownership, copyright, sensitivity) will be resolved
- where the data or a publishable portion of them will be made available after the end of the project
- If the project does not collect or produce any data fully or partially suitable for reuse, justify why the data cannot be made available even partially.
In the application’s funding plan, the applicant should consider that the costs associated with storing and sharing research data and material are regarded as overheads for the project’s host organisation, but they may also be legitimately accepted as research costs to be covered with Research Council research funding.
If a project is focused, for instance, on the processing of large amounts of data, or if the processing of the project’s data requires exceptionally much work or time, the researcher may apply for funding for salary costs related to the data processing.
B. After a positive funding decision
A researcher who has been granted funding must submit the actual data management plan within eight weeks of the funding decision.
Make sure to ask your organisation’s data expert for help in good time when writing the data management plans. We recommend that you use the DMPTuuli tool to draft the data management plan. The plan should be no more than approximately three pages long.
The full data management plan shall be submitted in the online services in connection with applicant approval but before the approval of the representative of the site of research. The consortium has a joint data management plan, and only the consortium PI will submit the plan when approving the funding.
The site of research commits to ensuring that the data management plan can be implemented at the site of research, and that the measures to be taken comply with good data management practice.
The funds can be paid only after the applicant and the representative of the site of research have accepted them. The system will then notify the funding to the finance administration of the site of research, whereupon the funds will be ready to use.
These detailed guidelines are the same as those published in DMPTuuli that concern calls by the Research Council of Finland. (NB! The Research Council’s research infrastructure calls have their own data management guidelines.)
DMP questions, Research Council of Finland guidelines and best practices
Contents:
- General description of data
- Ethical and legal compliance
- Documentation and metadata
- Storage and backup during the research project
- Opening, publishing and archiving the data after the research project
- Data management responsibilities and resources
Why data management plan?
Your DMP is a living document where you describe how you will manage your data throughout the research life cycle. Update the plan as your project progresses.
To avoid redundancy, refer to your research plan in your DMP and vice versa. The research plan describes the scientific, analytical and methodological processing of data, whereas the DMP describes the technical and administrative management of data.
In the data management context, ‘data’ is understood as a broad term.
Data cover all of the information and material resources your research results are based on. Not all research produces research data that can be reused, but every research process generates and uses data that should be managed properly and systematically throughout the process.
How do I write a DMP?
- First read all of the questions! Answer the questions where applicable and at least the main categories – if a certain question is not applicable in your case, justify why not.
- Include background information such as the name of the applicant and the project, the project number, the funding programme and the version of the DMP. Demonstrate your data management and version control skills, for example, when naming the data management plan.
- Use the data management plan as a risk assessment document. Show that you can recognise, anticipate and handle the risks related to your data management workflow.
- The data management plan should be drawn from the perspective of your own research project –do not copy/paste examples from somewhere else.
Why should you manage your research data and write a data management plan (DMP)?
- It is good scientific practice and it helps you save time and money!
- You will reduce the risk of losing your data.
- You will be able to anticipate complex ownership and user rights issues in advance.
- The plan supports making your data FAIR: Findable, Accessible, Interoperable and Reusable. This will increase data reuse as well as the visibility of your project.
- You will meet your funder’s requirements.
- Your data management plan reflects your skills as a project leader.
In the data management context, ‘data’ is understood as a broad term. Data cover all of the information and material resources your research results are based on. In the plan, you can concentrate on the data for which you are responsible.
The data management plan should describe how the data will be managed throughout the lifecycle of the research. The plan is a living document, which should be updated as the research progresses.
Research data management practices should aim to follow FAIR principles, that is, the data should be Findable, Accessible, Interoperable and Re-usable.
1. General description of data
1.1 What kinds of data is your research based on? What data will be collected, produced or reused? What file formats will the data be in? Additionally, give a rough estimate of the size of the data produced/collected.
Briefly describe what types of data you are collecting or producing. In addition, explain what kinds of existing data you will (re)use. List, for example, the types of texts, images, photographs, measurements, statistics, physical samples or codes.
Categorise your data in a table or with a clear list, for example:
A) previously collected data reused in this project
B) data collected for this project,
C) data produced as an outcome of the project.
The categorisation (item 1.1) can form a general structure for the rest of the data management plan..
data management plan.
List the file formats. In some cases, the file formats used during the research project may differ from those used in archiving the data after the project. List both. The file format is a primary factor in the accessibility and reusability of your data in the future.
In the plan, describe the required disk space, not how many informants participated in the project. A rough estimation of the size of the data is sufficient (e.g., less than 100 GB, approx. 1 TB or several petabytes).
Tips for best practices
- Use a table or bullet points for a concise way to present data types, file formats, the software used and the size of the data.
- Examples of file formats: .csv, .txt, .docx, .xslx and .tif.
- Make sure to describe any special or uncommon software necessary to view or use the data, especially if the software is coded or produced in your project.
Avoid overlaps with the research plan! Data analysis and methodological issues related to the data and materials have been described in the research plan.
1.2 How will the consistency and quality of data be controlled?
Explain how the data collection, analysis and processing methods used may affect the quality of the data and how you will minimise the risks related to data accuracy.
Data quality control ensures that no data are accidentally changed and that the accuracy of data is maintained over their entire life cycle. Quality problems may emerge due to the technical handling, converting or transferring of data, or during its contextual processing and analysis.
Tips for best practices
- Adopt and enforce formal version control processes. This can mean, for instance, shared and documented file naming conventions, or everyone working in Git repositories.
- Transcriptions of audio or video interviews should be checked by someone other than the transcriber.
- Analog material should be digitised in the highest resolution possible for accuracy.
- In all conversions, maintaining the original information content should be ensured.
Avoid overlaps with the research plan! Issues related to data analysis, methods and tools have been described in the research plan. Do not include, for example, instrument calibration descriptions in the data management plan.
2. Ethical and legal compliance
2.1 What legal issues are related to your data management (for example, GDPR and other legislation affecting data processing)?
All types of research data involve questions of rights and legal and ethical issues. Demonstrate that you are aware of the relevant legislation related to your data processing. If you are handling personal or sensitive information, describe how you will ensure privacy protection and data anonymisation or pseudonymisation.
Tips for best practices
- Check your institutional ethical guidelines, data privacy guidelines and data security policy, and prepare to follow the instructions that are given in these guidelines.
- If your research is to be reviewed by an ethical committee, outline in your data management plan how you will comply with the protocol (e.g., how to remove personal or sensitive information from your data before sharing them to ensure privacy protection).
- Will you process personal data? If you intend to do so, please detail what type of personal data you will collect.
- All data related to an identified or identifiable person is personal data. Information such as names, telephone numbers, location data and information on the congenital diseases of the individual’s grandparents is personal data.
- Read more on the website of the Office of the Data Protection Ombudsman.
Avoid overlaps with the research plan! Detailed research ethical aspects, statements of ethics committees and the use of laboratory animals, etc., have been described in the research plan.
2.2 How will you manage the rights of the data you use, produce and share?
Describe how you will agree upon the rights of use related to your research data – including the collected, produced and (re)used data of your project. Here, you can employ your categorisation in Question 1. Each of the categories in Question 1.1 involves different rights and licences. Describe the transfer of rights procedures relevant to your project. Describe confidentiality issues if applicable in your project.
License your data!
Tips for best practices
- Agreements on ownership and rights of use should be made as early as possible in the project lifecycle.
- Have you gained consent for data preservation and sharing?
- Follow the funder’s policies.
- It is recommended to make all research data, code and software created within a research project available for reuse, for example, under Creative Commons, GNU, MIT or another relevant licence.
3. Documentation and metadata
How will you document your data to make them findable, accessible, interoperable and re-usable for you and others? What kinds of metadata standards, README files or other documentation will you use to help others understand and use your data?
Data documentation enables datasets and files to be found, used and properly cited by other users (human or computer). Without sufficient documentation, the data cannot be reused.
Documentation includes essential information regarding the data, for example a) core metadata (for discovery and identification) where, when, why and how the data were collected as well as b) descriptive information how the data is interpreted correctly using metadata standards, vocabularies and readme files, for instance.
Tips for best practices
- Describe all types of documentation (Readme files, metadata, etc.) you will provide to help secondary users find, understand and reuse your data. Repositories often require the use of a specific metadata standard. Check whether a discipline-specific metadata schema or standard exists that can be adopted.
- Consider how the data will be organised during the project. Describe, for example, your file-naming conventions, version control and folder structure.
- Use research instruments that automatically create standardised metadata formats.
- Identify the types of information that should be captured to enable other researchers to discover, access, interpret, use and cite your data. Know the minimum requirements for data documentation; see, for example, the Qvain User Guide.
4. Storage and backup during the research project
4.1 Where will your data be stored, and how will they be backed up?
Describe where you will store and back up your data during your research project. Consider who will be responsible for backup and recovery. If there are several researchers involved, create a plan with your collaborators and ensure safe transfer between participants.
Show that you are aware of the storing solutions provided by your organisation. Do not merely refer to the IT services. In the end, you are responsible for your data, not the IT department or the organisation.
Explain the methods for preserving and sharing your data after your research project has ended in more detail in section 5.
Tips for best practices
- The use of a safe and secure storage provided and maintained by your organisation’s IT support or other reliable IT provider such as CSC is preferable.
- Do not use external hard drives as the main storing option.
- Follow your institution’s data security requirements.
4.2 Who will be responsible for controlling access to your data, and how will secured access be controlled?
It is essential to consider data security issues, especially if your data include sensitive data, personal data, politically sensitive information or trade secrets. Describe who has access to your data, what they are authorised to do with the data and how you will ensure the safe transfer of data to your collaborators.
Tips for best practices
- Access controls should always be in line with the level of confidentiality involved.
5. Opening, publishing and archiving the data after the research project
5.1 What part of the data can be made openly available or published? Where and when will the data, or their metadata, be made available?
Describe how you will make data available and findable for reuse. If your data or parts of the data cannot be opened, explain why you publish only metadata.
In the case of sensitive data, which cannot be opened, describe the opening of their metadata. Describe the secured preservation procedure of sensitive data in section 5.2.
The openness of research data promotes its reuse.
Tips for best practices
- You can publish a description (i.e., the metadata) of your data without making the data themselves openly available, which enables you to restrict access to the data.
- Publish your data in a data repository or a data journal.
- Check org to find a repository for your data.
- Prefer repositories or publishers that provide persistent identifiers (PID) to enable access and citation to the data via a persistent link (e.g. DOI, URN).
- Remember to check the funder, disciplinary or national recommendations for data repositories.
- It is recommended to make all research data, code and software created within a research project available for reuse, for example, under Creative Commons, GNU, MIT or another relevant licence.
- Avoid overlaps with the publication plan! The research article publication does not equal data publication. A data journal is a publication forum specialised in publishing research data.
Avoid overlaps with the publication plan! The research article publication does not equal data publication. A data journal is a publication forum specialised in publishing research data.
5.2 Where will data with long-term value be archived, and for how long?
Briefly describe what part of your data you will preserve, where it is preserved, and for how long. Long term preservation means that data is preserved for as long as necessary, for several decades or even centuries.
You can categorise your data sets according to the anticipated preservation period:
A) data to be destroyed upon the end of the project
B) data to be archived for a verification period, which varies across disciplines (e.g., 5–15 years)
C) data to be archived for potential re-use (e.g., for 25 years)
D) data with long-term value to be archived by a curated facility for future generations for tens or hundreds of years.
You will need to decide which of your research data to preserve and which to dispose of. Data that are unique or difficult to replicate might have long-term value and be fit for preservation. Special long-term data repositories should be used for digital preservation.
Tips for best practices
- Decisions about preserving data should begin during the data management planning stage, and should take into account, for example, institutional guidance and requirements.
- Use data repositories with a commitment to long-term curation. For example, the Fairdata Digital Preservation Service is dedicated to research datasets that have significant value to the organisation or on a national level currently and especially also in the future. Contact your host organisation for further information on the Service.
6. Data management responsibilities and resources
6.1 Who (e.g., role and institution) will be responsible for data management?
Summarise here all the roles and responsibilities described in the previous answers. Consider who will be responsible for the data resulting from your project after your project has ended.
Tips for best practices
- Outline the roles and responsibilities for data management/stewardship activities, for example, data capture, metadata production, data quality, storage and backup, data archiving, and data sharing. Name the responsible individual(s) where possible.
- For collaborative projects, explain the coordination of data management responsibilities across partners.
- Indicate who is responsible for implementing the data management plan and for ensuring that it is reviewed and, if necessary, revised.
- Consider scheduling regular updates of the data management plan.
6.2 What resources will be required for your data management procedures to ensure that the data can be opened and preserved according to FAIR principles (Findable, Accessible, Interoperable, Re-usable)?
Estimate the resources, such as time and financial costs, needed to manage, share and preserve the data. These may include storage costs, hardware, staff time, the costs of preparing data for deposit and repository charges.
Tips for best practices
- Consider the additional computational facilities and resources that need to be accessed, and what the associated costs will amount to.
- Carefully consider the resources needed to share, store and curate the data.
- Remember to specify your data management costs in the budget, according to funder requirements.