It is recommended to follow FAIR –principles (Findable, Accessible, Interoperable, and Re-usable) in research data management.
Research funders and publishers require opening the background materials of a study as far as possible, taking into account ethical and juridical constraints. According to the University of Eastern Finland’s data policy, research material related to research carried out with public funding is, in principle, open. Research data can be opened by depositing it to a national or international data repository or archive. If the research data cannot be opened for further use, then the descriptive information, i.e. metadata, should be shared openly.
Opening research data:
- improves the extensive usability and further utilisation of the research results
- advances research and enables new observations and phenomena to be discovered
- promotes research co-operation
- provides researchers equal opportunities to utilise research data
For a researcher, opening the data is a scientific merit, for the achievements in producing and sharing research data counts as scientific and societal impact in research work. Opening data is worth adding to your CV. Open data merits a researcher via citations to open access research data and publications produced from research data. Unused data is also recommended to be published. In order to make research data findable and utilisable, you should link sufficiently accurate descriptive information or metadata to the research data.
UEF eRepository (UEF eRepo) automatically compiles information from research data produced by researchers at the University of Eastern Finland from several different services, such as Etsin, Zenodo ja EUDAT ja Dryad. Research data produced by researchers at the University of Eastern Finland are recommended to be described in the national Etsin service using the Qvain tool. Also in cases when the data cannot be shared openly, the descriptive metadata can be shared. The public description of the data increases information about existing data and can, for example, create opportunities for cooperation, even if the data itself cannot be shared openly. Services suitable for opening research data are presented below.
When making research data openly accessible, the data needs to be in a form that an outsider can understand and reuse it. A thoroughly devised data management plan is instructive already in data-gathering phase and makes it easier to open the research data. The shared data must be a simple and explicit unity. It is advisable to publish your research data in such a data archive that potential users would locate easily.
Opening data step by step:
- Plan data sharing already in the beginning of the research. Describe sharing the data in your data management plan so that the requirements of data sharing are already taken into account when you are gathering the data. Define the phase of the research project in which you wish to open your data.
- Find out if there are any ethical, juridical or contractual limitations to the opening of data. In some cases, the data can be shared for further use with restrictions, for example, so that the material is made available for research use through an application for a permit. If you cannot open the data, produce descriptive information about it, i.e. metadata openly.
- Prepare the data for sharing and reuse, for instance, anonymise it if necessary.
- It is recommended to choose an open file format which requires no commercial software and which is compatible with as many operating systems as possible.
- Choose a suitable and reliable data repository or archive where you deposit the data openly available (national or international, discipline-based or general data repository, more information available after next section). Make sure that the chosen repository gives persistent identifiers (e.g. URN, DOI) for your data.
- Describe and document your data so it can be found and reused. Remember to mention University of Eastern Finland as the organisation in the descriptive information.
- Define the rights to use your data by licensing your research data. Guidelines for using Creative Commons licenses for research data are available in the next section.
- Advertise your open data in publications or social media. Add information about the open data into your CV, publication list and to your website. You can also write a publication about your data in a data journal.
Licensing is recommended when opening your data. Licenses allow a researcher personally to define the rights to use their data: what, when and to whom the data will be signed. It is important to acknowledge the funders’ and research organisations’ requirements in addition to laws and research ethics.
The research data’s author or the person to whom the author has assigned the user rights determine the terms of use, apart from situations where juridical or ethical aspects (such as data protection, trade secret or other confidential information) limit data use and publishing. Furthermore, the Copyright Act and good scientific practices require that the author’s name is mentioned in a proper manner.
For research data and its metadata, the license can be defined, for example, with the help of the Creative Commons license. Creative Commons is designed directly for authors, and licensing has been made easy. The use of a CC license is free of charge and does not require a separate permission or registration. The Creative Commons website has a CC license chooser, which helps to choose an appropriate license. You can add a license to your data as text, as an icon (including audio), or in a machine-readable format. A CC0 license (the author renounces all rights) or a CC BY 4.0 license (the author must always be mentioned) is recommended for open research data, and a CC0 license for open metadata. Although the CC0 license does not require a reference to the author, the expression of the author is part of good scientific practice. The CC license may not be applicable to all research outputs. For computer software and source codes, for example, an MIT license or a GPL license is often recommended.
For more information:
How to License Research Data / Digital Curation Centre
There are several reliable national and international data services that researchers can use to deposit their research data. The researcher can choose the repository that best suits the data and the discipline. It is recommended to use a service which is reliable and stable and enables open access.
Research data can also be opened by publishing it, for example, in a data-focused journal. Data journals are a new form of publication that focuses on publishing research data and information about it. In this case, the data is stored in a repository (recommended by the journal or chosen by the researcher), and in the data article, it is described in more detail, for example, what the data is, how it was collected, and how it can be reused.
Factors affecting repository selection:
• the repository produces an internationally recognised persistent identifier (e.g. DOI, URN)
• discipline-specific approach (e.g. competence in data formats typical of the field of science, the data is findable for other researchers in the same field)
• reliability (e.g. data security, data location, internationally recognised certificates such as CoreTrustSeal)
• a maintenance background organisation (attention should be paid to the terms and conditions of commercial operators)
• the need for long-term preservation (not all data repositories are suitable for long-term preservation)
• data management, i.e. curation (the data repository takes care of measures related to content maintenance and updates to ensure data integrity and usability).
In case your field of science does not have recommendations for suitable and reliable data repository, you can use so-called general repositories. These include Zenodo or Dryad. We will briefly introduce discipline-specific and general data repositories, as well as directory services.
National data repositories and archives
Fairdata.fi: IDA
• research data storage service especially during research (the project using the service must be active), but is also available for publishing data
• Finnish higher education institutions and state research institutes (external researchers may be involved in the projects)
• DOI Identifier if the data stored in IDA is described using the Qvain tool
• free for users
• data in Finland
• the service is provided by CSC and organised by the Ministry of Education and Culture
Fairdata.fi: Fairdata Digital Preservation Service (DPS)
• research data preservation service for long-term preservation
• researchers interested in utilisation will contact their own organisation's data support (see section on long-term preservation (DPS) below)
• the service can be used by Finnish higher education institutions and state research institutes
• the service is provided by CSC and organised by the Ministry of Education and Culture
The Finnish Social Science Data Archive (FSD) and the Aila service portal
• focuses on social science data, also data from other relevant fields such as Arts and Humanities, Education, and Health Sciences
• data should be anonymised
• curated, certified (CoreTrustSeal)
• DOI/URN Identifier
• free of charge
• data in Finland
• operates in connection with the University of Tampere and is a Finnish service provider of CESSDA (Consortium of European Social Science Data Archives)
The Language bank of Finland
• text and speech data
• curated, certified (CoreTrustSeal)
• URN Identifier
• free basic use
• data in Finland
• the service is provided by the national FIN-CLARIN consortium (Finnish universities and research organisations in the background)
International data repositories and archives
EUDAT
• services for data sharing, storage during research and archiving (e.g. B2SHARE, B2DROP, B2SAFE)
• general repository (not suitable for sensitive data)
• not curated
• DOI Identifier
• free (storage limit)
• data in Europe
• maintained by EUDAT CDI (European network of research organisations), main funding provider, e.g. The European Commission
The European Bioinformatics Institute (EMBL-EBI)
• molecular data resources and bioinformatics services to the scientific community
• several data repositories and a wizard that helps to find the right archive to submit data
• an intergovernmental organisation of several European countries, including Finland
GBIF, the Global Biodiversity Information Facility
• an international network and data infrastructure funded by the world's governments
• biodiversity data about all types of life on Earth (metadata about undigitized resources, checklist data, occurrence data, sampling-event data).
• DOI Identifier
• free of charge
• only publishes datasets directly from organizations. UEF researchers who wish to publish data in GBIF should first contact datasupport@uef.fi.
Dryad
• general repository (not suitable for sensitive data)
• curated
• DOI Identifier
• data is published exclusively under CC0 licence
• small data publishing charge to recover the core costs of curating and preserving data.
• data in the United States
• managing entity non-profit corporation registered in the United States
Pangaea
• discipline-specific data repository for Earth & Environmental Science
• curated, certified (CoreTrustSeal, World Data System, WMO Information System)
• DOI Identifier
• free of charge
• maintained by German research institutes
Zenodo
• general repository
• not curated
• DOI Identifier
• free of charge
• data in Europe
• maintained by CERN, main financier, e.g. European Commission (OpenAIRE)
Directory Services
CESSDA
• consortium of European Social Science Data Archives
Data repositories
• a list of open data repositories and databases by discipline
• part of an Open Access Directory (OAD) wiki
OpenAIRE
• European open access infrastructure
OpenDOAR
• directory service for data repositories
• lists free and open research data repositories
• joint project involving the University of Nottingham and Lund University; governing body Jisc (Joint Information Systems Committee, UK)
re3data.org
• directory service for data repositories
• various filtering options
• funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)
Further use of existing research data is economical and saves limited resources. It is good for researcher to utilise already existing research data in their study, because
- it speeds up the process
- ready data can work as reference material for their own data, or their own data can be merged with existing data
- it will save research resources, when they do not need to conjure up everything by themselves.
When using data produced by others, the terms of use for the data must be considered. Terms of use are defined usually by a licence. Data can be used either completely freely or there may be restrictions to its use.
Research data can be searched via search services, data repositories, archives and portals. When searching, it is useful to utilise general subject headings (Finto service provides subject headings from different fields of science) that may have been used when describing data. A list of research data services is previous section.
Various public actors open the data they produce for reuse:
- Avoindata.fi / open data regarding public administration
- Finnish Meteorological Institute – Open data / Finnish Meteorological Institute’s open data and open source code
- Finnish Transport Infrastructure Agency – Open data / open data regarding traffic and traffic networks
- National Land Survey of Finland – Open data service / self-service for drawing up maps (in Finnish)
- Finnish Instutute for Health and Welfare (THL) – Open data / National Institute for Health and Welfare’s open data
- Statistics Finland – Open data and interfaces / open data files and their interfaces
- Traficom – Open data / Finnish Transport and Communications Agency's open data
- Finnish Tax Administration – Open data / Open data provided by Finnish Tax Administration (in Finnish)
- European Union Open Data Portal
Data citation
Research data must be cited as all other sources used in research. According to the national data citation information model data reference should consist of following elements:
creator, title, host organisation, publication time and/or date and persistent identifier.
Useful additional elements are: version, resource type, license status, ORCID, embargo information.
Data repositories and archives usually have guidelines for data citation. Also publishers can have their own guidelines how to refer to data in journals.
- Tracing data: Data citation roadmap for Finland / FCRD
- FSD guidelines for data citation
- How to Cite Datasets and Link to Publications / Digital Curation Centre