copy the linklink copied!4. Main policy gaps hindering access to data


This chapter uses as a starting point the OECD Recommendation of the Council concerning Access to Research Data from Public Funding and the OECD Principles and Guidelines for Access to Research Data from Public Funding. A survey conducted by the OECD Committee for Scientific and Technological Policy (CSTP) in 2017 investigated the continued relevance of those principles, as well as potential additional principles for the future.

In March 2018, a joint CSTP-OECD Global Science Forum workshop was held under the title: “Towards new principles for enhanced access to public data for science, technology and innovation”. The workshop brought together 30 experts from government bodies, private companies, academia and non-governmental entities to take stock of current policy practices and discuss future policy needs to support enhanced access to data.

Further, the CSTP produced specific case studies of policies that illustrated good practice in policy making promoting enhanced access to data.


copy the linklink copied!Relevance of the 2006 OECD Recommendation of the Council concerning Access to Research Data from Public Funding

The OECD Recommendation of the Council concerning Access to Research Data from Public Funding (OECD, 2006) (hereafter “the Recommendation”) is based on a set of underlying principles (Box 4.1).

copy the linklink copied!
Box 4.1. Principles contained in the OECD Recommendation of the Council concerning Access to Research Data from Public Funding

The principles can be summarised as follows:

  • Openness: open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based.

  • Flexibility: flexibility requires taking into account the rapid and often unpredictable changes in information and communication technologies (ICTs); the characteristics of different research fields; and the diversity of research systems, legal frameworks and cultures in each member country.

  • Transparency: information on research data and data-producing organisations, and documentation on the data and conditions attached to their use, should be internationally available in a transparent way, ideally through the Internet.

  • Legal conformity: data-access arrangements should respect the legal rights and legitimate interests of all stakeholders in the public enterprise. Access may be restricted for reasons of national security, privacy and confidentiality; trade secrets and intellectual property rights (IPRs); protection of rare, threatened or endangered species; and legal processes.

  • Protection of intellectual property (IP): data-access arrangements should consider the applicability of copyright and other intellectual property laws that may be relevant to publicly funded research databases (as in the case of public-private partnerships).

  • Formal responsibility: access arrangements should promote rules and regulations regarding the responsibilities of the parties involved. They should be developed in consultation with stakeholders and consider such factors as the characteristics of the data, e.g. their potential value for research purposes. Data-management plans and long-term sustainability should also be considered.

  • Professionalism: institutional arrangements for the management of research data should be based on the relevant professional standards and values, embodied in the codes of conduct of the scientific communities involved.

  • Interoperability: access arrangements should consider the relevant international data standards.

  • Quality: the value and utility of data depend on the quality of the data themselves. Particular attention should be paid to compliance with explicit quality standards.

  • Security: attention should be paid to supporting the use of techniques and instruments that guarantee the integrity and security of research data.

  • Efficiency: a central goal of promoting data access and sharing is to improve the efficiency of publicly funded scientific research, to avoid expensive and unnecessary duplication of effort. This also involves performing cost-benefit analysis to define data-retention protocols, engaging data-management specialist organisations, and developing new reward structures for researchers and database producers.

  • Accountability: the performance of data-access arrangements should be subjected to periodic evaluation by user groups, the responsible institutions and research-funding agencies.

  • Sustainability: due consideration should be given to the sustainability of access to publicly funded research data as a key element of the research infrastructure.

Source: OECD (2006), Recommendation of the Council concerning Access to Research Data from Public Funding,

The 2017 Committee for Scientific and Technological Policy (CSTP) survey on policy practice related to access to data for science, technology and innovation (STI) (see Chapter 2) asked respondents to comment on the continued relevance of the Recommendation and implementation issues related to each principle, as well as propose potential new focus areas based on the evolving needs of stakeholders and policymakers. Figure 4.1 shows the survey results.

Interoperability ranked among the most relevant principles cited by survey respondents. Among the principles contained in the Recommendation (hereafter “the Principles”), interoperability included the explicit mention of the standards used, the adoption of best practices by professional organisations active in data collection and preservation, and the consideration of more general ICT standards.

While progress has been made in ensuring interoperability within disciplines, cross-disciplinary interoperability remains undeveloped. Interoperability is also a component of the Findable, Accessible, Interoperable and Reusable (FAIR) principles put forward by the European Union. Respondents proposed that the Recommendation provide guidance on ontologies and translation. The establishment of supranational open-science clouds, such as the European, Australian, African and National Institutes of Health (NIH) Commons in the United States, will generate a leap in findability for scientists in those regions. Going a step further, interoperability between those clouds needs to be established to develop global access.

Quality also ranked among most relevant principles. This principle comprises quality control through peer review, documenting the origin of sources, linking to original research materials and datasets, and data citation practices. Survey respondents felt that more needed to be done on overall quality assurance, by defining explicit and verifiable quality standards that could be captured quantitatively where possible. A potential future revision of the recommendation could provide guidelines for determining data quality, as well as a standard for labelling datasets with a confidence value.

Openness was cited as one of the most relevant principles by respondents. It is defined in the Recommendation as “access on equal terms for the international research community at the lowest possible cost, preferably at no more than the marginal cost of dissemination. Open access to data to research data from public funding should be easy, timely, user-friendly and preferably Internet-based” (OECD, 2006).

The European Union promotes an “open-by-default”, efficient and cross-disciplinary research-data environment. It allows for proportionate limitations only in duly justified cases relating to personal-data protection, confidentiality, IPRs, national security or similar concerns (e.g. “as open as possible and as closed as necessary”). Australia’s Open Government National Action Plan includes open access to data. However, respondents emphasise the need to limit openness in cases where legitimate reasons exist to keep data closed and warn against potential disincentive to data acquisition under an “open-by-default” policy. Further, they emphasise the need for “cultural change” among researchers to achieve more openness.

Transparency ranks very high in terms of relevance. The Principle defines it through the following aspects: i) information on data-producing organisations and their holdings, and documentation on available datasets; ii) dissemination of information on research-data policies to stakeholders; iii) agreements on standards for cataloguing data, and application thereof; and iv) information on data-management and access conditions, to be shared among data archives and data-producing institutions.

Sustainability is also considered highly relevant, and should be ensured throughout the successive evolutions in technology and standards. Hence, datasets need to be preserved across technology changes. However, user expectations should be managed, to ensure that they understand the scope and reuse potential of data – notably older data, which do not conform to the latest data standards. In this respect, regular evaluations of electronic infrastructures and services are needed, and the data-lifetime and deletion policy should be specified.

Security is another highly ranked principle. In the Principles, security encompassed both integrity (completeness and absence of errors) and security (protection against loss, destruction, modification and unauthorised access). Respondents see security as essential to fostering trust. They proposed the introduction of new guidelines on data provenance for repositories; versioning should be introduced to address data integrity. The guidelines should address not only the benefits, profits and advantages of enhanced access to data, but also its possible disadvantages and risks, to identify ways of overcoming them.

copy the linklink copied!
Figure 4.1. Assessment of the relevance of the principles from the OECD Recommendation of the Council concerning Access to Research Data from Public Funding
Figure 4.1. Assessment of the relevance of the principles from the OECD Recommendation of the Council concerning Access to Research Data from Public Funding

Notes: IP = intellectual property; IT = information technology; FAIR = Findable, Accessible, Interoperable and Reusable. The 2017 survey asked respondents to assess the relevance of the 13 principles cited in the original Recommendation (OECD, 2006) on a Likert scale (5 = very high relevance; 0 = no relevance). Responses were received from 55 organisations in 27 countries.

Source: OECD (2018a), OECD Science, Technology and Innovation Outlook 2018, The last additional principle was quoted from Soete (2016), “A sky without horizons. Reflections: 10 years after”,

Legal conformity under the Recommendation includes aspects of national security, privacy and confidentiality, IPRs, protection of biodiversity and legal processes. Respondents agreed this is one of the high-priority principles to be reinforced in future revisions of the Recommendation. The current principle mentions privacy, which may need an enhanced focus going forward.

Protection of IP is considered highly relevant, and the need to include it in future revisions of the recommendation is considered moderate to high. The 2006 Recommendation considers the applicability of copyright or other IP laws that may apply to publicly funded research databases – including to data resulting from balanced public-private partnerships – while facilitating broad access to data where appropriate for public research or other public-interest purposes, duly considering the protection of commercial interests.

Some respondents stated that when partnering with private parties, the public-good nature of publicly funded research data should not be compromised – meaning that it must be freely available for the use of all – and that private parties co-operating with the public sector must acknowledge this special status and abide by these principles. They also see IP protection of public research as a complex issue and suggest that data from public research could be protected under Creative Commons, allowing data to be open, but restricting any derivative or commercial use.

Other principles are seen as having moderate to high relevance, and the need for including in any future revisions of the Recommendation is equally moderate to high (as demonstrated by their positioning in the graph on Figure 4.1):

  • Flexibility applies to technological evolution, the evolving needs of scientific disciplines, and different countries’ legal systems and cultures.

  • Professionalism is defined in the Recommendation as the use of codes of conduct to simplify the regulatory burden on access to data, inducing mutual trust between researchers, institutions and other stakeholders involved, and setting clear rules for temporary exclusive use of data.

  • Efficiency covers cost effectiveness, cost-benefit analysis, engagement of data-management specialists as appropriate and reward structures for researchers.

  • Formal responsibility includes explicit rules and regulations (pertaining to authorship, producer credits, ownership, dissemination, usage restrictions, financial arrangements, ethical rules, licensing terms, liability and sustainable archiving) delineating the responsibilities of parties involved in data-related activities. If included in a future revision of the Recommendation, it should provide guidelines on the handover of responsibility for data curation from a national laboratory to the principal investigator’s institution. Respondents note that some countries do not possess formal agreements on terms of access and data use, disincentivising and increasing the personal burden on researchers.

  • Accountability implies project evaluation according to overall public-investment criteria, managing the performance of data-collection and archival agencies, monitoring the extent of reuse of datasets and the knowledge generated from reuse of existing data, and predicting future needs related to data preservation and reuse.

Respondents were also asked to quote important additional principles that are not present in the Recommendation. Below are some of their responses:

  • Responsibility and ownership, including legal and ethical issues, should be defined.

  • An explicit recognition and reward system for data authorship should be established.

  • Publicly funded research data should be treated as commons: licensing under Creative Commons could be used to provide a framework that ensures openness, while restricting reuse as needed.

  • An embargo period should be set for the exclusive use of data, to reassure authors: the embargo could vary according to the nature of the output.

  • The implications of blockchain technologies on enhanced access to data should be investigated: blockchain is a potential tool to improve inventions’ traceability, providing a way to trace the sources of innovation back into the network of public collaborative science and innovation (Soete, 2016).

Progress has been achieved in the decade since the publication of the Recommendation. Data associated with publications are now expected to be made available and the “reproducibility” agenda is getting more attention, e.g. in clinical trials. A number of research funders are now requiring open access to data.

copy the linklink copied!Addressing challenges in access to research data

Nevertheless, data sharing still seems less widespread than could be expected and is limited to a small number of research fields. This seems to stem from disincentives to sharing research data, including a lack of reward or credit for sharing data; the substantial effort needed to upload and maintain data in a form that is usable by others; risks of misinterpretation or misuse; and IP and personal-data protection issues, i.e. the need to anonymise data samples. In addition, demand for shared data seems limited to a few scientific disciplines (OECD, 2015a).

The OECD Global Science Forum (GSF) identified nine challenges related to data-driven and evidence-based research in social and economic sciences (OECD, 2013). Box 4.2 synthesises those findings.

copy the linklink copied!
Box 4.2. Challenges to data-driven and evidence-based research identified in social sciences

A. Infrastructure and skills (or lack thereof) at the country level:

A.1 Lack of data-management planning to make datasets available for reuse

A.2 Investments in the personnel and infrastructure needed for data creation and curation.

B. Legal and regulatory barriers/challenges at the country level:

B.1 Lack of information on what data exist and lack of adoption of international standards for data documentation

B.2 Individual privacy issues and absence of a recognised framework governing the use of personal data

B.3 Legal, cultural, language and proprietary rights barriers.

C. Researchers’ incentives and careers at the country level:

C.1 Incentives for researchers to ensure effective data sharing.

D. Data quality and characteristics:

D.1 Reliability, statistical validity and generalisability of different data sources

D.2 Need for greater harmonisation of social sciences data sources across countries

D.3 Increasing need for international co-ordination

D.4 Increasing need for interdisciplinary co-ordination for global challenges.

Source: Adapted from OECD (2013), “New data for understanding the human condition: International perspectives OECD Global Science Forum Report on Data and Research Infrastructure for the Social Sciences”,

In the OECD CSTP 2017 survey, respondents were asked to assess the relevance of each of these challenges, as well as the current policy effort related to those challenges (Figure 4.2).

The most relevant of all challenges is C1 – researcher’s incentives to ensure effective data sharing – but the consensus is that policy efforts to overcome that challenge are still weak. Cultural change is a long process. The perceived barriers and risks of enhanced access to data need to be counterbalanced by appropriate acknowledgement and reward systems. Data citation does not seem to have been widely implemented, and some respondents point out that prerequisites for it are still missing (such as data citation standards and metrics). Some countries also shared the view that open science should be embedded in evaluation systems, to ensure that researchers who provide high-quality research data (e.g. in Brazil, Canada, the European Commission, Japan and the Netherlands) are rewarded. Training in data literacy and data management is also an important aspect. Australia mentioned an interesting initiative: its Department of Employment organises an annual “GovHack” competition to reuse and remix government data, raising awareness and communicating about a “cultural shift” towards data stewardship and sharing throughout the research-data lifecycle.

copy the linklink copied!
Figure 4.2. Assessment of challenges related to data-driven and evidence-based research
Figure 4.2. Assessment of challenges related to data-driven and evidence-based research

Note: An average score was computed from the responses on a Likert scale: (1 = “none”; 2 = “slight”; 3 = “moderate”; 4 = “high”; 5 = “very high”). The errors bars show the statistical error on the mean score.

Source: OECD CSTP survey results from OECD and partner delegations.


The strongest policy effort goes into Challenge A1: the lack of data-management planning to make datasets available for reuse. Most governments (e.g. Australia, Canada, the Netherlands and Sweden) report they are addressing the issue by making recurrent research-funding contingent on data-sharing and data-management plans. They also quote adequate training in data-management planning as an important issue.

Another highly relevant challenge is Challenge A2: investments in the personnel and infrastructure needed for data creation and curation. An important initiative in this respect is the GO (Global Open) Findable, Accessible, Interoperable and Reusable (GO FAIR) initiative, led by Germany and the Netherlands, which is a proposed approach for establishing the European Open Science Cloud. The initiative rests on three pillars: i) GO CHANGE, to foster culture change, promote open science and establish reward systems; ii) GO TRAIN, to promote education and training; and (iii) GO BUILD, to build technical infrastructure (ZBW – Leibnitz Information Center for Economics, 24 January 2016).

Individual privacy issues (Challenge B2) also command high policy effort and are seen as highly relevant. Governments strive to ensure a balance between maximising data sharing while ensuring the privacy and security of information, particularly through “anonymised”, “non-sensitive” data. This issue is taken up in many projects and policies at the level of government as well as funding agencies and data centres. In an effort to harmonise data protection across Europe, the European Commission adopted the General Data Protection Regulation (GDPR) in 2016 (European Commission, 2016), which it enforced on 25 May 2018. There exist concerns that those stricter rules may have a negative effect an increasingly collaborative and data-intensive scientific-research sector.

Challenge B.3 – legal, cultural, language and proprietary rights barriers – is equally relevant, but has received slightly less policy effort, although some countries have modified copyright law accordingly. Respondents point to the necessity of clarifying and addressing the legal uncertainty of open access to research data, as well as the correct legal implementation of FAIR principles. Issues of ownership should also be addressed, particularly where institutions have created services and resources. Australia points to the necessity of harmonising legislation across data custodians, which often operate under varying legal frameworks governing the collection and use of sensitive data.

Challenge D 2 – greater international harmonisation of data sources across countries – is still highly relevant, but receiving only moderate policy effort. Respondents reported addressing the issue within the Research Data Alliance (RDA), as well as by applying FAIR principles, which should be the future reference for data access technical standards. They agreed that more needs to be done in this respect.

Challenge D 1 – reliability, statistical validity and generalisability of different data sources – has been on the receiving end of even less policy effort, despite its importance. The respondent from the European Commission proposed implementing an accreditation or certification mechanism based on agreed processes to ensure FAIR compliance, as well as establishing an accreditation or certification body to maintain an up-to-date, accessible catalogue of certified repositories.

Finally, the group of less-relevant challenges includes:

  • D3: international co-ordination: countries report participation in the RDA, the Committee on Data (CODATA) of the International Council for Science, the Document, Discover and Interoperate Alliance, and networks of repositories (e.g. the Confederation of Open Access Repositories LA Referencia in Latin America) to work on this issue. Such co-ordination should serve to define global standards for implementing FAIR principles.

  • D4: increasing need for interdisciplinary co-ordination for global challenges: respondents propose establishing cross-disciplinary agreements and protocols, inspired directly by relevant domain-specific needs, that will lead to specific standards. Variations across scientific disciplines, and their specific efforts to make research data open and FAIR should be respected. Developing best-practice interdisciplinary co-ordination projects could help address global challenges.

  • B1: lack of information on what data exist and lack of adoption of international standards for data documentation: respondents recommend raising awareness of RDA standards, better implementation of data-management plans and conducting a landscape analysis of data repositories. The respondent from the European Commission proposes creating catalogues for datasets, services and standards, based on machine-readable metadata and identifiable by a common and persistent identification mechanism that will make research data findable.

Respondents were also asked to provide additional challenges not covered in Box 4.2. Some of the challenges quoted include:

  • Measurement of the status quo of data access: general and specific indicators need to be established to measure sharing and reuse of data. Such measurement would: i) demonstrate the value added of enhanced access to data; ii) provide a basis for acknowledging and rewarding the researchers and institutions involved; and iii) help monitor the quality and sustainability of the datasets.

  • Large infrastructure solutions to address big data nationally and internationally, with adequate governance arrangements: existing physical infrastructures need to be strengthened and new ones created to accommodate rapidly growing needs in terms of big data. Repositories featuring tools for publishing datasets are preferable to read-only portals.

  • Data reuse, data portability and interoperability: physical infrastructures need to be complemented by internationally accepted and agreed standards, which need to be widely disseminated. An overarching recommendation on enhanced access to data could pave the way towards a more uniform political vision of these issues to trigger the needed action at the national level.

  • Funding models: responsibility for data curation is implicitly transferred to the researcher’s home institution, which may not have appropriate repositories for the specific data type. This calls for establishing mechanisms for cross-institutional and cross-border use and compensating the costs involved. Some respondents consider that publishers providing access to data at a cost is problematic.

  • Cost-benefit analysis and priority setting of enhanced access to data: scarcity of resources implies that not all data can be made openly accessible in the short term. Hence, efforts should focus on enhanced access to the data that are most likely to provide impact, to the extent that such impact can effectively be predicted (which is not always the case). Justifying investments in infrastructure needs to be based on the value expected from enhanced access to data, which is directly related to the issue of measurement and funding models. Another related issue is selecting data for long-term preservation; this involves complex decisions about what constitutes priority data, as well as the data-preservation timespan (some research communities use data that may be centuries old).

  • Operationalise FAIR principles in a pragmatic and technology-neutral way, encompassing equally all four FAIR dimensions: FAIR principles should be applied to all digital research objects, including data-related algorithms, tools, workflows, protocols and services. Interoperable registries of FAIR data resources should allow one to build portals to data relevant to various user needs. FAIR principles should be promoted, and the associated FAIR services should be maintained sustainably.

  • Statistical and methodological training in use and interpretation of data, data management, and training for data standards: increasing data access and enhancing the impacts from data will require new skills. Delivering these skills will require actions from policymakers, data producers and users, and higher education institutions in the form of co-operation and partnerships, training, new education programmes and curricula – and possibly digital learning and massive open online courses. The required skills are not only technical: they include a wide range of skills in statistics, computer science, information science (e.g. for data librarians), law and other social sciences. Many countries report limited curricula to meet those skill needs.

  • Building trust between all stakeholders, e.g. scientific communities, e-infrastructures, research infrastructures and funders, to “look outside the organisational boxes and work together”: respondents suggested integrating open and FAIR access to research data in the wider context of open science (for example, the Dutch National Plan for Open Science interconnects open access, open data and reward systems).

  • Data ownership and control: some publishers require that researchers hand over the data supporting the published article. Others offer platforms facilitating the research process, where all research elements – including annotations, methods, data and publications – can be disseminated. In the short term, this is a positive development that enhances data accessibility, but there is no guarantee that it will always be freely available.

  • Accessibility to content mining: to the availability of research data through privately held platforms means their proprietors frequently hinder automated content mining, with the justification that these platforms present technical limitations.

  • Integrating enhanced access to data within a broader open-science approach, including citizen science, with citizens as both providers and users of data: as a debt to taxpayers and citizens, respondents proposed transposing primary research data and scientific information into simpler representation to make it intelligible to the general public.

copy the linklink copied!Synthesis of policy gaps in promoting enhanced access to data for STI

This section builds on previous analysis to synthesise the policy gaps identified as the most critical to promoting enhanced access to finance in STI, as follows:

  • balancing the potential public benefits and risks of sharing – addressing privacy, confidentiality, quality and ethical issues

  • technical standards and practices

  • recognition and reward systems for data authors

  • definition of responsibility and ownership

  • business models for open-data provision

  • building human capital

  • exchange of sensitive data across borders.

Balancing the potential public benefits and risks of sharing: Addressing privacy, confidentiality, quality and ethical issues

The common heading of “balancing the potential public benefits and risks of sharing” – i.e. specifically addressing privacy, confidentiality, quality and ethical issues, was also a strong focus of discussions at the joint CSTP-GSF workshop “Towards New Principles for Enhanced Access to Public Data for Science, Technology and Innovation”, held at the OECD headquarters in Paris on 13 March 2018. The workshop gathered 30 experts from government, the private sector, data repositories, academia, non-governmental organisations, international data networks and libraries (OECD, 2018b).

Balancing the potential public benefits and risks of sharing research data is a critical issue for data governance. Sound data governance is required to ensure trust from both data providers and users, and promote a culture of sharing, with the aim of making data “as open as possible and as closed as necessary”.

Sharing data presents multiple risks, related to: i) individual privacy (e.g. in the case of clinical research data); ii) misuse (e.g. data about rare and endangered species, or rare minerals); iii) misinterpretation (particularly as concerns datasets of uncertain quality, and/or lacking the appropriate metadata); and iv) national security (e.g. data from research with potential military applications).1 More granular data often have higher potential research value, with a concurrent increase in risk.

Providing access to personal or human-subject data is a particular challenge (OECD, 2013). Although anonymisation techniques can remove personally identifiable information from individual datasets, true anonymisation becomes very difficult as more and more data from different sources are integrated (President’s Council of Advisors on Science and Technology, 2014). Moreover, the research value of personal data often stems from the ability to link them back to individual characteristics. In the United Kingdom, linking information from hospitals with the cancer-data repository and data from various screening programmes has made it possible to recommend changes in medical protocols that are likely to improve cancer survival rates. Rules and laws can be a disincentive to breaching anonymity, but the financial incentives to do so can be high in certain industries, and legal regimes are very difficult to implement across national jurisdictions.

Alongside anonymisation, informed consent is the second pillar underlying the use of personal data in research. Consent is a right recognised in many countries and enshrined in legislation, such as the recent GDPR2 (European Commission, 2016). However, situations exist where consent to use data for specific research purposes is impossible or impractical to obtain, particularly if these purposes were not envisaged when the data were originally collected. For example, when analysing new forms of data from social networks in ways the collector had not anticipated, it might be unfeasible to go back to all the individuals concerned to ask for consent. Notably, the GDPR makes exceptions for the use of data in research, where consent is one consideration, but is not prescribed as the legal basis for data use. Recent OECD work on the subject stressed the need for properly constituted independent ethics review bodies, outlining their role in evaluating applications to access publicly funded personal data for research purposes and building trust (OECD, 2016). This recent work also emphasised the importance of public engagement in defining norms on the use of personal data in research. The approach adopted by the Australian Government, which aims to achieve value creation with open data while transparently managing risk, is one example of such an approach (Box 4.3).

copy the linklink copied!
Box 4.3. In my view: Trust is the key to unlocking data

The Hon. Michael Keenan MP, Minister for Human Services and Digital Transformation, Australian Government

Data are the fuel powering our new digital economy. However, news of data breaches and misuse of personal information erodes trust and leads the public to believe that data are bad or something to be feared.

If these negative perceptions become entrenched, we risk missing out on the enormous opportunities and benefits data offer to improve people’s lives, help grow the economy and become more successful as a nation.

As a government, we have a responsibility to use data to make the best possible decisions to improve people’s lives. In May 2018, the Australian Government announced reforms to simplify the way public data can be shared and used, and clarify accountabilities around the management of data. These reforms are made up of four components:

  • a Consumer Data Right, to give Australians greater access and control over their data, to enable them to get a better deal from their bank, energy and telecommunications companies

  • a National Data Commissioner, to manage the integrity and improve how the Australian Government manages and uses data

  • a new National Data Advisory Council, to provide advice on ethical data use, technical best practice, and industry and international developments

  • the Data Sharing and Release Act, to improve the use and reuse of data, while strengthening security and privacy protections for personal and sensitive data.

These reforms represent a tremendous opportunity to unlock national productivity. However, we will only seize this opportunity if public data are used in a safe and transparent manner, and citizens trust their privacy and security are valued and protected at all times.

To achieve that, we are working hard to secure the trust of the public at the core of our reforms.

This is the only way we can ensure the benefits of data and insights are driving effective outcomes for all people and organisations and indeed, for the entire economy and society.

Data are the fuel of growth, and trust is the key that will enable us to get ahead.

If the full benefits of enhanced access to data are to be realised, trust is required at multiple levels – not just as it relates to personal data. Power relations between individuals, institutions and countries are a critical component of trust, and need to be considered when developing data-access policies. The reality is that open research data can be more readily exploited by more advanced companies, institutions and countries that master the technology and algorithms needed to analyse extract value from the data. Less empowered stakeholders can easily be reduced to simple data providers, while the (research and monetary) value is captured elsewhere.

In order to secure public trust and accountability, the socio-economic impacts of open research data need to be monitored. Over time, such impact assessments should help society evaluate the value of open-data initiatives. The 2006 OECD Recommendation suggested considering a few core aspects for external evaluation, including overall public investments, the management performance of data collection, and the extent to which existing datasets are used and reused (OECD, 2006). This provides useful starting guidance. Nevertheless, it must be noted that such assessments are quite challenging to implement, since the methodologies are not yet well-developed and standardised. Data integration is another major opportunity. For example, New Zealand’s Integrated Data Infrastructure3 allows registered researchers to access microdata about people and households, including data on education, income and work, benefits and social services, population, health, justice and housing. Such an integrated dataset enables social-science research on issues such as the life outcomes of socially disadvantaged groups, linking their educational attainment to income, health and crime outcomes.

Balancing the potential public benefits and risks of sharing: Cross-cutting learnings from the case studies

Balancing the potential public benefits and risks of sharing is a cross-cutting theme among the 17 case studies contributed by OECD member countries and partner economies in 2018, illustrating policy practice supporting enhanced access to data for STI.

The case studies address this central issue through a consensus approach that data be “as open as possible, as closed as necessary”. In those cases where the default is set to open, such as in the European Union, France and Slovenia, clear alternatives for opt-out are provided, on the condition that well-articulated reasons for opting out are formulated (European Commission, 2018; French Ministry of Higher Education, Research and Innovation, 2018; Tramte, 2018). The idea is not to put pressure on researchers to open data at all costs, but to make them think about what justifies not opening the data, and make the data accessible when no such justification exists. This is sound practice in the research-data management cycle. Although it is not yet anchored in the scientific community, it is becoming a requirement, e.g. for submitting Horizon 2020 proposals.

Some of the case studies report the development of specific agreements between data producers and users that allow the data producers to control the degree to which they share their data. At the Korea Research Institute of Chemical Technology chemical library, for example, data producers decide the degree of openness of their data (Shin, 2018). In the case of national repositories and portals, such as in Mexico, each participating institution signs an agreement that determines the degree of openness and the governance model for trust, privacy, confidentiality and ethical issues (CONACYT, 2018). In the case of Argentina’s Science and Technology Information Portal, each participating institution also signs a co-operation agreement (Luchilo, D’Onofrio and Tignino, 2018).

The UK Concordat (UK Research and Innovation, 2016) addresses the issue through specific principles:

  • Principle 2: there are sound reasons why the openness of research data may need to be restricted, but any restrictions must be justified and justifiable.

  • Principle 5: use of others’ data should always conform to legal, ethical and regulatory frameworks, including appropriate acknowledgement.

The Concordat further specifies: “Individual researchers are responsible for compliance with ethical, legal and professional frameworks, while it is the role of employers to support researchers in this through clear policies, awareness raising and providing clear advice and guidance” (Bruce, 2018).

In Canada, all heads of the government institutions are responsible for the effective, well-coordinated, and protective management of personal information in accordance with the Privacy Act and Privacy Regulations within their institutions (Treasury Board of Canada Secretariat, Open Government Team, 2018). The Canadian federal Privacy Act specifically addresses privacy protection and the right to access as follows:

  • “Personal information may be collected only when it relates directly to an operating program or activity of the institution;

  • Personal information must be collected directly from the person to whom it relates, with limited exceptions;

  • Individuals have a right to their own personal information with limited and specific exemptions; and

  • Restrictions are placed on the use and/or disclosure of personal information, subject to limited exceptions.”

The Swedish case study, “Infrastructures for Register-based Research – a government commission to the Swedish Research Council” (Eriksson and Nilsson, 2018), addresses one of the most challenging issues: providing simultaneous access to two or more sensitive datasets for cross-disciplinary research. Clinical data, for example, can be combined with population statistics containing sensitive personal information, such as country of birth and citizenship. It is easy to see the value of legitimate and ethical research projects, as well as the potential risks for malevolent use. Since such research multiplies the risk of identifying individuals, researchers currently need to follow a long and complex protocol (ethical approval, harms test to evaluate sensitivity of the data, definition of scope and post-disclosure protection measures). Such a process can last many months – or even years – before the researcher can access the data. Infrastructures for Register-based Research aims to streamline this process and improve access to such sensitive data, while preserving the interests of the data providers (individual citizens). A key component is separating metadata and semantics from the sensitive data (Figure 4.3). This allows a rich dialogue between the register holder and the researcher, to define exact needs met by using non-sensitive metadata before granting access to any sensitive data.

copy the linklink copied!
Figure 4.3. Sweden: Separating metadata and semantics from sensitive data
Figure 4.3. Sweden: Separating metadata and semantics from sensitive data

Source: Eriksson and Nilsson (2018), OECD Case study report RUT,

France has a strong ethical tradition related to the treatment of personal data, dating back to the 1978 Law on Information Technologies and Freedom, and the National Commission in Information Technologies and Freedom, whose role is to raise awareness about legal rights and obligations when dealing with personal information. This tradition is now being reinforced by the creation of a national Chief Data Officer, charged with orchestrating the circulation and reuse of public-sector data, with the goal of stimulating research and innovation while protecting the personal data and secrets safeguarded by the law (French Ministry of Higher Education, Research and Innovation, French Ministry of Higher Education, Research and Innovation, 2018).

Technical standards and practices

Technical aspects, such as dealing with discoverability/findability, machine readability and data standards, are another recurring theme in the survey findings, the workshop and the case studies.

As the volume and variety of research data increase, the resources required by data providers to make their data available, and the time invested by users to discover available data, also increase proportionally (OECD, 2015a). Insufficient information exists on what data are available, both for and from research. When data can be found, they are not always usable, because they do not conform to standards, lack metadata or are not machine-readable.

At the national scale, a large variety of institutional and domain-specific data catalogues, search engines and repositories are being established to enhance data findability (Boxes 4.1 and 4.2). At the international scale, increased efforts to co-ordinate and support global data networks are necessary (OECD, 2017) to provide the foundation for future open-science cloud initiatives that will facilitate data usage (Box 4.1).

Scientific publications are another major channel of discoverability. Many researchers first read about potentially interesting data in a journal article; the question is then how to gain access to that data. Persistent links should appear in published articles, which should also include a permanent identifier for the data, as well as the code and digital artefacts underpinning the published results. Data citation should be standard practice. Broken links or inadequate metadata are common challenges, especially as journals tend not to enforce data requirements for fear of losing good papers to competing journals. Several publishers have recently developed data journals, which can play an important role in promoting the use of published datasets. Links to data can also be included in standard publications when there are reliable sustainable services to deposit and curate data, which is already the case in several disciplines.

Formal standard-setting, through bodies such as the International Standards Organisation, is a slow iterative negotiation process that can last several years. As a result, proactive commercial or public players in a position of power can set de facto standards. One example is Google's General Transit Feed Specification, a common format for public transportation schedules and associated geographic information (OECD, 2018b).

The research community can turn this into an advantage if it takes the lead in developing appropriate standards and in so doing, consults fully with all concerned stakeholders. This is the approach taken by organisations that are helping to build the social and technical infrastructure to enable open sharing of data across national and disciplinary borders. For example, the RDA produces recommendations – which can be adopted as standards – on a broad range of issues related to interoperability, data citation, data catalogues or workflows for publishing research data (RDA, 2017).

Good metadata and the use of shared formats are essential for data interoperability and reuse. Provenance information tracks the history of a dataset and is an essential part of metadata, necessary to understand both the source of the information and the history of the dataset (it is also important for incentivising data access, as discussed in section below). In this regard, the Open Archival Information System (OAIS) reference model is of particular interest. OAIS was initially developed to archive data from space missions. It is designed to preserve information over the long term and disseminate it to a designated community, which should be able to understand the data independently, in the form in which it is preserved. OAIS covers the steps of ingesting, preserving and disseminating the data. It is universally accepted as the common language of digital preservation (Lavoie, 2014). An increasing number of repositories strive to be OAIS-compliant, which ensures the possibility of reusing data in the long term.

Discoverability/findability, machine readability and data standard: Cross-cutting learnings from the policy case studies

The Slovenian National Strategy for Open Access states: “Open research data has to be discoverable, accessible, assessable, intelligible, reusable, and, wherever possible, interoperable to specific quality standards.” (Tramte, 2018) The case studies frequently cite standards as a major element that structures the initiatives and needs to be further developed. One problem is standard fragmentation, making it difficult to select the relevant standard to be implemented. The Korean case study mentions this difficulty with respect to standards for genome data (Shin, 2018). The Argentine Science and Technology Information Portal uses several standards, including the Open Archives Initiative – Protocol for Metadata Harvesting (OAI-PMH), the Dublin core and Darwin core standards; for genome data, it uses GenBank and FAST (FIX Adapted for Streaming); further it is planning to introduce DataCite (Luchilo, D’Onofrio and Tignino, 2018). The case study does not specify whether such a combination of standards gives satisfactory results. The Mexican Open Institutional Repositories Programme reports using OpenAIRE for the technical framework, OAI-PMH for harvesting processes, and DublinCore and DataCite for metadata management (CONACYT, 2018).

Sweden studied several statistical metadata standards and frameworks before finally selecting the Generic Statistical Information Model (GSIM) (HLG on Modernisation of Statistics, 2013). GSIM is a standard that specifies which types of metadata from a register should be included to sufficiently describe its detailed contents, so that metadata – rather than the sensitive data themselves – can be used to engage in dialogue with researchers, thus protecting privacy and confidentiality (Eriksson and Nilsson, 2018).

France defines data quality through completeness, up-to-datedness and reliability. These are addressed through measurement of the delay between the occurrence of an event and its publication, the availability of the infrastructure (targeted at 99.5%) and the use of open standards, facilitating reuse. France addresses findability through ScanR, a dedicated search engine for science and innovation that allows searching among datasets from 35 000 public research institutions and private enterprises in France. The search engine enables simultaneous searches of publications, projects and patent databases (French Ministry of Higher Education, Research and Innovation, n.d.).

The UK Concordat addresses the issue through specific principles:

  • “Principle 7: data curation is vital to make data useful for others and for long-term preservation of data.”

    • This principle quotes adherence to community-specific data formats and standards as a possible (but not exclusive) avenue to curation. Non-proprietary formats are encouraged wherever possible. Where not possible, the proprietary software needed to process the research data should be indicated. Specialised search tools and catalogues are envisioned to enhance findability.

  • “Principle 8: data supporting publications should be accessible by the publication date and should be in a citeable form.”

    • “[…] The dataset should be citable in itself, for example through the use of persistent identifiers, such as Digital Object Identifiers (DOIs), to ensure clarity of which exact dataset is under discussion or examination.” (Bruce, 2018)

The Canadian Open Government Portal provides technical and policy guidance to individual departments and agencies, ensuring consistency, quality, accessibility and discoverability. These include: i) the Standard on Metadata;4 ii) Open Data Release Checklist within the Open Government Guidebook; iii) Open Data Registry and User Guide; iv) Standard on Geospatial Data; and v) the Open Government Guidebook (Treasury Board of Canada Secretariat, Open Government Team, 2018).

The Netherlands, together with Germany and France, initiated the GO FAIR initiative to create an environment where data are: i) findable: easily found by humans and machines alike; ii) accessible – as open as possible, as closed as necessary; iii) interoperable – datasets need to be combinable with other datasets; and iv) reusable – it must be possible to reuse data in future research projects and process these data further (Ministry of Education, Culture and Science, 2018). As the Colombian case study points out, however, the standard is a necessary support framework, but is not a strategy for building an initiative (Escobar, Hernández and Agudelo, 2018).

Recognition and reward systems for data authors

Data sharing entails cultural change among researchers in many scientific fields. Appropriate acknowledgement and reward systems need to counterbalance the perceived barriers and risks of enhancing access to data. The emphasis on competition in research, including the way in which it is evaluated and funded, can be a strong disincentive to openness and sharing.

Researchers have incentives to publish (preferably positive) scientific results. Incentives to publish data are less developed, and usually seen as a constraint imposed by funding agencies and/or publishers. Data citation has not been widely implemented. Although the prerequisites for achieving this (e.g. standard formats and citation metrics) already exist, they are not being broadly adopted. Data activities (including related to negative results) need to be embedded in evaluation systems, to ensure that researchers who provide high-quality research data are rewarded.

Despite the progress achieved, sharing of research data remains suboptimal. In a 2016 OECD pilot survey of scientific authors, only 20% to 25% of corresponding authors had been asked to share data after publication. If asked, a significant share (30% to 50%) said they would grant access to the data, or at least undertake steps to grant it; about 30% of authors said they would seek to clarify the request. Depending on the discipline, 10% to 20% of authors would refuse to share data on legal grounds (Boselli and Galindo-Rueda, 2016). Authors of scientific papers are more reluctant to share their data openly than to access data from other research groups (Elsevier and CSTS, 2017).

There seems to be weakness of demand for data reuse as well. A recent study in the health sector revealed a low degree of awareness of open data (Martin, Helbig and Birkhead, 2015). At the same time, it also showed the limited usefulness of open anonymised data where researchers need individually identifiable data, indicating a need for tiered data release (as discussed in Chapter 1).

The Transparency and Openness Promotion (TOP) guidelines (TOP, 2014) recognise data citation as one of the levers for incentivising data sharing. They propose making data citation mandatory, as well as citing and referencing all datasets and codes used in a publication with a digital object identifier (TOP, 2014). The adoption by researchers of unique digital identifiers for researchers, such as the Open Researcher and Contributor ID, is also important in this context, as it would greatly simplify provenance mapping and data citation.

Adopting data citation as a standard practice, so that it can be used to incentivise and reward data sharing, also requires developing appropriate metrics for data citation. These could then be used alongside other assessment measures – such as bibliometrics – in recruitment and evaluation processes (OECD, 2018b). The approach adopted by the National Science Foundation (NSF) in the United States is an interesting example: over the past decade, the NSF has implemented an incremental strategy for accessing research data. Since 2013, datasets and publications are treated equally as products in an individual researcher’s “biosketch”. In 2016, the NSF added to the proposal section a requirement to discuss evidence of research products (including data) and their availability in prior NSF-funded research. In France, the newly published national Open Science Plan (French Ministry of Higher Education, Research and Innovation, n.d.) has adopted similar principles, pleading for a more qualitative (rather than purely quantitative) approach to evaluating researchers. The Open Science Plan is based on the San Francisco Declaration on Research Assessment, which calls for a more holistic evaluation of scientists that considers all their research outputs, including data and software (DORA, 2012).

Although recognising data citation and data products (such as datasets and databases offering enhanced or open access) in academic evaluation processes may incentivise researchers, it does not necessarily value the critical contribution of data stewards. These are the people who curate and manage data, and ensure their long-term availability and usability. Career paths for this cohort of data professionals (which include both data scientists and researchers) are unclear. Mechanisms to assess their performance should be distinct from the evaluation mechanisms applied to researchers, but should be linked to the data they manage. New measures, incentives and reward systems will be required for data stewards.

Going forward, possible policy measures to incentivise and promote data sharing by researchers include:

  • developing new indicators/measures for data sharing, and incorporating them into institutional-assessment and individual researcher-evaluation processes

  • promoting the use of unique digital identifiers for individual researchers and datasets, to enable citation and accreditation

  • developing attractive career paths for data professionals, who are necessary to the long-term stewardship of research data and the provision of services.

Recognition and reward system for data authors: Cross-cutting learnings from the policy case studies

Most of the initiatives mention the very strong cultural barrier to sharing data among the researcher community. They cite insufficient skills for data management among researchers and insufficient resources to perform the additional workload of cleaning and curating the data for others to reuse. Above all, they highlight the perceptions of risks related to making the data available, including risks of scientific competition (another researcher acting more quickly to analyse and publish valuable results) and accrued risk to professional reputation (easier verification may increase the likelihood of uncovering errors in a researcher’s analysis). To overcome these barriers, specific recognition and reward systems are needed to create incentives for data sharing.

The only initiatives that mention the inclusion of data sharing as a potential criterion for assessing scientists are the National Open Access Strategy in Slovenia, the National Plan Open Science (NPOS) in the Netherlands, and the UK Concordat:

  • The Slovenian strategy states that: “research data that has undergone the scientific judgement and has been as such deposited at the authorised data centre is recognised as a scientific publication in the evaluation of the results of the programme or project”. The action plan (Activity VI.1) envisages recognition of research data in research evaluation (Tramte, 2018).

  • The Netherlands’ NPOS (Signatories of the NPOS, 2017) states that: “[o]pen science invites a broader set of evaluation criteria than just research output and research quality, including, for example, the quality of education, valorisation, leadership and good data stewardship". Moreover, the Platform for Open Science has a working group on “Researcher recognition and rewarding”. The group has issued the following recommendations: i) include (realised and expected) contributions to open science as selection criteria when hiring new researchers and support staff; ii) incorporate open science into policies on the development, support, rewarding and appreciation of scientific staff; and iii) ensure that assessment of research proposals incorporates positive rewarding of a researcher or research group’s open-science track record (open-access publication, FAIR data sharing, engaging societal stakeholders); and train reviewers accordingly (Ministry of Education, Culture and Science, 2018).

  • In the UK Concordat (UK Research and Innovation, 2016), Principle # 5 calls for appropriate acknowledgement when using others’ data, notably: “production of open research data should be acknowledged formally as a legitimate output of the research process and should be recognised as such by employers, research funders and others in contributing to an individual’s professional profile in relation to promotion, research assessment and research-funding decisions. Such formal recognition should be accompanied by the development and use of responsible metrics that allow the collection and tracking of data use and impact. In general, data citations should be accorded appropriate importance in the scholarly record relative to citations of other research objects, such as publications.” (Bruce, 2018)

None of the other initiatives mention any rewards for researchers who contribute access to quality data. The Canadian Open Government Portal includes an automatic dataset format rating system (out of five “stars”) for each dataset resource, based on Tim Berners-Lee five-star deployment scheme for Open Data. Users are also able to rate datasets with a five maple leaf rating system. However this user rating system is not for evaluation or reward, and specific parameters are not provided to users for what they are evaluating (Treasury Board of Canada Secretariat, Open Government Team, 2018). The Korean case study explicitly states that such incentives do not exist in Korean policy making (Shin, 2018). In Mexico, the funding agency focuses on institutional rather than individual incentives; hence, grants are conditioned on the inclusion of a target number of databases in the National Repository at the end of the project (CONACYT, 2018). Conversely, the Argentine Science and Technology Information Portal reports the existence of sanctions for researchers who do not comply with the open-access policies (Luchilo, D’Onofrio and Tignino, 2018).

The French case study mentions the development of data papers as a formal vehicle for data sharing (French Ministry of Higher Education, Research and Innovation, 2018). Such data papers generate bibliography and citations in the traditional sense, and can contribute to a researchers’ evaluation.

Definition of responsibility and ownership

Issues of ownership and responsibility – including IPRs – need to be considered when enhancing access to public-research data, as they can have important implications for how – and by whom – data can be used, while ensuring full respect for the rights of data owners. Data creators may not necessarily hold the ownership of the data they collect: in the case of human-subject data, for example, the participants themselves may hold those rights.

Most saliently, any IPR associated with research data, and the licensing arrangements for the use of that data, must be clearly specified. In the absence of such specification, data acquire the statutory IPR of the jurisdiction in which they are used. This may include copyright and sui generis5 database rights (e.g. in Europe), as well as local laws addressing confidentiality, privacy, trade secrets, patents and competition law which can inhibit the further use of data. Such protections arise automatically, unless expressly excluded, waived or modified (Doldirina et al., 2018). An example of how in IP and data-management policies can promote open-data practice is discussed in Box 4.4 on the example of University of Capetown.

copy the linklink copied!
Box 4.4. In my view: Greater clarity in IP and data-management policies can promote open-data practice

Michelle Willmers, Curation and Dissemination Manager of the Global South Research on Open Educational Resources for Development (ROER4D) project, University of Cape Town (UCT), South Africa

The ability of researchers to legally share outputs arising from their work is dictated by institutional IP policies, which are in turn largely influenced by national copyright acts. In the African context, many universities have nascent policy environments, meaning that they may not have an IP policy, or it is out of date and inadequate to cover the intricacies of online content sharing – particularly as it relates to open-data transfer and publication. There are also instances in which policy environments provide conflicting or contradictory stipulations. This situation makes for confusion on the part of academics in terms of what their actual rights are in the context of data sharing; in some cases, it may lead to flagrant disregard for policies and mandates.

Both the IP Policy and the Research Data Management Policy of UCT state that research data are owned by UCT, unless otherwise agreed in research contracts. This may lead many academics to assume they do not have the legal rights to share their data, which is not the case. UCT promotes the use of Creative Commons licensing in its IP policy, and has a concerted campaign underway to promote responsible data sharing at all levels of the academic enterprise.

Possible confusion in this regard is compounded by the fact that the institutional terms of deposit for sharing data in repositories state that: “UCT grants the Principal Investigator (PI) of a research project the right to upload UCT research data supporting a publication required by a journal publisher or a funder and all UCT project data where this is a specific funder requirement, as long as the data complies with any ethics requirements (e.g. patient confidentiality, consent, etc.)”.

This caveat raises questions about the rights of academics who are not operating in research contexts led by PIs, or are functioning in a context where there is no publisher or funder requirement in this regard. The fact that the caveat only exists on a website designed to promote data sharing and is not captured in any of the formal institutional policies regulating data sharing makes the institutional open-data policy landscape confusing for academics to navigate and may serve to build reluctance and confusion, rather than promote a culture of sharing where academics are certain of their legal rights.

Grant agreements and repository deposit terms do increasingly provide exceptions and caveats to restrictive or confusing IP policies, but these agreements are often not adequately scrutinised by academics, and the lack of cohesion between institutional policies, the dictates of funding entities and the intricacies of repository terms and conditions can ultimately amplify the distrust of – and therefore the reluctance to engage with – open-data practice.

National and regional initiatives to assess and revise institutional IP policies so that they are conducive to open-data sharing and form part of a set of clear, cohesive institutional stipulations would be extremely valuable in terms of promoting open-data practice, and ensuring a functional understanding of the legal and ethical aspects of the process – the uncertainty of which often inhibits academics’ practice in this regard.

Legislation and other rules for managing research data are not harmonised across organisations and countries. Data custodians often operate under various legal frameworks governing the collection and use of research data (e.g. Box 4 on South Africa). In the United States, different research-funding agencies have different IPR policies. For example, the Department of Defense has an “open-by-default” policy, while Department of Energy has a cost/benefit analysis approach to data sharing assorted with a mandatory Data Management Plan, and Defense Advanced Research Projects Agency has no identified data sharing policy (EARTO, 2016). In the European Union, copyright can be claimed on data that may not be copyrightable in other jurisdictions (such as the United States),6 with implications for the use of text and data mining in research. According to a study by Hargreaves (Hargreaves, 2011) of the UK IP framework, “[c]opyright, once the exclusive concern of authors and their publishers, is today preventing medical researchers studying data and text in pursuit of new treatments”.

Considering that one of the main drivers for enhanced access data is to improve knowledge transfer and innovation, tensions between public and private-sector actors over access to research data are a concern. Enormous potential exists for combining public-research data with private-sector data (including derived from social media). Achieving this, however, requires IPR and/or licensing arrangements guaranteeing both adequate protection of legitimate commercial interests, and the openness and transparency necessary to promote reproducibility and public confidence (OECD, 2016). The OECD Recommendation of the Council concerning Access to Research Data from Public Funding (OECD, 2006) states:

  • Under Principle D – Legal Conformity: “Data-access arrangements should respect the legal rights and legitimate interests of all stakeholders in the public research enterprise. Access to, and use of, certain research data will necessarily be limited by various types of legal requirements”.

  • Under Principle E – Protection of intellectual property: “Consideration should be given to measures that promote non-commercial access and use while protecting commercial interests, such as delayed or partial release of such data or the voluntary adoption of licensing mechanisms.”

Definition of responsibility and ownership: Cross-cutting learnings from the policy case studies

Several case studies touch upon the issue of responsibility and ownership.

By default, French law does not allow public administration to protect IP, and thus avoids creating a barrier (unique right of data producer) to free reuse of data. An exception is made for data produced in the context of a state-owned industrial or commercial activity in competitive markets (French Ministry of Higher Education, Research and Innovation, 2018).

In the United Kingdom, the “Research Council common principles on data” state that: “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner” (Bruce, 2018).

In addition, the UK case study notes that data-licensing agreements can make it complex to open up research data, creating legitimate and genuine difficulties for researchers and research organisations.

The ownership and responsibility of resources on the Canadian Open Government portal remain with the publishing department or agency. Resources published on the portal are freely reusable under the Open Government License (Treasury Board of Canada Secretariat, Open Government Team, 2018).

The Korean case study presents several use cases and approaches in different disciplines. While there exists no specific policy on “ownership” of data in the chemical library, materials data is opened only after publications and patents are released; for catalyst data, the Korean Institute of Science and Technology allows data creators to withhold information until they achieve their objectives (Shin, 2018).

In Mexico, research institutions preserve “ownership” of the data, while the national registry only aggregates registries (rather than datasets) (CONACYT, 2018). However, a survey among researchers in Mexico identified the most important barriers to data sharing as “the significant lack of knowledge about copyright and publishing policies, and the unawareness of the benefits provided by a more open research model” (Rodriguez, 2016). Argentina recommends relying on Creative Commons licences for the reuse of data (Luchilo, D’Onofrio and Tignino, 2018).

Business models for providing enhanced access to data

“Enhanced” – or even “open access” – does not necessarily imply “free of charge”. However, many experts agree that public-research data should ideally be free at the point of usage, as discussed during the joint CSTP-GSF workshop “Towards New Principles for Enhanced Access to Public Data for Science, Technology and Innovation”, held at the OECD headquarters in Paris on 13 March 2018 (OECD, 2018b), implying that the costs of stewardship and provision are assimilated by the data provider or repository. These costs can be substantial and require long-term financial commitment, often over several decades. Ultimately, most of the funding for open research data is likely to come from the public purse, although alternative revenue streams exist for some types of data (OECD, 2017). A key question from the science policy or funder perspective is how best to allocate this funding. The answer depends on deriving a full understanding of the business models and value propositions of specific data repositories, and the networks in which they are integrated (Figure 4.4).

Such an understanding requires considering multiple factors, including the role of the repository, and national and domain contexts; the repository's development or lifecycle phase; the characteristics of the user community; and the data product required by this community (influencing the level of investment necessary to curate and enhance the data). Business models are constrained by – and need to be aligned with – policy regulation (mandates) and incentives (including funding) (OECD, 2017).

Many different kinds of data repositories provide a large variety of services, ranging from raw data to complex online analyses. Institutional repositories, national repositories, domain-specific repositories and international repositories are all components of a complex landscape. This landscape is constantly changing as valuable new data resources arise from projects and transition into longer-term sustainable infrastructures, with longer-term funding requirements. At the level of the individual research system, potential economies of scale can be obtained by centralising or federating the management of data resources; this is common practice in some fields. However, not all data can be transferred across institutional or national boundaries, for legal, proprietary or ethical reasons. Moreover, a certain amount of redundancy in the system can also present some advantages, by making it more resilient. Federated networks can provide some benefits of scale, while respecting diversity (OECD, 2017).

Even when business models are well-developed and long-term funding is identified, there exist limits on how data repositories can operate to provide FAIR access to increasing volumes of data. Priorities need to be established and choices made, e.g. between providing immediate online access or putting data into deep storage. In the case of very big data from experimental facilities (such as the Square Kilometre Array telescope), it is impossible to provide open online access to all users; thus, tiered access systems have been developed. Prioritisation and data selection will be an increasingly significant challenge in the future. Addressing this challenge will require dialogue between repositories on one hand and data providers and users on the other, as well as more systematic cost-benefit analyses (bearing in mind that data that may have little value today can be very valuable tomorrow, and today’s users may be different tomorrow).

copy the linklink copied!
Figure 4.4. Creating a value proposition for data repositories
Figure 4.4. Creating a value proposition for data repositories

Source: OECD (2017), “Business models for sustainable research data repositories”, OECD Science, Technology and Policy Papers, No. 47,

Research-data repositories and services can also be developed as public-private partnerships. Some private companies are opening their data for non-monetary gain (e.g. for recruiting, improving their image or exchanging data). For instance, medical researchers may want to combine data about people’s medical history, genomics, food intake and mobility. Here, medical and genomic data may come from the public sector, whereas mobility and food data might depend on access to private-sector data. Provided IPR and ethical issues can be agreed, public-private partnerships built around such themes should be encouraged, as they can drive the development of data infrastructure and value-added services. The governance arrangements of such public-private partnerships need to be carefully designed to promote trust among all stakeholders, ensuring transparency and accountability (OECD, 2016).

Business models for providing enhanced access: Cross-cutting learnings from the policy case studies

Financing of data sharing is significant and long term, combining government, institutional and project funding. In Korea, the genomic data repository is government-funded; the chemical library is institutionally funded, complemented by project funding; and the materials pilot project runs on project funding (Shin, 2018). While the Korean pilot project on artificial intelligence data depends on project financing, public-private partnerships should be envisaged for the future, since the private sector values such data. The Colombian Biodiversity Information System, for its part, is financed by the Colombian National Ministry of Environment and Sustainable Development (Escobar, Hernández and Agudelo, 2018).

France considers data infrastructure entirely as a public investment that needs to be rendered sustainable (French Ministry of Higher Education, Research and Innovation, 2018). France is considering following the Danish example of centralised financing for data infrastructure through an interministerial platform. It also plans to establish a performance contract, with fixed targets for ministries. Finally, legal measures promoting enhanced access to data (complimentary access, reuse licences, open standards) will accelerate the data infrastructure build-up.

Principle 3 of the UK Concordat states: “Open access to research data carries a significant cost, which should be respected by all parties” (Bruce, 2018). The full text of the principle develops the business model:

  • “Whilst the benefits of open research data are real and achievable, the necessary costs – for IT infrastructure and services, administrative and specialist support staff, training and for researchers’ time – are significant. It is therefore vital that consideration of costs (both capital and recurrent) forms an important part of any obligation arising from the move to open research data recognising that such costs may fall outside of the defined time period of a particular project. Such costs should be proportionate to real benefits. It is recognised that the benefits and costs of open research data must be tensioned with those of the research portfolio as a whole.”

  • “It is UK policy that research organisations undertaking publically funded research are able to access resources for all legitimate costs through the so-called dual support system. It is therefore reasonable that appropriate costs of making research data open are met through those mechanisms whilst recognising the obligation to reduce costs through efficiency and sensible design of both obligations and infrastructure. All research-funding organisations that impose a requirement for open research data must do so in a manner that is consistent with available cost recovery mechanisms.”

  • “For research organisations such as Universities or Research Institutes, these costs are likely to be a prime consideration in the early stages of the move to making research data open – particularly where the required cost recovery mechanism is not yet in place. Both IT infrastructure costs and the ongoing costs of training for researchers and for specialist staff, such as data curation experts, are expected to be significant over time. Significant costs will also arise from Principle #10 regarding the undertaking of regular reviews of progress towards open access to research data. All of these costs must be balanced with the benefits to the research portfolio as a whole.”

The Canadian Directive on Open Government requires each department and agency to maximise the release of government information and data of business value. Each of the federal government’s Science-Based Departments and Agencies releases open data and/or information resources via Additionally, the Government of Canada has recently established the position of Chief Science Advisor of Canada. The Chief Science Advisor works to ensure that government science is fully available to the public and that government scientists can speak freely about their work. The resources on the portal are freely shared and useable under the Open Government Licence (Treasury Board of Canada Secretariat, Open Government Team, 2018).

Building human capital

Depending on the scientific domain, researchers normally have some training in data analysis, but often lack data-management skills. Users (who may be from different academic sectors, or from the private sector) do not always have the appropriate skills to interpret and analyse the data correctly. The effective operation of data repositories requires specialised skills in data curation and stewardship. Various other skills – related to ethical, legal and security concerns, as well as risk management, communication and design – should be included in any well-functioning open-data ecosystem. A lack of these skills breeds a lack of trust.

“Data science” and “data scientists” are overarching terms encompassing a wide range of skill needs. The National Institute of Standards and Technology Big Data Interoperability Framework (Volume 1)7 defines a data scientist as “a practitioner who has sufficient knowledge in the overlapping regimes of business needs, domain knowledge, analytical skills and software and systems engineering to manage the end-to-end processes in the data life cycle.” In reality, very few individuals in most scientific fields fit this definition and are leaders in each of these skill areas. Research increasingly depends on collaboration and co-operation between individuals with different data skillsets. Defining the needs and gaps for these skillsets in different scientific fields is an ongoing challenge.

Several detailed analyses exist of the data-skill requirements for science, e.g. the Data Science Framework developed by the EU-funded EDISON project8 (Figure 4.5).

copy the linklink copied!
Figure 4.5. Data skills
Figure 4.5. Data skills

Note: This diagram illustrates the main competence groups within data science, as defined in the EDISON project: data-science analytics, data-science engineering, and domain knowledge and expertise. Data management, including curation and long-term stewardship, is sometimes classified as part of data science or as a separate competence group. These various competences need to be integrated into the different aspects of the research process, from design to experimentation, analysis and reporting.

As regards data skills, different scientific domains are equipped to varying degrees. Traditionally data-intensive fields, such as experimental physics or astronomy, are generally well-positioned (although competition for data scientists from commercial actors is affecting recruitment and retention in academia). Other areas, such as medical research, have significant skill gaps. Moreover, the additional burden of curating and stewarding data to make it available for secondary use creates a human-resource challenge that cuts across all areas of science.

Identifying skill needs and gaps across different research domains is a necessary and challenging first step. Meeting these needs is an even greater challenge; it requires retraining existing personnel (e.g. retraining librarians and archivists to perform data-stewardship functions), and provide them with the relevant new skills, as well as providing new education and training opportunities for researchers and other professional research-data support roles. Many such initiatives are already underway; they yield considerable opportunities for mutual learning across countries and scientific domains.

Data scientists are in high demand in industry, with the result that academic research competes for the best talent. An urgent need exists to develop recognition and reward structures, as well as attractive career paths, for all the specialists needed to exploit the value of data derived from public research. As in other research areas, workforce diversity will be an important determinant of success, to be considered at the outset when developing human-resource strategies for the digital research age.

Building human capital: Cross-cutting learnings from the policy case studies

Most case studies mention human capital and skills, highlighting the support researchers need to comply with open-access requirements:

  • Slovenia’s Open Access Strategy stipulates that public research organisations should establish support mechanisms for researchers regarding compliance with open-data requirements (Tramte, 2018).

  • Belgium’s DMPonline focuses exclusively on such support, having developed a specific template for data-management planning (Laureys, 2018).

One objective of the Korean strategy is to provide education and training on data skills for data scientists/ engineers, and to hire data-management professionals (Shin, 2018). Argentina encourages training staff responsible for the institutional digital science-and-technology repositories (Luchilo, D’Onofrio and Tignino, 2018). Mexico mostly organises capacity-building through seminars and workshops (CONACYT, 2018).

The UK Concordat addresses human capital and institutional capabilities in Principle 9: “Support for the development of appropriate data skills is recognised as a responsibility for all stakeholders.” Based on the “recognition that curating, archiving, manipulating and analysing data requires a set of skills distinct from those utilised to collect, generate, or measure the data in the first place”, the Concordat calls upon research institutions to provide researcher-training opportunities in an organised and professional manner, with adequate funding from funding agencies. It further calls on institutions to ensure well-designed and sustainable career paths for data scientists in the realm of research-data management (Bruce, 2018).

Canada launched the Open Government Learning Hub9 in 2017 to provide guidance and resources to departments. Since July 2016, the Treasury Board of Canada Secretariat Open Government team has delivered 34 events on open government with approximately 1 800 learners from at least 26 federal organisations. Canada’s 2018-2020 National Action Plan on Open Government, includes a commitment to continue to promote and raise awareness and skills in the public service by continuing to build on the above (Treasury Board of Canada Secretariat, Open Government Team, 2018).

Exchange of sensitive data across borders

Data access and sharing should not be considered a “binary concept” opposing closed and open access to data. Rather, it is a continuum of openness, ranging from internal access and reuse only by the data holder (also known as closed access), through restricted (unilateral and multilateral) external access and sharing, to open access to the public (open data) as the extreme form of data sharing. Such a continuum makes it possible to address different risk-benefit trade-offs (OECD, 2015b). Low-sensitivity datasets will be candidates for open access, while more sensitive datasets can be shared on a more restricted basis with trusted and certified users.

Exchange of sensitive data across borders: Cross-cutting learnings from the policy case studies

The Swedish registry data are an example of restricted access to high-sensitivity data. Another example is the Secure Research Service provided by the UK Office of National Statistics, which provides certified researchers with access to sensitive datasets (Office of National Statistics UK, n.d.). This service provides two levels of sensitive datasets: i) very high-sensitivity “secure” datasets; and ii) intermediate-sensitivity “safeguarded data”, offered under end-user licence. The secure datasets are never released to the end user; rather, they can be consulted in a “Five Safes” framework (Table 1.2). This means that only approved researchers can access the data within a specific environment, analyse it without extracting the actual sensitive data and submit the results of their research for approval. Those results will be tested to determine whether they risk disclosure. If the results are considered “safe”, they will be authorised for use by the researcher; if they are considered “unsafe”, the researcher will need to devise a way of further anonymising the results.

UK legislation, notably the 2017 Digital Economy Act (Her Majesty’s Government, 2017), defines the different categories of personal information and the exact rules for sharing personal data, and enables the Secure Research Service to operate lawfully. However, such a service can only function on UK soil under UK legislation. It could never be provided to a researcher located in a different country, owing to a lack of international legal frameworks ensuring the same level of legal protection against misuse.

In the United States, the NIH ensures that human genomic data resulting from funded research is shared through a controlled-access mechanism, unless study participants have explicitly consented to sharing their data through unrestricted access mechanisms. Since 2007, more than 6 600 investigators from 46 countries have submitted 43 372 requests to access these data; approximately 63% of these requests were approved. On 1 November 2018, NIH updated its policy to allow unrestricted access to genomic summary results from most such studies after 1 May 2019. However, some study populations, such as those from isolated geographic regions or with rare or potentially stigmatising traits, may be made available only through restricted access (US Government, 2019).


Boselli, B. and F. Galindo-Rueda (2016), “Drivers and implications of scientific open access publishing: Findings from a pilot OECD international survey of scientific authors”, OECD Science, Technology and Industry Policy Papers, No. 33, OECD Publishing, Paris,

Bruce, R. (2018), “UK case study: The concordat on open research data”, case study for the OECD project on enhanced access to data,

CONACYT (2018), “Mexican open science policy – Case study: Open institutional repositories program”, case study for the OECD project on enhanced access to data,

Doldirina, C. et al. (2018), “Legal approaches for open access to research data”, Law ArXiv Papers,

DORA (2012), San Francisco Declaration on Research Assessment (DORA), (accessed on 7 July 2019).

EARTO (2016), “EARTO background note: Overview of US Federal Agencies data sharing policies”, (accessed on 11 March 2020).

Elsevier and CSTS (2017), Open Data: The Researcher Perspective, (accessed on 26 July 2019).

Eriksson, M. and M. Nilsson (2018), “OECD case study report RUT”, case study for the OECD project on enhanced access to data,

Escobar, D., A. Hernández and M. Agudelo (2018), “SiB Colombia – case study: Enhanced access to public data for science, technology and innovation”, case study for the OECD project on enhanced access to data,

European Commission (2018), “Case study of policy initiative for open access to research data: Horizon 2020 open research data (ORD) pilot and data management plan”, case study for the OECD project on enhanced access to data,

European Commission (2016), Regulation (EU) 2016/679 of the European Parliament and of the Council – of 27 April 2016 – on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation), (accessed on 13 September 2017).

French Ministry of Higher Education, Research and Innovation (n.d.), Explore the World of French Research and Innovation with ScanR, search engine, (accessed on 27 February 2020).

French Ministry of Higher Education, Research and Innovation (2018), “Ouverture des données publiques et de recherche”, case study for the OECD project on enhanced access to data,

Hargreaves, I. (2011), Digital Opportunity: A Review of Intellectual Property and Growth, (accessed on 4 July 2019).

Her Majesty’s Government (2017), “Digital Economy Act 2017”, United Kingdom, (accessed on 5 October 2019).

Laureys, E. (2018), “Belgian case study on open access to data: DMP Belgium consortium”, case study for the OECD project on enhanced access to data,

Lavoie, B. (2014), “The Open Archival Information System (OAIS) reference model: Introductory guide (2nd edition)”, Digital Preservation Coalition Watch Series, October, (accessed on 11 March 2020).

Luchilo, L., M. D’Onofrio and M. Tignino (2018), “Case study: The Argentine science and technology information portal”, case study for the OECD project on enhanced access to data,

Martin, E., N. Helbig and G. Birkhead (2015), “1) Opening health data: What do researchers want? Early experiences with New York’s Open Health Data Platform”, Journal of Public Health Management and Practice, Vol. 21/5, pp. E1-E7,

Netherlands Ministry of Education, Culture and Science (2018), “The Netherlands – National Plan Open Science (NPOS)”, case study for OECD project on enhanced access to data,

OECD (2018a), OECD Science, Technology and Innovation Outlook 2018, OECD Publishing, Paris,

OECD (2018b), “Enhanced Access to Publicly Funded Data for Science, Technology and Innovation”, webpage, OECD, Paris, (accessed on 9 January 2020).

OECD (2018c), OECD Expert Workshop on Enhanced Access to Data: Reconciling Risks and Benefits of Data Re-Use, webpage, OECD, Paris, (accessed on 11 March 2020).

OECD (2017), “Business models for sustainable research data repositories”, OECD Science, Technology and Industry Policy Papers, No. 47, OECD Publishing, Paris, (accessed on 9 March 2020).

OECD (2016), “Research ethics and new forms of data for social and economic research”, OECD Science, Technology and Industry Policy Papers, No. 34, OECD Publishing, Paris,

OECD (2015a), “Making open science a reality”, OECD Science, Technology and Industry Policy Papers, No. 25, OECD Publishing, Paris,

OECD (2015b), Data-Driven Innovation: Big Data for Growth and Well-Being, OECD Publishing, Paris,

OECD (2013), “New data for understanding the human condition: International perspectives”, OECD Global Science Forum Report on Data and Research Infrastructure for the Social Sciences, OECD, Paris, (accessed on 12 September 2019).

OECD (2006), Recommendation of the Council concerning Access to Research Data from Public Funding, OECD, Paris, (accessed on 27 February 2020).

Office of National Statistics UK (n.d.), “Secure research service”, webpage,

President’s Council of Advisors on Science and Technology (2014), “Report to the President: Big Data and Privacy: A Technological Perspective”, The White House, Washington DC, United States, (accessed on 27 June 2019).

RDA (2017), “All recommendations and outputs”, webpage, Research Data Alliance, (accessed on 25 September 2019).

Rodriguez, C.E. (2016), Encuesta a los investigadores en el SNI 2015. Módulo: Acceso abierto a la información científica, report, Foro Consultivo Científico y Tecnológico, A.C., Mexico City,

Shin, E. (2018), “Korean case report on enhanced access to research data”, case study for the OECD project on enhanced access to data,

Soete, L. (2016), “A sky without horizons. Reflections: 10 years after”, keynote presentation at the OECD Blue Sky Forum, Ghent, (accessed on 28 February 2020).

TOP Guidelines Committee (2014), Transparency and Openness Promotion (TOP) Guidelines, Open Science Framework, Centre for Open Science, (accessed on 9 March 2020).

Tramte, P. (2018), “Case study on research data management and openness in Slovenia”, case study for the OECD project on enhanced access to data,

Treasury Board of Canada Secretariat, Open Government Team (2018), “Case study: Canada’s Open Government Portal”, case study for the OECD project on enhanced access to data,

UNECE High-Level Group for the Modernisation of Official Statistics (HLG-MOS) (2013), Generic Statistical Information Model (GSIM) Specification – webpage,

UK Research and Innovation (2016), “Concordat on Open Research Data” case study for the OECD project on enhanced access to data, (accessed on 26 February 2020).

US Government (2019), “Public access to Federally funded research in the United States”, case study for the OECD project on enhanced access to data,

ZBW – Leibnitz Information Center for Economics (24 January 2016), “GO-FAIR: A member states-up strategy for the EOSC implementation”, ZBW MediaTalk blog, (accessed on 28 February 2020).


← 1. Risks related to intellectual property will be discussed in the section “Definition of responsibility and ownership” below.

← 2. Regulation 2016/679 defines “consent” of the data subject as “any freely given, specific, informed and unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to the data subject.”

← 3.

← 4. The metadata used on is aligned to the Government of Canada Standard on Metadata, but a specific Open Government Metadata Element Set was created, which is much more robust and comprehensive than the existing Standard. The Element Set is treated internally as official policy, and it enables Canada to have a robust and consolidated search for open data, as well as open information resources. Also, since Canada added microdata, those records were part of a pilot developed by google (Google Dataset Search), which consolidates open data records from various repositories to develop a true “federation” of open data.

← 5. “Sui generis” means “of its own kind” or unique.

← 6. The US Copyright Act applies only to the expression of the work, but does not extend to an idea, procedure, concept or discovery, while e.g. in Germany copyright protects the author in his intellectual and personal relationship to the work and in respect of the use of the work. A detailed discussion to be found in (Doldirina et al., 2018).

← 7.

← 8.

← 9.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2020

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at