copy the linklink copied!Chapter 11. Case Study 6. Digital innovations to facilitate farm level data analysis while preserving data confidentiality

The case study objective is to show how recent innovations such as “confidential computing” can improve access to farm-level data for agricultural and agri-environmental policy or research, while appropriately maintaining data confidentiality and security. While recognising that there are many relevant innovations around the globe, this case study provides examples drawn from the experience of Australia’s Commonwealth Scientific and Industrial Research Organisation (CSIRO), a world leader in these emerging technologies.


copy the linklink copied!Context: Farm-level data is crucial for policy analysis, but high confidentiality requirements limit the ability to use it

Micro-level agricultural data (for example, farm level or field level data) is needed for evaluating the effectiveness and efficiency of agricultural and agri-environmental policies. They also allow understanding of how policy impacts differ across dimensions such as location, production practices, industry or sector, socio-economic status.

Agricultural censuses and surveys conducted by or on behalf of government agencies have long been a key source of such data. Most countries, including OECD countries, have a long history of gathering such data and using it to underpin policy decisions. However, in general, authorising legislation or regulation which enable this data collection also impose strict confidentiality requirements on the public release of records which could (whether intrinsically or when combined with other data) identify individuals or individual businesses (farms).

In addition to such datasets, administrative data,1 usually gathered and held by government agencies, is an important source of information relevant for policy-making. Berg and Li (2015[1]) identify the following sources of administrative data for agriculture: soils information; crop insurance and subsidiary programmes; land registration and cadastral records; government regulations and monitoring programmes; private and non-governmental organizations and sources of operations; reporting systems (e.g. periodic crop condition reporting); and taxation data. Access to administrative data is often even more limited than access to farm level survey or census data.

Data confidentiality requirements are often cited in the literature as a limiting factor in micro-level agricultural and agri-environmental analysis (Martínez-Blanco et al., 2014[2]) (Tukker and Dietzenbacher, 2013[3]) (VanderZaag et al., 2013[4]). Further, researchers and analysts often need to be able to link different datasets in order to conduct policy-relevant analysis. In the agriculture context, one crucial type of linking is to tie data on physical characteristics (e.g. location-specific data on soil type, precipitation, temperature, proximity to water bodies, etc.) with data on economic characteristics (e.g. farm performance attributes such as farm profit, farm costs; subsidies received; input use, etc.). This linkage generally needs to occur at the farm or field level in order to evaluate policy microeconomic and environmental impacts (Jones et al., 2017[5]; Petsakos and Jayet, 2010[6]). Woodard (2016, p. 385[7]) sums up the situation: “Some work cannot be feasibly accomplished without being able to link together different databases at low levels of aggregation.” While confidentiality requirements for individual organisations or individual datasets are often the reason that desired linkages across datasets cannot be made, a range of other factors also contribute, including: the absence of common linking variables (which enable record matching) (Lubulwa et al., 2010[8]); high costs or lack of resources or expertise needed to perform the linkages (Hand, 2018[9]); and lack of interoperability between datasets (e.g. different definitions with no rule to “translate” definitions in one dataset to match up with another) (Hand, 2018[9]).

copy the linklink copied!Use of digital technologies to overcome the impasse

The problems

Efforts to increase the accessibility and reusability of agricultural micro data, and to link different sources of agricultural micro data, seek to address issues arising from information gaps and information asymmetries.2 However, in doing so, they create new issues. At a conceptual level, ethical and practical3 questions about the appropriate level of confidentiality or privacy protection for farmers (and other entities to which data refer) must be considered.4 This then presents additional technical issues about:

  • how to appropriately protect farmers from the misuse of data that pertains to them (a question with both competitiveness (economic) and ethical dimensions);

  • how to ensure farmers are able to exercise their right to privacy or confidentiality;

  • how to make datasets interoperable.

Researchers from the USDA’s ERS succinctly define the challenge:

In essence, the trade-offs involve the desire to get the highest return possible for substantial data collection costs and respondent burden to gather information necessary to produce official statistics and support economic research on one hand and the requirement to uphold the pledge of confidentiality and ensure the future participation of respondents. (Towe and Morehart, n.d., p. 2[10])

Digital solutions5

Existing approaches to improving data access and reuse while preserving confidentiality

Technology solutions have been developed over many years to enable more data to be available for use, such as anonymisation and data obfuscation techniques, and this activity continues today, with many exciting technologies for improved data availability on the horizon. Existing methods for protecting data include simple methods such as aggregation and suppression, such as only releasing data at postcode level and only with a sufficient number of counts. These methods can be augmented with perturbation methods,6 which protect tables released from unit level census data from re-identification attacks.7

There are a large number of proposed approaches to confidentialisation to facilitate data sharing for research while protecting privacy. All of these have been used in successful, large scale implementations in Australia and internationally (O’Keefe and Rubin, 2015[11]; Reiter and Kohnen, 2005[12]). Relevant arrangements include:

  • De-identified open data access – the analyst downloads the data directly (e.g. datasets accessible via the GODAN initiative8).

  • User agreements for offsite use (licensing), in which users are required to register with a custodian agency, and sign a user agreement, before receiving data to be analysed offsite.

  • Remote analysis systems, in which the analyst submits statistical queries through an interface, analyses are carried out on the original data in a secure environment and the user then receives the (confidentialised) results of the analyses.

  • Virtual Data Centres (VDCs), which are similar to remote analysis systems, except that the user has full access to the data, and are similar to on‐site data centres, except that access is over a secure link on the internet from the researcher’s institution. (e.g. the USDA-ERS data enclave platform provided by NORC,9 Australian Bureau of Statistics DataLab10). VDCs may also make use of containerisation, where the analyst can access the data in a limited way, on a secure platform through a containerised application (e.g. the SURE platform used by the Sax Institute11).

  • Secure, on‐site data centres, in which researchers access confidential data in secure, on‐site research data centres (e.g. the Secure Access Data Center, France12).

Each arrangement makes data available at a specified level of detail, where sensitive detail can be reduced by methods including:

  • Removal of identifying information.

  • Confidentialisation of the data by one of a range of methods, including aggregation, suppression or the addition of random “noise”.

  • Replacement of sensitive variables or data with synthetic (“made‐up”) data.

Unfortunately, with the exception of the open data approach, these mechanisms greatly restrict the number of people that can access the data, and the convenience of that access. Also, some techniques may reduce the value of data for policy analysis, for example by reducing the level of granularity, introducing bias into the dataset, or reducing the ability to link individual records in different datasets.

Recent technological advances: Confidential Computing, Multi-party Computation and Synthetic Data Release

The Commonwealth Scientific and Industrial Research Organisation (CSIRO), a corporate entity of the Australian Government, is currently leading research into several innovative techniques for allowing researchers to make use of confidential data such as farm level records, without actually being able to see or access the raw data. These innovations rely on advances in digital encryption to preserve confidentiality.

Confidential computing and multi-party computation

CSIRO has expertise in homomorphic encryption, which enables calculations to be done on data while the data is encrypted; and secure multi party computation, which allows data to be shared between and computed on by multiple parties, but none of the parties have sufficient information to reconstruct the data itself. Both of these approaches are considered very promising as a long-term solution to the data protection problem, however “fully homomorphic encryption”, which is a recently discovered capability, is not yet practical for large-scale data analysis problems.

As part of its “confidential computing” platform, CSIRO Data61 is developing a combination of “partial homomorphic encryption” (which is more limited but more efficient than fully Homomorphic encryption), distributed computing and machine learning. This platform enables the provision of services that allows individual organisations (both public and private) to do joint analysis of data without exposing their own data to any other party. These methods are being applied to federal government data within the Australian Government National Innovation and Science Agenda13 (NISA) framework as a proof of concept.

The “Confidential Computing” platform allows access to a prescribed set of analytics functions that are performed over encrypted data that is not disclosed to the data scientist or analyst. As of September 2018, analytics functions that are available through this approach include aggregation and other simple statistical functions, simple supervised and unsupervised machine learning approaches, but currently exclude methods such as Random Forests and Deep Learning due to their incompatibility with the reduced set of operations available from the underlying cryptographic representation of the data. Confidential Computing enables a new, low friction, method of doing exploratory linkage and analysis of datasets. This approach may allow the discovery of new connections between datasets or attributes and insights without the overhead of the training, authorisation and provision of current approaches, while still maintaining the confidentiality of the data. More expensive access to the data directly can still be obtained through current methods, particular if justified through exploratory analysis over encrypted data. This capability is equally relevant to intra‐government data collaboration, government‐private data collaboration, and private‐private data collaboration.

Synthetic data release

There is a recent advance in privacy technology known as Differential Privacy, introduced by Dr Cynthia Dwork at Microsoft. Differential Privacy is a quantifiable measure of the privacy of certain data analytics techniques that involve random perturbation of either the data being analysed or the analysis itself. CSIRO Data61 is working on a variety of differentially private mechanisms to allow the release of synthetic unit record datasets that contain statistically similar data to the original data, but can guarantee that the released data cannot be re‐identified. Data61 is undertaking investigation of these methods within the NISA framework to potentially allow the release of government datasets with fewer restrictions than are currently needed to ensure confidentiality. These techniques involve adding noise to the data, and so have some impact on the utility of the data for analytics.

copy the linklink copied!Lessons learned for the use of innovative digital technologies to improve access to and reusability of farm level data for policy-relevant analysis

Lesson 1. Agricultural micro data, and the ability to “tie” farm level financial data to physical data, including location-specific attributes, are crucial for developing more efficient, spatially-targeted policies

Given the weight of evidence from existing economic analyses that untargeted agricultural policies are inefficient (see, for example, (Arbuckle, 2013[13]; Lankoski, 2016[14]; OECD, 2008[15]; OECD, 2012[16]; Rabotyagov et al., 2014[17]; Weersink and Pannell, 2017[18]; Whittaker et al., 2017[19]), the usefulness of micro-data for effective and efficient policy design, implementation and evaluation will only increase.

Governments need to recognise that access to agricultural micro data, including the ability to link different agricultural micro datasets (as well as other relevant data such as environmental data) is a crucial source of value-added, and is needed to produce robust and targeted policy analysis and advice.

Lesson 2. Even though governments may be moving towards more open data approaches, access to farm level data collected by public agencies is generally limited by enabling legislation and is underpinned by trust

Many governments have decided to pursue more “open data” approaches or enact general data privacy regulations which will shape governance on the use of agricultural micro data. Many have also committed to the principle that published data should confirm to FAIR standards14 of being findable, accessible, interoperable and re-useable.15 However, it is important to appreciate that government organisations are often legally required to maintain confidentiality in relation to raw data (particularly, in the agriculture context, where the raw data pertains to individuals or individual farms), and hence that commitments to open data or FAIR principles may not be considered relevant for access to farm-level data.

Moreover, most agricultural data is collected via trust-based relationships between farmers and government agencies or researchers. In a voluntary context, there is a clear link between trust in the data collector’s commitment and ability to preserve confidentiality and the willingness to participate. In a mandatory context, while arguably participation could be more easily regulated, provision of complete, correct data may nevertheless be difficult to ensure.

The fact that government agencies’ (and researchers’) ability to collect farm-level data is based on trust and on often longstanding legislative commitments to maintaining confidentiality has several implications:

  • Government may have limited ability to lessen these legislated confidentiality guarantees, especially in relation to existing datasets. This suggests that an open data approach for agricultural micro data may not be an achievable or desirable end goal.

  • Government should consider how collection of data via new methods which do not require direct participation from farmers (e.g. collection of data via remote sensing, or automated collection of data from “smart” agricultural machinery or infrastructure) impacts on the existing trusted relationships with farmers. Interactions may not be straightforward – for example, increased use of remote sensing may induce farmers to become more relaxed about (certain aspects) of confidentiality because data is available to all; conversely, it could engender a more wary approach and resistance to what could be perceived as government overreach.

Lesson 3. By facilitating analysis of the data without the analyst being able to see the data, confidential computing can solve the confidentiality-reuse dilemma

CSIRO’s N1 confidential computing platform provides an example of how technology can be used to bypass the traditional dilemma between the benefits of allowing access to highly disaggregated “true” data (including individual records) and the need to aggregate or perturb the data in order to preserve confidentiality. However, these technologies are still new and have yet to be applied a context involving agricultural data.

Lesson 4. Improving access to agricultural micro data needs a coherent, tiered data dissemination strategy

Existing arrangements for access to agricultural micro data for policy-related research and analysis is cumbersome and often fails to adequately provide researchers and analysts with the data they need. This results in duplication of effort (e.g. universities conducting their own surveys because they cannot access farm-level data collected by government statistical agencies) and limits the ability for researchers to provide targeted, dis-aggregated policy analysis and advice.

As demonstrated in this case study, there are a range of institutional and technological solutions which can facilitate access to agricultural micro data while preserving individual respondent confidentiality. It is not clear that one particular solution is superior; rather, government agencies (and other organisations who collect agricultural micro data) can take a graduated approach which takes into account both the benefits of allowing access to specific data for specific purposes and the potential harm caused if confidentiality is breached. Data dissemination strategies should explicitly recognise the trade-offs of different data access options.

It is suggested that governments take a tiered approach, as follows:

  • Start from the position of open data and take a “Why not?” approach: that is, reasons why data cannot be openly provided should be clearly articulated. Pre-existing legislative requirements to protect confidentiality should be able to be periodically transparently reviewed.16

  • Invest in data services such as providing linked datasets to increase the usefulness of government data collections. One important aspect of this is to link farm financial datasets with physical data such as soils, precipitation, and other climate variables.

  • Increase use of secure remote access mechanisms to reduce transactions costs of allowing trusted researchers to access micro data.

  • Explore greater use of new technologies such as confidential computing that avoid the traditional confidentiality-accessibility dilemma.

Organisations who collect or house data should work together with data providers (e.g. farmers in the context of traditional agricultural surveys) and data users to establish a clear framework governing data access.

Finally, while this case study has not considered broader issues about data ownership, data use rights and requirements to obtain consent to use and reuse data, it is important to emphasise that frameworks governing access to agricultural micro data should be coherent with broader policies governing such issues, as well as with underlying legislation authorising government agencies to collect agricultural data. For example, if a government were to take an approach that gives farmers ownership of agricultural data which pertains to them, data dissemination strategies of government agencies who collect, store or disseminate agricultural data needs to be consistent with this broader approach. Another example relates to consistency across jurisdictions: for example, organisations that are part of the Farm Accountancy Data Network (FADN)17 should ensure their data dissemination strategies are as consistent as possible, to facilitate analysis across FADN countries.


[13] Arbuckle, J. (2013), “Farmer Attitudes toward Proactive Targeting of Agricultural Conservation Programs”, Society and Natural Resources,

[1] Berg, E. and J. Li (2015), Improving the Methodology for Using Administrative Data in an Agricultural Statistics System. Technical Report 2: Administrative Data and the Statistical Programmes of Developed Countries, Center for Survey Statistics and Methodology, Iowa State University, (accessed on 21 August 2018).

[23] Chapman, D. (2016), CSIRO Submission 16/560 to Productivity Commission Inquiry on Data Availability and Use, (accessed on 21 August 2018).

[22] Domingo-Ferrer, J. and L. Franconi (eds.) (2006), LNCS 4302 - Privacy in Statistical Databases, Springer, http://ttps:// (accessed on 27 August 2018).

[21] Eurostat (2006), Monographs of official statistics: Work session on statistical data confidentiality, (accessed on 27 August 2018).

[9] Hand, D. (2018), “Statistical challenges of administrative and transaction data”, Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 181/3, pp. 555-605,

[5] Jones, J. et al. (2017), “Brief history of agricultural systems modeling”, Agricultural Systems, Vol. 155, pp. 240-254,

[14] Lankoski, J. (2016), “Alternative Payment Approaches for Biodiversity Conservation in Agriculture”, OECD Food, Agriculture and Fisheries Papers, No. 93, OECD Publishing, Paris,

[8] Lubulwa, M. et al. (2010), Statistical integration in designing Australian farm surveys, ABARE-BRS Conference Paper 10.13, Speke Resort, Kampala, Uganda, (accessed on 21 August 2018).

[2] Martínez-Blanco, J. et al. (2014), “Application challenges for the social Life Cycle Assessment of fertilizers within life cycle sustainability assessment”, Journal of Cleaner Production, Vol. 69, pp. 34-48,

[16] OECD (2012), Evaluation of Agri-environmental Policies: Selected Methodological Issues and Case Studies, OECD Publishing, Paris,

[15] OECD (2008), Agricultural policy design and implementation - a synthesis, OECD Publishing, Paris.

[20] OECD (n.d.), Short-Term Economic Statistics (STES) Administrative Data: Two Frameworks of Papers, (accessed on 21 August 2018).

[11] O’Keefe, C. and D. Rubin (2015), “Individual privacy versus public good: protecting confidentiality in health research”, Statistics in Medicine, Vol. 34/23, pp. 3081-3103,

[6] Petsakos, A. and P. Jayet (2010), Evaluating the efficiency of a N-input tax under different policy scenarios at different scales, (accessed on 19 April 2018).

[17] Rabotyagov, S. et al. (2014), “Cost-effective targeting of conservation investments to reduce the northern Gulf of Mexico hypoxic zone”, Proceedings of the National Academy of Sciences,

[12] Reiter, J. and C. Kohnen (2005), “Categorical data regression diagnostics for remote access servers”, Journal of Statistical Computation and Simulation, Vol. 75/11, pp. 889-903,

[10] Towe, C. and M. Morehart (n.d.), Improving researcher access to USDA’s Agricultural Resource Management Survey, (accessed on 21 August 2018).

[3] Tukker, A. and E. Dietzenbacher (2013), “Global Multiregional Input-Output Frameworks: An Introduction and Outlook”, Economic Systems Research, Vol. 25/1, pp. 1-19,

[4] VanderZaag, A. et al. (2013), “Towards and inventory of methane emissions from manure management that is responsive to changes on Canadian farms”, Environmental Resource Letters, Vol. 8, p. 035008,

[18] Weersink, A. and D. Pannell (2017), Payments versus Direct Controls for Environmental Externalities in Agriculture, Oxford University Press,

[19] Whittaker, G. et al. (2017), “Spatial targeting of agri-environmental policy using bilevel evolutionary optimization”, Omega (United Kingdom),

[7] Woodard, J. (2016), “Data Science and Management for Large Scale Empirical Applications in Agricultural and Applied Economics Research: Table 1.”, Applied Economic Perspectives and Policy, Vol. 38/3, pp. 373-388,


← 1. OECD (n.d.[20]) defines “administrative data” to have the following features:

  • the agent that supplies the data to the statistical agency and the unit to which the data relate are usually different: in contrast to most statistical surveys;

  • the data were originally collected for a definite non-statistical purpose that might affect the treatment of the source unit;

  • complete coverage of the target population is the aim;

  • control of the methods by which the administrative data are collected and processed rests with the administrative agency.

← 2. McCaa and Esteve (in Eurostat (2006[21]) citing Julia Lane, 2003) highlight “five classes of benefits which accrue from broader access to microdata: address more complex questions, calculate marginal effects, replicate findings, assess data quality and build new constituencies or stakeholders. Replication is extremely important because there is an overwhelming temptation for scientists to misrepresent results when the data are unlikely to be available to others.”

← 3. “Beyond law and ethics, there are also practical reasons for statistical agencies and data collectors to invest in this topic: if individual and corporate respondents feel their privacy guaranteed, they are likely to provide more accurate responses.” (Domingo-Ferrer and Franconi, 2006[22])

← 4. It is acknowledged that in some cases there may be little scope, at least in the short to medium term, to alter existing levels of protection provided confidentiality requirements already set in data collecting agencies’ authorizing legislation. Nevertheless, as opportunities to review such legislation arise, the appropriate level of protection should be carefully considered and not take as a “given”.

← 5. The material in this section is taken from CSIRO’s 2016 Submission to the Australian Government Productivity Commission’s Inquiry into Data Access and Use (Chapman, 2016[23]), with minor editorial modifications and addition of examples that are relevant to the agri-environmental context. Changes have been approved by the original authors.

← 6. Perturbation methods such as swapping, recoding, etc. make “exceedingly unlikely the identification of individuals, families or other entities in the data. Technical [perturbation] measures have the additional benefit that any assertion of absolute certainty in identifying anyone in the data is false.” (Eurostat, 2006[21])

← 7. Re-identification attacks are methods of analysing aggregated data to extract the details of a single individual or a group of individuals with a common characteristic. A notable example is the re-identification of the Netflix public dataset as performed by Narayanan and Shmatikov,

← 8. The Global Open Data for Agriculture and Nutrition (GODAN) initiative promotes the “the proactive sharing of open data to make information about agriculture and nutrition available, accessible and usable”. See, accessed August 2018.

← 9. “The [United States] Economic Research Service (ERS) and the National Agricultural Statistics Service (NASS), in coordination with the Food and Nutrition Service (FNS) utilise the [university of Chicago’s NORC] Data Enclave to provide authorised researchers secure remote access to data collected as part of the Agriculture Resource Management Survey (ARMS), the primary source of information to the US Department of Agriculture and the public on a broad range of issues about US agricultural resource use, costs, and farm sector financial conditions.” See, accessed August 2018.

← 10. “The DataLab is the data analysis solution for high-end users who want to undertake interactive (real time) complex analysis of microdata. Within the DataLab, users can view and analyse unit record information using up to date analytical software with no code restrictions, while the files remain in the secure ABS environment. All analytical outputs are checked by the ABS before being provided to the researcher.” See, accessed August 2018.

← 11. SURE is “Australia’s only remote-access data research laboratory for analysing routinely collected [health-related] data, allowing researchers to log in remotely and securely analyse data from sources such as hospitals, general practice and cancer registries.” See, accessed August 2018.

← 12. See, accessed September 2018. This is the channel for accessing agricultural micro-level data in France, including FADN data, but also surveys of farm practices. The CASD has been in place since 2012 and contains various types of sensitive data (e.g. health, taxation, business surveys, and administrative data such as agri-environmental measures).

← 13. See, accessed August 2018.

← 14. See, accessed August 2018.

← 15. In the Australian context, the Australian Government released in 2015 the Australian Government Public Data Policy Statement. The Policy Statement states: “The Australian Government commits to optimise the use and reuse of public data; to release non sensitive data as open by default; and to collaborate with the private and research sectors to extend the value of public data for the benefit of the Australian public. Public data includes all data collected by government entities for any purposes including; government administration, research or service delivery. Non-sensitive data is anonymised data that does not identify an individual or breach privacy or security requirements.” (emphasis added).

See, accessed August 2018. See also the Policy Statement on F.A.I.R. Access to Australia's Research Outputs, at, accessed August 2018.

← 16. Note that this recommendation does not presume that an open data approach will be appropriate in all cases. Rather, it is recommended as a useful conceptual starting point so that the case for confidentiality requirements can be re-evaluated and transparently made.

← 17. See, accessed August 2018.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2019

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at

Chapter 11. Case Study 6. Digital innovations to facilitate farm level data analysis while preserving data confidentiality