1. What is evaluation and why do it?

The definition of evaluation used in this edition of the Framework continues to be that used by (Papaconstantinou and Polt, 1997[1]):

“Evaluation refers to a process that seeks to determine as systematically and objectively as possible the relevance, efficiency and effectiveness of an activity in terms of its objectives, including the analysis of the implementation and administrative management of such activity.”

This definition has four distinct elements, each of which is now discussed in turn.

The point made at the outset of the definition is that evaluation should be seen as a continuous or semi-continuous process and not as a “one-off” activity. So, rather than evaluation being limited to reviewing a completed programme to assess whether it should be closed, expanded or modified, the (Papaconstantinou and Polt, 1997[1]) definition emphasises its continuous nature in the policy process, where evaluation has a key role to play in each of the main policy stages below:

  • Prior to a programme being announced.

  • Whilst a programme is in operation.

  • When reaching a judgement on whether or not a programme has been effective.

  • When programmes with similar objectives to others that have already been implemented are under consideration in future.

The role of evaluation is now discussed for each stage in turn.

A first evaluation-related task for policymakers prior to a programme being announced is the formulation of Objectives and their associated Targets. These will enable evaluation to make assessments against previously fixed milestones. The Objectives and Targets help all stakeholders (politicians, programme managers, support agencies, beneficiary firms and entrepreneurs etc.) to be clear about what the programme seeks to achieve. They also help to ensure that the outcome claimed for a programme by an evaluation is related to what it set out to do – rather than allowing any change in related areas to be claimed as an impact of the programme.

Broad and vague Objectives, such as “making the country more entrepreneurial,” or “the creation of an enterprise culture” are inappropriate for impact evaluation. This is because these phrases can be interpreted in many ways such as: more business creations; more high-growth firms; more innovators; more social enterprises; a stronger international focus etc.

Instead, the Objectives have to be specified in terms of identifiable outcomes that are expected to change as a result of the policy. Examples of more appropriate Objectives might be:

  • To raise rates of new business creation by the unemployed or by disadvantaged groups.

  • To increase the proportion of SMEs which grow rapidly.

  • To increase the total number of social enterprises in the economy.

  • To promote international sales by SMEs by subsidising their participation at trade fairs.

Policymakers also need to specify more detailed Targets before the commencement of a programme. Here, the policymaker “converts” the specified Objectives to Targets by linking them to a statement on both their magnitude and their timescale. Examples of “converting” the above Objectives into Targets are:

  • To raise rates of new business creation by the unemployed or by disadvantaged groups by 20% over five years.

  • To increase the proportion of SMEs defined as high-growth SMEs from 3% to 4% over five years.

  • To raise the number of social enterprises in the economy by 50% over a decade.

  • For subsidised participation of SMEs at trade fairs to lead to increased overseas sales by these SMEs of 100% in the following 12 months.

A related task is to specify the form of evidence which will be used in an evaluation so as to assess whether the Objectives and Targets have been met. For this, the emphasis should be on outcome evidence.

In contrast, input-based evidence is insufficient. Two instances of input-based evidence are shown below:

  • To increase the number of SMEs participating in training programmes.

  • To increase the % of SMEs satisfied with business advice programmes.

These types of measures can provide policymakers with useful information on programme relevance and management performance, but are only crude measures even in these respects.

The first type of measure focuses on a programme’s take-up, market penetration rate and continued participation rate. However, these are influenced by a range of factors aside from the relevance of the programme to solving a problem or how well it is managed, such as the scale of the subsidy1, awareness by SMEs of the existence of the subsidy and perhaps SMEs’ experience of previous programmes. The second type of measure focuses on the satisfaction reported by programme participants. This is also an unreliable indicator of programme relevance and management quality because these measures rely on, possibly biased, samples of respondents.

Most importantly, input-based measures in themselves say nothing about the impact of the support provided on SME and entrepreneurship performance. Participation in the programme and the satisfaction reported by the participant may be unrelated to any identifiable changes in the firms or entrepreneurs on which support is focused. So, for example, the number of firms participating in training programmes might increase, but with no economic consequences for participating enterprises if the training is ineffective – and yet assessing effectiveness is the core issue in impact evaluation. Equally, Research and Development (R&D) tax credits may mean that enterprises undertake more R&D. However, an impact evaluation has to identify if the additional R&D enhanced the performance of the enterprises or generated wider social benefits. For all these reasons, reliable evaluation requires Objectives to be specified as outcomes such as the survival of the participant firms or changes in their levels of employment or output. We develop this further below.

This Framework does favour the collection of some input-based data alongside output data. It has the merit of being easy and relatively cheap to acquire. It also can be used to improve the delivery of programmes, once they become operational. Finally, as we note later, it is often used as an input into evaluation. Nevertheless, for the reasons outlined above, input data can only play a modest role in impact evaluation.

Prior to a programme being announced, Objectives and output-based Targets have to be agreed and specified in a form that makes them potentially open to impact evaluation. This provides the basis for reaching a reliable judgement on programme effectiveness.

An important part of the evaluation process is to collect and review monitoring data over the course of implementation of a programme. Such data can include the characteristics of the individuals and businesses that applied for and participated in the programme. They can also include feedback from participants and from those delivering the programme.

Monitoring information can help ensure that a programme is delivered to the intended recipients in an efficient manner. For example, if it becomes clear that take-up is either low, or focused on the wrong groups, various strategies, such as higher subsidies or enhanced marketing can help to re-focus the programme.

However, a second function of monitoring data, with particular relevance for impact evaluation, is that the characteristics of participants, as established by programme monitoring, can be used to create a control group of otherwise similar individuals/businesses that did not participate in the programme. These can be used as the counterfactual in an impact evaluation when a comparison is made between recipients and otherwise similar non-recipients.2 In programmes where not all applicants are successful, or where some businesses participate to a greater extent than others, the non-participant or low level participant businesses can, with care, also be used as the counterfactual.3

This role of evaluation in the policy-making process is the widely-accepted contribution expected of an impact evaluation. This role is discussed in more depth when the concepts of “systematically and objectively” from our evaluation definition are set out in Section 1.2 below.

This function of evaluation is very important because SME and entrepreneurship programmes with very similar stated objectives, and very similar modes of delivery, are found in different countries, and even within the same country some years apart. Ideally, when a new programme is under review, for whatever reason, evaluations of similar programmes in other countries, or in earlier time periods, should be used to learn lessons on how such a programme can be designed and delivered effectively.

The number of broadly similar programmes across countries has multiplied in recent years, which increases the scope to learn from the evaluations of other programmes when planning a new intervention.

One option is to access individual evaluations that have been undertaken in the programme area and review the relevant findings. However, the exercise of comparison of results is facilitated by the fact that “overviews” or meta-evaluations are increasingly common in the area of SME and entrepreneurship policy.4 Some reviews have focussed on “single issue” policies such as science parks or support for youth enterprise, whereas others have examined a wide range of SME and entrepreneurship policies. Chapter 3 discusses the results of some of these evaluations.

Nevertheless, it must be recognised that there is often diversity amongst programmes even within the same overall policy categories, which means that comparisons across individual evaluations may not always be comparing like with like. Furthermore, there may be differences in the reliability of the evaluation methods used for the different programmes and this could corrupt the conclusions. In particular, there must be doubts about the reliability of evaluations undertaken without control groups and at low steps in the Six Steps to Heaven Framework.

For this reason, conclusions reached on programme effectiveness from individual and meta-evaluations have to be carefully interpreted when making inputs to decision-making on new initiatives, with particular regard to the reliability of the evaluation and to the features of the programmes implemented.

Our approach is to focus heavily upon those evaluations that are the most sophisticated, and hence the most reliable. This means less credence is given, for example, to studies with an exclusive reliance on the views of small samples of programme recipients or of managers of the programmes.

OECD 2007 argued that a “good” evaluation was one that was able to determine, as systematically and objectively as possible, the impact of participating in a public programme on targeted SMEs and entrepreneurs. Good evaluations minimised the risk of bias by comparing the performance of the treatment group with otherwise similar non-recipients. This provided policymakers with the confidence that the findings could be taken as reliable.

The sophistication/reliability of evaluations was categorised as six steps, with Step I being the least, and Step VI being the most, sophisticated, and with a distinction being made between monitoring and evaluation. It argued that monitoring, Steps I to III, was the collection of information from the recipients of the programme or those delivering it. In contrast, the key element of evaluation was a comparison with a control group of firms or entrepreneurs/potential entrepreneurs that did not participate in the programme, but were identical to the recipients in all other respects.

It was inferred that the impact of the programme was the difference between the performance over time of the recipients, or treatment group, and the control group. Evaluation therefore applies only when there is a valid control group.

The Box below sets out all six steps, distinguishing between monitoring and evaluation. In 2007, SME and entrepreneurship policy evaluations based on Randomised Control Trials (RCTs) were rare, and no examples were included in the volume. However, these have become more common in the field over the last 15 years, and are now added as an example of STEP VI evaluations that in principal do not suffer from selection bias.

This phrase highlights that impact evaluations not only have to provide guidance on whether policy objectives are met, but also whether they are met in a cost-effective manner.

For example, assume some years previously a decision was made that new and small firms should be able to access publicly-funded business advice and that the impact of this programme should be evaluated. The purpose of the evaluation should be first, to determine whether participating firms out-performed – according to prior agreed metrics – otherwise similar firms that did not receive this advice. Second, that evidence should, in conjunction with the programme budget, be used to estimate the cost-effectiveness of the programme. Thirdly these findings should be placed alongside those of other relevant and comparable policy options.

To continue with the example, the purpose of business advice might be to lead to additional job creation amongst the recipients of the advice and quantified in terms of “cost per job” created. If evaluations of other SME and entrepreneurship programmes have been conducted, this enables policymakers to compare the efficiency and effectiveness of business advice – in terms of cost per job – with these other policy options.

In recent years several studies of entrepreneurship and SME policy (Arshed, Carter and Mason, 2014[3]); (Jurado and Battisti, 2019[4]); (Kitching, 2019[5]) have argued that policy outcomes can be strongly influenced by the role of key players – normally public servants – in the details of the formulation and implementation of policy. The role played by these “institutional entrepreneurs” therefore needs to be identified in any policy evaluation because what might appear to be “administrative” decisions can powerfully change the outcomes of a policy5.

As an example, many governments have loan guarantee programmes that are intended to ensure that risky, but worthy, SMEs are able to obtain loan funding. However, although the objective is simple, these programmes have terms and conditions and modes of delivery that vary considerably6. These include the percentage of the loan that is guaranteed; the interest rates payable; the maximum size of the loan; and the sectoral, legal form and geographical restrictions on eligibility. Frequently, the setting of these terms and conditions is seen as administrative, rather than strategic or political, decisions appropriately made by “institutional entrepreneurs”. Yet these, apparently minor, variations in eligibility can make a considerable difference to take-up rates, and hence to the success or otherwise of a loan guarantee programme. For these reasons it is important to understand the decisions made on policy design and implementation, the key influences upon it, and the impacts of any adjustments, as part of an evaluation7.

The central justification for undertaking evaluations of SME and entrepreneurship policy was made more than thirty years ago when such policies were in their infancy.

A conference, organised by the European Commission’s DG V in Brussels in March 1988 concluded, following a review of employment trends and policy initiatives:

“A great deal of emphasis was placed on the fact that the effectiveness of policy and financial intervention must be assessed, both because the means are limited and in order to improve targeting. The European Community does not have money to burn and has to convince the Member States of the effectiveness of any project before any allocations can be made.” (European Commission, 1988[6]).

This case remains unchanged. It is that governments have a responsibility to their taxpayers to ensure, as a minimum, that the funds used achieve the objectives set out for them. The case made here is that this can only be achieved through appropriate evaluation.

This justification is particularly important in the case of SME and entrepreneurship policy because, not only are the sums of public money considerable but also the scale and delivery of this budget can be opaque. This is because expenditure at a country level is incurred by a diverse range of actors. These include virtually every ministry or department of both national and regional government. In many countries it also includes funding from international organisations such as the European Union. Decisions on priorities for expenditure are taken by individual ministries of government and so inevitably reflect ministry priorities. This risks an approach which lacks cohesion across government in the absence of co-ordination mechanisms.

An example of the scale and diversity of SME and entrepreneurship policy expenditure is provided in the Box below.

Information on public expenditure on SME and entrepreneurship policy enables linking of evaluation evidence on policy impact to the scale of the policy expenditures made, and hence assessments of the cost-effectiveness of policy interventions. Documenting the scale and components of all public SME and entrepreneurship policy expenditure provides a context for assessments of the SME and entrepreneurship policy mix. For example, expenditure information combined with policy impact evidence would provide an input as to whether, for example, the provision of business advice is more cost-effective in raising SME employment than lowering corporation tax.

Making decisions about the relative cost-effectiveness of different interventions requires co-ordination of information and evaluation efforts across government. This needs to involve all the ministries and agencies of central government with SME and entrepreneurship policy expenditures. A co-ordination group for SME and entrepreneurship policy evaluation could be set up with a focal point from the relevant ministries (finance, economy, employment etc.) with significant policy expenditures impacting on SMEs and entrepreneurship. Their work could be led by a central monitoring and evaluation unit in the ministry with lead responsibility for SME and entrepreneurship policy. They would promote evaluation in their ministries and agencies and share information on evaluation methods and findings. This would help to make decisions on future policies making use of evaluation findings.

Recommendation: Governments should establish a central monitoring and evaluation unit and a co-ordination process for the monitoring and evaluation of SME and entrepreneurship policy across government ministries and bodies.   

In addition to justifying value for public expenditure, evaluation evidence is critical in helping policymakers learn how to strengthen the relevance, effectiveness and efficiency of policy by identifying which types of policy work well and not well in which contexts and with which designs and delivery methods. Again, the greatest benefits are achieved when the evaluation is undertaken comprehensively, across many policy interventions and the lessons are drawn from them.


[3] Arshed, N., S. Carter and C. Mason (2014), “The ineffectiveness of entrepreneurship policy: is policy formulation to blame?”, Small Business Economics, Vol. 43/3, https://doi.org/10.1007/s11187-014-9554-8.

[13] Brault, J. and S. Signore (2019), “The real effects of EU loan guarantee schemes for SMEs: A pan-European assessment”, EIF working paper No. 2019/56.

[9] Caselli, S. et al. (2019), “Public Credit Guarantee Schemes and SMEs’ Profitability: Evidence from Italy”, Journal of Small Business Management, Vol. 57/S2, https://doi.org/10.1111/jsbm.12509.

[14] Cowling, M. (2010), Economic Evaluation of the Small Firms Loan Guarantee (SFLG) Scheme, Institute for Employment Studies.

[6] European Commission (1988), Employment creation in Small Firms: Trends and New Developments.

[17] Georgiadis, A. and C. Pitelis (2016), “The Impact of Employees’ and Managers’ Training on the Performance of Small- and Medium-Sized Enterprises: Evidence from a Randomized Natural Experiment in the UK Service Sector”, British Journal of Industrial Relations, Vol. 54/2, https://doi.org/10.1111/bjir.12094.

[4] Jurado, T. and M. Battisti (2019), “The evolution of SME policy: the case of New Zealand”, Regional Studies, Regional Science.

[5] Kitching, J. (2019), “Regulatory reform as risk management: Why governments redesign micro company legal obligations”, International Small Business Journal: Researching Entrepreneurship, Vol. 37/4, https://doi.org/10.1177/0266242618823409.

[16] Lecluyse, L., M. Knockaert and A. Spithoven (2019), “The contribution of science parks: a literature review and future research agenda”, Journal of Technology Transfer, Vol. 44/2, https://doi.org/10.1007/s10961-018-09712-x.

[7] Lundström, A. et al. (2014), “Measuring the Costs and Coverage of SME and Entrepreneurship Policy: A Pioneering Study”, Entrepreneurship: Theory and Practice, Vol. 38/4, https://doi.org/10.1111/etap.12037.

[8] Martín-García, R. and J. Morán Santor (2021), “Public guarantees: a countercyclical instrument for SME growth. Evidence from the Spanish Region of Madrid”, Small Business Economics, Vol. 56/1, https://doi.org/10.1007/s11187-019-00214-0.

[2] OECD (2007), OECD Framework for the Evaluation of SME and Entrepreneurship Policies and Programmes, OECD Publishing, Paris, https://doi.org/10.1787/9789264040090-en.

[1] Papaconstantinou, G. and W. Polt (1997), Policy evaluation in innovation and technology: An Overview, in Policy Evaluation in Innovation and Technology: Towards Best Practices, OECD, Paris.

[15] Riding, A., J. Madill and G. Haines (2007), “Incrementality of SME loan guarantees”, Small Business Economics, Vol. 29/1-2, https://doi.org/10.1007/s11187-005-4411-4.

[10] Rotger, G., M. Gørtz and D. Storey (2012), “Assessing the effectiveness of guided preparation for new venture creation and performance: Theory and practice”, Journal of Business Venturing, Vol. 27/4, https://doi.org/10.1016/j.jbusvent.2012.01.003.

[11] Sara, R. (2016), Start-Up Support for Young People in the EU: From Implementation to Evaluation, Eurofound.

[12] Storey, D. (1994), “Understanding the Small Business Sector”, Routledge, London.


← 1. See a North Jutland case in Denmark examined by (Rotger, Gørtz and Storey, 2012[10]).

← 2. If the programme is evaluated using Randomised Control Trials (RCTs) then the control group is normally established prior to, rather than after, the programme has become operational. A notable exception is (Georgiadis and Pitelis, 2016[17]), where demand for the programme was considerably higher than expected and so support was randomly “rationed”.

← 3. As we show later, the assumption that non-applicants or rejected applicants are always a suitable control group is open to question.

← 4. For example, policy-makers seeking evidence of the impact of science parks can turn to a review of 175 journal articles evaluating science parks by (Lecluyse, Knockaert and Spithoven, 2019[16]). An equally authoritative review of start-up support for young people in the European Union is provided by (Sara, 2016[11]). They identified 34 broadly similar programmes spanning virtually all EU countries.

← 5. For example (Kitching, 2019[5]) examines UK policies to reduce the disclosure requirements of publicly available accounts of SMEs. He questions whether, despite having a clear policy to “think small first,” the interests of SMEs actually took precedence over those of large enterprises. He concludes that, although these policies were intended to reduce the bureaucratic burdens on SMEs, the prime beneficiaries were actually larger enterprises.

← 6. For reviews of guarantee programmes see (Martín-García and Morán Santor, 2021[8]) for Spain, (Caselli et al., 2019[9]) for Italy; (Brault and Signore, 2019[13]) for the EU, (Cowling, 2010[14]) for the UK and (Riding, Madill and Haines, 2007[15]) for Canada.

← 7. The UK Loan Guarantee Scheme (LGS) provides an example. The LGS varied both the percentage of the loan guaranteed and the interest rate charged. Raising the interest rate premium from 3% to 5% reduced the number of loans from just over 4 000 per year to almost zero within two years. When the premium was lowered to 2.5% loan numbers returned to previous levels within three years (Storey, 1994[12]).

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.