Chapter 6. Evaluating the impact and cost-effectiveness of youth programmes (Module 4)

Developing comprehensive programmes to support youth well-being is becoming an important objective in many countries around the world. Unfortunately, as policy makers consider measures to help young people, they are often challenged by the lack of information on what their options are, what has worked well and what may be the most cost-effective intervention to enhance youth outcomes. In this context, more rigorous evidence is needed on the impact and cost-effectiveness of youth programmes. This section presents tool to carry out impact evaluations.

  

Impact evaluation of a youth programme aims to assess the effectiveness of such programme in relation to a desired change in the well-being of the affected target population. Quantitative impact evaluation permits generalisations to be made about large populations on the basis of much smaller samples and can help attribute impact to a particular programme. To determine the most cost-effective options among effective programmes, quantitative impact evaluation needs to be combined with a cost-effectiveness or a cost-benefit analysis. Integrating qualitative and quantitative approaches in programme evaluation can further yield insights that neither approach would produce on its own. For a comprehensive overview of impact evaluation methods, see Hempel and Fiala (2012). For a review of concrete examples of youth programmes targeting different dimensions and their impact evaluation, see Knowles and Behrman (2005).

Experimental methods

Experimental methods are the most robust method to evaluate the impact of a programme. They can be applied only if the following conditions are met: i) the impact evaluation is planned right from the beginning, before the programme starts; ii) there is excess demand in the sense that not all eligible individuals participate in the programme; and iii) the selection of participants and non-participants is based on random assignment or voluntary participation. Experimental methods include lottery, randomised phase-in and randomised promotion.

Lottery design

Lottery design is the most robust type of impact evaluation. It can be conducted only if the selection of participants and non-participants is based on random assignment and if the programme is delivered all at once.

Lottery design is obtained through a randomised controlled trial. First, a representative sample of the eligible population is selected at random. Then, the sample is randomly split between two groups of equal or very similar size: those who will benefit from the programme (treatment group) and those who will not (control group). If the representative sample is large enough, the treatment and control groups will have, on average, similar characteristics. At the end of the programme, the difference in outcomes between the two groups will only be attributable to the intervention because all the other factors affecting outcomes will have, on average, the same effects on the two groups.

When applying a lottery design, the impact evaluation is externally valid, which means that the estimated impact can be generalised to the entire eligible population. This is because the sample of participants and non-participants is randomly selected and therefore has similar characteristics, on average, to the total eligible population.

Randomised phase-in design

Randomised phase-in design applies if the selection of participants and non-participants is based on random assignment and if the programme is rolled out over time. Randomised phase-in design follows the same process as lottery design to determine the treatment and control groups and to evaluate the impact of a programme. The only difference is that, in randomised phase-in design, the process is repeated for each phase of the programme. Each eligible individual has the same chance of receiving the programme under each of the phases.

Randomised phase-in design allows comparing the impact between participants who receive the programme first and participants who receive it later. The long-term impact of a programme cannot be estimated with this method if all eligible individuals end up receiving the programme because, in that case, no comparison group can be established.

Randomised promotion design

Randomised promotion design is used when potential beneficiaries cannot be excluded from a programme, either because it is impossible (the programme is open to all eligible individuals and participation is voluntary) or because it is not desirable (the programme has sufficient resources to cover the entire eligible population).

As with the previous methods, the first step in randomised promotion design is to select a representative sample of the eligible population randomly. Once the sample is obtained, it is not possible to assign individuals into the treatment and control groups randomly, as is done in the previous methods, because the programme cannot control who participates and who does not. To overcome this problem, the randomised promotion design proposes, as a next step, to promote the programme randomly or, said differently, to select individuals randomly who are encouraged to benefit from the programme.

Random promotion of a programme is based on the assumption that there are three types of potential beneficiaries: i) individuals who never participate; ii) individuals who always participate; and iii) individuals who participate only if they are incentivised.

Whatever the incentives or encouragement, there will be always some individuals in the non-promoted group who will take up the programme and some individuals in the promoted group who will not. However, the promotion can be considered effective if enrolment in the programme is found to be relatively higher for the promoted group compared to the non-promoted group. Because the promotion is conducted randomly, both groups will have, on average, similar characteristics.

The impact of the programme cannot be estimated by comparing the outcomes between participants and non-participants because both groups are not random and therefore have characteristics that differ. The impact is estimated instead by comparing the outcomes of those who received the promotion with those who did not receive it. Concretely, the impact is calculated as the difference in average outcomes between the promoted and non-promoted groups, corrected by the difference in enrolment rates.

Randomised promotion design is as reliable as the other experimental methods but has two major caveats. First, it requires larger sample sizes in order to obtain statistically significant results; second, the estimated impact is valid only for individuals who participated in the programme because they were encouraged and therefore cannot be generalised to the entire eligible population.

Quasi-experimental methods

Quasi-experimental methods are generally less robust but allow estimating the impact of a programme when experimental methods are not applicable, that is, when the impact evaluation cannot be planned during the design phase or when the selection of participants and non-participants cannot be based on random assignment or voluntary participation. Quasi-experimental methods include discontinuity design, difference-in-difference and matching.

Discontinuity design

Discontinuity design applies when participation in a programme is determined by an eligibility threshold. Potential beneficiaries are given a score and ranked based on specific and measurable criteria (e.g. age, test scores, poverty index). If they score higher/lower than a minimum/maximum threshold, then they are accepted into the programme. Discontinuity design assumes that participants and non-participants who are close to the threshold share similar characteristics. As such, the impact of the programme is obtained by calculating the difference in outcomes between these two groups.

This method requires a clearly defined threshold and a sample that is large enough to obtain a minimum number of observations around the threshold. The estimated impact is valid only for participants near the threshold and cannot be generalised to other participants or to the eligible population.

Difference-in-difference

Difference-in-difference provides much less robust results than all previous methods presented because it is used in programmes where the treatment and control groups are determined neither randomly nor with clear objective criteria, but following subjective rules, i.e. they are believed to be comparable. In such cases, the treatment and control groups are very likely to have differing characteristics, and the difference-in-difference method can only provide a poor estimate of the impact. The impact cannot be calculated by simply comparing the outcomes of the treatment and control groups once the programme is completed because both groups are selected on a subjective basis.

What the difference-in-difference does instead is to compare outcomes both before and after the intervention. The method is based on a very basic assumption according to which the differences in characteristics between the treatment and control groups, which are very likely to exist, are constant over time. With this assumption, the trend of the control group can serve as the counterfactual to estimate the impact of a programme.

The problem with this assumption is that it may be true for observable characteristics (e.g. age, gender) but is quite unrealistic as regards unobservable characteristics, since the latter tend to evolve over time (e.g. non-cognitive skills, preferences). This is indeed problematic because, if differences in characteristics change over time, the estimated impact of the programme will be biased. A way to test this assumption is to observe the trend in outcomes of the treatment and control groups before the implementation of the programme and determine whether they follow a similar pattern.

In difference-in-difference design, outcomes of the treatment and control groups are measured both at the beginning and at the end of the intervention. The impact is then estimated by subtracting the difference in pre-intervention outcomes from the difference in post-intervention outcomes.

As said, difference-in-difference provides less reliable results and is based on a strong assumption that can be difficult to verify. In addition, this method requires more efforts in terms of data collection: two data collections before the programme starts to test the assumption and another at the end to allow estimating the impact.

Matching

As with difference-in-difference, matching is also used for programmes where the treatment and control groups are subjectively selected. Thus, it constitutes an alternative method for estimating impact in these particular programmes.

In the matching method, the control group is identified by matching each participant of a programme with a non-participant who is as similar as possible, based on observable characteristics. The impact is obtained by calculating the difference in average outcomes between the treatment and control groups.

The matching method has several caveats. First, it requires a sample that is large enough to determine an appropriate control group. Second, it cannot control for unobservable characteristics. If the treatment and control groups differ in unobserved characteristics, as is very likely to happen, the estimated impact will be biased. Third, it can be difficult to find a match for each participant, especially when the sample is small and the number of observed characteristics considered is large. Matching only a small number of observed characteristics is not recommended because the matching will be less precise and the treatment and control groups less comparable.

Cost-effectiveness and cost-benefit analyses

Impact evaluation is a good instrument to assess the effectiveness of a programme, but it generally does not take into account the full range of benefits that are generated and misses the other side of the coin: the costs. Moreover, different programmes can lead to the same impact, in which case impact evaluation does not suffice to evidence which of them is the best.

To determine the overall success of a programme, two types of analysis can be conducted: i) cost-effectiveness analysis; and ii) cost-benefit analysis. These two methods mainly differ in that cost-effectiveness analysis denominates benefits in physical units, whereas cost-benefit analysis denominates benefits in money equivalent terms (Hempel and Fiala, 2012).

Cost-effectiveness analysis identifies the total costs of a programme and calculates the level of output or outcome per monetary unit spent to determine if the programme is cost-effective. It allows identifying the most efficient allocation of resources (most cost-effective programme) when alternative programmes exist and are compared.

Cost-benefit analysis weighs the total costs and benefits of a programme and calculates the ratio of benefits to costs. It allows determining if the programme is cost-beneficial, i.e. whether the benefits outweigh the costs or, in other words, whether investing in the programme yields overall positive returns. Cost-benefit analysis requires identifying all the costs and benefits that are directly or indirectly related to a programme. In cost-effectiveness analysis, only the costs are needed.

Total costs are calculated by multiplying the quantity of inputs used by their prices. There are different types of costs. Financial costs refer to inputs that have been purchased and therefore whose prices are known. In turn, economic costs refer to inputs that have been used for free and therefore have no explicit monetary value (e.g. volunteering). For this type of inputs, the price can be proxied by the opportunity cost (i.e. the value of the best alternative use).

Costs can also be either fixed or variable. Fixed costs are expenditures incurred before the output is produced. Variable costs are the costs of actually producing the outputs once the fixed costs have been incurred. If the programme is susceptible to generate future benefits and costs, it is important to discount them.

Costs can be calculated on average or at the margin. Average costs correspond to the total expenditure divided over the quantity of outputs produced, and marginal costs to the amount of money necessary to produce an additional unit of output. The former are computed using both fixed and variable costs, whereas the latter are computed using only variable costs.

Mixed quantitative-qualitative approaches

The previous section presented the most commonly used methods to conduct a quantitative impact evaluation. The present section, in turn, emphasises the importance of complementing quantitative results with qualitative information in order to better comprehend the outcomes of a programme and interpret its estimated impact. For a comprehensive discussion on the integration of quantitative and qualitative methods, see Bamberger (2000).

The best way to assess the effectiveness of a programme comprehensively is to rely on mixed approaches to impact evaluation, which combine both quantitative and qualitative methods. Incorporating qualitative methods, such as open-ended surveys, in-depth interviews, case studies, focus groups, participatory tools or direct participant observation can indeed improve the analysis in a number of ways:

  • Quality of programme implementation: The way a programme is implemented has necessarily an incidence on its outcomes. It is important to understand how the quality of implementation has affected the outcomes in order to better interpret impact evaluation results. Quality of implementation is best captured through a close monitoring of the programme and qualitative methods, including key informant interviews, direct participant observation and focus groups.

  • Impact heterogeneity across target groups: The impact of a programme may vary across different target groups. Quantitative methods can evidence impact heterogeneity but, in contrast to qualitative methods, they offer limited possibilities to understand the drivers behind these varying effects and the way they operate.

  • Contextual information: The outcomes of a programme can be affected by a range of contextual factors that cannot be fully captured through quantitative methods. Mixed approaches can thus be used in order to have a better knowledge of the context in which a programme is implemented and of how the context affects the outcomes.

  • Qualitative outcomes: Outcomes are often difficult to capture with quantitative methods (e.g. mental health, empowerment). In such cases, mixed methods, including qualitative interviews, focus groups and case studies, are useful to identify appropriate qualitative indicators and to better understand the programme impact.

References

Bamberger, M. (ed.) (2000), Integrating Quantitative and Qualitative Research in Development Projects, World Bank, Washington, DC.

Hempel, K. and N. Fiala (2012), Measuring Success of Youth Livelihood Interventions: A Practical Guide to Monitoring and Evaluation, World Bank, Washington, DC.

Knowles, J.C. and J.R. Behrman (2005), The Economic Returns to Investing in Youth in Developing Countries: A Review of the Literature, World Bank, Washington, DC.