Chapter 4. Contingent valuation method

The contingent valuation (CV) method is a stated preference approach where respondents are asked directly for their willingness to pay (or willingness to accept compensation) for a hypothetical change in the level of provision of a non-market good. CV is applicable to a wide range of situations, including future changes and changes involving non-use values. As this chapter documents, there is now a wealth of experience that can be gleaned from the literature on CV that can guide current thinking about good survey design and robust valuation. This is critical as the central debate remains regarding the method’s validity, manifesting itself in discussions about specific problems and biases. Increasingly some of these problems are being investigated in the light of findings from research on behavioural economics. Other significant developments, notably the rise of on-line surveys, have been important in enabling more extensive applications, and further testing of biases and possible bias reduction mechanisms.


4.1. Introduction

The contingent valuation (CV) method is a survey-based stated preference technique that elicits people’s intended future behaviour in constructed markets. In a contingent valuation questionnaire, a hypothetical market is described where the good in question can be traded. This contingent market defines the good itself, the institutional context in which it would be provided, and the way it would be financed. Respondents are asked directly for their willingness-to-pay (or willingness-to-accept) for a hypothetical change in the level of provision of the good (Mitchell and Carson, 1989). Respondents are assumed to behave as though they were in a real market.

One of the strengths of stated preference methods lies in their flexibility. Because of its hypothetical nature and non-reliance on existing markets, the contingent valuation method is applicable, in principle, to almost all non-market goods, to past changes and future changes, and is one of the few available methodologies able to capture all types of benefits from a non-market good or service, including those unrelated to current or future use, i.e. so-called non-use values.

The CV idea was first introduced by von Ciriacy-Wantrup (1947) and the first application was undertaken by Davis (1963) valuing the benefits attached to outdoor recreation. Over time, CV became the dominant stated preference method, extensively applied to the valuation of a wide range of non-market changes both in developed and developing countries: water quality, outdoor recreation, species preservation, forest protection, air quality, visual amenity, waste management, sanitation improvements, biodiversity, health impacts, natural resource damage, environmental risk reductions, cultural heritage, and new energy technologies, to list but a few. Much of the impetus to this expansion were the conclusions of the special panel appointed by the US National Oceanic and Atmospheric Administration (NOAA) in 1993 (Arrow et al., 1993) following the Exxon Valdez oil spill in Alaska in 1989 (Nelson, 2017). The panel concluded that, subject to a number of best-practice recommendations, CV studies could produce estimates reliable enough to be used in a judicial process of natural resource damage assessment. And despite criticism from some quarters at the time (e.g. Diamond and Hausman, 1994), the number of contingent valuation studies has increased substantially since. In 2011, Carson published an annotated bibliography of contingent valuation studies (published and unpublished): it contained over 7 500 entries from over 130 countries (Carson, 2011). And a search on the Web of Science for publications using the search term “contingent valuation” produced almost 6 000 hits as of January 2017.

It is now almost twenty-five years since the NOAA deliberations and it is no exaggeration to say that a discussion of methodological tests and developments in the field of stated preference methods and contingent valuation in particular, could command several volumes. The intervening years have seen stated preference research being applied routinely in policy. Government-commissioned guidelines now exist for using these methods to inform UK public policy in general (Bateman et al., 2002), and also specific guidance for particular sectors (e.g. Bakhshi et al., 2015, for the UK cultural sector). State-of-the-art guidance on most aspects of non-market (environmental) valuation for the United States has also been published (Champ et al., 2003). The most recent contemporary guidance for stated preference studies can be found in Johnston et al. (2017).

Developments have not been restricted only to the application of these tools in the field of environmental economics. There has also been important cross-fertilisation with, for example, health economics and, more recently, cultural economics, sports economics and other areas of public policy. Moreover, research in stated preference methods has also played a role in advancing the whole field of economics. According to Kerry Smith (2006), “Contingent valuation has prompted the most serious investigation of individual preferences that has ever been undertaken in economics” (p. 46). Notably, the recent rise to prominence of behavioural and experimental economics owes much to the research around investigating anomalies in stated preference methods (Carson and Hanemann, 2005; Carlsson, 2010; Nelson, 2017). Most promisingly, much more is now known about in what circumstances stated preference methods work well – in terms of resulting in valid and reliable findings – and where problems can be expected. Behavioural economics research has shown that some of the anomalies that were first detected in hypothetical markets also occur in real-markets and are an inescapable feature of how people behave and react to incentives and information (rather than resulting from shortcomings specific to CV). Such findings have had an important bearing on progressing best practice in how to design a contingent valuation questionnaire.

However, despite thousands of studies, numerous methodological developments and widespread policy application, contingent valuation remains a source of controversy. Long-time critics, like Jerry Hausman, remain unconvinced about the merits of stated preferences and of CV in particular. In 1994, Diamond and Hausman published a much-cited critique of CVM (Diamond and Hausman, 1994), where doubts were expressed about its validity with a focus on scope insensitivity. More recently, in 2012, Hausman updated his concerns in a blunt set of criticisms, where he contends that CV is a “hopeless” technique, despite all the wealth of experiences and advances in the intermediating years (Hausman, 2012). Hausman remains worried about three well-known potential limitations of CV, namely hypothetical bias, insensitivity to scope and disparity between WTP and WTA. After performing a “selective” review of CV studies he concludes that respondents “invent their answers on the fly” and that “no number is still better than a contingent valuation number”. Controversially, Hausman goes on to defend the use of experts for the creation of economic values. Detailed discussion and counterarguments can be found in Kling et al. (2012), Carson (2012) and Haab et al. (2013).

This chapter seeks to distil some of the recent important developments in contingent valuation, and in that light, critically reviews the evidence on its validity. Section 4.2 summarises the conceptual framework. Section 4.3 discusses and evaluates a number of key points that guide good survey design, on the basis that valid and reliable estimates of non-market values are far more likely to emerge from studies which draw on the wealth of experience that can be gleaned from the literature on contingent valuation. Section 4.4 outlines issues related to divergences between mean and median WTP – an issue of particular importance in aggregating the findings from stated preference studies. Section 4.5 discusses the evidence on validity and reliability and critically considers a number of potential problems and biases that have been cited as being amongst the most important challenges facing contingent valuation practitioners. Section 4.6 contains an overview of recent developments, such as the influence of related research on behavioural economics and the rise of on-line surveys. Finally, Section 4.7 offers some concluding remarks and policy guidance.

4.2. Conceptual foundation

The value of a non-market good or service relates to the impact that it has on human welfare, measured in monetary terms. Hicks (1943) proposed four measures of economic value holding utility constant, in contrast to Marshallian consumer surplus which holds income constant. The Hicksian welfare measures comprise compensating variation and compensating surplus which measure gains or losses relative to the initial utility level (i.e. the implied property right is in the status quo); and equivalent variation and equivalent surplus, which measure gains or loses relative to an alternative utility level (i.e. the implied property right is in the new situation) (Mitchell and Carson, 1989). Variation measures are used for price changes, when as a response the individual can vary the quantity of the good or service of interest, while surplus measures are used for situations involving changes in the quantity or quality of goods and services, and where the individual can only buy fixed amounts (Freeman, 1994). A more detailed explanation of the Hicksian welfare measures can be found in Annex 4.A1. Most environmental applications deal with situations involving fixed increases or decreases in the quantity or quality of a non-market good or service. In such contexts, the relevant welfare measures are therefore the Hicksian welfare surplus measures: compensating and equivalent surplus (Freeman, 1994):

  • Compensating surplus (CS) is the change in income, paid or received, that will leave the individual in his initial welfare position after a change in provision of the good or service;

  • Equivalent surplus (ES) is the change in income, paid or received, that will leave the individual in his subsequent welfare position in absence of a change in provision of the good or service.

Formally, for a welfare improvement, these welfare measures can be derived as follows (Freeman, 1993):

picture [4.1]

picture [4.2]

where u is the indirect utility function, M is money or income, Q is the non-market good, CS is the compensating surplus, ES is the equivalent surplus, and the 0 and 1 superscripts refer to before and after provision of the non-market good.

Depending on whether the change of interest has a positive or negative effect on welfare, CS and ES can be rephrased in terms of willingness-to-pay (WTP) or willingness-to-accept (WTA). Table 4.1 summarises the four possible measures (Freeman, 1994).

Table 4.1. Hicksian compensating and equivalent surplus measures of welfare

Compensating surplus (CS)

Equivalent surplus (ES)

Welfare gain

(1) WTP to secure the positive change

(2) WTA compensation to forego the positive change

Welfare loss

(3) WTA compensation to put up with the negative change

(4) WTP to avoid the negative change

4.3. Designing a contingent valuation questionnaire

As with other survey techniques, a key element in any CV study is a properly designed questionnaire: i.e. a data-collection instrument that sets out, in a formal way, the questions designed to elicit the desired information (Dillon et al., 1994). Questionnaire design may seem to be a trivial task where all that is required is to put together a number of questions about the subject of interest. But this apparent simplicity lies at the root of many badly designed surveys that elicit biased, inaccurate and useless information, possibly at a great cost. In fact, even very simple questions require proper wording, format, content, placement and organisation if they are to elicit accurate information.1 Moreover, any draft questionnaire needs to be adequately piloted before it can said to be ready for implementation in the field. In this context, Mitchell and Carson (1989, p. 120) note that:

“the principal challenge facing the designer of a CV study is to make the scenario sufficiently understandable, plausible and meaningful to respondents so that they can and will give valid and reliable values despite their lack of experience with one or more of the scenario dimensions”.

This section introduces the basics of contingent valuation questionnaire design, the typical aim of which is to elicit individual preferences, in monetary terms, for changes in the quantity or quality of a non-market good or service. The questionnaire intends to uncover individuals’ estimates of how much having or avoiding the change in question is worth to them. Expressing preferences in monetary terms means finding out people’s maximum willingness-to-pay (WTP) or minimum willingness-to-accept (WTA) for various changes of interest. In other words, a CV questionnaire is a survey instrument that sets out a number of questions to elicit the monetary value of a change in a non-market good. Typically, the change described is hypothetical.

There are three basic parts to most CV survey instruments.

First, it is customary to ask a set of attitudinal and behavioural questions about the good to be valued as a preparation for responding to the valuation question and in order to reveal the most important underlying factors driving respondents’ attitudes towards the public good.

Second, the contingent scenario is presented and respondents are asked for their monetary evaluations. The scenario includes a description of the commodity and of the terms under which it is to be hypothetically offered. Information is also provided on the quality and reliability of provision, timing and logistics, and the method of payment. Then respondents are asked questions to determine how much they would value the good if confronted with the opportunity to obtain it under the specified terms and conditions. The elicitation question can be asked in a number of different ways as discussed later in this chapter. Respondents are also reminded of substitute goods and of the need to make compensating adjustments in other types of expenditure to accommodate the additional financial transaction. The design of the contingent scenario and of the value elicitation questions are the core elements of the CV method.

Finally, questions about the socio-economic and demographic characteristics of the respondent are asked in order to ascertain the representativeness of the survey sample relative to the population of interest, to examine the similarity of the groups receiving different versions of the questionnaire and to study how willingness-to-pay varies according to respondents’ characteristics.

Econometric techniques are then applied to the survey results to derive the desired welfare measures such as mean or median WTP (and are used to explain what are the most significant determinants of WTP).

In the remainder of this section we focus on the second and core part of a CV questionnaire that comprises three interrelated stages: i) identifying the good to be valued; ii) constructing the hypothetical scenario; and iii) eliciting the monetary values.

4.3.1. What is the policy change being valued?

Before starting to design the questionnaire, researchers must have a very clear idea of what policy change they want to value, i.e. which quality or quantity change(s) is of interest and of what particular non-market good(s) or service(s). This is in essence the formulation of the valuation problem. But as fundamental as this is, formulating the problem to be valued may not be straightforward. First, there may be scientific uncertainty surrounding the physical effects of particular changes. Second, it may be unclear how physical changes affect human well-being. Third, the effects of some changes may be difficult to translate into terms and sentences that can be readily understood by respondents. Fourth, some changes are very complex and multidimensional and cannot be adequately described within the timeframe and the means available to conduct the questionnaire. Fifth, textual descriptions of some changes may provide only a limited picture of the reality (e.g. changes in noise, odour or visual impacts). Table 4.2 presents examples of changes that may be difficult to define.

Table 4.2. Examples of possible valuation topics and potential problems

Change to be valued


Damages caused in a river from increased water abstractions

Scientific uncertainty surrounding the physical changes caused by increased abstractions;

Difficulty in describing a wide range of changes in the fauna, flora, visual amenity, water quality and recreational potential, without causing information overload;

Difficulty in isolating abstraction impacts in one river from impacts in other rivers;

The damages may be different in different stretches of the river and in different periods of the year.

Reduced risk of contracting a disease or infection

Risk and probability changes are not easily understood;

Difficulties in conveying the idea of small risk changes;

Difficulties in isolating pain and suffering impacts from the cost of medication or of lost wages.

Damages caused by traffic emissions on an historical building

Difficulties in isolating the impact of traffic-related air pollution and other sources of air pollution;

Difficulty in explaining the type of damage caused (e.g. soiling of the stone vs. erosion of the stone);

Difficulty in conveying the visual impacts of the change if visual aids are not used.

Damages caused by the introduction of a plant pest

Limited scientific information may not permit full identification of the wide range of environmental impacts caused by plant pests;

Difficulty in explaining in lay terms the idea of damages to biodiversity and ecosystems;

The impacts of a pest may be too complex to explain in the limited time that the questionnaire lasts.

4.3.2. Constructing the hypothetical scenario

As with all surveys, CV surveys are context dependent. That is, the values estimated are contingent on various aspects of the scenario presented and the questions asked. While some elements of the survey are expected to be neutral, others are thought to have a significant influence on respondents’ valuation. These include the information provided about the good, the wording and type of the valuation questions, the institutional arrangements and the payment mechanism. Hence, the design of the hypothetical scenario and the payment mechanism is of crucial importance for the elicitation of accurate and reliable responses.

A hypothetical scenario has three essential elements: i) a description of the policy change of interest; ii) a description of the constructed market; and iii) a description of the method of payment.

Description of the policy change of interest

For single-impact policies, the description of the policy change to be valued entails a number of steps. Clearly, there must be a description of the attributes of the good under investigation in a way that is meaningful and understandable to respondents. Some of those issues outlined in Table 4.2 arise in this context, as these force complex and potentially overwhelmingly large amounts of information to be translated into a few meaningful “headline indicators”. The description of available substitutes for the good (its degree of local, national or global uniqueness) and of alternative expenditure possibilities may affect respondents’ values and should also be part of the scenario description. Lastly, the scenario should include a description of the proposed policy change and of how the attributes of the good of interest will change accordingly.2 In particular, the reference (status quo or baseline level) and target levels (state of the world with the proposed change) of each attribute of interest need to be clearly described.

If a multidimensional policy is to be appraised, then this provides extra challenges in terms of questionnaire design. For example, if the specific change being valued is part of a more inclusive policy that comprises a number of other changes occurring simultaneously (e.g. protecting the white tiger when protection of black rhinos, blue whales, giant pandas and mountain gorillas are also on the agenda) then it is fundamental to present the individual change as part of the broader package. This provides respondents with a chance to consider all the possible substitution, complementarity and income effects between the various policy components, which would have been impossible had the policy component been presented in isolation (which would have led to possible embedding effects, where respondents equate the value of “part” of a policy change with how they actually value the “whole” and so an overestimation of the value of the policy component).

One such approach is to follow a top-down procedure, whereby respondents are first asked to value the more inclusive policy and then to partition that total value across its components. There is an obvious limitation to the number of components that can be valued in such a way: as one tries to value an increasing number of policy changes, the description of each becomes necessarily shorter, reducing the accuracy of the scenario, while respondents may also become fatigued or confused. It should be noted that while contingent valuation is in theory applicable to value multidimensional changes, as described above, a more efficient way of dealing with such changes might be to adopt a choice modelling approach (see Chapter 5).

Description of the constructed market

The constructed market refers to the social context in which the hypothetical CV transaction, i.e. the policy change, takes place. A number of elements of the constructed market are important.

The institution that is responsible for proving the good or change is of interest. This can be a government, a local council, a non-governmental organisation or NGO, a research institute, industry, a charity and so on. Institutional arrangements will affect WTP as respondents may hold views about particular institutions’ level of effectiveness, reliability and trust. The technical and political feasibility of the change is a fundamental consideration in the design of the questionnaire. Respondents can only provide meaningful valuations if they believe that the scenario described is feasible.

Conditions for provision of the good include respondents’ perceived payment obligation and respondents’ expectations about provision. Regarding the former, there are several possibilities: respondents may believe they will have to pay the amounts they state; they may think the amount they have to pay is uncertain (more or less than their stated WTP amount); or they may be told that they will pay a fixed amount, or proportion of the costs of provision. Regarding the latter, the basic question is whether respondents believe or not that provision of the good is conditional on their WTP amount. Both types of information are important as each different combination evokes a different type of strategic behaviour (Mitchell and Carson, 1989). In particular, it is important to provide respondents with incentives to reveal their true valuations, i.e. to design an incentive compatible mechanism. This issue is addressed at various points, later in this chapter (see in particular Box 4.2).

The timing of provision – when and for how long the good will be provided – also needs to be explicitly stated. Given individual time preferences, a good provided now will be more valuable than a good provided in 10 years’ time. Also, the amount of time over which the good or service will be provided can be of crucial importance. For example, the value of a programme that saves black rhinos for 50 years is only a fraction of the value of the same programme where protection is awarded indefinitely.

Description of the method of payment

A number of aspects of the method of payment should be clearly defined in CV questionnaires. Most fundamentally, the choice of benefit measure is a fundamental step in any CV survey. Box 4.1 notes a further issue regarding the possible existence and elicitation of negative WTP in situations where some respondents could just as conceivably value the status quo.

Box 4.1. Eliciting negative WTP

Policy makers often are concerned with choosing between a proposed environmental change – or number of proposed changes – and the status quo. To help in making such a decision, stated preference survey techniques such as the CV method may be employed to gauge the size of the welfare benefits of adopting each one of the proposed changes. In the case of changes in the provision in, for example, rural landscapes opinion could be split with some respondents favouring the change, whilst others wishing to indicate a preference for the status quo. In such cases, CV practitioners could consider in designing a survey to allow respondents to express either a monetary value of their welfare gain or welfare loss for any particular change.

A number of studies have sought to examine this problem of negative WTP, including Clinch and Murphy (2001) and Bohara et al. (2001). One example of the issues that can arise is illustrated in Atkinson et al. (2004). In this CV study of preferences for new designs for the towers (or pylons), which convey high voltage electricity transmission lines, opinion on the new designs was divided. Some respondents favoured a change, whilst others indicated a preference for the status quo. Indeed, for some respondents, a number of the new designs were considered sufficiently unsightly that they felt the landscape would be visually poorer for their installation.

For those respondents preferring a new design to the current design, WTP was elicited using the payment vehicle of a one-off change of the standing charge of their household electricity bill. For those people preferring the current design to some or all of the new tower designs, the procedure was less straightforward. Respondents could be asked for their willingness-to-accept (WTA) a reduced standing charge as compensation for the disamenity of viewing towers of the new design. This reduction, for example, could be explained as reflecting reductions in the maintenance costs of the newer design. Here a particular respondent might prefer one change to the status quo whilst “dispreferring” another. Yet, within the context of seeking separate values for each of a number of different changes, this would require respondents to believe a scenario in which preferred changes happened to trigger increases in bills but less preferred changes resulted in reductions in bills. Whether respondents would find this credible or not was a question that was considered by the authors.

As an alternative, respondents were asked instead to state which of a number of increasingly arduous tasks they would perform in order to avert the replacement of the current towers with towers of a new design. These tasks are described in the first column in Table 4.3 and involved signing petitions, writing complaint letters or making donations to protest groups. Each intended action can then be given a monetary dimension by relating it to the associated value of time lost (writing letters, signing petitions) or loss of money (donations).

The second column in Table 4.3 describes the results of imputing WTP values to each of the possible actions to avoid replacing the current design where the value in money terms of the time, effort and expense involved in writing a letter of complaint is described by c. A respondent who indicated that he/she would not do anything was assumed to be stating indifference, i.e. a zero WTP to retain the current design. A respondent stating that they would sign a petition but not go as far as writing a letter to their MP was assumed to be indicating that they were not indifferent but would not suffer a sufficient welfare loss to invest the time, effort and expense in writing a letter. Hence, their WTP was larger than zero but less than c. A respondent stating they would write a letter but would not pay GBP 10 to a protest group was indicating that their welfare loss lay in the interval between c (inclusive) and c + GBP 10 (exclusive). Respondents stating they would write a letter and pay GBP 10 to a fighting fund but not pay GBP 30 were indicating that their welfare loss lay in the interval above or equal to c + GBP 10 but below c + GBP 30. For those willing to donate GBP 30, it can be inferred that their maximum WTP is above or equal to c + GBP 30.

Table 4.3. Translating intended actions into WTP estimates

Intended action

Assumed WTP to retain the current design

I wouldn’t do anything as I don’t really care

WTP = 0

I would sign a petition complaining to my MP and local council

0 < WTP < c

I would sign a petition and independently write to my local council and/or MP and/or electricity company in order to complain.

c £ WTP < GBP 10 + c

As well as signing a petition and writing letters of complaint I would be prepared to donate GBP 10 to a group coordinating protest

GBP 10 + c £ WTP < GBP 30 + c

As well as signing a petition and writing letters of complaint I would be prepared to donate GBP 30 to a group coordinating protest

WTP ≥ GBP 30 + c

Notes: c is the value in money terms of the time, effort and expense involved in writing a letter of complaint.

Source: Atkinson et al. (2004).

Given that c is of an unknown magnitude, the assumption was made that it takes an hour to produce and mail such a letter. Put another way, c is the value the household places on one hour of its time. Following some frequently used assumptions concerning the value of non-labour time, c is calculated from the annual after-tax income. Specifically, the value of time is taken as a third of the wage rate, which is approximated as a two‐thousandth of the annual after-tax income of the household.

With regards to the payment vehicle – how the provision of the good is to be financed – the basic choice is between voluntary or coercive payments. Coercive payment vehicles include taxes, rates, fees, charges or prices. Voluntary payments are donations and gifts. The payment vehicle forms a substantive part of the overall package under evaluation and is generally believed to be a non-neutral element of the survey. Mechanisms such as income taxes and water rates are clearly non-neutral and it is relatively common to find respondents refusing to answer the valuation question on the grounds that they object in principle to paying higher taxes or water rates, in spite of the fact that the proposed change is welfare enhancing. The use of taxes also raises issues of accountability, trust in the government, knowledge that taxes are generally not hypothecated, excludes non-tax payers from the sample and may not be credible when the scenario is one of WTA, i.e. corresponding to a tax rebate. Voluntary payments on the other hand might encourage free-riding, as respondents have an incentive to overestimate their WTP to secure provision, with a voluntary later decision on whether or not to purchase in the future (see Box 4.2). The use of prices also poses problems as respondents may agree to pay more but simply adjust the quantities consumed so that the total expenditure remains the same.

Box 4.2. Coercion vs. voluntarism and WTP for a public good

Carson, Groves and Machina (2007) have analysed extensively the conditions under which CV respondents have incentives to free-ride. They conclude that the provision of a public good by means of voluntary contributions is particularly troublesome as there is a strong incentive to overstate WTP in the survey context (if stated WTP is perceived to unrelated to actual payment). This is because overstating hypothetical WTP increases the chances of provision of the desired public good without having to pay for it. Conversely, respondents may choose to free-ride (state a lower WTP value than they would pay in reality) if stated values were perceived to translate credibly into actual contributions. The implication is that voluntary contribution mechanisms should generally be avoided in CV surveys, as that seems to be the cause of the bias rather than the hypothetical nature of the method. Incentive compatible payment methods should be used to minimise the risk of strategic behaviour.

A study by Champ et al. (2002) has sought to test some of these ideas. The authors examined three types of payment vehicle, which they used to elicit WTP for the creation of an open space in Boulder County, Colorado: (A) voluntary individual contribution to a trust fund; (B) voluntary individual contribution to a trust fund, which would be reimbursed in full if the open space project did not go ahead; and, (C) one-off tax on residents based on the results of a referendum. Assuming that respondent believed their WTP values could form the basis of the charge they would actually face to finance the project, it was hypothesised that theory (as just described) would predict that:

1. WTP (C) £ WTP(A)

2. WTP(C) £ WTP(B)

3. WTP(A) £ WTP(B)

Put another way, the authors reckoned that the relatively coercive form(s) of payment vehicle would be less likely to encourage free-riding than the relatively voluntary form(s). The findings of this study appear to confirm this in part as strong evidence was detected for the first prediction. That is, WTP in form of a tax (C) was significantly smaller than WTP in the form of voluntary contributions (A). While there was less strong evidence (if any) for the remaining two hypotheses, these findings, nevertheless, provide some support for the conjecture that coercive payment vehicles reduce implicit behaviour that might be interpreted as having some strategic element. However, as the authors note, this is just one desirable criterion of a payment vehicle and, in practice, the credibility of any payment medium will also play a large part in determining its relative merits.

Although there seems to be some consensus that voluntary payment vehicles should generally be avoided due to the insurmountable problem of free-riding, ultimately, the choice of the payment vehicle will depend on the actual good being studied and the context in which it is to be provided. Credibility and acceptability are important considerations here. A simple guideline is to use the vehicle, which is likely to be employed in the real-world decision: i.e. if water rates are the method by which the change in provision will be affected then there should be a presumption in favour of using water rates or charges in the contingent market. A caveat to this guide arises where this causes conflict with certain of the criteria set out above. For example, a study by Georgiou et al. (1998) found considerable resistance to the use of a water rates vehicle in the immediate aftermath of the privatisation of the public water utilities in the UK. As a practical, in such cases, the use of a different payment vehicle (if credible) might well be justified.

Eliciting monetary values

After the presentation of the hypothetical scenario, the provision mechanism and the payment mechanism, respondents are asked questions to determine how much they would value the good if confronted with the opportunity to obtain it, under the specified terms and conditions.

The elicitation question can be asked in a number of different ways. Table 4.4 summarises the principal formats of eliciting values as applied to the case of valuing changes in landscape around Stonehenge in the United Kingdom (Maddison and Mourato, 2002). The examples in the table all relate to the elicitation of WTP but could easily be framed in terms of WTA.

Table 4.4. Examples of common elicitation formats



Open ended

What is the maximum amount that you would be prepared to pay every year, through a tax increase (or surcharge), to improve the landscape around Stonehenge in the ways I have just described?

Bidding game

Would you pay GBP 5 every year, through a tax increase (or surcharge), to improve the landscape around Stonehenge in the ways I have just described?

If Yes: Interviewer keeps increasing the bid until the respondent answers No. Then maximum WTP is elicited.

If No: Interviewer keeps decreasing the bid until respondent answers Yes. Then maximum WTP is elicited.

Payment card

Which of the amounts listed below best describes your maximum willingness-to-pay every year, through a tax increase (or surcharge), to improve the landscape around Stonehenge in the ways I have just described?


GBP 0.5






GBP 7.5

GBP 10

GBP 14.5

GBP 15

GBP 20

GBP 30

GBP 40

GBP 50

GBP 75

GBP 100

GBP 150

GBP 200

>GBP 200

Single-bounded dichotomous choice

Would you pay GBP 5 every year, through a tax increase (or surcharge), to improve the landscape around Stonehenge in the ways I have just described? (The amount is varied randomly across the sample.)

Double-bounded dichotomous choice

Would you pay GBP 5 every year, through a tax increase (or surcharge), to improve the landscape around Stonehenge in the ways I have just described? (The amount is varied randomly across the sample.)

If Yes: And would you pay GBP 10?

If No: And would you pay GBP 1?

Source: Pearce et al. (2006).

The direct open-ended elicitation format is a straightforward way of uncovering values, does not provide respondents with cues about what the value of the change might be, is very informative as maximum WTP can be identified for each respondent and requires relatively straightforward statistical techniques. Hence, there is no anchoring or starting point bias – i.e. respondents are not influenced by the starting values and succeeding bids used. However, due to a number of problems, CV practitioners have progressively abandoned this elicitation format (although there are instances in which open ended elicitation might work well, see Box 4.3). Open-ended questioning leads to large non-response rates, protest answers, zero answers and outliers and generally to unreliable responses (Mitchell and Carson, 1989).3 This is because it may be very difficult for respondents to come up with their true maximum WTP, “out of the blue”, for a change they are unfamiliar with and have never thought about valuing before. Moreover, most daily market transactions involve deciding whether or not to buy goods at given prices, rather than stating maximum WTP values.

The bidding game was one of the most widely used technique used in the 1970s and 1980s. In this approach, as in an auction, respondents are faced with several rounds of discrete choice questions, with the final question being an open-ended WTP question. This iterative format was reckoned to facilitate respondents’ thought processes and thus encourage them to consider their preferences carefully. A major disadvantage lies in the possibility of anchoring or starting bias. It also leads to large number of outliers, that is unrealistically large bids and to a phenomenon that has been labelled as “yea-saying”, that is respondents accepting to pay the specified amounts to avoid the socially embarrassing position of having to say no. Bidding games have mostly been discontinued in contingent valuation practice.

Payment card approaches were developed as improved alternatives to the open-ended and bidding game methods. Presenting respondents with a visual aid containing a large number of monetary amounts facilitates the valuation task, by providing a context to their bids, while avoiding starting point bias at the same time. The number of outliers is also reduced in comparison to the previous methods. Some versions of the payment card show how the values in the card relate to actual household expenditures or taxes (benchmarks). In on-line surveys, payment cards can be presented as sliding scales, where respondents slide the cursor along to select their value (Figure 4.1). Several variants of the payment card method have also been developed to deal with particular empirical issues, such as the presence of uncertainty in valuations. Box 4.3 presents an example of an especially designed payment card to identify certain and uncertain values. The payment card is nevertheless vulnerable to biases relating to the range of the numbers used in the card and the location of the benchmarks.

Figure 4.1. Example of a payment card sliding scale from an on-line survey

Source: Arold (2016), The Effect of Newspaper Framing on the Public Support of the Paris Climate Agreement, MSc thesis, Department of Geography & Environment, LSE.

Box 4.3. Tailored open-ended WTA elicitation: Valuing land-use change in the Peruvian Amazon

Mourato and Smith (2002) used a tailored open-ended elicitation mechanism to estimate the compensation required by slash-and-burn farmers in the Peruvian Amazon to switch to more sustainable agroforestry systems. A total of 214 farmers in the Campo Verde district, Peru, were surveyed face-to-face. Simple black and white drawings were used to depict the scenario and the elicitation mechanism (Figure 4.2) as most farmers were illiterate.

Figure 4.2. Pictorial elicitation mechanism

Farmers were presented with a possible project in which utility companies in developed countries, driven by the possibility of emission reduction legislation, were willing to compensate farmers who preserved forest by adopting multistrata agroforestry systems. A fixed annual payment would be made for each hectare of agroforestry. Payments would cease if the area was deforested.

With the aid of the drawings in Figure 4.2, farmers were asked about the potential economic impacts of agroforestry in terms of investment, labour, yields, and available products, when compared with the traditional slash-and-burn system. Then, they were asked, in an open-ended procedure, for their minimum annual willingness-to-accept compensation to convert one hectare of primary or secondary forest, destined for slash-and-burn agriculture, to multistrata agroforestry.

Simultaneously, farmers were reminded that they were competing against alternative suppliers of carbon services. Therefore, it was advisable to minimise bids, and there was no guarantee that any bids would be accepted. This mechanism served the dual purpose of increasing the realism of the scenario and minimising the occurrence of over-bidding, which is one of the caveats associated with WTA formats.

The piloting stages of the study had showed that dichotomous choice approaches did not to work well: farmers were a close-knit community, and disclosed the bids received to one another, creating general discontent. Instead, using the especially designed procedure described above, farmers were able to think through the costs and benefits of the different land uses and formulate bids in this way. Given the relatively small sample size, this approach was also more informative.

The mean compensation required for adoption of agroforestry from the CV survey was USD 138. This was found to be very close to the average difference in returns between slash-and-burn and agroforestry in the first two years, from experimental data (USD 144). Hence, the estimated compensations from the open-ended WTA elicitation procedure, embedded in a competitive setting, seem to reflect expected economic losses rather than strategic bidding.

Single-bounded dichotomous choice or referendum methods became increasingly popular in the 1990s. This elicitation format is thought to simplify the cognitive task faced by respondents (respondents only have to make a judgement about a given price, in the same way as they decide whether or not to buy a supermarket good at a certain price) while at the same time providing incentives for the truthful revelation of preferences under certain circumstances (that is, it is in the respondent’s strategic interest to accept the bid if his WTP is greater or equal than the price asked and to reject otherwise, see Box 4.2 for an explanation of incentive compatibility). This procedure minimises non-response and avoids outliers. The presumed supremacy of the dichotomous choice approach reached its climax in 1993 when it received the endorsement of the NOAA panel (Arrow et al., 1993). However, enthusiasm for closed-ended formats gradually waned as an increasing number of empirical studies revealed that values obtained from dichotomous choice elicitation were significantly and substantially larger than those resulting from comparable open-ended questions. Some degree of yea-saying is also possible. In addition, dichotomous choice formats are relatively inefficient in that less information is available from each respondent (the researcher only knows whether WTP is above or below a certain amount), so that larger samples and stronger statistical assumptions are required. This makes surveys more expensive and their results more sensitive to the statistical assumptions made.

Double-bounded dichotomous choice formats are more efficient than their single-bounded counterpart as more information is elicited about each respondent’s WTP. For example, one knows that a person’s true value lies between GBP 5 and GBP 10 if she accepted to pay GBP 5 in the first question but rejected GBP 10 in the second. But all the limitations of the single-bounded procedure still apply in this case. An added problem is the possible loss of incentive compatibility due to the fact that the second question may not be viewed by respondents as separate to the choice situation and the added possibility of anchoring and yea-saying biases.

Other developments in elicitation formats include Hanemann and Kanninen’s (1999) proposal of a one and a half bound dichotomous choice procedure whereby respondents are initially informed that costs of providing the good in question will be between GBP X and GBP Y (X < Y), with the amounts X and Y being varied across the sample. Respondents are then asked whether they are prepared to pay the lower amount GBP X. If the response is negative, no further questions are asked; if the response is positive, then respondents are asked if they would pay GBP Y. Conversely respondents may be presented with the upper amount GBP Y initially and asked about amount GBP X if the former is refused.

The choice of elicitation format is of dramatic importance as different elicitation formats typically produce different estimates. That is, the elicitation format is a non-neutral element of the questionnaire. Carson et al. (2001) summarises a number of stylised facts regarding elicitation formats. These are described in Table 4.5. Considering the pros and cons of each of the approaches above, contributions such as Bateman et al. (2002) and Champ et al. (2003) typically recommend dichotomous choice approaches and, to some extent, payment cards. The latter are more informative about respondents’ WTP and cheaper to implement than the latter and are superior to both direct open-ended questions and bidding games. The former may be incentive compatible and facilitates respondents’ valuation task.4 The newer one and a half bounds approach also shows potential. A final consideration is that while it is important to find out which elicitation format is the more valid and reliable, some degree of flexibility and variety in use of formats should be expected, and consideration needs to be given to the empirical circumstances of each application, as suggested by the examples in Boxs 4.3and 4.4.

Table 4.5. Elicitation formats – some stylised facts

Elicitation format

Main problems


Large number of zero responses, few small positive responses

Bidding game

Final estimate shows dependence on starting point used

Payment card

Weak dependence of estimate on amounts used in the card

Single-bounded dichotomous choice

Estimates typically higher than other formats

Double-bounded dichotomous choice

The two responses do not correspond to the same underlying WTP distribution

Source: Carson et al. (2001), “Contingent Valuation: Controversies and Evidence”, Journal of Environmental and Resource Economics, Vol. 19(2), pp. 173-210.

Box 4.4. Value uncertainty in payment cards

It seems plausible that some individuals may not have precise preferences for changes in the provision of certain non-market goods. Within stated preference studies this might manifest itself in respondent difficulty in expressing single and exact values. If so, then it might be worthwhile to allow respondents to express a range of values within which, for example, their WTP would most likely reside. A few studies have attempted to allow respondents in CV surveys to be able to express this uncertainty. For example, Dubourg et al. (1997) and Hanley and Kriström (2003) both adapt a payment card elicitation format in order to assess the significance of this uncertainty.

The latter study describes a CV survey of WTP for improvements in coastal water quality in two locations in Scotland. A payment card (see Table 4.6) with values ranging from GBP 1 to GBP 125 was presented to those respondents in their sample of the Scottish population around these locations who had indicated that their WTP for the improvement was positive. In order to test whether these particular respondents were uncertain about their exact WTP, the authors posed the valuation question in two ways.

Table 4.6. Payment card in CV study of improvements in Scottish coastal waters

GBP per annum

A: I would definitely pay (✓)

B: I would definitely NOT pay ()


















Source: Adapted from Hanley and Kriström (2003), What’s It Worth? Exploring Value Uncertainty Using Interval Questions in Contingent Valuation, Department of Economics, University of Glasgow, mimeo.

First, respondents were asked if they would definitely pay the lowest amount on the card (i.e. GBP 1) for improving coastal water quality. If the answer was “yes”, then the respondent was asked whether they would definitely pay the second lowest amount on the card (i.e. GBP 2) and so and on and so forth with successively higher amounts being proposed until the respondent said “no” to a particular amount.

Second, in addition to this conventional way of eliciting WTP using a payment card, respondents were then asked to consider whether the highest amount on the payment card (i.e. GBP 125) was too much for them to pay. If “yes” then the respondent was asked whether the second highest amount on the card (i.e. 104) was too much to pay and so on and so forth with successively lower amounts being proposed to the respondent until the respondent stated that they were not sure that a particular amount was too much.An illustration of this process to capture respondent uncertainty is described in Table 4.6. The difference between the ticks and crosses on this payment card indicates how uncertain respondents are about their exact WTP: in this case, the respondent would be prepared to pay GBP 34 for sure, would definitely not pay GBP 60 and is unsure whether he/she would pay amounts ranging between GBP 34 and GBP 60. Understanding more about the source of this uncertainty, that may stem from a number of candidate explanations, and whether it varies depending on the non-market good being valued are clearly important questions for future research of this kind.

Whatever the elicitation format adopted, respondents should be reminded of substitute goods and of their budget constraints (and the possible need to make compensating adjustments in other types of expenditure to accommodate the additional financial transaction implied by the survey). The former reminds respondents that the good in question may not be unique and that this has implications upon its value; the latter reminds respondents of their limited incomes and of the need to trade-off money for environmental improvements. Once the WTP elicitation process is over, debriefing and follow-up questions can help the analyst to understand why respondents were or were not willing to pay for the change presented. These questions are important to identify invalid (e.g. protest) answers: that is, answers that do not reflect people’s welfare change from the good considered.

4.4. Mean versus median willingness-to-pay

In using the findings of a cost-benefit analysis (CBA), a decision-maker accepts measures of individuals’ preferences, expressed as WTP sums, as valid measures of the welfare consequences of a given change in provision of say some public good. Generally, no account is taken of how ability to pay might constrain those WTP sums (i.e. the present distribution of income is taken as given) and those expressing a higher WTP are considered as simply reflecting their higher preferences for the good. (However, see Chapter 11 for a discussion of ways in which to take account of distribution.) In this system, mean WTP is preferred to median WTP as a more accurate reflection of the variance in preferences across the mass of individuals whose aggregation is considered to represent society’s preference.

For a number of environmental and cultural goods, a not uncommon finding is that the distribution of WTP is skewed in that, for example, there are a very small number of respondents bidding very large values and a very large number of respondents bidding very small (or even zero) values. In other words, the problem in such cases is that mean WTP gives “excessive” weight to a minority of respondents who have strong and positive preferences. While mean WTP is the theoretically correct measure to use in CBA, median WTP is the better predictor of what the majority of people would actually be willing to pay (when there is a wide distribution of values). From a practical viewpoint, this is extremely important if a decision-maker wishes to capture some portion of the benefits of a project in order say to recover the costs of its implementation. As median WTP reflects what the majority of people would be willing to pay, passing on no more than this amount to individuals should have a correspondingly greater degree of public acceptability than seeking to pass on an amount which is closer to a mean WTP, which might have been overly influenced by a relatively few very large bids.

While CBA describes how micro-level project appraisals are evaluated, it does not provide a model for how major political issues are decided – namely, the election of government. Here, if one simplifies to a simple two-option system (to allow comparison with the “project on” or “project off” scenario of the valuation exercise), the decision is based on a simple majority of the relevant constituency. This system is analogous to the median WTP measure of a CV study. This argument, between the dominance of preference values or a referendum, is an ongoing debate within environmental economics, which has yet to be resolved. In short, both the mean and median measures deserve consideration in contemporary decision-making and the management of environmental goods.

4.5. Validity and reliability

Despite numerous methodological improvements and a widespread application, particularly in the field of environmental economics, the contingent valuation method still raises some controversy (e.g. Hausman, 2012). One of the main areas of concern regards the ability of the method to produce valid and reliable estimates of WTP. It is not straightforward to assess the validity (i.e. the degree to which a study measures the intended quantity, or absence of systematic bias) and reliability (i.e. the degree of replicability of a measurement, or absence of random bias) of the estimates produced by contingent valuation studies for the obvious reason that actual payments are unobservable. Nevertheless it is possible to test indirectly various aspects of validity and reliability.

4.5.1. Validity

Face or content validity tests look at the adequacy, realism and neutrality of the survey instrument as well as at respondents’ understanding, perception and reactions to the questionnaire. The former aspects can be checked by having stakeholder meetings at the start of the project, and an advisory board throughout, to advise on various aspects of the policy change and survey design. The latter aspects can be tested in the piloting stages of the questionnaire, which may include focus groups, in-depth interviews and, importantly, field pilots (Bateman et al., 2002). Additionally, the rate of protests provides valuable information on how respondents react to the scenarios and payment mechanisms.

Convergent validity tests compare the estimates derived from a CV study with values for the same or a similar good derived from alternative valuation methods, such as those based on revealed preferences. Carson et al. (1996) conducted a meta-analysis looking at 616 value estimates from 83 studies that used more than one valuation method. The authors concluded that, in general, contingent valuation estimates were very similar and somewhat smaller than revealed preference estimates, with both being highly correlated (with 0.78-0.92 correlation coefficients). As will be discussed in more detail later in this section, a common claim of CV critics is that WTP estimates, obtained through the CV method, represent gross overestimates of respondents’ true values (e.g. Cummings et al., 1986). Such findings lend support to the claim that the values estimated by CV studies provide reasonable estimates of the value of environmental goods, as they are very similar to those based on actual revealed behaviour, in spite of the hypothetical nature of the method. The usefulness of convergent validity testing is, however, restricted to quasi-public goods as only estimates of use values can be compared due to the limited scope of revealed preference techniques. Hence, values for pure public goods cannot be analysed in this way.

Perhaps the most common validity test is to check whether CV results conform to the predictions of economic theory. This corresponds to the concept of theoretical validity (Bateman et al., 2002). In general, theoretical validity tests examine the influence of a number of demographic, economic, attitudinal and locational variables, thought to be WTP determinants, on some measure of the estimated WTP. The test is normally formulated by regressing estimated WTP on these variables and checking whether the coefficients are significant, with the expected sign and size. These tests are now standard CV practice and most studies report them. A common theoretical validity test is to check whether the percentage of respondents willing to pay a particular price falls as the price they are asked to pay increases (in dichotomous choice elicitation). This is similar to a negative price elasticity of demand for a private good and is generally tested by checking whether the price coefficient is negative and significant. The condition is almost universally observed in CV studies (Carson et al., 1996).

Another common theoretical validity test consists of analysing the relationship between income and WTP. If the environmental good being valued is a normal good, then a positive and significant income coefficient is to be expected.5 A positive income elasticity of WTP that is significantly less than one is the usual empirical finding in CV studies of environmental goods. The small magnitude of this income elasticity has been the focus of some of the criticism directed at contingent values: since most environmental commodities are generally regarded as luxury goods rather than necessity goods, many authors expected to find larger-than-unity income elasticities of WTP. However, as Flores and Carson (1997) point out, CV studies yield income elasticities of WTP for a fixed quantity, which are different from income elasticities of demand, a measured based on varying quantity. The authors show that a luxury good in the demand sense can have an income elasticity of WTP which is less than zero, between zero and one or greater than one. They also analyse the conditions under which the income elasticity of WTP is likely to be smaller than the corresponding income elasticity of demand.

In a comprehensive overview of 20 years of contingent valuation research in developing countries, Whittington (2010) shows that WTP is typically low in these countries, in absolute terms, as a percentage of income, and also relative to the cost of provision. This finding applies to the wide range of non-market goods and services covered by the review: e.g. improved water infrastructure, sanitation and sewage, household water treatment, ecosystem services and watershed protection, solid waste management, marine turtle conservation, cholera and typhoid vaccines, and preservation of cultural heritage. The result is of course unsurprising in the sense that average ability to pay is very low in developing countries, with many people living at a subsistence level and having very little income to spare. Moreover, Whittington notes that people may have other priorities and pressing needs aside the non-market goods or services being offered. The policy solution will involve subsidies, international assistance, and other forms of sponsorship, or delaying the projects until incomes rise.

Other tests of theoretical validity involve checking whether values are sensitive to the scope of the good or service being valued, and whether WTP and WTA measures of a similar change are similar. The problems of insensitivity to scope (or embedding bias), as well as the disparity between WTP and WTA, are being discussed further below.

Arguably, the most powerful and direct way of checking the validity and accuracy of contingent values is to compare contingent valuation hypothetical values with “true” or “real” values, when these can be discerned in actual behaviour. These criterion validity tests analyse the extent to which the hypothetical nature of the CV systematically biases the results, when all other factors are controlled for. This is the most difficult validity test to perform as is not feasible for many types of good. Indeed, many of the criterion validity tests have been conducted in a laboratory setting, using simulated “real money” transactions and most have been undertaken with private goods. Many of these studies point towards a tendency to overstate WTP in hypothetical markets. These results are discussed in more detail below, when hypothetical bias is reviewed.

4.5.2. Bias testing and correction

Key areas of concern for empirical methodologies such as contingent valuation relate to their susceptibility to various biases (see Mitchell and Carson, 1989; Bateman et al., 2002; or Champ et al., 2002, for extensive reviews). Validity can also be interpreted as the absence of systematic bias and validity testing often involves checking for the presence of certain biases. Many such biases are not specific to the CV method but are common to most survey based techniques and are largely attributable to survey design and implementation problems. But generally, the further from reality and the less familiar the scenario is, the harder it will be for respondents to act like they would in a real market setting. Importantly, some of the anomalies detected in contingent markets also happen in actual markets and hence are not so much a problem with the method, but a feature of how people actually behave (Carson and Hanemann, 2005). This is discussed in more detail in Section 4.6.

Amongst the most examined problems are hypothetical bias (umbrella designation for problems arising from the hypothetical nature of the CV market); insensitivity to scope (where the valuation is insensitive to the scope of the good); WTP/WTA disparity (where WTA is much higher than WTP); and framing effects/information bias (when the framing of the question unduly influences the answer). These biases are discussed in more detail next.

Hypothetical bias

Unsurprisingly, given the hypothetical nature of stated preference scenarios, the criticism of CV that has perhaps received the most attention is hypothetical bias (Arrow and Solow, 1993; Champ and Bishop, 2001; Hausman, 2012; Loomis, 2014), where individuals have been found to systematically overstate stated WTP, when compared with actual payments, due to the hypothetical nature of the survey. Foster et al. (1997) conducted a review of the literature in this area covering both field and laboratory experiments. Voluntary payment mechanisms are typically used given the difficulty associated in conducting experiments with taxes. The empirical evidence shows that there is a tendency of hypothetical CV studies to exaggerate actual WTP. Most calibration factors (i.e. ratios of hypothetical to actual WTP) were found to fall in the range of 1.3 to 14. Carson et al. (1997) notes that hypothetical bias is more prevalent when voluntary payment mechanisms are used, as respondents have incentives to free-ride. The evidence suggests that there is a strong incentive to overstate WTP in the survey context and to free-ride on actual contributions (a phenomenon known as strategic bias). This is because overstating WTP increases the chances of provision of the desired public good without having to pay for it. In order to explain what accounts for the discrepancy found in their review between real and hypothetical values, Foster et al. (1997) also conducted an experiment comparing data on actual donations, in response to a fund-raising appeal for an endangered bird-species, with CV studies focusing on similar environmental resources. The main finding was that the divergence between the data on real and hypothetical valuations might be due as much to free-riding behaviour – because of the voluntary nature of the payment mechanism – as to the hypothetical nature of the CV approach.

Moreover, hypothetical bias tends to arise most commonly when valuing distant, complex and unfamiliar goods and services, where people may not have well-defined prior preferences and may be unable to establish their preferences within the short duration of a (one-off) survey. It is a problem that might affect particularly some types of non-use values for less known and distant policy changes. Use-related values, and goods and services that people are generally familiar with, are arguably less prone to hypothetical bias. A recent CV survey investigating visitor WTP to access London’s Natural History Museum, via an entry fee, elicited values of just under GBP 7 per visit (Bakhshi et al., 2015). These use values are credible and in line with prices currently charged for paid exhibitions in cultural institutions in the United Kingdom.

A range of counteractive procedures or corrective adjustments have been developed to help minimise hypothetical biases (Loomis, 2014). Many such mechanisms are ex ante, preceding the valuation, and involving developments in the design and implementation of CV surveys. First, as noted above, hypothetical bias is associated with the use of voluntary payment mechanisms, as respondents have incentives to free-ride (Carson et al., 1997). The implication for practitioners is to avoid using voluntary payments where possible and select instead compulsory payment mechanisms such as taxes, fees or prices (see Box 4.2).

Another development has been the use of provision point mechanisms in the contingent scenario, which are designed to reduce free-riding behaviour when using voluntary payment mechanisms as a payment vehicle. In a provision point mechanism, respondents are told that the project will only go ahead if a certain donations threshold is reached (i.e. the provision point). If the total donations collected fail to meet the threshold, then the project does not go ahead and the donations made are refunded to the respondents. As indicated by lab experiments (Bagnoli and McKee, 1991), the mechanism incentivises truth telling, as underbidding might result in the project not going ahead. Poe et al. (2002) found that the provision point design also works in a field contingent valuation study setting, incentivising true revelation of WTP. However, there are also potential caveats with this design. Champ et al. (2002) did not find a difference between a provision point mechanism with money back guarantee and a standard voluntary contribution design, as many respondents did not believe the provision point would be met, and were therefore possibly discouraged from contributing. Similarly, Groothuis and Whitehead (2009) found that those that did not believe the provision point would be met were more likely to reject a dichotomous choice bid amount, as a protest.

Counteractive (i.e. ex ante) treatments are often employed through so-called entreaties in the survey text. Famously, Cummings and Taylor (1999) developed a cheap talk entreaty for reducing hypothetical bias, whereby a script describes the bias problem and a plea is made to respondents not to overstate their true willingness-to-pay. The evidence suggests that use of cheap talk reduces, but not completely eliminates, hypothetical bias (e.g. Aadland and Caplan, 2006; Carlsson et al., 2005; Carlsson and Martinsson, 2006; List and Lucking-Reiley, 2000; Lusk, 2003; Murphy et al., 2003).6 Further details of the cheap talk experimental work are described in Box 4.5.

Box 4.5. Hypothetically speaking: Cheap talk and CVM

A small but growing number of studies have sought to investigate the impact on hypothetical bias of adapting “cheap talk” (CT) concepts (defined as the costless transmission of information) in CV-like experiments. These studies include the pioneering experiments of Cummings and Taylor (1999) and Brown et al. (2003).

Hypothetical bias is described in these studies as the difference in what individuals say they would pay in a hypothetical setting vis-à-vis what they pay when the payment context is real. CT adds an additional text or script to the (hypothetical) question posed, explaining the problem of hypothetical bias and asking respondents to answer as if the payment context was real. Put another way, the objective of this approach to see if people can be talked out of behaving as if the experiment was hypothetical.

Although there are a number of psychological concerns about the effect that this CT information will have on respondents – will it bias them the other way and/or be too blatant a warning? – the results from these studies have been both interesting and important. For example, Cummings and Taylor (1999) only use one bid level which participants are asked to vote “yes” or “no” to. They find the CT-script to work well in reducing hypothetical bias: that is, bringing stated WTP amounts more in line with actual payments. Brown et al. (2003) vary the bid-level across respondents and still find that CT works well on similar terms.

Most of these studies are based on experiments using (paid) university students; i.e. not based on applications in the field amongst the public. This enables the CT-script to be relatively long. One concern is that the script needs to be much shorter if this method is to be widely applied in the field. However, the impacts of script-shortening on survey success do not appear to be encouraging, neither in experiments (Loomis et al., 1996) nor in the field (Poe et al., 1997).

A recently proposed entreaty is the oath script, which typically asks respondents to agree to promise or swear that they will respond to questions or state values honestly. Within environmental economics, the oath script has seen only a small number of applications (e.g. Carlsson et al., 2013; de-Magistris and Pascucci, 2014; Ehmke et al., 2008; Jacquemet et al., 2013; Stevens et al., 2013; Bakhshi et al., 2015). In an investigation of preferences for insect sushi, De-Magistris and Pascucci (2014) found evidence of efficacy of the oath script in lowering WTP estimates, relative to both a cheap talk script and a control group. In a recent study estimating the value of securing the future of two UK cultural institutions, Bakhshi et al. (2015) found that the oath script reduced mean WTP either alone, or in combination with cheap talk. These results suggest that oath scripts are a promising way to address hypothetical bias in contingent valuation surveys.

Some changes are complex and difficult to convey and respondents might be uncertain about how their welfare might be affected. Uncertainty typically occurs for goods and services which are intricate and unfamiliar. Champ and Bishop (2001) tested the use of certainty questions (e.g. “how certain are you that you would really pay the amount indicated if asked”) in an experiment with real payments. They found that respondents with a higher level of certainty regarding their stated WTP values were more likely to state they would actually pay the amounts when asked to do so. This suggests that the predictive accuracy of results may be increased by, for example, recoding uncertain WTP responses as zero payments (an ex post adjustment). Although typically ignored in many valuation studies, identifying certainty in valuation responses appears to play a role in enhancing their validity.

Other important considerations for reducing hypothetical bias include designing the contingent scenario to be credible, neutral and realistic; making sure, where possible, that surveys are perceived as consequential; i.e. respondents should believe that their responses will matter and have an impact; including reminders of budget constraints and substitute goods; and giving respondents time to think (Arrow et al., 1993; Mitchell and Carson, 1989; Bateman et al., 2002; Whittington, 1992; Carson and Groves, 2007; Haab et al., 2013).

Finally, it is important to note that, despite the potential for problems arising due to the hypothetical nature of CV, this is also its greatest strength, as it allows a degree of flexibility, applicability and scope, that other methods do not have, reliant as they are on existing data.

Insensitivity to scope

Insensitivity to scope7 relates to a lack of sensitivity of respondents’ valuations towards changes in the scope of the good or service being valued. More formally, insensitivity to scope occurs when stated values do not vary significant (or more strictly still, proportionally) to the scope of the provided benefit (i.e. broadly, larger benefits should be associated with larger WTP values) (Mitchell and Carson, 1989, Bateman et al., 2002). Compliance of CV estimates with the scope test is one of the most significant controversies in the CV validity debate. The debate can be traced back to two widely cited studies, Kahneman and Knetsch (1992) and Desvousges et al. (1993), who found that individuals’ CV responses did not vary significantly with changes in the scope and coverage of the good being valued. Scope tests can be internal, whereby the same sample is asked to value different levels of the good; or these tests can be external, where different, but equivalent, sub-samples are asked to value different levels of the good. Internal tests of scope typically reject the hypothesis that respondents are not sensitive to the amount of the benefit being provided by the hypothetical policy change. The focus of the controversy has been based on the more powerful external scope tests. One important point to note here is that, because of income constraints and sometimes strongly diminishing marginal utility, WTP is not expected to vary linearly with the scope of a change; but it is nevertheless expected to show some variation.

A number of explanations have been advanced for this phenomenon. Kahneman and Knetsch (1992) argued that, because individuals’ do not possess strongly articulated preferences for environmental goods, they tend to focus on other facets of the environment, such as the moral satisfaction associated with giving to a good cause. This “warm glow” effect would be independent of the size of the cause. Avoiding the use of donations as a payment vehicle would clearly minimise this possibility, as paying taxes is unlikely to generate a warm glow. Others have argued that embedding is more an artefact of poor survey design: for example, the use of vague descriptions of the good to be valued, or the failure to adequately convey information about the scope of the change (Carson, Flores and Meade, 2001; Smith, 1992). Another suggestion is that, to make valuation and financial decisions easier, people think in terms of a system of expenditure budgets, or “mental accounts”, to which they allocate their income (Thaler, 1984). For environmental improvements, if the amount allocated to the “environment account” is quite small, then this might result in an inability to adjust valuations substantially in response to changes in the size and scope of an environmental good. Essentially, embedding might be a result of valuations’ being determined by an income constraint which is inflexible and relatively strict compared with assessments of an individual’s total (or full) income.

To assess the empirical importance of this phenomenon, Carson (1998) undertook a comprehensive review of the literature on split-sample tests of sensitivity to scope. This showed that, since 1984, 31 studies rejected the insensitivity hypothesis while 4 did not. Another way of looking at this issue involves comparing different studies valuing similar goods. A meta-analysis of valuation of air quality improvements (Smith and Osborne, 1996) also rejected the embedding hypothesis and showed that CV estimates from different studies varied in a systematic and expected way with differences in the characteristics of the good. Hence, it seems that early conclusions about the persistence of insensitivity to scope can partly be attributed to the lack of statistical power in the test used to detect differences in values.

Many practitioners have concluded that insensitivity to scope is normally a product of misspecified scenarios or vague and abstract definitions of the policy change that can lead respondents not to perceive any real difference between impacts of varying scope (Carson and Mitchell, 1995). Well-designed surveys should therefore be capable of overcoming to some extent the potential for scope insensitivity. A clear, detailed and meaningful definition of the scope of the proposed policy change is required. A possible design solution involves the adoption of a top-down approach, where respondents are first asked to value the larger good or service, and are subsequently asked to allocate a proportion of that value to the smaller component goods or services. The increase in popularity of online CV surveys makes it arguably easier to communicate information, test understanding and indeed to tailor information to respondents that might be having difficulties understanding the details of what they are being asked to value. Avoiding donations (to avoid warm glows), and giving respondents time to think to carefully read the scenarios and pick up differences in scope, are other suggestions.

Nevertheless, there are instances where describing the scope of policy changes is particularly difficult. A typical example is the presentation of small changes in health risks (e.g. small percentage changes) where insensitivity to scope has consistently been found, despite researchers’ efforts to convey the information in simple and “respondent-friendly” ways (see Box 4.6). This is because people have difficulty in computing small numbers, and find it cognitively very problematic to distinguish between what are, in absolute terms, very small variations in scope. This limitation is not exclusive to surveys but is a feature of the way people behave in real markets.

Box 4.6. Risk insensitivity in stated preference studies

Past evidence has indicated that respondent WTP, in stated preference surveys, might be insufficiently sensitive to the size of the reduction in risk specified and that this is particularly the case for changes in very small baseline risks (Jones-Lee et al., 1985; Beattie et al., 1998). In a comprehensive review, Hammitt and Graham (1999) concluded that: “Overall, the limited evidence available concerning health-related CV studies is not reassuring with regard to sensitivity of WTP to probability variation.” (p40). Interestingly, however, Corso et al. (2000) found that, on the one hand, there was evidence of risk insensitivity when risk reductions were only communicated verbally to respondents but, on the other hand, there was significant evidence of risk sensitivity when risk changes were also communicated visually. This important finding has led many practitioners to adopt visual aids to better depict the concept of risk changes.

This particular visual variant has been used successfully in a study of the preferences of individuals for reductions in mortality risks in Canada and the United States, by Alberini et al. (2004). Respondents were asked – using a dichotomous choice format – for their WTP to reduce this risk over a 10 year period by either 1 in 1 000 or 5 in 1 000: i.e. an external scope test. In order to assist respondents to visualise these small changes, the authors used the type of risk communication mechanism recommended by Corso et al. (2000), which in this case was a 1 000 square grid where red squares represented the prevalence of risks (used alongside other devices to familiarise respondents with the idea of mortality risk). Initial questions to respondents sought to identify those who had grasped these ideas and those who apparently had not. For example, respondents were asked to compare grids for two hypothetical people and to state which of the two had the higher risk of dying. Interestingly, roughly 12% of respondents in both the United States and Canada failed this test in that they (wrongly) chose the person with the lower risk of dying (i.e. fewer red squares on that hypothetical person’s grid).

The point of this, and other screening questions that the authors used, was to identify those respondents in the sample who “adequately” comprehended risks – in the sense of readiness to answer subsequent WTP questions – and those who did not. The authors’ expectations were that the responses of those in the former group were more likely to satisfy a test of scope (e.g. proportionality of WTP with the size of the change in risk) than those “contaminated” by the responses of those in the latter group. However, while the authors find that restricting the analysis to those who passed risk comprehensive tests leads to significantly different WTP amounts for the 1 in 1 000 and 5 in 1 000 risk reductions, this does not result in the sort of proportionality that many demand of this particular scope test: i.e. is WTP for the 5 in 1 000 risk change (about) 5 times WTP for the 1 in 1 000 risk change?

What seems to make a difference in this study is a subsequent self-assessment question based on how confident a respondent felt they were about their WTP response. The results are summarised in Table 4.7. More confident respondents, on balance, appear to state WTP amounts, which pass the stricter scope test of proportionately. (The ratios of median WTP are not exactly 5 in either the US or Canadian case. However, the important thing here is that the numbers are not statistically different from this value.) The median WTP values based only on those respondents who were not so confident about their WTP answers, by contrast, did not pass this particular scope test. In other words, these findings appear to provide some important clues in the understanding of WTP and risk insensitivity.

Table 4.7. A scope test for mortality risks
Median WTP, USD

Risk reduction

Canada median WTP

US median WTP

More confident

Less confident

More confident

Less confident

5 in 1 000





1 in 1 000










Source: Alberini et al. (2004), “Does the value of statistical life vary with age and health status? Evidence from the US and Canada”, Journal of Environmental Economics and Management, Vol. 48, pp. 769-792.

WTP-WTA disparity

As explained in Section 4.2, the Hicksian welfare measures that CV studies are designed to estimate can be elicited via either WTP or WTA questions. In theory, both value measures should be similar,8 but in practice empirical evidence consistently shows that WTA values can be significantly larger than the corresponding WTP values. Horowitz and McConnell (2002) reviewed 45 usable studies reporting both WTP and WTA and found significant discrepancies between WTP and WTA (Table 4.8). They found that WTA was on average 7 times higher than WTP and that the further away the good being valued was from being an ordinary private good, the higher was the ratio of WTA to WTP. Importantly, Horowitz and McConnell also found that surveys using real goods showed no lower ratios than surveys with hypothetical goods. This suggests that the disparity between WTP and WTA is not peculiar to the hypothetical contexts that characterise stated preference studies;, one of the explanations sometimes advanced for the disparity, but once again an inherent feature of consumers’ real behaviour.

Table 4.8. WTA/WTP for different types of goods

Type of good


Standard error

Public or non-market



Health and safety



Private goods






All goods



Source: Horowitz and McConnell (2002), “A review of WTA/WTP studies”, Journal of Environmental Economics and Management, Vol. 44, pp. 426-447.

This evidence prompted the NOAA guidelines to favour WTP measures of value (Arrow et al., 1993): given that WTP is bounded by income it is less prone to overstatement. However, WTA measures are often the conceptually correct welfare measures to use. Mitchell and Carson (1989) argue that the choice between WTP and WTA formulations depends on the property rights of the respondent in relation to the good being valued: if the respondent is being asked to give up an entitlement, then the WTA measure is appropriate (Carson, 2000). For example, in the Mourato and Smith (2002) study summarised in Box 4.3, farmers were offered compensation to switch from their preferred land use to an alternative land use, that would not be as profitable for them in the short-term; in this case, it would not make sense to elicit WTP to switch land use.

Several hypotheses have been put forward to explain the disparity between WTP and WTA. Some of the main explanations are discussed in turn. The absence of close substitutes for the valued goods and services will lead to greater disparity between WTP and WTA (Hanemann 1991, 1999). Intuitively, if environmental goods have few substitutes then very high levels of compensation will be required to tolerate a reduction in their quantity. More technically, the ratio of WTA to WTP depends on the ratio of the income effect to the substitution effect.

Another popular explanation for the disparity between WTP and WTA, and the subject of a substantial literature, has developed around the notion of “loss aversion” and “reference dependence” which, if correct, would have major implications for cost-benefit analysis. The basic idea is that the loss of an established property right will require higher compensation than the gain of a new property right. This is because losses are weighted far more heavily than gains, where loss and gain are measured equally in terms of quantities. The point of reference for the loss and gain is an endowment point which is often the bundle of goods, or the amount of a specific good, already owned or possessed, but could be some other point, e.g. an aspiration level. The reference dependency model is owed mainly to Tversky and Kahnemann (1991) and builds on the earlier “prospect theory” work of Kahnemann and Tversky (1979). Many of the seminal works on reference dependency are collected together in Kahnemann and Tversky (2000). The explanation of reference dependency is essentially psychological: advocates of the approach argue that it is an observed feature of many gain or loss contexts, so that theory is essentially being advanced as an explanation of observed behaviour. Further behavioural explanations for observed stated preference anomalies are discussed below. Whether substitution effects alone or an endowment effect alone explains the disparity between WTA and WTP is ultimately an empirical issue. Some authors (e.g. Morrison, 1996; 1997; Knetsch, 1989; Knetsch and Sinden, 1984) have argued that both an endowment effect and a substitution effect explain the disparity. Effectively, loss aversion magnifies the substitution effect by shifting the indifference curve.

A number of other explanations have been proposed. Uncertain respondents tend to state low WTP and high WTA values as a result of their unfamiliarity either with the elicitation procedure or with the good (Bateman et al., 2002). Respondents who are asked to state a compensation to forego their initial property rights, may state very high WTA values as a form of protest (Mitchell and Carson, 1989). The disparity between WTP and WTA may also, to some extent, be a product of informational constraints and inexperience. For example, List (2003) found that the behaviour of more experienced traders (in a number of different real markets) showed no signs of an endowment effect. And poor design of WTA studies can lead to an overestimation of stated compensation amounts by failing to remind respondents that the welfare measure required is the minimum compensation that would produce the same (not higher) well-being level as the change they are asked to forgo (in the case of a well-being-enhancing policy) (Bateman et al., 2002).

Framing bias

The quality of CV responses is crucially dependent on the information provided in the contingent scenario, namely on the accuracy and plausibility of the scenarios in order to engage respondents in the revelation of truthful preferences or to incentivise their formation. In recent years, the increased use of online CV surveys has facilitated the presentation of information, expediting the tailoring of information to respondent’s needs (and level of understanding), measuring the time spent reading the information (effort), testing understanding, and enabling the use of alternative media, such as images, film or sound. Nevertheless, despite an extensive literature on information effects in CV (e.g. Hoehn and Randall, 2002; Blomquist and Whitehead, 1998; Ajzen, Brown and Rosenthal, 1996; Bergstrom, Stoll and Randall, 1990, 1989; Samples, Dixon and Gowen, 1986) empirical evidence about the “right” amount of information within a survey remains limited.

Another important area concerning the presentation of information relates to whether policy changes are presented in isolation, in sequence, or simultaneously, as part of a group of changes (Carson and Mitchell, 1995; Carson et al., 2001). Single modes of evaluation can elicit different preference rankings and monetary values to joint or multiple modes of evaluation because information is used differently when a point of comparison is available (Hsee and Zhang, 2004). This can result in preference reversals, and inconsistent value rankings depending on the order in which policy changes are evaluated (e.g. Brown 1984; Gregory et al., 1993; Irwin, et al., 1993). Moreover, surveys that focus on a single policy issue run the risk of artificially inflating its importance (also called focussing bias) (Kahneman and Thaler, 2006). This is because, at the time of preference elicitation, people are focusing only on the salient aspects of the policy and this may not reflect how they would actually experience this policy in real life where many other phenomena compete for their attention (Kahneman et al., 2006; Dolan and Kahneman, 2008). Also, people might adapt to certain changes and hence value them differently after some time.

Ultimately, the information presentation in a survey should match how the policy changes are expected to occur in practice, i.e. in isolation, in sequence or simultaneously. To avoid an excessive focus on the policy change being evaluated, stated preference surveys should be careful not to over-emphasise their importance. The changes of interest should be presented within the wider context of people’s lives and experiences. For this purpose, it is important to include in the scenario reminders of substitute goods and services, as well as reminders of budget constraints and other possible expenses (Bateman et al., 2002; Arrow and Solow, 1993). If respondents are not reminded about other similar goods they may overestimate their WTP for a specific good or instead state the value they hold for the general type of good (Arrow and Solow, 1993; Loomis et al., 1994). In this respect, information overload concerns might occur because in order to ensure respondents adequately consider substitutes, it is necessary to provide a similar amount of information about substitutes, as the good and service of interest (Rolfe, Bennett and Louviere, 2002).

Moreover, it is beneficial where possible to give respondents extended periods of time to think about the issue, about how much it matters to them, to consider their respective valuations, and to allow an opportunity to discuss it with other relevant people. Whittington et al. (1992) showed that giving respondents the chance to go home and think about the survey for 24 hours had a significant negative impact on WTP values, as respondents were able to reflect on the importance of the issue in the wider context of their lives. With on-line surveys, it can be possible for respondents to interrupt the survey, and continue it at a later time, giving them extra time to think.

4.5.3. Reliability

Reliability is a measure of the stability and reproducibility of a measure. A common test of reliability is to assess the replicability of CV estimates over time (test-retest procedure). McConnell et al. (1997) reviewed the available evidence on temporal reliability tests and found a high correlation between individuals’ WTP over time (generally between 0.5 and 0.9), regardless of the nature of the good and the population being surveyed, indicating that the contingent valuation method appears to be a reliable measurement approach. In addition, the original state-of-the-art Alaska Exxon Valdez questionnaire (Carson et al., 1992) was administered to a new sample two years later: the coefficients on the two regression equations predicting WTP were almost identical (Carson et al., 1997).

4.6. Recent developments and frontier issues

4.6.1. Insights from behavioural economics

The last decade has witnessed a huge increase in popularity of behavioural economics (BE) (see Camerer et al., 2011, for an early review as well as the edited volume by Shafir, 2013) and, in turn, of its influences in environmental economics (e.g. Horowitz et al., 2008; Shogren and Taylor, 2008; Brown and Hagen, 2010). Experimental research in this area has repeatedly identified empirical phenomena that are not adequately explained by traditional neo-classical economic analysis. Rabin (2010) talks about three waves in the development of behavioural economics, from the initial focus on the identification of behavioural anomalies, to formalising alternative theoretical conceptualisations in precise models, to fully integrating these alternatives into economic analysis, thereby improving and reshaping economic principles.

With the growth of behavioural economics, some of the known issues of stated preference methods were recast in the light of these alternative theories of behaviour. According to Shogren and Taylor (2004, p. 29) “Behavioural economics has probably had the biggest impact on environmental economics through research on the nonmarket valuation for environmental goods”. Of note, two special editions of the journal Environmental and Resources Economics (in September 2005 and in June 2010) were dedicated to behavioural economics and the environment and to methods that have been developed to deal with preference anomalies in stated preference valuation studies.

In reality, most stated preference biases had been identified before behavioural economics came into vogue as a way of summarising such findings (Mitchell and Carson, 1989; Carson et al., 2001; Carson and Hanemann, 2005; Kahneman, 1986). For example, what used to be called information effects is now often referred to as framing, priming or focussing anomalies as a result of this general behavioural turn in economics. In fact, the relationship between environmental valuation and behavioural economics is complex and the historical developments of both are deeply intertwined. It is thought that research into perceived anomalies in stated preference studies was a contributing factor in the development and popularity of behavioural economics. Carson and Hanemann (2005, p. 30) noted that: “[t]here is, of course, some irony in that many of the key tenets of what is now often referred to as the behavioural economics revolution were first demonstrated in CV studies and declared anomalies of the method rather than actual economic behavior.” As discussed earlier in this chapter, issues like limited cognitive ability to deal with small numbers, or loss aversion are found in actual market behaviour and are not simply an artefact of hypothetical markets, as was initially posited by CV critics. Fifteen years later, Carlsson (2010) also argued that the marriage of behavioural economics with non-market valuation techniques was inspired by anomalies that appeared in applied stated preference studies. And, as Horowitz et al. (2008, p. 4) put it, “Valuation is a form of experimentation and this experimentation has played a large role in learning about preferences and by extension, behavioural economics”. More recently, Nelson (2017) argued that it was the oil industry’s efforts to discredit CV after the Exxon Valdez disaster that helped to advance a new generation of behavioural economics.

Anomalies in stated preference data often emerge where bounded rationality exists (typically for complex and ill-understood changes) and can take many forms. This includes preferences which are imprecise or are only learned and constructed during the administration of the survey itself (and so likely to remain incomplete). It also includes factors which should be superfluous to determining respondent preferences but might not be in practice (e.g. context, such as current personal mood or immediate environment); or the change being valued commanding greater importance for respondents at the time of the survey when this is not a true reflection of how they would otherwise view its significance (the focussing illusion). A range of issues related to the complexity of valuation tasks may also be viewed in this way, including choice under risk and uncertainty (Sugden, 2005; Swait and Adamowicz, 2001; Horowitz et al., 2008; Shogren and Taylor, 2008; DellaVigna, 2009; Brown and Hagen, 2010; Carlsson, 2010; Gsottbauer and van den Bergh, 2011; Bosworth and Taylor, 2012).

Sugden (2005, p.7) states that when trying to comprehend the anomalies found in even the best designed stated preference studies, “we need to take account of evidence from the widest range of related judgement and decision-making tasks”. These include assessing laboratory experiments of psychologists, behavioural economists and economic behaviour observed outside of stated preference studies. Gaining a better understanding from the behavioural economics literature of the source of the problems in stated preferences will make it possible to design better solutions to minimise them, as discussed further below.

Various studies have shown that the Homo economicus view of behaviour concerning self-interest consistently deviates from real-life actions of people, in that individuals typically care about reciprocity and equality (Fehr and Gächter, 2000; Camerer et al., 2011). Individuals have been found to punish others who operate in an uncooperative manner, whilst rewarding those who act in the communal interest. Cai et al. (2010) found that respondents in an internet-based hypothetical CV valuation study (measuring WTP for climate change mitigation strategies) exhibited increased WTP values when they believed that the negative effects of climate change would fall disproportionately on the world’s poorest people and when larger cost shares were paid by those deemed to shoulder greater responsibility for mitigation. Moreover, the social context in which the valuation takes place also matters, as people care about others and about their approval. Alpizar et al. (2008) found that contributions to a public good stated in public were 25% greater than contributions stated in private.

The role of emotions on the formation of stated preferences for non-market goods has also been investigated (Peters, 2006). Peters, Slovic, and Gregory (2003) looked at the impact of affect on the disparity between WTP and WTA. They found that buyers with stronger positive feelings about the good were willing to pay more for it, while sellers with stronger negative feelings about no longer having the good were willing to accept a greater minimum payment in exchange for it. Similarly, Araña and León (2008) looked at changes in emotion intensity to predict anchoring effects and WTP. And in a recent paper, Hanley et al. (2016) looked at the role of general incidental emotions (happy, sad and neutral, unrelated to the good being valued) on the valuation of changes in coastal water quality and fish populations in New Zealand. In this study, no statistically significant effects were found of changes in emotional state on WTP.

As discussed above, there is a well-documented difference in derived values between WTP and WTA studies (Horowitz and McConnell, 2002). Knetsch (2010) focused on which of these two elicitation methods to choose, depending on the case in question, highlighting that an understanding of reference states will help to avoid under-valuation of non-market environmental goods. Taking insights from behavioural economics, Bateman et al. (2009) found that by increasing the simplicity of tasks faced by respondents, the difference between WTP and WTA values regarding land use changes was reduced. Horowitz et al. (2008, p. 3) also discussed the divergence of WTP and WTA values in some detail, showing that behavioural knowledge of survey design and context dependence can help to understand the WTP-WTA gap: “making sense of the gaps is an essential component of sustaining the validity of this valuation method”.

The results from the meta-analysis of CV studies by Brander and Koetse (2011) highlighted the effect of study design on value estimates and the authors discussed the need to recognise and accommodate this when using CV results (see also OECD, 2012). They found that the methodological design of a CV study had a sizable influence on results and that values derived using payment vehicles such as donations or taxes tended to be significantly lower than other payment scenarios. As with many other studies, the authors found that the use of dichotomous choice or payment card methods produced significantly reduced values compared with open-ended methods. Brown and Hagen (2010) considered that differences in behaviour might be mitigated by using survey devices such as budget constraint reminders and “time-to-think” procedures.

Looking at seven empirical studies, in which WTP value estimates were adjusted with preference uncertainty scores which quantify on a numerical scale how uncertain respondents stated they were about their WTP (e.g. from 0 to 10 where 0 might be wholly uncertain and 10 might be extremely certain) and then compared with conventional (i.e. unadjusted) double-bounded CV WTP estimates, Akter et al. (2008) discussed whether or not respondent uncertainty could be measured accurately. Contrary to the NOAA Panel (Arrow et al., 1993) advice, the empirical evidence explored in this study suggested that incorporating information on uncertainty led to largely inconsistent (and less efficient) welfare estimates. That said, an awareness of respondents’ valuation confidence can be helpful in understanding survey results. Conversely, Morrison and Brown (2009) investigated techniques for reducing hypothetical bias, such as certainty scales, cheap talk and dissonance minimisation (where respondents are allowed to express support for a programme without having to pay for it). They found that certainty scales and dissonance minimisation were the most effective in reducing the bias.

Finally, in a seminal paper, Bateman et al. (2008) argued that a key mechanism for anomaly reduction in CV studies lies in providing respondents with opportunities for learning by repetition and experience. The authors tested three alternative conceptualisations of individual preferences: i) a-priori well-formed preferences, that are capable of being elicited via a single dichotomous choice question, as recommended by the NOAA guidelines (Arrow et al., 1993); ii) learned or “discovered” preferences through a process of repetition and experience, based on Plott’s (1996) “discovered preference hypothesis”, where stable and consistent preferences are argued to be not pre-existent, but the product of experience gained through repetition; and iii) internally coherent preferences but strongly influenced by arbitrary anchors, inspired by the work of Ariely et al. (2003).The latter argued that, even when individual choices are internally coherent, they can still be strongly anchored to some arbitrary starting point, and by altering this starting point, values can be arbitrarily manipulated (a type of behaviour coined as “coherent arbitrariness”).

In order to test these alternatives, Bateman et al. (2008) develop the so called “learning design contingent valuation” method, which is essentially a double-bound dichotomous choice payment format (Hanemann et al., 1991), applied repeatedly to mutually exclusive goods, to allow for learning and experience in the valuation tasks and for the opportunity to “discover” preferences within the duration of the survey. Their findings support a model in which preferences converge towards standard expectations through a process of repetition and learning, i.e. the discovered preferences hypothesis (Plott, 1996). Knowledge of the operating rules of the contingent market was also found to be a prerequisite for producing reliable and accurate values.

Bateman et al. (2008) results suggest a number of practical empirical fixes for common CV issues. First, it supports the use of double-bounded dichotomous choice formats rather than one-shot single-bounded designs. Double-bounded designs have the added advantage of also permitting a substantial improvement in the statistical results of a given sample relative to that provided by applying a single-bounded format (because it contains more information about respondents’ preferences). As a result, double-bounded CV formats have risen in popularity in recent years and have arguably become one of the most prevalent CV designs. Second, it indicates that it is the last response in a series of valuations which should be attended to rather than the first. Third, it supports the use of “practice” questions (such as those described by Plott and Zeiler, 2005), which could then be followed by a single, incentive-compatible, contingent valuation question. Finally, it also highlights the advantage of the increasingly common choice experiment method, discussed in the following chapter, as a means of developing institutional and value learning. The idea of preference learning during repeated choices has also been observed by researchers using choice experiments. Several studies have shown that estimates of both preferences and variance obtained from the initial choices are often out of line with those obtained from subsequent choices (Carlsson et al., 2012; Hess et al., 2012; Czajkowski et al., 2014).

It is hoped that wider adoption of elicitation approaches that provide opportunities for learning might lead to a reduction in issues in stated preference studies which have previously been regarded as insoluble anomalies. Particularly, it appears to be a promising avenue for exploring potentially more accurate estimates of non-use values, where unfamiliar goods and ill-formed preferences are particularly prone to a range of heuristics and framing effects.

4.6.2. Developments in technology and social media

On a more practical note, much progress has been made in the implementation of stated preference surveys. With the development of the internet, the growth in broadband penetration and the popularity of on-line forums, there has been a strong move towards designing and implementing surveys on-line (Lindhjem and Navrud, 2010). There are now many excellent (both proprietary and open-source) software products that can be used to produce high-quality web surveys (e.g. Qualtrics and Survey Monkey). Typically the implementation is carried out via a market research company that has access to an on-line panel of respondents, covering a wide range of demographics, which is paid to complete the surveys. Alternatively, there are also new crowdsourcing resources, such as Amazon’s human intelligence task marketplace, Mechanical Turk. Here researchers (the “Requesters”) are able to post tasks directly (in this case surveys). Prospective respondents (the “Turkers”) then browse the existing tasks and choose to complete them for a monetary incentive set by the researchers.

Online surveys offer many advantages: they are very quick to implement (i.e. it is common to get hundreds of completed surveys back within 24 hours of launching); they are inexpensive (particularly when compared with face-to-face interviews); there is no need to input the data onto a spreadsheet as this is done automatically; the responses are immune to interviewer bias; and respondents are likely to feel more comfortable answering sensitive questions and moving through the survey at their own pace on their own and in familiar surroundings (Bateman et al., 2002; MacKerron et al., 2009). Crucially, these surveys provide a large amount of flexibility in terms of implementation. For example, the questionnaire can be tailored to the respondent, and it is easy to alter the flow of the questions depending on certain responses. Sound and images can be easily presented and it is possible to monitor the time taken on a particular page, or whether extra information was accessed.

Needless to say, there are pitfalls too. Not everyone has access to the internet (although in time this will become less of an issue as broadband reach extends, even to developing countries) and on-line surveys might not be the best option for certain groups such as the very elderly, or illiterate populations (although it is possible to design pictorial surveys to avoid this problem). Moreover, it is not possible to offer clarification if respondents get confused with certain parts of the text or the questions. A sizeable number of studies have investigated the impact of survey mode on stated values (e.g. Dickie et al., 2007; Marta-Pedroso et al., 2007).

Reassuringly, it seems that many reported problems with web-based valuation surveys can potentially be controlled for or avoided altogether. For example, respondents who speed through the survey can easily be detected and, if judged appropriate, discarded from the sample. Questions can be included to check attention and understanding. Learning mechanisms and trial questions can be added if the pilots reveal difficulties, and so on. More positively, some studies suggest that Internet panel surveys have desirable properties along several dimensions of interest (Bell et al., 2011). Importantly, Lindhjem and Navrud (2010) found no significant differences between CV values obtained between Internet and in‐person administration. Within this context, these authors envisage a possible mass exodus from in-person interviews, the traditional gold standard in CV survey administration, to the much faster and cost-effective Internet surveys.

4.7. Summary and guidance for policy makers

Although controversial in some quarters, the contingent valuation method has gained increased acceptance amongst many academics and policy makers as a versatile and powerful methodology for estimating the monetary value of non-market impacts of projects and policies. Stated preference methods more generally offer a direct survey approach to estimating consumer preferences and more specifically WTP amounts for changes in provision of (non-market) goods, which are related to respondents’ underlying preferences in a consistent manner. Hence, this technique is of particular worth when assessing impacts on public goods, the value of which cannot be uncovered using revealed preference methods. However, it is worth noting that contingent valuation methods are being used even where a revealed preference option is available.

This growing interest has resulted in research in the field of contingent valuation evolving substantially over the past 25 years or so. For example, the favoured choice of elicitation formats for WTP questions in contingent valuation surveys has already passed through a number of distinct stages, as previously discussed in this chapter. This does not mean that homogeneity across studies in the design of stated preference surveys can be expected any time soon. Nor would this particular development necessarily be desirable. The discussion in this chapter has illustrated findings from studies that show how, for example, legitimate priorities to minimise respondent strategic bias by always opting for incentive-compatible payment mechanisms must be balanced against equally justifiable concerns about the credibility of a payment vehicle. The point is that the answer to this problem is likely to vary across different types of project and policy problems.

As with any empirical methodology, there remain concerns about the validity of the findings of contingent valuation studies, particular in what concerns the measurement of non-use values. Much of the research in this field has sought to construct rigorous tests of the robustness of the methodology across a variety of policy contexts and non-market goods and services. CV has been subject to more stringent testing than any other similar methodology – and has become stronger as a result. The analysis of anomalies first detected in CV, led to the realisation that these were not necessarily an artefact of CV but in many cases reflected the way people behaved in reality. Contingent valuation turned out to be fertile ground for the development of behavioural economics.

By and large, the overview provided in the latter part of this chapter has struck an optimistic note about the use of contingent valuation to estimate the value of non-market goods. In this interpretation of recent developments, there is a virtuous circle between translating the lessons from tests of validity and reliability into practical guidance for future survey design. Indeed, many of the criticisms of the technique can be said to be imputable to problems at the survey design and implementation stage (and associated with the way people behave) rather than to some intrinsic methodological flaw. Taken as a whole, the empirical findings largely support the potential validity and reliability of CV estimates.

On the whole, developments in CV research overwhelmingly point to the merits (in terms of validity and reliability) of good quality studies and so point to the need for practitioners to follow, in some way, guidelines for best practice. While the NOAA guidelines continue to be a focal point, there are a number of more recent guidelines (e.g. the very recent Johnston et al., 2017, guidance, Bateman et al., 2002, which is intended to guide official applications of stated preference methods in the United Kingdom and Champ et al., 2003, for the United States), which also provide useful and state-of-art reference points for practitioners.


Aadland, D. and A.J. Caplan (2006), “Cheap talk reconsidered: New evidence from CVM”, Journal of Economic Behavior & Organization, Vol. 60(4), pp. 562-578.

Azjen, I., T.C. Brown and L.H. Rosenthal (1996), “Information bias in contingent valuation: Effects of personal relevance, Quality of Information, and Motivational Orientation”, Journal of Environmental Economics and Management, Vol. 30(1), pp. 43-57,

Akter, S., J. Bennett and S. Akhter (2008), “Preference uncertainty in contingent valuation”, Ecological Economics, Vol. 67(3), pp. 345-351,

Alberini, A. et al. (2004), “Does the value of statistical life vary with age and health status? Evidence from the US and Canada”, Journal of Environmental Economics and Management, Vol. 48, pp. 769-792,

Alpizar, F., F. Carlsson and O. Johansson-Stenman (2008), “Does context matter more for hypothetical than for actual contributions. Evidence from a natural field experiment”, Experimental Economics, Vol. 11, pp. 299-314,

Araña, J.E. and C.J. León, (2008), “Do emotions matter? Coherent preferences under anchoring and emotional effects”, Ecological Economics, Vol. 66 (4), pp. 700-711,

Ariely, D., G. Loewenstein and D. Prelec (2003), “’Coherent arbitrariness’: Stable demand curves without stable preferences”, Quarterly Journal of Economics, Vol. 118(1), pp. 73-105,

Arold, B. (2016), The Effect of Newspaper Framing on the Public Support of the Paris Climate Agreement, MSc thesis, Department of Geography & Environment, LSE.

Arrow, K. and R. Solow (1993), Report of the NOAA Panel on Contingent Valuation, National Oceanic and Atmospheric Administration, Washington, DC,

Atkinson, G. et al. (2012), “When to Take No for an Answer? Using Entreaties to Reduce Protest Zeros in Contingent Valuation Surveys”, Environmental and Resource Economics, Vol. 51 (4), pp. 497-523,

Atkinson, G. et al. (2004), “‘Amenity’ or ’Eyesore’? Negative willingness to pay for options to replace electricity transmission towers”, Applied Economic Letters, Vol. 14(5), pp. 203-208,

Bagnoli, M. and M. Mckee (1991), “Voluntary Contribution Games: Efficient Private Provision of Public Goods”, Economic Inquiry, Vol. 29(2), pp. 351-366,

Bakhshi, H. et al. (2015), Measuring Economic Value in Cultural Institutions, Arts and Humanities Research Council,

Bateman, I.J. et al. (2009), “Reducing gain-loss asymmetry: A virtual reality choice experiment valuing land use change”, Journal of Environmental Economics and Management, Vol. 58, pp. 106-118,

Bateman, I.J. et al. (2008), “Learning design contingent valuation (LDCV): NOAA guidelines, preference learning and coherent arbitrariness”, Journal of Environmental Economics and Management, Vol. 55, pp. 127-141,

Bateman, I.J. et al. (2002), Economic Valuation with Stated Preference Techniques: A Manual, Edward Elgar, Cheltenham, United Kingdom.

Beattie, J. et al. (1998), “On the contingent valuation of safety and the safety of contingent valuation: Part 1 – Caveat investigator”, Journal of Risk and Uncertainty, Vol. 17, pp. 5-25,

Bell, J., J. Huber and W. Kip Viscusi (2011), “Survey mode effects on valuation of environmental goods”, International Journal of Environmental Research on Public Health, Vol. 8(4), pp. 1222-1243,

Bergstrom, J.C., J.R. Stoll and A. Randall, (1989), “Information effects in contingent markets”, American Journal of Agricultural Economics, Vol. 71(3), pp. 685-691.

Bergstrom, J.C., J.R. Stoll and A. Randall (1990), “The impact of information on environmental commodity valuation decisions”, American Journal of Agricultural Economics, Vol. 72(3), pp. 614-621.

Blomquist, G.C. and J.C. Whitehead (1998), “Resource quality information and validity of willingness to pay in contingent valuation”, Resource and Energy Economics, Vol. 20(2), pp. 179-196,

Bosworth, R. and L.O. Taylor (2012), “Hypothetical bias in choice experiments: Is cheap talk effective at eliminating bias on the intensive and extensive margins of choice?”, B.E. Journal of Economic Analysis and Policy, Vol. 12, pp. 1,

Brander, L.M. and M.J. Koetse (2011), “The value of urban open space: Meta-analyses of contingent valuation and hedonic pricing results”, Journal of Environmental Management, Vol. 92(10), pp. 2763-2773,

Brown, T.C. (1984), “The concept of value in resource allocation”, Land Economics, Vol. 60, pp. 231-246.

Brown, T.C., I. Ajzen and D. Hrubes (2003), “Further tests of entreaties to avoid hypothetical bias in referendum contingent valuation”, Journal of Environmental Economics and Management, Vol. 46(2), pp. 353-361,

Brown, G. and D.A. Hagen (2010), “Behavioral economics and the environment”, Environmental and Resource Economics, Vol. 46, pp. 139-146,

Cai, B., T.A. Cameron and G.R. Gerdes (2008), “Distributional preferences and the incidence of costs and benefits in climate change policy”, Environmental and Resource Economics, Vol. 46, pp. 429-458,

Camerer, C.F., G. Loewenstein and M. Rabin (2011), Advances in Behavioral Economics, Princeton University Press, USA.

Carlsson, F. (2010), “Design of Stated Preference Surveys: Is There More to Learn from Behavioral Economics?”, Environmental and Resource Economics, Vol. 46, pp. 167-177,

Carlsson, F., P. Frykblom and C. Lagerkvist (2005), “Using cheap talk as a test of validity in choice experiments”, Economics Letters, Vol. 89(2), pp. 147-152,

Carlsson, F. et al. (2013), “The truth, the whole truth, and nothing but the truth – A multiple country test of an oath script”, Journal of Economic Behaviour and Organisation, Vol. 89, pp. 105-121,

Carlsson, F. and P. Martinsson (2006), “Do experience and cheap talk influence willingness to pay in an open-ended contingent valuation survey?” Working Papers in Economics 109, Department of Economics School of Business, Economics and Law, Göteborg University.

Carlsson, F., M.R. Mørbak and S.B. Olsen (2012), “The first time is the hardest: A test of ordering effects in choice experiments”, Journal of Choice Modelling, Vol. 5(2), pp. 19-37,

Carson, R.T. (1998), “Contingent Valuation Surveys and Tests of Insensitivity to Scope”, in Kopp, R., W. Pommerhene and N. Schwartz, (eds.), Determining the Value of Non-Marketed Goods: Economic, Psychological and Policy Relevant Aspects of Contingent Valuation Methods, Kluwer, Boston.

Carson, R.T. (2000), “Contingent Valuation: A User’s Guide”, Environment Science and Technology, Vol. 34, pp. 1413-1418,

Carson, R.T. (2011), Contingent Valuation: A Comprehensive Bibliography and History, Edward Elgar, Cheltenham.

Carson, R.T. (2012), Contingent Valuation: A Practical Alternative When Prices Aren’t Available”, Journal of Economic Perspectives, Vol. 26(4), pp. 27-42,

Carson, R.T. et al. (1996), “Contingent valuation and revealed preference methodologies: Comparing estimates for quasi-public goods”, Land Economics, Vol. 72, pp. 80-99.

Carson, R.T., N.E. Flores and N.F. Meade (2001), “Contingent Valuation: Controversies and Evidence”, Environmental and Resource Economics, Vol. 19(2), pp. 173-210,

Carson, R.T., T. Groves and M.J. Machina (1997), “Stated preference questions: Context and optimal response”, in National Science Foundation Preference Elicitation Symposium, University of California, Berkeley.

Carson, R.T. and T. Groves (2007), “Incentive and Informational Properties of Preference Questions”, Environmental and Resource Economics, Vol. 37(1), pp. 181-210,

Carson, R. and W.M. Hanemann (2005), “Contingent Valuation”, in Mäler, K.-G. and J.R. Vincent (eds.), Handbook of Environmental Economics, Elsevier, Amsterdam.

Carson, R.T. and J.J. Louviere (2011), “A common nomenclature for stated preference elicitation approaches”, Environmental and Resource Economics, Vol. 49(4), pp. 539-559,

Carson, R.T. and R.C. Mitchell (1995), “Sequencing and nesting in contingent valuation surveys”, Journal of Environmental Economics and Management, Vol. 28(2), pp. 155-173,

Carson, R.T. et al. (1992), A Contingent Valuation Study of Lost Passive Use Values Resulting from the Exxon Valdez Oil Spill, Report to the Attorney General of the State of Alaska, prepared by Natural Resource Damage Assessment, Inc, La Jolla, CA,

Champ, P.A. and R.C. Bishop (2001), “Donation payment mechanisms and contingent valuation: An empirical study of hypothetical bias”, Environmental and Resource Economics, Vol. 19(4), pp. 383-402,

Champ, P. A. et al. (2002), “Contingent valuation and incentives”, Land Economics, Vol. 78(4), pp. 591-604,

Champ, P.A., K.J. Boyle and T.C. Brown (eds.) (2003), A Primer on Nonmarket Valuation, Kluwer Academic Publishers, Dordrecht.

Corso, P.S., J.K. Hammitt and J.D. Graham (2001), “Valuing mortality-risk reduction: Using visual aids to improve the validity of contingent valuation”, Journal of Risk and Uncertainty, Vol. 23, pp. 165-84,

Cummings, R.G., D.S. Brookshire and W.D. Schulze (eds.) (1986), Valuing Environmental Goods: An Assessment of the Contingent Valuation Method, Rowman and Allanhed, Totowa, New Jersey.

Cummings, R.G. and L.O. Taylor (1999), “Unbiased value estimates for environmental goods: A cheap talk design for the contingent valuation method”, American Economic Review, Vol. 89(3), pp. 649-665,

Czajkowski, M., M. Giergiczny and W. Greene (2014), “Learning and fatigue effects revisited. The impact of accounting for unobservable preference and scale heterogeneity”, Land Economics, Vol. 90(2), pp. 324-351,

Davis, R. (1963), “Recreation Planning as an Economic Problem”, Natural Resources Journal, Vol. 3, pp. 239-249.

DellaVigna, S. (2009), “Psychology and economics: Evidence from the field”, Journal of Economic Literature, Vol. 47, pp. 315-372,

de-Magistris, T. and S. Pascucci (2014), “The effect of the solemn oath script in hypothetical choice experiment survey: A pilot study”, Economic Letters, Vol. 123, pp. 252-255,

Desvousges, W. et al. (1993), “Measuring Natural Resource Damages with Contingent Valuation: Tests of Validity and Reliability”, in Hausman, J. (ed.) Contingent Valuation: A Critical Assessment, North-Holland, Amsterdam.

Diamond, P.A. and J.A. Hausman (1994), “Contingent Valuation: Is Some Number Better Than No Number?”, Journal of Economic Perspectives, Vol. 8, pp. 45-64.

Dickie, M., S. Gerking and W.L. Goffe (2007), “Valuation of non-market goods using computer-assisted surveys: A comparison of data quality from internet and Rdd samples”, presentation at European Association of Environmental and Resource Economists, Thessaloniki, Greece,

Dillon, W.R., T.J. Madden and N.H. Firtle (1994), Marketing Research in a Marketing Environment, 3rd Edition, Irwin, Boston

Dolan, P. and D. Kahneman (2008), “Interpretations of utility and their implications for the valuation of health”, The Economic Journal, Vol. 118(525), pp. 215-234, 2007.02110.x.

Dubourg, W.R., M.W. Jones-Lee and G. Loomes (1997), “Imprecise preferences and survey design in contingent valuation”, Economica, Vol. 64, pp. 681-702,

Ehmke, M.D., J.L. Lusk and J.A. List (2008), “Is Hypothetical Bias a Universal Phenomenon? A Multinational Investigation”, Land Economics, Vol. 84, pp. 489-500,

Fehr, E. and S. Gächter (2000), “Fairness and retaliation: The economics of reciprocity”, Journal of Economic Perspectives, Vol. 14(3), pp. 159-181,

Flores, N. and R. Carson (1997), “The relationship between income elasticities of demand and willingness to pay”, Journal of Environmental Economics and Management, Vol. 33, pp. 287-295,

Foster, V., I. Bateman and D. Harley (1997), “A non-experimental comparison of real and hypothetical willingness to pay”, Journal of Agricultural Economics, Vol. 48(2), pp. 123-138,

Freeman III, A.M. (1994), The Measurement of Environmental and Resource Values: Theory and Methods, Resources for the Future, Washington, DC.

Georgiou, S. et al. (1998), “Determinants of individuals’ willingness to pay for perceived reductions in environmental health risks: A case study of bathing water quality”, Environment and Planning A, Vol. 30, pp. 577-594,

Gsottbauer, E. and J.C. van den Bergh (2011), “Environmental policy theory given bounded rationality and other-regarding preferences”, Environmental and Resource Economics, Vol. 49: 263-304,

Gregory, R., S. Lichtenstein and P.Slovic (1993), “Valuing environmental resources: A constructive approach”, Journal of Risk and Uncertainty, Vol. 7(2), pp. 177-197,

Groothuis, P.A. and J.C. Whitehead (2009), “The Provision Point Mechanism and Scenario Rejection in Contingent Valuation”, Agricultural and Resource Economics Review, Vol. 38(2), pp. 271-280,

Haab T.C. et al. (2013), “From Hopeless to Curious? Thoughts on Hausman’s ’Dubious to Hopeless’ Critique of Contingent Valuation”, Applied Economic Perspectives and Policy, Vol. 35(4), pp. 593-612,

Hammitt, J. and J. Graham (1999), “Willingness to pay for health protection: Inadequate sensitivity to probability?”, Journal of Risk and Uncertainty, Vol. 18, pp. 33-62,

Hanemann, M. (1991), “Willingness to pay and willingness to accept: How much can they differ?”, American Economic Review, Vol. 81, pp. 635-647,

Hanemann, M. (1999), “The economic theory of WTP and WTA”, in, Bateman, I. and K. Willis (eds). Valuing Environmental Preferences: Theory and Practice of the Contingent Valuation Method in the US, EU and Developing Countries, Oxford University Press, Oxford.

Hanemann, W.M. and B. Kanninen, (1999), “The Statistical Analysis of Discrete-Response CV Data”, in Bateman, I.J. and K.G. Willis (eds.) Valuing Environmental Preferences: Theory and Practice of the Contingent Valuation Method in the US, EU, and Developing Countries, Oxford University Press, Oxford.

Hanemann, M., J. Loomis and B. Kanninen (1991), “Statistical efficiency of double-bounded dichotomous choice contingent valuation”, American Journal of Agricultural Economics, Vol. 73(4), pp. 1255-1263,

Hanley, N., C. Boyce, M. Czajkowski, S. Tucker, C. Noussair and M. Townsend (2016), “Sad or happy? The effects of emotions on stated preferences for environmental goods”, Environmental and Resource Economics,

Hanley, N. and B. Kriström (2003), What’s It Worth? Exploring Value Uncertainty Using Interval Questions in Contingent Valuation, Department of Economics, University of Glasgow, mimeo,

Hausman, J. (2012), “Contingent Valuation: From Dubious to Hopeless”, Journal of Economic Perspectives, Vol. 26(4), pp. 43-56,

Hess, S., D.A. Hensher and A. Daly (2012), “Not bored yet – Revisiting respondent fatigue in stated choice experiments”, Transportation Research Part A, Vol. 46, pp. 626-644,

Hicks, J.R. (1943), “The four consumer’s surpluses”, Review of Economic Studies, Vol. 11, pp. 31-41,

Hoehn, J.P. and A. Randall (2002), “The effect of resource quality information on resource injury perceptions and contingent values”, Resource and Energy Economics, Vol. 24(1-2), pp. 13-31,

Horowitz, J. and K. McConnell (2002), “A review of WTA/WTP studies”, Journal of Environmental Economics and Management, Vol. 44, pp. 426-447,

Horowitz, J.K., K.E. McConnell and J.J. Murphy (2008), “Behavioral foundations of environmental economics and valuation”, in List, J. and M. Price (eds.), Handbook on Experimental Economics and the Environment, Edward Elgar, Northampton, MA.

Hsee, C.K. and J. Zhang, (2004), “Distinction Bias: Misprediction and Mischoice Due to Joint Evaluation”, Journal of Perspectives in Social Psychology, Vol. 86, pp. 680-695,

Irwin, J.R. et al. (1993), “Preference reversals and the measurement of environmental values”, Journal of Risk and Uncertainty, Vol. 6(1), pp. 5-18,

Jacquemet, N. et al. (2013), “Preference elicitation under oath”, Journal of Environmental Economics and Management, Vol. 65, pp. 110-132,

Johnston, R.J. et al. (2017), “Contemporary guidance for stated preference studies”, Journal of the Association of Environmental and Resource Economists, Vol. 4, pp. 319-405,

Jones-Lee, M.W., M. Hammerton and P.R. Phillips (1985), “The value of safety: Results from a national sample survey”, Economic Journal, Vol. 95, pp. 49-72,

Kahneman, D. (1986), “Comments”, in Cummings, R., D. Brookshire and W. Schulze (eds.), Valuing Environmental Goods: An Assessment of the Contingent Valuation Method, Rowman and Allenheld, Totowa, NJ.

Kahneman, D. and J.L. Knetsch (1992), “Valuing public goods: The purchase of moral satisfaction”, Journal of Environmental Economics and Management, Vol. 22(1), pp. 57-70,

Kahneman, D. et al. (2006), “Would you be happier if you were richer? A focusing illusion”, Science, Vol. 312(5782), pp. 1908-1910,

Kahneman, D. and R.H. Thaler (2006), “Anomalies: Utility maximization and experienced utility”, Journal of Economic Perspectives, Vol. 20(1), pp. 221-234,

Kahneman, D. and A. Tversky (1979), “Prospect Theory: An Analysis of Decision under Risk”, Econometrica, Vol. 47, pp. 263-291,

Kahnemann, D. and A. Tversky (eds.) (2000), Choice, Values and Frames, Cambridge University Press, Cambridge.

Kling, C.L., D.J. Phaneuf and J. Zhao (2012),From Exxon to BP: Has Some Number Become Better Than No Number?”, Journal of Economic Perspectives, Vol. 26(4), pp. 3-26,

Knetsch, J. (1989), “The endowment effect and evidence of non-reversible indifference curves”, American Economic Review, Vol. LXXIX, pp. 1277-84.

Knetsch, J. (2010), “Values of gains and losses: Reference states and choice of measure”, Environmental and Resource Economics, Vol. 46(2), pp. 179-188,

Knetsch, J. and J. Sinden (1984), “Willingness to pay and compensation demanded: Experimental evidence of an unexpected disparity in measures of value”, Quarterly Journal of Economics, Vol. XCIX, pp. 507-21,

Lindhjem, H. and S. Navrud (2010), “Can cheap panel-based internet surveys substitute costly in-person interviews in CV surveys?”, Department of Economics and Resource Management, Norwegian University of Life Sciences, 3518&rep=rep1&type=pdf.

List, J.A. (2003), “Does market experience eliminate market anomalies?”, Quarterly Journal of Economics, Vol. 118, pp. 41-72,

List, J.A. and D. Lucking-Reiley (2000), “Demand reduction in multiunit auctions: Evidence from a sportscard field experiment”, American Economic Review, Vol. 90(4), pp. 961-972,

Loomis, J. (2014), “Strategies for overcoming hypothetical bias in stated preference surveys”, Journal of Agricultural and Resource Economics, Vol. 39/1, pp. 34-46.

Loomis, J.B., T. Lucero and G. Peterson (1996), “Improving validity experiments of contingent valuation methods: Results of efforts to reduce the disparity of hypothetical and actual willingness to pay”, Land Economics, Vol. 72(4), pp. 450-61.

Loomis, J., A. Gonzalez-Caban and R. Gregory (1994), “Do reminders of substitutes and budget constraints influence contingent valuation estimates?”, Land Economics, Vol. 70, pp. 499-506.

Lusk, J.L. (2003), “Effects of cheap talk on consumer willingness-to-pay for golden rice”, American Journal of Agricultural Economics, Vol. 85(4), pp. 840-856,

MacKerron, G. et al. (2009), “Willingness to Pay for Carbon Offset Certification and Co-Benefits Among (High-)Flying Young Adults in the UK”, Energy Policy, Vol. 37, pp. 1372-1381,

Maddison, D. and S. Mourato (2002), “Valuing different road options for Stonehenge”, in S. Navrud and R. Ready (eds.) Valuing Cultural Heritage, Edward Elgar, Cheltenham.

Marta-Pedroso, C., H. Freitas and T. Domingos (2007), “Testing for the survey mode effect on contingent valuation data quality: A case study of web based versus in-person interviews”, Ecological Economics, Vol. 62, pp. 388-398,

McConnell, K., I.E. Strand and S. Valdes (1997), “Testing temporal reliability and carry-over effect: The role of correlated responses in test-retest reliability studies”, Environmental and Resource Economics, Vol. 12, pp. 357-374,

Mitchell, R.C. and R.T. Carson (1989), Using Surveys to Value Public Goods: The Contingent Valuation Method, Resources for the Future, Washington, DC.

Morrison, G. (1996), “Willingness to pay and willingness to accept: Some evidence of an endowment effect”, Discussion Paper 9646, Department of Economics, Southampton University.

Morrison, G. (1997), “Willingness to pay and willingness to accept: Have the differences been resolved?”, American Economic Review, Vol. 87/1, pp. 236-240.

Morrison, M. and T.C. Brown, (2009), “Testing the effectiveness of certainty scales, cheap talk, and dissonance-minimization in reducing hypothetical bias in contingent valuation studies”, Environmental and Resource Economics, Vol. 44/3, pp. 307-326,

Mourato, S. and J. Smith (2002), “Can carbon trading reduce deforestation by slash-and-burn farmers? Evidence from the Peruvian Amazon”, in Pearce, D., C. Pearce and C. Palmer (eds.) (2002), Valuing the Environment in Developing Countries: Case Studies, Edward Elgar, Cheltenham, UK.

Murphy, J.J., T. Stevens and D. Weatherhead (2003), “An empirical study of hypothetical bias in voluntary contribution contingent valuation: Does cheap talk matter?”, Working Paper, University of Massachusetts, Amherst.

Nelson, S.H. (2017) Containing Environmentalism: Risk, Rationality, and Value in the Wake of the Exxon Valdez, Capitalism Nature Socialism, Vol. 28(1), pp. 118-136.

OECD (2012), Mortality Risk Valuation in Environment, Health and Transport Policies, OECD Publishing, Paris,

Peters, E. (2006), “The functions of affect in the construction of preferences”, in Lichtenstein, S. and P. Slovic (eds.) (2006) The Construction of Preferences, Cambridge University Press, New York.

Peters, E., P. Slovic and R. Gregory (2003) “The role of affect in the WTA/WTP disparity”, Journal of Behavioral Decision Making, Vol. 16 (4), pp. 309-330,

Plott, C.R. (1996), “Rational individual behavior in markets and social choice processes: The discovered preference hypothesis”, in K. Arrow et al. (eds.) Rational Foundations of Economic Behavior, Macmillan, London and St. Martin’s, New York.

Plott, C.R. and K. Zeiler (2005), “The willingness to pay-willingness to accept gap, the ‘endowment effect’, subject misconceptions, and experimental procedures for eliciting valuations”, American Economic Review, Vol. 95(3), pp. 530-545,

Poe, G.L. et al. (2002), “Provision Point Mechanisms and Field Validity Tests of Contingent Valuation”, Environmental and Resource Economics, Vol. 23, pp. 105-131,

Poe, G., M. Welsh and P. Champ (1997), “Measuring the difference in mean willingness to pay when dichotomous choice valuation responses are not independent”, Land Economics, Vol. 73(2), pp. 255-267.

Rabin, M. (2010), “Behavioral Economics”, Lecture to the American Economic Association (AEA) Continuing Education Program in Behavioral Economics, Atlanta, January 5-7, available on-line at:

Rolfe, J., J. Bennett and J. Louviere (2002), “Stated values and reminders of substitute goods: Testing for framing effects with choice modelling”, Australian Journal of Agricultural and Resource Economics, Vol. 46(1), pp. 1-20,

Samples, K.C., J.A. Dixon and M.M. Gowen (1986), “Information disclosure and endangered species valuation”, Land Economics, Vol. 62, pp. 306-312.

Shogren, J. and L. Taylor (2008), “On behavioral-environmental economics”, Review of Environmental Economics and Policy, Vol. 2(1), pp. 26-44,

Smith, V.K. (1992), “Arbitrary values, good causes, and premature verdicts”, Journal of Environmental Economics and Management, Vol. 22, pp. 71-89,

Smith, V.K. (2006), “Fifty years of contingent valuation”, in A. Alberini, and J.R., Kahn, (eds.) (2006), Handbook on Contingent Valuation, Edward Elgar, Cheltenham.

Smith, V.K. and L. Osborne, (1996), “Do Contingent Valuation Estimates Pass a Scope Test? A Meta-Analysis”, Journal of Environmental Economics and Management, Vol. 31, pp. 287-301,

Stevens, T.H., M. Tabatabaei and D. Lass (2013), “Oaths and hypothetical bias”, Journal of Environmental Management, Vol. 127, pp. 135-141,

Sugden, R. (2005), “Anomalies and stated preference techniques: A framework for a discussion of coping strategies”, Environmental and Resource Economics, Vol. 32, pp. 1-12,

Swait, J. and W. Adamowicz (2001), “Choice environment, market complexity, and consumer behavior: A theoretical and empirical approach for incorporating decision complexity into models of consumer choice”, Organizational behavior and human decision processes, Vol. 86 (2), pp. 141-167,

Thaler, R. (1984), “Towards a Positive Theory of Consumer Choice”, Journal of Economic Behaviour and Organisation, Vol. 1, pp. 29-60,

Tversky, A. and D. Kahneman, (1991), “Loss aversion in riskless choice: A reference-dependent model”, Quarterly Journal of Economics, Vol. 106(4), pp. 1039-1061,

von Ciriacy-Wantrup, S. (1947), “Capital returns from soil-conservation practices”, Journal of Farm Economics, Vol. 29, pp. 1181-1196.

Whittington, D. (2010), “What have we learned from 20 years of stated preference research in less-developed countries?”, Annual Review of Resource Economics, Vol. 2, pp. 209-236,

Whittington, D. et al. (1992), “Giving respondents time to think in contingent valuation studies: A developing country application”, Journal of Environmental Economics and Management, Vol. 22, pp. 205-225,

Willig, R. (1976), “Consumers’ surplus without apology”, American Economic Review, Vol. 66 (4), pp. 589-597,

Annex 4.A1. Hicks’s measures of consumer’s surplus for a price change

Compensating variation (CV)

Consider a price decrease. The individual is better off with the price decrease than without it. CV is then the maximum sum that could be taken away from the individual such that he is indifferent between the post-change (new) situation and the pre-change (original) situation. The reference point is the original level of welfare.

Consider a price increase. The individual is worse off with the price increase than without it. CV is then the compensation required by the individual to make him indifferent between the new and old situations. The reference point is again the original level of welfare.

The CV measures relate to a context in which the change in question takes place. In this case they relate to the situation in which the price falls. CV in the context of a price fall thus measures the individual’s maximum willingness to pay rather than relinquish the price reduction. In the context of a price rise, CV is the minimum amount the individual is willing to accept by way of compensation to tolerate the higher price. Note that the implicit assumption about property rights with CV is that the individual is entitled to the pre-change situation.

Equivalent variation (EV)

Consider a price decrease. The individual is better off with the price decrease than without it. EV measures the sum of money that would have to be given to the individual in the original situation to make him as well off as he would be in the new situation. The reference point is the level of welfare in the new situation.

Consider a price increase. EV is now the individual’s willingness to pay to avoid the price increase, i.e. to avoid the decrease in welfare that would arise in the post-change situation. The reference point is the level of welfare in the new situation.

The EV measures relate to a context in which the price change does not take place. EV for a price fall is the minimum willingness to accept to forego the price fall. EV for a price rise is the maximum willingness to pay to avoid the price rise. Note that the implicit assumption about property rights with EV is that the individual is entitled to the post change situation.

Compensating surplus (CS)

The compensating surplus, CS, and equivalent surplus (ES) measures relate to contexts in which the individual is constrained to consume either the new quantity of X (CS) or the old quantity of X (ES) arising from the price change. CS is then defined as the sum that would make the individual indifferent between the original situation and a situation in which he is constrained to buy the quantity of X that results from the price change. If the context is a price decrease, then CS is a measure of the willingness to pay to secure that decrease. If the context is one of a price increase, then CS is a measure of the willingness to accept compensation for the price increase.

Equivalent surplus (ES)

ES is similarly quantity-constrained and is defined as the sum that would make the individual indifferent between the new situation (with the price change) and the old situation if the individual is constrained to buy the quantity of X in the original situation. If the context is a price decrease, then ES is a measure of the willingness to accept compensation to forego the benefit of the price decrease. If the context is one of a price increase, then ES is a measure of the willingness to pay to avoid the increase.

The concepts can be shown diagrammatically, as in Figure A4.1 which shows the situation for a price fall. The following relationships hold for equivalent price changes:

  • CV price fall = –EV price rise.

  • EV price fall = –CV price rise.

  • EV = CV if the income elasticity of demand for X is zero.

  • EV > CV for a price decrease if the income elasticity of demand is positive.

  • EV < CV for a price increase if the income elasticity of demand is positive.

  • The higher the income elasticity of demand for X, the greater the disparity between CV and EV.

Note that Figure A4.1 shows the four measures of surplus for a price fall. The same notions will apply to a price rise, giving eight measures in all.

Figure A4.1. Hicks’s four consumer’s surpluses for a price fall


← 1. Clearly, there are general principles for writing valid questions and of questionnaire form and layout as well as guidelines in the context of stated preference research. Guidelines as regards these general issues can be found in a number of sources (see, for example, Tourangeau et al., 2000).

← 2. Describing the good and the policy change of interest may require a combination of textual information, photographs, drawings, maps, charts and graphs. For example, OECD (2012) presents a meta-analysis of studies of WTP for changes in mortality risks using stated preference methods and concludes that: “There is strong indication that if a visual tool or a specific oral or written explanation was used to explain the risk changes to the respondents in the survey, the estimated VSL [value of statistical life] tends to be lower” (p. 70).

← 3. Protest answers occur when respondents who are positively affected by a policy nevertheless reveal only a zero value for it, in payment card or open-ended elicitation, or reject any bid in a dichotomous choice setting. Outlying answers refer to unrealistically high values expressed typically in open-ended WTP or WTA questions.

← 4. It is worth mentioning some adjustments that have to be made in the arguments presented above when WTA is used rather than WTP. First, contrary to what happens when WTP is used, under a WTA format, open-ended elicitation procedures will likely produce higher average values than dichotomous choice procedures. Open-ended elicitation may also yield very large outliers. In this case, dichotomous choice is the conservative approach. Given that WTA measures are not constrained by income, respondents may have a tendency to overbid. Attention may have to be given to mechanisms to counteract this tendency.

← 5. It should be emphasisedthat the fact that income or ability to pay influences WTP is not a bias of stated preference methods. On the contrary, it shows that WTP accords to theoretical expectations. Such methods attempt to mimic what would happen in a market if a real market existed for the good or service in question. In a real market, ability to pay influences purchases; hence, one would expect the same to happen in hypothetical markets.

← 6. Entreaties have many other potential uses. Atkinson et al. (2012) for example use a cheap talk entreaty to reduce protest answers in a CV study eliciting the value of protecting tropical biodiversity amongst distant beneficiaries.

← 7. Insensitivity to scope is often called the “embedding effect”.

← 8. In an influential article, Willig (1976) argued that the disparity between WTP and WTA must be small as the income effects are small.

End of the section – Back to iLibrary publication page