# 2. A method to estimate school costs and access

While ageing and depopulation will have a considerable impact on the demand for education and will possibly jeopardise accessibility to schools in remote and rural areas, the impact of demographic changes on the geography of school provision and its cost remains unknown.

This chapter fills an important data gap by proposing a method to estimate primary and secondary education costs at the school level. The information is then aggregated spatially to understand differences across types of human settlements, classified according to their degree of urbanisation. The method has two main steps:

• The first step involves simulating school locations using a thresholds-based, bottom-up algorithm that relies on road networks and fine spatial resolution population grids, and assigning student to each school based on a spatial interaction model.

• The third step estimates school costs based on the estimated number of students per school, broken down by costs on teaching staff, non-teaching staff, and other costs.

The method is fined-tuned using publicly available school-level data for England and then applied to the case of France, where there is no publicly available school-level data on costs. The benchmarking exercise focuses on the case of England because it has exceptionally detailed public data at the establishment level from various sources. The method is flexible enough to apply to any country and/or population projection grids to obtain future cost estimates.

The next section of this chapter outlines the simulation allocation method, shows statistics on the composition of schools’ costs and the geographical variation of costs for primary and secondary schools in England, and describes the method for estimating costs at the school level. The third section shows the comparison of modelled versus actual results for England. The fifth section shows an application to the case of France. The last section concludes.

This chapter describes the method and results from a tool to estimate school costs, inspired by the Swedish system for municipal finance equalisation (Tillväxtanalys, 2011[1]). The method is based on three principles:

1. 1. costs arise in facilities (e.g. schools, hospitals), not in areas (e.g. school districts or municipalities)

2. 2. public services are consumed locally, and are provided close to places of residence

3. 3. additional costs arise as a result of transport costs, lack of economies of scale, and/or scope of small facilities.

The method proceeds in three steps. First, it simulates likely school locations based on the distribution of the student-age population and general thresholds. Second, it estimates how many separate schools are likely present in a location and how many students frequent those simulated schools by using a spatial interaction model. And third, it estimates school costs based on school sizes.1 The method has been calibrated using exceptionally detailed public data for schools in England, and uses data for France to validate the results.

This section first describes the method to simulate school locations and continues by defining school costs for this and the next chapter. It then describes the evidence on geographical differences in school costs using available data for England and focuses on primary and secondary schools. The analysis of cost is solely based on primary schools, as the geographical variation of costs per student for secondary schools in the actual data follows an unclear geographical pattern, (possibly because the sample of schools is not representative of the universe. The evidence for England guides the modelling choices described in the next section, and Annex 2.C describes the data.

The estimation method discussed in the last part of this section derives school costs based solely on the estimated number of students in each school, without relying on country-specific information or other school information that is often not readily available. This includes costs that depend on school size (i.e. running costs), and does not include (capital) investments or other costs such as building costs. The next section compares estimated schools and school costs for the case of England and France.

The objective of the simulation of school locations is to obtain the number and size of schools in all EU countries with available population projections.2 The method described in this section adapts the facility allocation procedure of (Kompil et al., 2019[2]) (see Annex 2.A for a comprehensive description of the adaptations). Figure 2.1 illustrates the placement procedure and Figure 2.2 exemplifies it for the case of Portugal.

To ensure the procedure will yield realistic values for school sizes and transport costs, the first step establishes bounding conditions by setting threshold values for (see Annex 2.B for more details):

1. 1. the distance in kilometres that defines a potential school’s largest allowed catchment area

2. 2. the minimum number of students that a potential school needs in its catchment area

3. 3. the optimum number of students for a school.

The boundaries of independent placement zones are TL3 regions for primary schools, and country borders regions for secondary schools.3

The simulation models local communities as a network of nodes (distributed on a 1 km2 lattice) that are all potential school locations. The approach calculates travel distances via road networks from every community to all other communities within the largest allowed catchment area (TomTom, 2018[3]). Every community can be flagged as having satisfied or unsatisfied demand, and all communities are initially assigned as having unsatisfied demand. After establishing all potential locations of schools, the method chooses the highest utility location in a region4 through an iterative procedure similar to a bidding game, in which local communities compete for the location of a school.5 The procedure stops once demand has been completely met, or when no more potential locations meet the bounding conditions.

Local school demand arises from children or youth population in every community in the catchment area. Children or youth population distributions are obtained from 1 km LUISA population age grids (Goujon et al., 2021[4]; JRC, 2021[5]; Jacobs-Crisioni et al., 2020[6]; Jacobs-Crisioni et al., n.d.[7]), which in turn are based on EUROSTAT census-based, 1 km population grid (GEOSTAT, 2011[8]), and regional population projections prepared for the 2015 ageing report (EC, 2015[9]).

The analysis considers primary and broad secondary schools separately, setting age ranges at:

• 6 to 11-year-olds for primary school students

• 12 to 17-year-olds for secondary school students.

While European ages of school attendance do not necessarily align with the selected age ranges, a universal classification ensures comparability. The chosen classification aligns with International Standard Classification of Education (ISCED) levels, and the age ranges are close to those in the English educational system.6 The approach also assumes that middle and high schools are always part of secondary schools, which does not hold true in all cases. This assumption is necessary to avoid modelling context-specific school integration choices, and based on the validation of results, does not affect the cost estimates significantly.

The student placements obtained during the procedure described previously are only a rough approximation of final school sizes, as the number of schools at a selected location may be larger than one, and free school choice is not taken into account. The approach uses a two-stage approach described in Annex 2.B to adjust the number of schools at a location and balance student populations over available schools.

Expenditure on education is composed of current and capital expenditure. In OECD countries, current expenditure represents the largest proportion of total expenditure on education. In 2017, it accounted for 92% of total expenditure, with the remainder devoted to capital expenditure. Current expenditure in education includes:

• Spending on teachers and other staff compensation. The compensation of teachers and non-teaching staff – other pedagogical, administrative, professional and support personnel – comprises gross salaries and contributions, expenditure on retirement, and expenditure on other non-salary compensation (healthcare or health insurance, disability insurance, unemployment compensation, maternity and childcare benefits and other forms of social insurance).

• Spending on the goods and services needed within the current year. Goods and services require recurrent production in order to sustain educational services, such as expenditure on support services, teaching materials and supplies, ordinary maintenance of school buildings, provision of meals and dormitories to students, rental of school buildings and other facilities, among others. These services are obtained from outside providers, unlike the services provided by education authorities or by educational institutions using their own personnel.

For the purposes of the analysis in this report, the term “costs” refers to current school expenditure (costs). The term “costs” does not include capital expenditure, which refers to spending on assets that last longer than one year, including construction, renovation or major repair of buildings, and new or replacement equipment. Unlike current expenditure, capital expenditure can have large fluctuations over time, with peaks in years when significant investments are undertaken, followed by years of lows. Differences in the allocation of current and capital expenditure indicate the degree of investment by a country in the construction of new buildings, for instance in response to rising enrolment rates, or in the restoration of existing school buildings, resulting from the obsolescence and ageing of existing structures. Nevertheless, capital expenditure accounted for only 9% and 7.7% of total expenditure of primary and secondary schools across OECD countries in 2011 (Santiago et al., 2016[10]), so current school expenditure represents the bulk of educational expenditure in schools.

Moreover, educational expenditure can be from both public and private sources. Final public spending includes direct public purchases of educational resources and payments to educational institutions, and it is this type of expenditure that the analysis in this report captures. The analysis thus does not include final private spending, which comprises all direct expenditure on educational institutions, including tuition fees and other private payments to educational institutions (whether partially covered by public subsidies or not), and expenditure by private companies on the work-based element of school and work-based training of apprentices and students (OECD, 2020[11]).

The data for England illustrates the differences in the average size of schools across degrees of urbanisation. The data shows that the average number of students per school increases with population density, with primary schools with the smallest average ratio located in sparsely populated areas. This means there are more schools in lower-density areas for a comparable number of students. For instance, the number of schools in towns and suburbs is 1.6 times the number of schools in sparse rural areas, even though there are 3.9 times more students in towns and suburbs than in sparse rural areas.

The number of students per teacher is also smaller in rural areas (villages and sparse rural areas) compared to towns and suburbs, and cities. Among rural areas, schools in sparse rural areas have 3.3 less students per teacher compared to schools in towns and suburbs (Table 2.1). However, students per teaching staff (which includes both teachers and teaching assistants, all measured in full-time equivalent) differ less across settlement types. This is because the ratio of teaching assistants to teachers is higher in towns and suburbs, and cities compared to rural areas. The variation in the number of students per teacher among schools in rural areas is larger than in towns and suburbs or cities.

In the actual data for England, teaching staff costs make up 57.6% of total school costs. There are no noticeable differences in average cost shares by degree of urbanisation (Table 2.2). This suggests that the average school located in a rural area has a similar cost structure to one located in a town & suburb or even a city. The implication for the cost estimation exercise is that it suffices to estimate school costs from first principles, i.e. the number of teaching staff required for the school size, instead of explicitly modelling cost differences arising from geographical factors.

The lack of difference in the cost structure of schools does not mean however that there are no differences in average cost per student across geographical areas. While the share of costs in teaching staff does not vary significantly across settlements (in line with nationally-set wages), teaching staff annual cost per student are higher in lower-density areas -e.g. it is about GBP 700 higher per student in sparse rural areas compared to towns. These differences are reflected in differences in annual cost per student in rural areas compared to the national average, which are as high as GBP 921 per student in mostly uninhabited areas.

Both annual cost per student and annual cost of teaching staff per student are higher in rural versus urban areas, while towns and suburbs hold the lowest costs (Table 2.2). Differences in cost per student across degrees of urbanisation are to a large extent driven by differences in staff cost per student, as they comprise the bulk of school costs. This confirms that cost differences based on school locations are not primarily driven by geographical wage differences or different cost structures between rural and urban schools.

The method to obtain school costs from any school using only the number of students focuses first on deriving staff costs because they represent the bulk of costs in schools. Teaching staff annual cost alone constitute more than half of total school costs in England (Table 2.2), and in fact compensation of all staff represented 74% and 80% of current expenditure in primary and secondary schools in OECD countries in 2011 (Santiago et al., 2016[10]).

Teaching staff annual cost is the product of the number of teaching staff, multiplied by their corresponding salaries. To estimate the number of teaching staff in each school, values are drawn from an ordered normal probability distribution of student-to-teacher ratios, following the distribution displayed by the actual data for England (see Annex 2.C). Assigning teaching staff in this way ensures having schools with teaching staff counts proportional to their size, while allowing for some variation in the number of staff across schools in the same size range. Table 2.3 summarises the assumed parameters for the primary and secondary school cost estimation.

The mean student-to-teacher ratios for primary and secondary schools are based on average ratios across the EU in 2017.7 A lower student-to-teacher ratio in secondary schools is in line with OECD average values. Furthermore, the distribution of student-to-teacher ratio in the actual data follows a normal distribution with mean 11.9 and standard deviation 2.3. In mathematical terms, given schools in j, student-to-teacher ratio PT is distributed randomly in a Gaussian distribution. Subsequently teaching staff in schools ${T}_{j}$ is computed as , where s is students, and schools in j are ordered by size (measured by number of students).

To obtain teaching annual cost, each school is assumed to have one base full-time staff paid at mean school salary levels. This puts a bottom limit on the teaching staff in each school. On top of this fixed cost, there is a percentage of the teaching staff paid at half the mean school salaries and the remaining share paid at mean school salaries. This is equivalent to assuming that a share of the teaching workforce in each school has low qualifications and/or experience. In the primary school data for England, teachers make up 56% of the school teaching staff, and teaching assistants make up the remaining 44%, while secondary schools have a lower share of teaching assistants (21%). As a reference, the mean annual gross salary for teachers in England in primary schools is USD PPP (2019) 46 644 and in secondary schools it is USD PPP 63 307. Also, average teachers’ statutory annual salaries after 15 years of experience for primary and higher secondary school levels across OECD countries stands around USD PPP 46 801 and 50 701 respectively. Furthermore, the distribution of annual cost per teacher also follows a normal distribution (with mean GBP 29 600). The primary school data shows that the mean annual gross salary for teachers is GBP 38 716.

In technical terms, this means that total teaching cost is estimated based on mean school teacher salaries $\stackrel{-}{T{S}_{j}}$, which are also normally distributed, as ${E}_{j}=\sigma *Ft+0.5\left(%TA*{T}_{j}*\stackrel{-}{T{S}_{j}}\right)+\left(1-%TA\right)*\left({T}_{j}*\stackrel{-}{T{S}_{j}}\right)$, where Ft is the number of fixed full-time teaching staff, and %TA is the share of teaching assistants in the teaching staff. Ft = 1 and %TA=0.6 if the school is primary, and Ft = 2 and %TA = 0.2 if the school is secondary.

To estimate non-teaching staff cost, the method assumes that every four teaching staff requires one non-teaching staff. These proportions follow those observed in the actual data for England. Median salaries per non-teaching staff are set at a lower value than those of teaching staff under the assumption that non-teaching staff require lower qualifications than teaching staff. Total cost of non-teaching staff in each school is equal to one fixed non-teaching staff (plus the count of non-teaching staff times the mean salary).

Finally, the sum of the remaining cost, which includes premises, learning material, catering and other costs, is assigned by drawing ordered random values for $R{ES}_{j}$, a normal distribution of remaining cost per student (5) to compute total remaining school cost $R{E}_{j}$ (6) and compute total school cost as $R{E}_{j}={s}_{j}*RE{S}_{j}$. Total cost is then equal to the sum of teaching and remaining cost ${E}_{j}=T{E}_{j}+R{E}_{j}$.

A comparison across degree of urbanisation levels helps to verify whether the adopted method reproduces observed spatial pattern and cost differences across degree of urbanisation types. This section discusses results for primary schools at length and presents a more limited set of results for secondary schools given the limitations of the actual data. While in this analysis, all results are based on the degree of urbanisation of the grid cell in which a school is placed, Chapter 4 discusses results based on results at place of residency of students.8

The approach mimics well the geographic distribution of the number of teachers and the average number of teachers per school by degree of urbanisation. Estimating the number of teaching staff based on actual student numbers data leads to 202 077 estimated teaching staff, 28 404 teaching staff lower than the actual count but with a similar geographical distribution (compare columns 3 and 4 of Table 2.4). This lower count of estimated teaching staff results largely from assuming a student-to-teacher ratio (13) higher than the actual one (11.9).

While the simulated placement estimates more schools than the actual data, it correctly captures the increasing number of schools when moving from villages to urban degree of urbanisation types (compare columns 1 and 2 in Table 2.4). The differences in average teaching staff between the actual and simulated data (columns 6 and 7 in Table 2.4) are due to both different numbers of simulated versus actual schools and different teaching staff numbers arising from different student numbers per school. This is because the simulated placement generates more schools in cities and towns and suburbs than those observed in the actual data.

The percentage of small schools across degree of urbanisation types in the sample can be compared to the percentage in the simulated placement using the definition of small schools (an average year group size of less than 21.4 students for primary schools and 100 for secondary schools) used in the block national funding formula of schools eligible for sparsity funding.9 As Table 2.5 shows, although the allocation simulation produces a slightly smaller number of small schools, it also places most of the small schools in rural areas (with a positive bias towards sparse rural areas compared to the actual data). The lower counts of small schools in cities are present in both the actual and simulated data.

Table 2.6 compares actual and estimated costs based on simulated school placements by cost type. Despite differences in average teachers per school between the actual and simulated placement, applying the cost estimation approach to the simulated placement still reproduces the size and geographical variation of actual teaching staff costs (see columns 2 and 3 of Table 2.6). Both teaching and non-teaching annual cost, which represent the bulk of school costs, decrease more rapidly with distance in the simulated data compared to the actual data. This largely explains that the estimated costs per student differences between cities and other areas are larger in the simulated data compared to the actual data.

After estimating each of the three cost types, the comparison by degree of urbanisation using actual and simulated placements to actual total costs shows that the proposed approach captures well the levels and geographical variation of cost per student for primary schools, including the variation within categories, with more dispersion and relatively large values in sparse rural areas (Figure 2.3).

The relationship between cost per student versus school size (measured by total number of students) captures the extent of scale economies present in primary schools. The plot of this relationship based on actual cost data shows that cost per student decreases quickly from high levels as school size increases. Both the estimated cost based on actual placement as the one based on simulated placement capture this behaviour (Figure 2.4).

As Figure 2.5 shows, the key to getting the geographical differences in school costs lies in successfully reproducing the share of small schools in every degree of urbanisation level. This can be traced back to the introduction of a balancing mechanism in the simulation approach to lower the concentration effects of competition and scale.

Given that the sample of secondary schools with financial information does not have a similar size distribution compared to the universe of schools, this chapter does not present as much detail for secondary schools as for primary schools. This section discusses a limited set of results for secondary schools, summarised in Table 2.7.

The simulated placement allocates a larger share of small secondary schools in rural areas while still preserving some small schools in cities and towns and suburbs even when distance ranges are larger for secondary students (see second column of Table 2.7). This is achieved with the help of the balancing procedure in the allocation of schools (see Annex 2.A) that enables locating small schools in dense areas.

The average number of teachers per school increases with distance and unlike the case of the actual placement of primary schools, it peaks in cities instead of towns and suburbs. The estimated per head differences in costs for secondary schools between the most costly (sparse rural areas) and the least costly (cities) are higher at EUR 1 047 per head (compared to EUR 662 for primary schools). Finally, the relationship between cost per student and school size also shows evidence for scale economies related to the method’s assumption on fixed staff (Figure 2.6).

To verify the validity of the school placement method outside England, the approach is applied to available school data for France. Geolocalised data for each school including number of students per school in France is available for the year 2017 for primary (école élémentaire, ages 6-11) and secondary schools (ages 12-18) (French Ministry of National Education and the Youth, 2021[14]).

The procedure to derive teaching staff counts per school from the number of students is applied to the actual and simulated schools. In line with the exercise for England a mean of 13 and a standard deviation of 1 are assumed for primary schools, and a mean of 12 and a standard deviation of 1 for secondary schools. The resulting total number of teaching staff in primary schools using actual schools is 369 508 and in secondary schools it is 318 993. As a benchmark, the number of teaching staff in public schools in France in 2015 was 340 500 in pre-school and primary schools (premier degree), and 304 500 in secondary schools (second degree).10 The approach reproduces the size variation of the actual data for primary and secondary schools (Figure 2.7).11 As in England, average school sizes increase with density.

Table 2.8 shows the comparison of the simulated placement results with the actual school data for primary and secondary schools. While the aim of the simulated placement is not to reproduce actual numbers of schools and students, the information in the table is useful to evaluate whether there are salient geographical differences between the simulated placement (which is benchmarked using data for Portugal) and the actual distribution of schools in France.

The simulation places less primary schools in every degree of urbanisation except for mostly uninhabited areas, where it places more. Still, the simulated approach also places the majority of small schools in rural areas (Table 2.8). In contrast, the simulated approach places more secondary schools than observed generally, and proportionally more in towns and suburbs. The simulations place a larger share of small secondary schools in rural areas compared to the actual data. As the data shows, primary education in France is geographically more disperse than the simulation approach captures, while secondary education in France is more centralised than simulated. There are many potential reasons for the differences between actual and simulated placements, for instance, because of policies that prefer to reduce travel distances for primary school students even at the possible penalty of reduced cost efficiency; or simply a preference for relatively small primary and secondary schools, as can be seen by the fairly equal distribution of small secondary schools across France's degrees of urbanisation.

This chapter described a method to estimate primary and secondary education cost differences across human settlements. The method involves two steps. The first step simulates school placements using a spatial access optimisation algorithm that relies on road networks and population grids. The second step estimates costs based solely on student counts by using the distributional properties of actual school costs. The method was tested using data for France where there is no school-level data on cost.

The analysis of data for primary schools in England showed that teaching staff represents the bulk of school cost, and that the average school located in a rural area has a similar expenditure structure to one located in a city, town or suburb. The method proposed in this chapter departs from first estimating teaching costs based on the number of teachers required for the number of students in each school (as per an assumed teaching-to-pupil ratio), subsequently adding other types of costs including non-teaching staff cost (that depend on the number of teachers) and remaining cost including premises, learning material, catering and other costs (that depend on the number of students).

The comparison by degree of urbanisation using actual and simulated placements to actual total costs shows that the proposed approach captures well the levels and geographical variation of cost per student for primary schools. Although it is based on data for England, the method outlined in this chapter does not rely on England-specific parameters but rather on EU averages. In this sense, the application of the model for all EU countries undertaken in the next chapter is not expected to be biased by the use of English data to guide the methodological design. It is important to stress here, however, that the exercise tries to capture differences in school costs solely driven by geographical differences and not by national factors such as the efficiency in the use of education resources, payment levels, etc.

This annex describes several extensions to the simulated placement model described in (Kompil et al., 2019[2]).

(Kompil et al., 2019[2]) allocate service locations based on Euclidean distances to potential service users. This approach has been refined somewhat by deriving distances as shortest-path distances from a proprietary finely grained road network obtained from (TomTom, 2018[3]), predominantly known as a provider of in-car navigation equipment. Those shortest path distances have been loaded into sizeable matrices indicating the distances between all grid cells in a country that meet the threshold for maximum catchment area, plus one. These matrices are stored in memory, and used throughout the school placement simulation procedure.

The locational utility of each node is measured as potential accessibility to unsatisfied demand. However, the developed mechanism includes functionality to weigh people with relatively poor access to service disproportionally. This is included to mimic top-down equity considerations in facility location. Thus, in any allocation iteration in iter, we first define access to facilities as (A.1):

(A.1)

In which ${d}_{ij}$ indicates travel distance between origin node i and destination node j; $\gamma$ indicates the threshold maximum catchment size (1); and 100 metres is kept as the minimum distance between population and facilities relevant for the special case that j = i. ${D}_{j}^{iter}$ is a vector of dichotomous values that indicate whether facilities have been allocated in prior iterations in the destination nodes in j. Subsequently, through iteration-specific weighting values W, population is weighted by their access to services in A, relative to the average of the collection of nodes in a region in I, so that (A.2):

(A.2)

In which ${P}_{i}^{\left(iter-1\right)}$ contains all population, passed on from the previous iteration, that is not yet attributed to an already allocated facility. The function rescales the relative facility accessibility between the lowest value in 0.1 and the highest value in w. For schools, w is set to 2. Subsequently, locational utility of a node is computed as (A.3):

(A.3)

To compute the cost incurred by having a facility, users have to be attributed to facilities. The most straightforward approach is by attributing users to whichever facility is nearest. However, such an approach is unattractive because, on the one hand, it does not take into account the free choice users experience in contexts with many relevant options; and on the other hand, it does not take into account that facilities may have maximum capacities. To optimise user distribution, given inherent facility capacities, a user balancing mechanism has therefore been put in place. That mechanism is essentially based on an origin-constrained spatial interaction model, although with modifications that require a two-stage approach.

In stage 1 users are allocated to facilities based on distance decayed travel distance in C, so that (A.4):

${C}_{ij}={\left[\stackrel{´}{{d}_{ij}}\right]}^{-\alpha },$

with

(A.4)

and distance decay parameter $\alpha =2$ and $\alpha =1.25$ for primary and secondary schools, respectively. Here ${d}_{ij}$ contains travel times from every origin grid to the five closest facilities. Thus the size of the matrix here is limited to 5 times the number of origin points.

Using $\stackrel{´}{{d}_{ij}}$ rather than the actual travel distances in ${d}_{ij}$ imposes that the distance-decayed travel distances retain high sensitivity to farther destinations even if the closest facility is relatively far. As the distance decay computation may be unstable at small changes in travel distances smaller than 1 minute, the system uses 1 km as minimum travel distance.

Flows in F are computed through (A.5):

${F1}_{ij}={O}_{i}D{1}_{j}{{A1}_{i}}^{-1}{C}_{ij},$   (A.5)

in which and ${D}_{j}$ contains weights per facility. In the first step, ${D}_{j}$ has the value 1 for all facilities so that initially all facilities are equally attractive. Total flow production is limited to the relevant population O through accessibility measure A, which is defined as (A.6):

$A{1}_{i}=\sum _{j=1}^{n=5}D{1}_{j}{C}_{ij}.$   (A.6)

The calculation of F1 yields a pattern of attendance of students to schools in ATT, so that (A.7):

(A.7)

From which can be obtained a crude estimate of facilities FAC needed at location j. To obtain realistic school size distributions, likely number of schools in a location are estimated based on a function that explains number of schools in 1 km nodes based on the number of students that are observed in those nodes. This procedure allows for larger-scale schools in contexts with many users, and relatively small schools in contexts with few students. For primary and secondary schools, this function has been estimated based on aggregate number of students in a grid cell in S. It takes the form (A.8):

(A.8)

And is estimated separately for France, Portugal and England on all 1 km nodes that contain at least one facility. The results of this estimation exercise are given in Annex Table 2.A.1.

This function is subsequently used to establish likely number of schools in a grid cell (A.9):

$\stackrel{´}{{FAC}_{j}^{cont}}={e}^{\left(\beta 0+\beta 1\mathit{ln}AT{T1}_{j}\right)}$

$\stackrel{´}{{FAC}_{j}^{round}}=round\left(\stackrel{´}{{FAC}_{j}^{cont}\right)}\ge 1$   (A.9)

So that number of schools is rounded, and any selected location gets at least one school. This leads to stage 2 of the student attribution procedure, in which users are redistributed so that school sizes further converge towards realistic school sizes. To do so, facility attractiveness in D is rebalanced by the unrounded estimate of number of facilities, so that (A.10):

(A.10)

implying that allocated schools that, in the first stage, are smaller than the expected largest school in range increase in attractiveness, while facilities that are bigger than the expected largest school in range decrease in attractiveness. Subsequently compute $A{2}_{i}=\sum _{j=1}D2{C}_{ij}$ and ${F2}_{ij}={O}_{i}D{2}_{j}{{A2}_{i}}^{-1}{C}_{ij}$, the latter yielding a rebalanced distribution of attendance. Note that due to the distance decay function enforced through C, the rebalancing in D may be expected to have a limited effect on total travel costs. Finally this yields a final estimate of school sizes s, so that (A.11):

(A.11)

This annex describes the procedure to calibrate the thresholds used in the placement simulation.

The valuation of imputed threshold values was done through a grid search that aimed at most accurately reproducing observed school distributions. The grid search has been performed for primary and secondary schools in Portugal based on (Directorate General of Education and Science Statistics of Portugal, 2021[15]) and additional detail provided by the Ministry of Education of Portugal. Due to its relatively small size and the implications of country size for computational burden, Portugal was found a better fit for this exercise than the other countries for which observed school distributions were readily available (France and England). From the results that were computed with the adopted threshold values based on Portugal, it may be concluded that English school distributions and costs can be reproduced accurately.

The adopted location-allocation approach is meant to reproduce observed school placement patterns accurately, under the assumption that the real-world placement patterns yield a societally acceptable balance between school cost (as a function of the size) and travel costs.

The grid search was performed by adapting values related to maximum distance, minimum size, optimal size and accessibility weighting. A composite objective function was computed to measure model accuracy given the imputed values. That function was composed of three criteria, namely percentage difference between modelled and observed nationwide number of facilities; the difference between modelled and observed rates of number of urban vs rural facilities; and the mean squared error of percentage points for shares of number of schools per level 2 degree of urbanisation (see Box 1.2 in Chapter 1), thus discerning school provision in cities, towns, suburbs, villages, dispersed rural areas and mostly uninhabited areas.

Annex Table 2.B.1 shows the thresholds that yield the most accurate results in Portugal. The imputed optimum school sizes are lower than what is considered optimal for US primary and high schools (Zimmer, DeBoer and Hirth, 2009[16]; Andrews, Duncombe and Yinger, 2002[17]); reflecting preference for relatively small schools in European countries compared to the United States.

The calibration exercise also showed that some parameters have a much more substantial impact on allocation outcomes than others. In particular, the maximum catchment area distance and the school’s optimal size, which both come into play in the school placement stage of the modelling procedure, have a considerable impact on facility distribution.

A grid search of Portuguese primary school allocation yielded that the allocation procedure performs best with a maximum distance of 15 km, and an optimal school size of 280. For secondary schools, the same exercise yielded the higher threshold values, with a maximum catchment area distance of 35 km and an optimal size of 450. These threshold values have therefore been selected as baseline values for allocation of primary and secondary schools throughout Europe (Annex Figure 2.B.1).

The main data source for benchmarking school costs is publicly available data on school workforce composition for England provided by the UK Department of Education (UK Department of Education, 2021[12]).12 This database includes maintained primary, secondary and special schools that were open for the period April 2018 to March 2019. Maintained schools make up the vast majority of schools in England. The data contains the precise school location (geographical coordinates) and the number of students of each school. Data for cost disaggregated by type (e.g. staff, maintenance, etc.) for the year 2018-19 can be matched to this data for a representative subset of schools.

The dataset contains information for 14 963 (90%) primary schools and 2 854 (83%) secondary schools, accounting for 4 200 779 primary and 2 882 185 secondary school students. The data with financial information for each school is more limited in scope, covering 60% of primary schools (2 727 656 students) and 20% of secondary schools (686 163 students). Schools recorded with less than one student are removed from the analysis.

In England, primary education covers key stages 1 (5-7-year-olds) and 2 (8-11-year-olds) and the phase of education offered by each school is specified in the data. Although in England primary schools can also provide early years foundation stage (kindergarden) education, the aggregate number of students in the subset of primary schools corresponds to the national figures.13

The data for primary schools includes schools with statutory age range from 0 to 7 years. To get the number of students per grade, schools with statutory low ages above 7 (812/12 809 schools) are dropped. Although some schools offer levels 2-4, the percentage of students in nursery state-funded schools is small (43 785 versus 4 689 660 students in primary schools). Furthermore, not all schools offer all grades. For instance, some schools may offer all grades for 2 to 11-year-olds, while others may only offer 2 to 7. Consequently, student-to-teacher ratios are computed at the school level.

As shown in Annex Figure 2.C.1, the sample of schools with financial information has a similar size distribution compared to the universe of schools, suggesting the sample is representative of the universe. The simulated placement is less skewed to the left than the universe, suggesting the simulated placement produces less small schools than those observed in the universe. Unlike the case of primary schools, the sample of secondary schools with financial information does not have a similar size distribution compared to the universe of schools. For the purpose of the descriptive analysis, the cost data is grouped into five categories: teaching staff, non-teaching staff, school premises (including utilities), teaching resources (including ICT), and catering. For the cost estimation, cost is grouped into three categories: teaching staff, non-teaching staff and other costs (including school premises, teaching resources, and catering).

## References

[17] Andrews, M., W. Duncombe and J. Yinger (2002), “Revisiting economies of size in American education: are we any closer to a consensus?”, Economics of Education Review, Vol. 21/3, pp. 245-262, http://dx.doi.org/10.1016/s0272-7757(01)00006-1.

[15] Directorate General of Education and Science Statistics of Portugal (2021), Statistics on pre-primary, primary and secondary schools, https://www.dgeec.mec.pt/np4/408/ (accessed on March 2020).

[9] EC (2015), “The 2015 Ageing Report: Economic and Budgetary Projections for the 28 EU Member States (2013 - 2060).”, Luxembourg: Publications Office of the European Union.

[13] Eurostat (2021), Education administrative data from 2013 onwards, https://ec.europa.eu/eurostat/cache/metadata/en/educ_uoe_enr_esms.htm (accessed on February 2020).

[14] French Ministry of National Education and the Youth (2021), Annuaire de l’éducation nationale, https://www.education.gouv.fr/annuaire (accessed on February 2020).

[8] GEOSTAT (2011), GEOSTAT 1 km2 population grid, https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat (accessed on 21 December 2020).

[4] Goujon, A. et al. (eds.) (2021), The demographic landscape of EU territories: challenges and opportunities in diversely ageing regions, EUR 30498 EN, Publications Office of the European Union, Luxembourg, http://dx.doi.org/10.2760/658945.

[19] Hilferink, M. and P. Rietveld (1999), “LAND USE SCANNER: An integrated GIS based model for long term projections of land use in urban and rural areas”, Journal of Geographical Systems, Vol. 1/2, pp. 155-177, http://dx.doi.org/10.1007/s101090050010.

[6] Jacobs-Crisioni, C. et al. (2020), “Ageing in Regions and Cities: High Resolution Projections for Europe in 2030”, https://doi.org/10.2760/716609.

[7] Jacobs-Crisioni, C. et al. (n.d.), Development of the LUISA Reference Scenario 2020 and Production of Fine-Resolution Population Projections by 5 Year Age Group.

[5] JRC (2021), LUISA modelling platform, https://ec.europa.eu/jrc/en/luisa/data-sources (accessed on 2021).

[2] Kompil, M. et al. (2019), “Mapping accessibility to generic services in Europe: A market-potential based approach”, Sustainable Cities and Society, Vol. 47, p. 101372, http://dx.doi.org/10.1016/j.scs.2018.11.047.

[11] OECD (2020), Education at a Glance 2020: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/69096873-en.

[18] Pacheco, J. and S. Casado (2005), “Solving two location models with few facilities by using a hybrid heuristic: a real health resources case”, Computers & Operations Research, Vol. 32/12, pp. 3075-3091, http://dx.doi.org/10.1016/j.cor.2004.04.009.

[10] Santiago, P. et al. (2016), OECD Reviews of School Resources: Estonia 2016, OECD Reviews of School Resources, OECD Publishing, Paris, https://dx.doi.org/10.1787/9789264251731-en.

[1] Tillväxtanalys (2011), Merkostnader på grund av gles bebyggelsestruktur i kommuner och landsting, Working Paper 2011:08, Tillväxtanalys.

[3] TomTom (2018), “Road Networks Including Link Distances, Impedances and Connectivity in Europe.”.

[12] UK Department of Education (2021), Find and compare schools in England, https://www.compare-school-performance.service.gov.uk/download-data (accessed on February 2020).

[20] Xu, Y. et al. (2020), “Deconstructing laws of accessibility and facility distribution in cities”, Science Advances, Vol. 6/37, http://dx.doi.org/10.1126/sciadv.abb4112.

[16] Zimmer, T., L. DeBoer and M. Hirth (2009), “Examining Economies of Scale in School Consolidation: Assessment of Indiana School Districts”, Journal of Education Finance, Vol. 35/2, pp. 103–27, http://www.jstor.org.vu-nl.idm.oclc.org/stable/40704380.

## Notes

← 1. School provision costs are expressed in a monetary value. Transport costs are expressed in distances travelled, while their monetary value remains unknown, so that we will assume that longer distances travelled are linear with transport costs. Monetary values of transport distance remain unknown because the means to travel to schools, as well as the organisation of school transport, likely differs substantially between contexts and countries in Europe. In addition, establishing the value of transport opportunity costs is beyond the scope of this study.

← 2. This is in contrast to other approaches (Xu et al., 2020[20]; Pacheco and Casado, 2005[18]) where there is no central optimisation process and the number of locations is not defined a-priori.

← 3. The imposed regional boundaries allow parallel placement of schools across regions, which is useful to speed up the modelling process, and has a negligible influence on simulation results.

← 4. The boundaries of independent placement zones are drawn based on TL3 regions for primary schools and TL1 regions; for secondary schools.

← 5. This is analogue to the bid-rent assumptions in other land-use modelling applications (Hilferink and Rietveld, 1999[19]).

← 8. Conceivably, students from different degrees of urbanisation visit the same school, and the chosen aggregation method therefore does not accurately describe cost differences between the places where students live. Through the spatial interaction model used for student attribution, school costs for the simulated school placements can in fact be linked to the origins of students; however, for the observed costs, such data are unavailable.

← 9. , For secondary schools the threshold of 100 is based on the values for the national funding formula: 69.2 for middle schools and 120 for secondary schools. See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/844007/2020-21_NFF_schools_block_technical_note.pdf

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD/European Union 2021

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at http://www.oecd.org/termsandconditions.