copy the linklink copied! Annex A. Measures of segregation

A variety of measures of residential or school segregation have been proposed in the literature (for a review see, for instance, (Frankel and Volij, 2011[1]). These measures may vary by the population groups and geographical areas they consider, as well as by the type of question they try to answer. These measures are always highly correlated. However, they may differ in some cases, as they do not convey exactly the same meaning. A measure taken in isolation provides insights only on one part of the general picture, and the choice of using one or other indicators depends on the particular aspect of segregation one wants to focus on.

The first decision to make is which groups should be considered, answering the question: “Who is segregated from whom?” Historically, a large body of academic research focuses on the segregation by race in the United States, but these notions may be extended to study a wide range of dimensions. One may consider the segregation of socio-economically disadvantaged or advantaged students, or low-achieving students, or those with an immigrant background, etc.

The second decision depends on the aspect of segregation that is the most pertinent to the analysis, answering the question: “Why does segregation matter?” in a specific context. The various measures commonly used by scholars can be classified by the dimension they focus on, such as whether the different groups have any opportunities for social interactions, or whether some groups are concentrated in some schools or are evenly distributed across schools (Massey and Denton, 1988[2]).

Measuring group interactions

A first family of measures focuses on the interactions between groups of students. This is important if one believes that the interactions of peers of different backgrounds may be beneficial for academic performance or social cohesion. The exposure indicator measures the probability that an average student from a group will be in contact at school with members of another group. Formally, it can be written as:

picture, where picture (respectively Na) stands for the number of students of type a (for instance, those with an immigrant background) in school j (respectively, in the country), picture the number of students of the other type b and nj the total number of students in this school j. The value of the exposure indicator decreases with the level of segregation between the two groups, and ranges from 0 (full segregation, when the two groups have no contact at all) to pb, with picture the proportion of the group b in the population (no segregation). One may also, conversely, compute an indicator that measures the probability for the average student from a group of being in contact with students from the same group. This indicator increases with segregation and ranges from the proportion of the group a in the population pa (no segregation) to 1 (full segregation). The two previous indicators usually rely on a binary decomposition along a sole dimension (for instance ethnic, social or academic groups) and always sum to 1.

The ranges of these two indices related to interactions depend on the proportion of the group that is studied, which can make comparisons between countries difficult. For instance, consider the hypothetical example of two countries where the isolation indices for immigrant students have the same value, 0.20, but where the shares of immigrant students are 20% (country A) and 2% (country B). In country A, the value of the index corresponds to a no-segregation situation (the typical immigrant student has a chance of being enrolled in the same school as other immigrants, as observed in the entire country population). By contrast, in country B, in theory an immigrant student would be much less frequently enrolled with other immigrants, as there are fewer of them in the population. The same value in the isolation index means in this case that immigrant students should still have been concentrated in the same schools.

In order to compare segregation across countries, one should thus prefer to use the normalised version of the exposure indicator, which is called the isolation indicator in the document:


This index ranges from 0 (no segregation) to 1 (full segregation).

One may also derive a version of this indicator when the two groups of this indicator do not constitute a division of the population – for example, when measuring the exposure of disadvantaged students to high achievers in the country. In this case, the two groups taken together may not constitute the entire population, and in this example may partially overlap.1 The lowest value (0) is observed when the two subgroups are clustered in the same schools; the highest value (1) is observed when they are both clustered in schools, but in different ones. Medium values are observed when the two populations are randomly mixed within the schools.

Measuring departure from unevenness

Another way of analysing segregation is to look whether the student body in a country’s schools resembles the population of the country, in other words whether the distribution of students in schools deviate from what would have been observed if they were distributed randomly across schools (unevenness). The dissimilarity index is commonly used for this purpose. It is related to the proportions of students of the two groups who have to be displaced in order to obtain an identical distribution across all schools. Formally, the dissimilarity index may be written as:

picture, where picture (respectively Nb) stands for the number of students in school j with type b (respectively, in the country). This index thus measures the dissimilarity in the distribution of type a students across schools from the distribution of type b students across schools. It may be interpreted as the proportion of one or the other group that has to be displaced in order to achieved unevenness (assuming that school size may be adjusted), or as the average proportions of students of both group a and group b that have to be reallocated in order to achieved unevenness, maintaining equal school size.

Other indicators have been derived from the dissimilarity index. For instance, the indicator proposed by Gorard and Taylor (2002):

picture, where nj (respectively, N) stands for the number of students in school (respectively, in the country). It corresponds to the proportion of students of the considered group that should be displaced in order to obtain evenness, keeping the size of the schools stable. This indicator ranges from 0 (no segregation) to picture (full segregation). The current concentration indicator proposed in PISA 2015, picture, is also derived from the dissimilarity indicator.

The dissimilarity, Gorard and current concentration indices are closely linked. It can be easily shown that picture. However, while the dissimilarity index D ranges from 0 (no segregation) to 1 (full segregation), the maximum values of the two other indicators depend on the proportion of the considered group in the population: picture for the G indicator and picture for the CC indicator. One should be thus cautious when comparing the value of these indices when the share of the proportion of interest has a large range amongst countries. For the sake of illustration, consider the two “countries” A and B with 20% and 40% of students, respectively, from the minority group. For country A, the indicator will range from 0 to 0.8. However, by construction, even in the case of full segregation (all students in the category are in some schools without any other students), the G indicator in country B cannot be higher than 0.6, which thus corresponds to an “intermediate” situation in term of segregation in country A.

Comparability may not be an issue when the analysis focuses on a minority group whose proportions are similar or even identical across countries. This could be the case when considering categories related to the quartiles of the country distribution of a continuous variable, such as socio-economic status or academic performance. Yet, using a common range for different indices make interpretation easier.

Analysing diversity

The indices described above single out one group of students (for instance, disadvantaged students), compared to all other students (for instance, all non-disadvantaged students, including advantaged students and those of average socio-economic status). However, the two-group analysis may be inadequate for describing more complex patterns of social segregation. In PISA, for instance, the social background of students is defined by a continuous variable. Using binary variables for describing this situation loses a lot of information. For instance, if one focuses on disadvantaged students, a binary outcome will result in contrasting those disadvantaged students (defined as students whose socio-economic status is below the first quartile of the national distribution of this variable) to all other students (from the second to the fourth quartile). Assume that in some countries the most disadvantaged students (those below the first quartile) are never enrolled in the same schools as the most advantaged students (those above the fourth quartile) but often with the slightly more advantaged students. Contrasting the disadvantaged students with all other students will provide only a partial view of the segregation across social categories, and how social diversity observed in the population is reflected in schools. One could use several distinct indices, but another option is to use multi-group indices. Several ones have been proposed for categorical data in order to obtain a more complete description of the segregation (Reardon and Firebaugh, 2002[3]; Frankel and Volij, 2011[1]).

One may also consider the no-diversity index, which has the advantage of being decomposable. This index is based on a measure of entropy, meaning of the diversity in a group, and for this reason is often referred as the entropy index, or mutual information index. It is based on the Theil index commonly used in inequality analysis. When analysing a distribution of the population in four categories, in proportion picture, the diversity of the population may be related to the measure:


The no-diversity index compares this measure to the average obtained at the school level:


In the equation above, picture is the proportion of the four categories of the students amongst the nj students in school j (and N the total number of students).

The no-diversity index goes from 0 (no segregation) to 1 (full segregation). One of its advantages is that it is additively decomposable.2 If one aggregates schools at a higher level, typically comparing private schools to public schools, the no-diversity index can be decomposed into three components. One component corresponds to the social segregation within private schools, the second to the segregation within public schools, and the third to the additional segregation that reflects the fact that the social composition in the public sector could be distinct from that of the private sector.

Formally, this can be written as:

picture, with picture interpreted as the segregation due specifically to the coexistence of private and public sectors.

The indicator is based on the same idea as the social inclusion index commonly used in PISA publications, from the ratio of the within- and between-school variances (of the continuous social index or performance). The inclusiveness indicator measure relies on a multilevel model (or hierarchical model) that decomposes the variance (modelled by a normal distribution) with one component corresponding to the schools and another to the students. In multilevel models, the estimator of the variance between schools is corrected in order to take into account the part of this variability that is due to students. However, this estimator cannot be additively decomposed in a direct way.

Why do these indicators differ?

Consider the hypothetical situation of a population of students who may be of type A (10% of the population) or B (90%). The students are distributed across six schools, each with a capacity of six students. The full segregation situation is observed when all the type A students are in one and only one school. No segregation corresponds to a situation where all schools are equally composed of one type A student and five type B students. In both cases, the dissimilarity index, the isolation index and the diversity index coincide (Figure A.1). However, these indicators may differ in intermediate situations. The first panel in Figure A.1 is a slight departure from the complete segregation case (one type A student is mixed with five type B students) while all others are in the same school (with only one type B student). Both the dissimilarity and the isolation indices are very high (D = 0.80 and NE = 0.67, respectively). In the former, many type A students have to be displaced in order to achieve evenness. In the latter, most of the type A students are concentrated in only one school, and the probability of an average student A to interact with a student B is low (one student A has a very high probability of interacting with a student B, but for the remaining five students B the probability is much lower). Now consider the case illustrated in the right panel of Figure A.2. The dissimilarity index has the same value as in the previous case: the same proportion of students has to be displaced to achieve unevenness. However, the isolation index is much lower (NE = 0.4), as the average type A student is much more likely to interact with type B students (as they constitute two-thirds of the enrolment of any type A student’s school). In both cases, the no-diversity index is between these two.

copy the linklink copied!
Figure A.1. Complete vs no segregation cases (illustrative example)
Figure A.1. Complete vs no segregation cases (illustrative example)
copy the linklink copied!
Figure A.2. High dissimilarity, high vs medium isolation (illustrative example)
Figure A.2. High dissimilarity, high vs medium isolation (illustrative example)

Dissimilarity, no-diversity and isolation indices convey different meanings – and this may be important for analysing real data. For example, Figure A.3 shows the values of these three indices of segregation for students with an immigrant background across schools. Countries and economies are sorted according to the proportion of immigrant students within the 15-year-old student population (represented by diamonds in the figure). In countries where the proportion of immigrants is very small, the dissimilarity index tends to be very high. If the country hosts only a few students with an immigrant background, one cannot expect them to be distributed to all schools. As the dissimilarity index is related to the proportion of the population of immigrant students who have to be displaced, this proportion may quickly become very large if the population is small (as the related population, meaning the denominator, is very small).3

copy the linklink copied!
Figure A.3. Dissimilarity index, no-diversity index and isolation of students with an immigrant background
Figure A.3. Dissimilarity index, no-diversity index and isolation of students with an immigrant background


One may imagine an extreme case where the country hosts fewer immigrant students than the number of schools. In this case, even if these students are distributed across all schools, one will observe numerous schools without any immigrant students. One the other hand, the isolation index is much lower, as the average immigrant student is also more likely to interact with native students at school.

Figure A.3 also illustrates that both isolation and dissimilarity indices may vary widely in countries with similar proportions of immigrant students. For instance, the concentration of these students in some schools is much lower in New Zealand than in Australia and in the United States, while the proportions of non-native students amongst 15-year-old students are similar in these three countries. Isolation of immigrant students is also much higher in Denmark and the Netherlands than in Estonia and Greece, while in most European countries, around 10% of students are non-native.


[5] Carrington, W. and K. Troske (1997), “On Measuring Segregation in Samples with Small Units”, Journal of Business & Economic Statistics, Vol. 15/4, p. 402,

[4] D’Haultfœuille, X. and R. Rathelot (2017), “Measuring segregation on small units: A partial identification analysis”, Quantitative Economics, Vol. 8/1, pp. 39-73,

[1] Frankel, D. and O. Volij (2011), “Measuring school segregation”, Journal of Economic Theory,

Gorard, S. and C. Taylor (2002), “What is Segregation?: A Comparison of Measures in Terms of ‘Strong’ and ‘Weak’ Compositional Invariance”, Sociology, Vol. 36/4,

[2] Massey, D. and N. Denton (1988), “The Dimensions of Residential Segregation”, Social Forces, Vol. 67/2, p. 281,

[6] OECD (2017), PISA 2015 Results (Volume IV): Students’ Financial Literacy, PISA, OECD Publishing, Paris,

[3] Reardon, S. and G. Firebaugh (2002), “2. Measures of Multigroup Segregation”, Sociological Methodology, Vol. 32/1, pp. 33-67,


← 1. The index obviously does not make sense when the two populations completely overlap (Nb = Na, or when Nb + Na > N).

← 2. This is a common property with the square roots index proposed by Hutchens for binary variables, see PISA 2015 Results Volume IV (OECD, 2017[6]).

← 3. This is related to the issue of measuring segregation when units are small (D’Haultfœuille and Rathelot, 2017[4]; Carrington and Troske, 1997[5]).

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2019

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at

Annex A. Measures of segregation