# Annex B. Technical notes on analyses in this report

The statistics presented in this report were derived from data obtained through samples of schools, school principals and teachers. The sample was collected following a stratified two-stage probability sampling design. This means that teachers (second-stage units or secondary sampling units) were randomly selected from the list of in-scope teachers for each of the randomly selected schools (first-stage or primary sampling units). For these statistics to be meaningful for a country, they needed to reflect the whole population from which they were drawn and not merely the sample used to collect them. Thus, survey weights must be used in order to obtain design-unbiased estimates of population or model parameters.

Final weights allow the production of country-level estimates from the observed sample data. The estimation weight indicates how many population units are represented by a sampled unit. The final weight is the combination of many factors reflecting the probabilities of selection at the various stages of sampling and the response obtained at each stage. Other factors may also come into play as dictated by special conditions to maintain the unbiasedness of the estimates (e.g. adjustment for teachers working in more than one school).

Statistics presented in this report that are based on the responses of school principals and that contribute to estimates related to school principals were estimated using school weights (SCHWGT). Results based only on responses of teachers or on responses of teachers and principals (i.e. responses from school principals were merged with teachers’ responses) were weighted by teacher weights (TCHWGT).

In this report, several scale indices are used in regression analyses. Descriptions of the construction and validation of these scales can be found in Chapter 11 of the TALIS 2018 Technical Report (OECD, 2019[1]).

The OECD and TALIS averages, which were calculated for most indicators presented in this report, correspond to the arithmetic mean of the respective country estimates. When the statistics are based on responses of teachers, the OECD and TALIS averages cover 31 and 48 countries and territories, respectively (Table A B.1). In those cases where the analysis is based on principals’ responses, the OECD and TALIS averages cover 30 and 47 countries and territories, respectively.

The EU total represents the 23 European Union member states that also participated in TALIS 2018 as a single entity and to which each of the 23 EU member states contribute in proportion to the number of teachers or principals, depending on the basis of the analysis. Therefore, the EU total is calculated as a weighted arithmetic mean based on the sum of final teacher (TCHWGT) or principal (SCHWGT) weights by country, depending on the target population.

In this publication, the OECD average is generally used when the focus is on providing a global tendency for an indicator and comparing its values across education systems. In the case of some countries and territories, data may not be available for specific indicators, or specific categories may not apply. Therefore, readers should keep in mind that the term “OECD average” refers to the OECD countries and territories included in the respective comparisons. In cases where data are not available or do not apply to all sub-categories of a given population or indicator, the “OECD average” may be consistent within each column of a table but not necessarily across all columns of a table.

Differences between sub-groups among school characteristics (e.g. between schools with a high concentration of students from socio-economically disadvantaged homes and schools with a low concentration of students from socio-economically disadvantaged homes) were tested for statistical significance. All differences marked in bold in the data tables of this report are statistically significantly different from 0 at the 95% level.

In the case of differences between sub-groups, the standard error is calculated by taking into account that the two sub-samples are not independent. As a result, the expected value of the covariance might differ from 0, leading to smaller estimates of standard error as compared to estimates of standard error calculated for the difference between independent sub-samples.

Tables presenting the proportion of teachers and principals, by the school breakdown variables (Tables A B.2 and A B.3) can be found in Annex C.

The dissimilarity index, which is commonly used as a measure of segregation, captures to what extent the distribution of teachers in schools deviates from what would have been observed if they were distributed randomly across schools. It is related to the proportions of teachers of two mutually exclusive groups (e.g. teachers with a master’s degree and teachers without a master’s degree) who have to be reallocated in order to obtain an identical distribution across all schools. Thus, in the context of this report, the dissimilarity index measures whether teachers with certain traits are clustered in a limited number of schools. Clustering arises whenever similar individuals (in this case, teachers with similar characteristics) end up together (in this case, working in the same school). Formally, the dissimilarity index may be written as:

$D=\frac{1}{2}\sum _{j=1}^{J}\left|\frac{{n}_{j}^{a}}{{N}^{a}}-\frac{{n}_{j}^{b}}{{N}^{b}}\right|$, where ${n}_{j}^{b}$ (respectively ${N}^{b}$) stands for the number of teachers in school $j$ with Type $b$ (respectively, in the country).

Thus, this index measures the dissimilarity between the distribution of type $a$ teachers across schools and the distribution of Type $b$ teachers across schools (OECD, 2019[2]). It may be interpreted as the proportion of one or the other group that has to be displaced in order to achieve evenness (assuming that school size may be adjusted), or as the average proportions of teachers of both group $a$ and group $b$ that have to be reallocated in order to achieve evenness, maintaining equal school size.

The dissimilarity index ranges from 0 (i.e. the allocation of teachers in schools perfectly resembles the teacher population of the country) to 1 (i.e. teachers with a certain characteristic are concentrated in a single school). A high dissimilarity index means that the distribution of teachers with a certain characteristic is very different from what would be observed if they were distributed randomly across schools. Hence, it is an indication of teachers with a certain characteristic being highly concentrated in certain schools. Figure A B.1 shows an example in which teachers may be Type A or Type B. They are distributed across six schools, each with a capacity of six teachers. Complete clustering is observed when all the Type A teachers are in one and only one school. No clustering corresponds to a situation where all schools are equally composed of one Type A teacher and five Type B teachers.

By design, the value of the dissimilarity index increases as the overall shares of both groups in the teacher population becomes more unbalanced, based on the specific teacher characteristic being analysed. In those cases, where the share of teachers with a certain characteristic in the overall teacher population is either very small or large, the value of the dissimilarity index tends to be high. In the extreme case, when there are more schools than actual teachers with a certain characteristic in a country, the value of the dissimilarity index is larger than zero, even if these teachers are randomly allocated across schools (OECD, 2019[2]). Thus, the comparability of the dissimilarity index across countries warrants caution, especially when the group of teachers with certain characteristic that is analysed varies considerably across countries.

In addition, the value of the dissimilarity index is also affected by the size of the units (i.e. schools) across which the distribution of individuals are analysed. Notably, if the units’ sizes are small, then the dissimilarity index tends to overestimate the level of deviations from randomness (also known as small-unit bias) (Carrington and Troske, 1997[3]; D’Haultfœuille, Girard and Rathelot, 2021[4]; D’Haultfœuille and Rathelot, 2017[5]). For example, the smaller the schools in terms of the number of teachers teaching in the school, the more likely it is to observe a deviation from the random allocation of teachers with certain characteristics.

Regression analyses were carried out for each country separately. Similarly to other statistics presented in this report, the OECD and TALIS averages refer to the arithmetic mean of country-level estimates, while the EU total is calculated as a weighted arithmetic mean based on the sum of final teacher (TCHWGT) or principal (SCHWGT) weights by country, depending on the target population.

In order to ensure the robustness of the regression models, independent variables were introduced into the models in steps. This approach also required that the models at each step be based on the same sample. The restricted sample used for the different versions of the same model corresponded to the sample of the most extended (i.e. with the maximum number of independent variables) version of the model. Thus, the restricted sample of each regression model excluded those observations where all independent variables had missing values.

Statistics based on multilevel models include variance components (between- and within-school variance), the intra-class correlation derived from these components, and regression coefficients (where this has been indicated). Multilevel models in this report are specified as two-level regression models (the teacher and school levels) and estimated with maximum likelihood estimation.

Weights are used at both the teacher and school levels. The purpose of these weights is to account for differences in the probabilities of teachers being selected in the sample. Final teacher weights (TCHWGT) were used as teacher-level sampling weights. Teachers’ within-school weights correspond to final teacher weights, rescaled to amount to the sample size within each school. Final school weights (SCHWGT) were used as school-level sampling weights.

Estimates based on multilevel models depend on how schools are defined and organised within countries and territories and how they are chosen for sampling purposes. Schools may have been defined differently in the TALIS sample, depending on the country/territory. Namely, they can be defined as: administrative units (even if they spanned several geographically separate institutions); as those parts of larger educational institutions that serve students at the ISCED level concerned; as physical school buildings; or, rather, from a management perspective (e.g. establishments having a principal). Annex E of the TALIS 2018 Technical Report includes information on how countries and territories defined schools in their respective systems (OECD, 2019[1]). In particular, the between-school variance estimates can be affected if the variables used for stratification, a process aimed at reducing variation within strata, are associated with between-school differences.

Multilevel logistic models can be viewed as latent-response models (Gelman and Hill, 2007[6]; Goldstein, Browne and Rasbash, 2002[7]; Rabe-Hesketh and Skrondal, 2012[8]). In this report, the observed dichotomous response ${y}_{i}$ (i.e. whether teachers use information and communication technology [ICT] for instruction on a regular basis or not) is assumed to arise from an unobserved or latent continuous response ${y}_{i}^{*}$ that represents the propensity to use ICT for instruction. If this latent response is greater than 0, then the observed response is 1; otherwise, the observed response is 0:

Multilevel linear models were estimated using the Stata (version 17.0) “mixed” module, while the multilevel logistic models were estimated with the “melogit” module.

The index of intra-class correlation represents the share of the variance that lies between the cluster variable – in this case, schools – and it is defined and estimated as:

where ${\sigma }_{B}^{2}$ and ${\sigma }_{W}^{2}$, respectively, represent the between- and within-variance estimates. In the case of multilevel logistic models, the assumed within-school variance component (is the standard logistic distribution, that is $\left({\pi }^{2}/3\right)\approx 3.29$. Therefore, the index of intra-class correlation is estimated as:

For statistics based on multilevel models, such as the estimates of variance components and regression coefficients from two-level regression models, the standard errors are not estimated with the usual replication method, which accounts for stratification and sampling rates from finite populations. Instead, standard errors are “model-based”: their computation assumes that schools, and teachers within schools, are sampled at random (with sampling probabilities reflected in school and teacher weights) from a theoretical, infinite population of schools and teachers, which complies with the model’s parametric assumptions. The standard error for the estimated intra-class correlation is calculated by deriving an approximate distribution for it from the (model-based) standard errors for the variance components, using the delta method.

Binary logistic regression analysis enables the estimation of the relationship between one or more independent (or explanatory) variables and the dependent (or outcome) variable with two categories. The regression coefficient ($\beta$) of a logistic regression is the estimated increase in the log odds of the outcome per unit increase in the value of the predictor variable.

More formally, let $Y$ be the binary outcome variable indicating no/yes with 0/1, and $p$ be the probability of $Y$ to be 1, so that $p=prob\left(Y=1\right)$. Let ${x}_{1},\dots {x}_{k}$ be a set of explanatory variables. Then, the logistic regression of $Y$ on ${x}_{1},\dots {x}_{k}$ estimates parameter values for ,…, via the maximum likelihood method of the following equation:

$Logit\left(p\right)=\mathrm{log}\left(p/\left(1-p\right)\right)={\beta }_{0}+{\beta }_{1}{x}_{1}+\dots +{\beta }_{k}{x}_{k}$

Additionally, the exponential function of the regression coefficient (${e}^{\beta }$) is obtained, which is the odds ratio ($OR$) associated with a one-unit increase in the explanatory variable. Then, in terms of probabilities, the equation above is translated into the following:

$p=\frac{{e}^{\left({\beta }_{0}+{\beta }_{1}{X}_{1}+\dots +{\beta }_{k}{X}_{k}\right)}}{\left(1+{e}^{\left({\beta }_{0}+{\beta }_{1}{X}_{1}+\dots +{\beta }_{k}{X}_{k}\right)}\right)}$

The transformation of log odds ($\beta$) into odds ratios (${e}^{\beta }$; $OR$) makes the data more interpretable in terms of probability. The odds ratio ($OR$) is a measure of the relative likelihood of a particular outcome across two groups. The odds ratio for observing the outcome when an antecedent is present is:

$OR=\frac{{p}_{11}/{p}_{12}}{{p}_{21}/{p}_{22}}$

where ${p}_{11}/{p}_{12}$represents the “odds” of observing the outcome when the antecedent is present, and ${p}_{21}/{p}_{22}$ represents the “odds” of observing the outcome when the antecedent is not present. Thus, an odds ratio indicates the degree to which an explanatory variable is associated with a categorical outcome variable with two categories (e.g. yes/no) or more than two categories. An odds ratio below one denotes a negative association; an odds ratio above one indicates a positive association; and an odds ratio of one means that there is no association. For instance, if the association between being a female teacher and having chosen teaching as first choice as a career is being analysed, the following odds ratios would be interpreted as:

• 0.2: Female teachers are five times less likely to have chosen teaching as a first choice as a career than male teachers.

• 0.5: Female teachers are half as likely to have chosen teaching as a first choice as a career than male teachers.

• 0.9: Female teachers are 10% less likely to have chosen teaching as a first choice as a career than male teachers.

• 1: Female and male teachers are equally likely to have chosen teaching as a first choice as a career.

• 1.1: Female teachers are 10% more likely to have chosen teaching as a first choice as a career than male teachers.

• 2: Female teachers are twice as likely to have chosen teaching as a first choice as a career than male teachers.

• 5: Female teachers are five times more likely to have chosen teaching as a first choice as a career than male teachers.

The odds ratios in bold indicate that the relative risk/odds ratio is statistically significantly different from 1 at the 95% confidence level. To compute statistical significance around the value of 1 (the null hypothesis), the relative-risk/odds-ratio statistic is assumed to follow a log-normal distribution, rather than a normal distribution, under the null hypothesis.

Binary logistic regressions cannot provide a goodness-of-fit measure that would be equivalent to the R-squared (R²), which represents the proportion of the observed variation in the dependent (or outcome) variable that can be explained by the independent (or explanatory) variables. Unlike linear regressions with normally distributed residuals, it is not possible to find a closed-form expression for the coefficient values that maximise the likelihood function of logistic regressions; thus, an iterative process must be used instead. Yet, the goodness-of-fit of binary logistic models can be evaluated by the pseudo-R².1 Similarly to the R², the pseudo-R² also ranges from 0 to 1, with higher values indicating better model fit. Nevertheless, pseudo-R² cannot be interpreted as one would interpret the R².

Correlation coefficient measures the strength and direction of the statistical association between two variables. Correlation coefficients vary between -1 and 1; values around 0 indicate a weak association, while the extreme values indicate the strongest possible negative or positive association. The Pearson correlation coefficient (indicated by the letter r) measures the strength and direction of the linear relationship between two variables.

In this report, Pearson correlation coefficients are used to quantify relationships between country/territory-level statistics. With only two variables (x and y), the R-squared measure (indicated by R2) of the linear regression of y on x (or, equivalently, of x on y) is the square of the Pearson correlation coefficient between the two variables.

## References

[3] Carrington, W. and K. Troske (1997), “On measuring segregation in samples with small units”, Journal of Business & Economic Statistics, Vol. 15/4, p. 402, https://doi.org/10.2307/1392486.

[4] D’Haultfœuille, X., L. Girard and R. Rathelot (2021), “segregsmall: A command to estimate segregation in the presence of small units”, The Stata Journal, Vol. 21/1, pp. 152-179, https://doi.org/10.1177/1536867X211000018.

[5] D’Haultfœuille, X. and R. Rathelot (2017), “Measuring segregation on small units: A partial identification analysis”, Quantitative Economics: Journal of the Econometric Society, Vol. 8/1, pp. 39-73, https://doi.org/10.3982/QE501.

[6] Gelman, A. and J. Hill (2007), Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, Cambridge, https://doi.org/10.1017/CBO9780511790942.

[7] Goldstein, H., W. Browne and J. Rasbash (2002), “Partitioning variation in multilevel models”, Understanding Statistics, Vol. 1/4, pp. 223-231, https://doi.org/10.1207/S15328031US0104_02.

[2] OECD (2019), Balancing School Choice and Equity: An International Perspective Based on Pisa, PISA, OECD Publishing, Paris, https://dx.doi.org/10.1787/2592c974-en.

[1] OECD (2019), TALIS 2018 Technical Report, OECD, Paris, http://www.oecd.org/education/talis/TALIS_2018_Technical_Report.pdf.

[8] Rabe-Hesketh, S. and A. Skrondal (2012), Multilevel and Longitudinal Modeling Using Stata, Third Edition, Volume II: Categorical Responses, Counts, and Survival, Stata Press, Stockholm.

## Note

← 1. Among the various different types of pseudo-R2, this report applies McFadden’s pseudo-R2.