# Annex B. Technical notes on analyses in this report

The statistics presented in this report were derived from data obtained through samples of schools, school principals and teachers. The sample was collected following a stratified two-stage probability sampling design. This means that teachers (second-stage units or secondary sampling units) were randomly selected from the list of in-scope teachers for each of the randomly selected schools (first-stage or primary sampling units). For these statistics to be meaningful for a country/economy, they needed to reflect the whole population from which they were drawn and not merely the sample used to collect them. Thus, survey weights must be used in order to obtain design-unbiased estimates of population or model parameters.

Final weights allow the production of country/economy-level estimates from the observed sample data. The estimation weight indicates how many population units are represented by a sampled unit. The final weight is the combination of many factors reflecting the probabilities of selection at the various stages of sampling and the response obtained at each stage. Other factors may also come into play as dictated by special conditions to maintain the unbiasedness of the estimates (e.g. adjustment for teachers working in more than one school).

Statistics presented in this report that are based on the responses of school principals and that contribute to estimates related to school principals were estimated using school weights (SCHWGT). Results based only on responses of teachers or on responses of teachers and principals (i.e. responses from school principals were merged with teachers’ responses) were weighted by teacher weights (TCHWGT).

In this report, several scale indices are used in regression analyses. Descriptions of the construction and validation of these scales can be found in Chapter 11 of the TALIS 2018 Technical Report (OECD, 2019[1]).

The Teaching and Learning International Survey (TALIS) averages, which were calculated for most indicators presented in this report, correspond to the arithmetic mean of the respective country/economy estimates in each International Standard Classification of Education (ISCED) level. When the statistics are based on responses of teachers and principals in primary education, the TALIS average covers 13 countries and economies (Table A B.1). Although 15 countries and economies took part in the study, since the data of Australia and the Netherlands was not adjudicated, they were not incorporated in the TALIS average. In the case of analysis based on responses of teachers and school leaders in upper secondary education, the TALIS averages cover 11 countries and economies. Finally a TALIS average for lower secondary education was estimated for those countries participating in primary education and upper secondary education.

The statistics in this report represent estimates based on samples of teachers and principals, rather than values that could be calculated if every teacher and principal in every country/economy had answered every question. Consequently, it is important to measure the degree of uncertainty of the estimates. In TALIS, each estimate has an associated degree of uncertainty that is expressed through a standard error. The use of confidence intervals provides a way to make inferences about the population means and proportions in a manner that reflects the uncertainty associated with the sample estimates. From an observed sample statistic and assuming a normal distribution, it can be inferred that the corresponding population result would lie within the confidence interval in 95 out of 100 replications of the measurement on different samples drawn from the same population. The reported standard errors were computed with a balanced repeated replication (BRR) methodology.

Differences between sub-groups along teacher characteristics (e.g. female teachers and male teachers), school characteristics (e.g. schools with a high concentration of students from socio-economically disadvantaged homes and schools with a low concentration of students from socio-economically disadvantaged homes) and across educational levels (e.g. teachers in primary education and teachers in lower secondary education) were tested for statistical significance. All differences marked in bold in the data tables of this report are statistically significantly different from 0 at the 95% confidence level.

In the case of differences between sub-groups at the same level, the standard error is calculated by taking into account that the two subsamples are not independent. As a result, the expected value of the covariance might differ from 0, leading to smaller estimates of standard error as compared to estimates of standard error calculated for the difference between independent subsamples. In the case of differences between educational levels, the standard error is calculated by considering that the two subsamples are independent.

Regression analysis was conducted to explore the relationships between different variables. Multiple linear regression was used in those cases where the dependent (or outcome) variable was considered continuous. Binary logistic regression was employed when the dependent (or outcome) variable was a binary categorical variable. Regression analyses were carried out for each country/economy separately. Similarly to other statistics presented in this report, the TALIS averages refer to the arithmetic mean of country/economy-level estimates.

Control variables included in a regression model are selected based on theoretical reasoning and, preferably, limited to the most objective measures or those that do not change over time. Controls for teacher characteristics include: teacher’s gender, age, employment status (i.e. full-time/part-time) and years of teaching experience. Controls for class characteristics include: variables of classroom composition (i.e. share of students whose first language is different from the language of instruction, low academic achievers, students with special needs, students with behavioural problems, students from socio-economically disadvantaged homes, academically gifted students, immigrant students or students with an immigrant background, refugee students) and class size.

In the case of regression models based on multiple linear regression, the explanatory power of the regression models is also highlighted by reporting the R-squared (R²), which represents the proportion of the observed variation in the dependent (or outcome) variable that can be explained by the independent (or explanatory) variables.

Multiple linear regression analysis provides insights into how the value of the continuous dependent (or outcome) variable changes when any one of the independent (or explanatory) variables varies while all other independent variables are held constant. In general, and with everything else held constant, a one-unit increase in the independent variable (${x}_{i}$) increases, on average, the dependent variable ($Y$) by the units represented by the regression coefficient (${\beta }_{i}$):

$Y={\beta }_{0}+{\beta }_{1}{x}_{1}+\dots +{\beta }_{i}{x}_{i}+\epsilon$

When interpreting multiple regression coefficients, it is important to keep in mind that each coefficient is influenced by the other independent variables in a regression model. The influence depends on the extent to which independent variables are correlated. Therefore, each regression coefficient does not capture the total effect of independent variables on the dependent variable. Rather, each coefficient represents the additional effect of adding that variable to the model, considering that the effects of all other variables in the model are already accounted for. It is also important to note that, because cross-sectional survey data are used in these analyses, no causal conclusions can be drawn.

Regression coefficients in bold in the data tables presenting the results of regression analysis are statistically significantly different from 0 at the 95% confidence level.

Binary logistic regression analysis enables the estimation of the relationship between one or more independent (or explanatory) variables and the dependent (or outcome) variable with two categories. The regression coefficient ($\beta$) of a logistic regression is the estimated increase in the log odds of the outcome per unit increase in the value of the predictor variable.

More formally, let $Y$ be the binary outcome variable indicating no/yes with 0/1, and $p$ be the probability of $Y$ to be 1, so that $p=prob\left(Y=1\right)$. Let ${x}_{1},\dots {x}_{k}$ be a set of explanatory variables. Then, the logistic regression of $Y$ on ${x}_{1},\dots {x}_{k}$ estimates parameter values for ,…, via the maximum likelihood method of the following equation:

$Logit\left(p\right)=\mathrm{log}\left(p/\left(1-p\right)\right)={\beta }_{0}+{\beta }_{1}{x}_{1}+\dots +{\beta }_{k}{x}_{k}$

Additionally, the exponential function of the regression coefficient (${e}^{\beta }$) is obtained, which is the odds ratio ($OR$) associated with a one-unit increase in the explanatory variable. Then, in terms of probabilities, the equation above is translated into the following:

$p=\frac{{e}^{\left({\beta }_{0}+{\beta }_{1}{X}_{1}+\dots +{\beta }_{k}{X}_{k}\right)}}{\left(1+{e}^{\left({\beta }_{0}+{\beta }_{1}{X}_{1}+\dots +{\beta }_{k}{X}_{k}\right)}\right)}$

The transformation of log odds ($\beta$) into odds ratios (${e}^{\beta }$; $OR$) makes the data more interpretable in terms of probability. The odds ratio ($OR$) is a measure of the relative likelihood of a particular outcome across two groups. The odds ratio for observing the outcome when an antecedent is present is:

$OR=\frac{{p}_{11}/{p}_{12}}{{p}_{21}/{p}_{22}}$

where ${p}_{11}/{p}_{12}$represents the “odds” of observing the outcome when the antecedent is present, and ${p}_{21}/{p}_{22}$ represents the “odds” of observing the outcome when the antecedent is not present. Thus, an odds ratio indicates the degree to which an explanatory variable is associated with a categorical outcome variable with two categories (e.g. yes/no) or more than two categories. An odds ratio below one denotes a negative association; an odds ratio above one indicates a positive association; and an odds ratio of one means that there is no association. For instance, if the association between being a female teacher and having chosen teaching as first choice as a career is being analysed, the following odds ratios would be interpreted as:

• 0.2: Female teachers are five times less likely to have chosen teaching as a first choice as a career than male teachers.

• 0.5: Female teachers are half as likely to have chosen teaching as a first choice as a career than male teachers.

• 0.9: Female teachers are 10% less likely to have chosen teaching as a first choice as a career than male teachers.

• 1: Female and male teachers are equally likely to have chosen teaching as a first choice as a career.

• 1.1: Female teachers are 10% more likely to have chosen teaching as a first choice as a career than male teachers.

• 2: Female teachers are twice more likely to have chosen teaching as a first choice as a career than male teachers.

• 5: Female teachers are five times more likely to have chosen teaching as a first choice as a career than male teachers.

The odds ratios in bold indicate that the relative risk/odds ratio is statistically significantly different from 1 at the 95% confidence level. To compute statistical significance around the value of 1 (the null hypothesis), the relative-risk/odds-ratio statistic is assumed to follow a log-normal distribution, rather than a normal distribution, under the null hypothesis.

The main goal of this report was to identify those variables displaying significant changes across educational levels and attempt to understand what factors could explain these differences. In particular differences in teacher and school characteristics across educational levels are used to understand the differences in certain outcomes, such as teachers’ perceptions of feeling valued in society, amount of time devoted to instruction, the level of need for professional development and teachers’ overall well-being (see Boxes 2.1, 3.2, 4.5 and 6.3)

The method chosen to conduct these explorations was the Blinder-Oaxaca decomposition. It was originally developed to study different labour market outcomes across groups, such as gender or race, but it can be used to investigate any group differences in outcomes. Starting with a set of relevant characteristics that differ across groups, the methodology can be applied to divide the group differences in outcomes into a portion explained by group characteristics and a residual or unexplained component (OECD, 2018, p. 181[2]). In general, the Binder-Oaxaca decompositions analyse the differential of the average of a teacher characteristic (Y) between two groups (P [primary education] and LS [lower secondary education]) to determinate the share that is due to differences in a set of observable predictors (S) and the share left unexplained.

Based on linear models, the difference in the mean outcome can be expressed as:

$\stackrel{-}{Yp}-\stackrel{-}{Yls}=\beta p\stackrel{-}{Sp}-\beta ls\stackrel{-}{Sls}$

It can be divided into two components, using a “twofold” decomposition.

$\beta p\left(\stackrel{-}{Sp}-\stackrel{-}{Sls}\right)$

The equation above represents the “explained effects”, which reflects the extent to which the differences in the outcomes are explained by the mean differences in a series of teacher and school characteristics between primary and lower secondary education.

$\left(\alpha p-\alpha ls\right)+\left(\beta p-\beta ls\right)\stackrel{-}{Sls}$

The equation above represents the “unexplained effects” that could be capturing the residual effects (i.e. return effects) of the teacher and school characteristics on the model, along with differences in unobservable components. Results on the residual term are presented in the graph for the sake of clarity, but they are not commented on because their interpretation is cumbersome.

Correlation coefficient measures the strength and direction of the statistical association between two variables. Correlation coefficients vary between -1 and 1; values around 0 indicate a weak association, while the extreme values indicate the strongest possible negative or positive association. The Pearson correlation coefficient (indicated by the letter r) measures the strength and direction of the linear relationship between two variables.

In this report, Pearson correlation coefficients are used to quantify relationships between country/economy-level statistics.

The classification of levels of education is based on the International Standard Classification of Education (ISCED). ISCED is an instrument for compiling statistics on education internationally. ISCED-97 was revised, and the new International Standard Classification of Education (ISCED-2011), formally adopted in November 2011, is now the basis of the levels presented in this publication. It distinguishes between nine levels of education.

Although the ISCED classification seeks to provide a common framework to make data internationally comparable, education systems are multi-layered and complex, and some nuances may still escape analysis. In other words, educational programmes, even within the same educational level, could vary greatly across countries/economies, and any analysis of the data should take that into consideration. In order to provide additional country/economy-level information that will help contextualise information and provide some nuances to national comparison, national-level information was collected from TALIS 2018 participants in primary and upper secondary education. The data comes for the most part from administrative sources taken from other OECD publications, such as Education at Glance and publications related to the Programme for International Student Assessment (PISA). All data are displayed in Tables A B.3, A B.4 and A B.5.

## References

[1] OECD (2019), TALIS 2018 Technical Report, OECD, Paris, http://www.oecd.org/education/talis/TALIS_2018_Technical_Report.pdf.

[4] OECD (2018), OECD Handbook for Internationally Comparative Education Statistics 2018: Concepts, Standards, Definitions and Classifications, OECD Publishing, Paris, https://dx.doi.org/10.1787/9789264304444-en.

[2] OECD (2018), The Resilience of Students with an Immigrant Background: Factors that Shape Well-being, OECD Reviews of Migrant Education, OECD Publishing, Paris, https://dx.doi.org/10.1787/9789264292093-en.

[3] UNESCO-UIS (2012), International Standard Classification of Education: ISCED 2011, UNESCO Institute for Statistics, Montreal, http://uis.unesco.org/sites/default/files/documents/international-standard-classification-of-education-isced-2011-en.pdf.