2. Understanding and measuring mathematics teaching practice

V. Darleen Opfer
Courtney A. Bell
Eckhard Klieme
Daniel F. McCaffrey
Jonathan Schweig
Brian M. Stecher

As described in Chapter 1, understanding teaching quality is a complex undertaking. Teaching is heavily influenced by context due to differences in curriculum content, school, student background, preferences for pedagogical methods and instructional goals. Previous single-method, large-scale international studies have been limited in their ability to understand teaching and how it varies within and between countries and economies. In particular, it has been difficult to understand how these variations are related to differences in student outcomes.

Researchers have long recognised the potential of video observation, especially when paired with other measures, to deepen understanding of what teaching looks like and how the practices and interactions being used are associated with student outcomes. The Study uses video observation, along with multiple other measures, to provide policymakers and practitioners with information about the practices employed in their countries and economies. It uses unique methods that build on previous video and large-scale studies.

The remaining sections of this chapter describe the methods used to understand teaching in participating countries/economies. An explanation of how the samples of teachers and classrooms were obtained and a description of how the Study was fielded are then also provided. The chapter concludes with a discussion of the limitations of the Study that should be taken into account when considering the implications of results.

The Study has unique aspects that set it apart from other international studies of teaching. As set out in Chapter 1, these design features enhance its potential to make a significant contribution to the field. Notable methodological features include:

  • Common Measurement Tools: The Study developed and fielded common sets of measures for analysing video-recorded teaching practices and classroom teaching materials. Unlike many studies of teaching and learning, the protocols developed for the Study draw on multiple perspectives from participating countries and available research literature regarding which constructs are worth measuring and what quality looks like on those constructs.

  • Common Focal Topic: The Study focused on the teaching and learning of a single common secondary mathematics topic across all countries and economies. The targeted focus on a common topic supports the understanding of classroom teaching and learning practices across countries/economies by removing differences resulting from variations in the content matter. For example, if the Study did not narrow to a focal topic and two countries had different relationships between questioning practices and students' interest in mathematics, it would be impossible to rule out the hypothesis that different subject matter (not questioning) shaped students' different levels of interest. After mapping mathematics curricula, quadratic equations was chosen as the focal topic because it is taught in all the participating countries/economies at a similar age, grade, level, has a well-defined starting point that introduces a core mathematical concept and allows for rich mathematical activities (e.g. application, modelling and transfer), for deep mathematical thinking (e.g. argumentation, proof) and for working with different representations (see the Global Teaching InSights Technical Report, hereafter referred to as the Technical Report).

  • Pre/Post Design: The Study captured student outcome measures before and after students' learning of the focal content, which allowed analyses to control for students' prior knowledge and dispositions. 

  • Standardised Procedures: The Study used standardised and replicable procedures for data collection, training and certifying video observers and teaching materials raters, and for coding videos and teaching materials in every country/economy. These standard procedures are important because, in studies without such standardisation, it can be challenging to determine whether differences across countries are real or merely the result of variation in implementation.

The aim was to recruit 85 teachers and their students in each participating country or economy. This number was determined through power analyses informed by existing research to be able to detect a relationship between teaching practices and student outcomes. All data related to that teacher and his or her teaching were collected from one class or section of students. The following are the types of data that were collected:

  • Video-Recorded Lessons: Two lessons focused on quadratic equations were video-recorded for each teacher. Following data collection, these video-recorded lessons were scored using the Study’s observation codes (see below) by two observers. Video-recorded lessons were scored by observers in the same country/economy as the teacher.

  • Teaching materials: Classroom artefacts or instructional materials (hereafter referred to as teaching materials) from the unit on quadratic equations were collected from each lesson that was video-recorded as well as the next day's lesson on quadratic equations. The teaching materials collected included lesson plans, visual aids, handouts, textbook pages and student assignments from each of these lessons, as well as the subsequent local examination that included quadratic equations. The teaching materials collected for a lesson were considered a set. Each set was coded by two raters.

  • Student Achievement Pre- and Post-Test: Achievement tests were constructed using items submitted by countries and economies. Using these items, common tests were developed and administered in all participating countries/economies. Detailed information on test construction and item function can be found in the Technical Report. The tests were administered to all students in the selected class or section for each participating teacher. The pre-test was to be conducted within two weeks before the start of the quadratic equation unit. The post-test was to be conducted within two weeks of the conclusion of the quadratic equation unit. The pre-test covered general mathematical topics and the post-test was specific to the domain of quadratic equations to provide a more precise estimate of the knowledge and understanding of the subject.

  • Student Pre- and Post-Questionnaire: Students in the targeted classroom or section completed two questionnaires: one before the quadratic equation unit and one after. These questionnaires covered information on student background, lesson context, teaching and learning processes, and student dispositions (sometimes labelled non-cognitive outcomes of student learning). Measures included, family wealth, learning time both within and out of school, students' perception of and participation in classroom activities, and their self-efficacy beliefs related to mathematics. The questionnaires were developed, primarily, using items from TALIS and PISA and with input from participating countries and economies. Detailed information about questionnaire development is available in the Technical Report.

  • Teacher Pre- and Post-Questionnaire: The teacher questionnaires covered teacher background and education, teachers' beliefs, teachers' motivation, teachers' perception of the school environment, the selected class, the selected unit, including lesson goals, teaching practices used, and teachers' judgment of the effectiveness of the unit and if the video-recorded lessons were representative of typical instruction. Items for the teacher questionnaires were primarily drawn from TALIS with input from the Technical Advisory Group and countries/economies. Information about the development of the teacher questionnaires can be found in the Technical Report.

  • Teaching Log: Attached to the teacher's pre-questionnaire was a teaching log. Teachers were instructed to keep the log and complete it each time they taught quadratic equations throughout the unit. The teaching log was used to document the number and length of lessons in the quadratic equation unit, and the mathematics content covered during each lesson in the unit.

In preparation for the development of the Study measurement instruments, a conceptualisation of quality mathematics teaching was developed, resulting from the integration of three bodies of knowledge: country and economy conceptualisations, TALIS and PISA frameworks, and a review of international literature on teaching. At a broad level, these multiple views of quality teaching were similar; however, at the detailed level, they regularly emphasised different practices and defined those practices somewhat differently. In general, these differences were resolved through iterative discussions with participating countries/economies and technical experts until a set of six domains that captured teaching in ways that aligned with all three bodies of knowledge was agreed upon.

Findings presented in this report are organised by three broad analytic domains of teaching: classroom management, social-emotional support and instruction. Instruction is comprised of four subdomains: discourse, quality of subject matter, student cognitive engagement, and assessment of and responses to student understanding (see the Technical Report for additional details about the creation of the analytic domains). While survey items, mostly adapted from PISA and TALIS, could be used to measure aspects of each domain in the student and teacher questionnaires, new coding rubrics needed to be developed and implemented for observations and teaching materials.

Each analytic domain is further operationalised in the observation codes into components and indicators depending on whether and how the valued teaching practices could be seen and evaluated by an observer (Table 2.1). The codes captured behaviours that were observable during lessons and about which observers could make inferences without significant additional information from other sources (e.g. an interview with the teacher or the entire quadratic equations unit plan). The domains, indicators and components are described in greater detail in the Technical Report.

Unlike the video observations of classroom lessons, which offer insight into all six domains, classroom teaching materials offer a more limited profile of teaching. A pilot sample of nearly 1 000 teaching materials provided by the participating countries/economies was reviewed to determine which of the domains could potentially be measured in teaching materials. The review yielded information about the qualities and characteristics of the teaching materials likely to be available during the Main Study. The review also reinforced the general conception that teaching materials, which are static, are not well suited to providing information about real-time interactions among teachers and students during a lesson. As a consequence of the review, teaching material rating was narrowed to a subset of four domains of teaching. The logic of this decision is summarised in Table 2.2.

Specific high-level design principles guided iterative rounds of evidence-centred code development to ensure the Study observation and teaching material codes could be meaningfully applied across participating countries and economies. Observation and teacher material scores were developed to support the following claim: In the lesson, there was quality, nature, or presence of a specific teaching practice. For example, one such claim might be "In lesson X, there was a moderate level of questioning that requires students to explain their thinking." It is important to note that the foundational claim was not that certain practices were better than other practices, but rather that scores conveyed the quality, nature or presence of a specific practice.

This very basic claim – that codes specified the quality, nature or presence of a specific practice – required four design principles. Codes were collaboratively developed to reflect global conceptions of teaching quality and to capture variation in teaching within the eight participating countries/economies. The understanding and application of the codes also had to be scalable so that they could be used in a train-the-trainer model of implementation. This meant the codes had to be defined and operationalised in standardised ways so that bilingual observers could apply them. And finally, to support standardised application across countries/economies, codes had to focus on behaviours or aspects of teaching that are observable on video or present in teaching materials. Four cycles of iterative development with country/economy experts over roughly two years of development activities resulted in the final codes for the Study. In each iterative development cycle, codes were drafted and tested on pilot Study classroom videos and teaching materials; then refined and shared with external experts and countries/economies, and again revised, retested and refined further. Two codes (i.e. components related to risk-taking and clarity) did not work well in all countries and are not included in this report. All codes tested are detailed in the Technical Report.

Once the codes were finalised, training materials for both the observation and teaching material codes were developed in English. Four major principles guided the development of the training materials: i) coherent structure to ensure the scoring scales were consistent within and across countries/economies; ii) equitable country/economy representation so that the observers developed a broad understanding of how teaching and learning looked globally; iii) robustness to educate bilingual observers who had a mathematics background in quadratic equations but were not mathematics experts; and iv) explicitness to be effective in a face-to-face train-the-trainer approach. Master observers from each country/economy were trained together during two weeks of face-to-face sessions and the master observers then trained the country-level observers face-to-face in their own countries/economies. Quality control rating processes of certification, calibration, validation and double-rating were put in place to ensure reliability and validity. These processes are described in the Technical Report.

Calculation of observation scores: Because observers are prone to many types of errors when rating human performance, the Study divided observations into shorter segments for rating; this decreased the cognitive demand on observers and provided information on how teaching varies within lessons. Observers paused the lesson video and rated it every 16 minutes (components) or 8 minutes (indicators). These procedures produced multiple sets of component or indicator ratings commensurate with the length of the lesson. In addition, each video and each set of teaching materials was rated independently by two observers.

The multiple sets of observation ratings were then averaged to get an observation component score for a particular classroom by:

  1. 1. averaging the two observers' segment-level ratings for each segment to obtain segment averages  

  2. 2. averaging over the segment averages for all segments associated with a lesson to obtain a lesson average (if a teacher only had one lesson, this score was used as the classroom-level score)  

  3. 3. averaging the two lesson averages for a teacher to obtain the classroom-level score.

Indicators were rated on various scales, some of which were less amenable to the approach described above for components. The Technical Report (Chapter 17) provides a complete description of indicator aggregation methods.

Calculation of teaching materials scores: Because many individual teaching materials provided scant information about students' learning experiences and increased burden on raters, the Study used the lesson as the unit of rating. All teaching materials associated with a lesson were rated as a set. Each set of teaching materials were scored by two raters. Component scores for teaching materials for a teacher were subsequently created by:

  1. 1. averaging the two raters' scores for the set of teaching materials associated with a single lesson

  2. 2. averaging the component scores across all teaching materials sets for a teacher.

The one exception to this rule is the examination or test that covers quadratic equations. Country/economy experts reported that these local examinations usually occur after the unit is completed and they would not be associated with any specific day. Thus, the decision was to rate them separately, while the number of components were reduced to those for which there was likely to be evidence in the unit test.

To understand the relationship between teaching practices and student achievement and non-cognitive outcomes, the Study needed to obtain a diverse sample of teaching practices. To achieve that goal, the Study developed a stratified, two-stage probability sampling design. With the exception of Germany*1 (which did not participate in TALIS 2018), each country and economy was provided with a list of 100 schools (and another 200 backup schools) that were randomly selected from the school rosters used for TALIS 2018. However, final samples were not fully representative in four countries/economies as detailed below. In most of the countries, schools were ISCED level 2, where the focal topic was taught. Schools in Biobío, Metropolitana and Valparaíso (Chile) (hereafter “B-M-V [Chile]”) were ISCED level 3 because quadratic equations are taught at that level.

Countries and economies approached schools about participation. If a school declined participation, then countries/economies attempted to recruit the first backup school and then the second until all backup schools were exhausted. Once a school agreed to participate, schools provided a list of teachers who taught quadratic equations. From that list, three teachers were randomly selected. Countries were to approach teachers for participation, one at a time, in the order they were selected. Once a teacher initially agreed to participate, consent forms to be video-recorded were sent home with students. The teacher and at least 15 students, or 50% of the class, had to consent for the teacher to remain in the Study. If that threshold was not met, the country/economy went to the next teacher on the list and started the process over again. If the country/economy exhausted the list of three teachers without reaching the consent threshold, they were to move on to the first backup school for the originally selected school. They were to repeat this recruitment process until they obtained 85 teachers in 85 schools.

Recruitment of schools and teachers was challenging for most of the countries and economies. As a result, sampling deviations occurred in order to obtain a sample as large as required for the Study while still trying to achieve a diversity of teaching practice. Table 2.3 summarises the types of sampling deviations by countries and economies in the Study. Further information on sampling deviations, consent rates and characteristics of participating schools, teachers and students is available in the Technical Report.

As Table 2.3 illustrates, B-M-V (Chile), Colombia, Mexico and Shanghai (China) followed the original sampling plan with some fidelity. England (UK), Germany*, Kumagaya, Shizuoka and Toda (Japan) (hereafter “K-S-T [Japan]”), and Madrid (Spain) had sampling deviations of different levels of significance, which reflected the difficulty they faced in recruiting schools and teachers. Notably, Japan sampled within three major cities (Kumagaya, Shizuoka and Toda) and University-affiliated schools only, and Germany* used a convenience sample of volunteering schools mostly from the high ability track (Gymnasium) from seven different states (Baden-Württemberg, Hesse, Lower Saxony, North Rhine-Westphalia, Rhineland-Palatinate, Saxony-Anhalt and Schleswig-Holstein). As the school year drew close, England (UK) started approaching all three teachers in a school at the same time until a teacher was recruited. These difficulties translated into differences in the number of participating teachers and schools which ranged from 103 teachers and schools in Mexico to 38 schools and 50 teachers in Germany* (see the Technical Report).

The number of sampling deviations raised concerns about whether the samples are similar to the diverse population of teachers and students in each country or economy. To determine whether that is the case, the Study samples of teachers and students were compared to PISA 2018 students and TALIS 2018 teachers. Annex 2.A, Table 2.A.1 shows the comparison between the students in the Study and the students in PISA 2018. As PISA is a representative study of 15 year old students, and the Study targeted those grade levels where quadratic equations are taught, differences in student age and grade level are mainly design-based. Differences in gender, immigrant status and parental education, however, might indicate some selectivity in sampling.

The Study averaged far fewer students participating per country/economy (2 118 students) than participated, on average, in PISA 2018 (7 595 students). However, Annex 2.A, Table 2.A.1 illustrates that the students who participated in this Study are roughly similar to those that participated in PISA 2018. The Study’s sample of students is one year younger than students in PISA in six out of eight countries/economies. Also, the Study’s sample of students has more variation in age than is found in the PISA sample, but less variation in grade level for most countries/economies, because the PISA sample is age-based, while the Study’s sample was classroom-based. More students with an immigrant background participated in PISA 2018 than in the present Study in England (UK) and Germany*. Finally, the years of parental education differ by over a year between the Study’s sample of students and the PISA 2018 sample in Germany* and Shanghai (China).

The samples of teachers for the Study differed in gender distribution and experience across participating countries and economies. For example, Table 2.4 shows that the sample in Shanghai (China) was mostly female, whereas Colombia and K-S-T’s (Japan) sample was mainly male. The Study’s sample of England (UK) had less experienced teachers than most other countries/economies, while the sample of Shanghai (China) had the most experienced teachers. Finally, Table 2.4 shows how successful the countries and economies were at meeting the targeted teacher numbers for the Study.

When the teachers in the Study are compared to the sample of mathematics teachers in lower secondary education that participated in TALIS 2018, differences are also found in some countries. Annex 2.A, Table 2.A.2 shows that differences existed both in Colombia and Mexico between participating teachers in the Study and the TALIS 2018 samples in terms of gender and education. In both countries, there were more female teachers in the TALIS 2018 sample than in this Study’s sample. In Colombia, the TALIS 2018 sample of teachers has more education than teachers in this Study’s sample. For example, there are fewer teachers with only an undergraduate degree and more teachers with a Master's degree in the TALIS 2018 sample. In Mexico, the Study’s sample of teachers has more education than the TALIS 2018 sample, with fewer teachers having an undergraduate degree only. Otherwise, few differences are observed between the teachers participating in this Study and those that participated in TALIS 2018.

In addition to differences in teacher characteristics across participating countries and economies, the characteristics of the schools in which the teachers are employed also had many variations. Table 2.5 presents the features of the schools where the Study teachers taught. For example, more than a quarter of B-M-V’s (Chile) sample was private schools, whereas both England (UK) and K-S-T (Japan) had no private schools in their sample. Virtually all of B-M-V (Chile) and Madrid (Spain)’s schools were in urban areas, while only 34% of Germany*'s and 40% of Shanghai (China)’s were urban schools. School size also varied across countries and economies both in the number of teachers in the school and the sizes of the student body. Schools in England (UK) were the largest in the Study and schools in K-S-T (Japan) were the smallest (among countries/economies where data were available).

The Study is a landmark study that required complex data collection. The greatest challenges faced by the countries/economies related to conducting video-based research on teaching, coordinating the complex scheduling and data collection, managing privacy requirements associated with video-recordings and negotiating agreements to share information across countries/economies. This is discussed in greater detail in the Technical Report.

The National Project Managers were responsible for tracking consent and ensuring data were collected using required standardised procedures. They were also charged with applying quality control standards to both videos and teaching materials. Once all data were collected, country/economy researchers entered test and questionnaire data. They also recruited, screened, hired and trained observers, rated the videos and teaching materials using the procedures outlined above, and submitted those ratings. Observers/raters were assigned to rate the videos and teaching materials using an assignment tool. No observer was permitted to rate a teacher’s videos more than once. Observers in some countries did rate teaching materials and a video from the same teacher.

In part, due to the burden of the data collection and the coordination demands of the Study, there were some deviations from the established fielding procedures. Deviations from fielding procedures tended to be singular events (e.g. a fire alarm during a lesson) and rare. These events are described in the Technical Report and, for the most part, did not substantially impact the quality of the data. The exception was Madrid (Spain), which had significant fielding deviations that may have impacted the quality of the data. In particular, the Teacher Log only contained information about more than one lesson for 35 out of the 85 participating classrooms. Also, students' pre-tests and pre-questionnaires cannot be reliably linked to their post-tests and post-questionnaires, thus their comparison was not feasible (see the Technical Report).

The present Study was an ambitious one designed to capture international variation in teaching and to investigate the relationship between different teaching practices and student learning across a range of contexts and countries. At the same time, the Study sought to pioneer new research methodologies that could advance researchers’ understanding of how to measure something as complex as teaching. Inevitably, there were tensions between trying to understand the complex nature of teaching and the scale that is feasible in international studies. As a result, the methodological decisions made for the Study may create concerns about the reliability and validity of its findings.

One area of concern when using observational measures to research teaching is how many lessons need to be observed to produce a reliable measure of teaching quality that, when paired with other data, can detect a relationship between teaching and student outcomes. The Study observed two lessons of each teacher, in line with what is viewed as best practice amongst researchers. For example, Taut and Rakoczy (2016[1]) found that the number of observers and the number of components per observation dimension had the most significant impact on reliability rather than additional lessons. Similarly, Ho and Kane (2013[2]) found that having a second observer had twice the gains in reliability than having an observer watch an additional lesson from a teacher, in particular when the second observer rates a different lesson than the first observer. Thus, while the Study was limited to two observations per teacher, its decision to double rate lessons and assign observers in such a way that no observer saw the same teacher twice, considerably increased the reliability of observation scores. The potential for excessive observer cognitive demand was handled by segmenting lessons into segments. Indeed, the Study reached a level of quality control in its video ratings comparable to other major national studies, even when working at an international level. For example, average observer-to-observer agreement for component scoring was 50-56% for an exact match and 86-92% for an adjacent match. For indicators, observer-to-observer agreement was even higher with 88-91% with an exact match and 98-99% with an adjacent match (Technical Report).

A further area of concern surrounds whether lessons that are videotaped result in student or teacher behaviour being affected such that typical lessons are not captured. The Study asked students and teachers whether they perceived any difference between videotaped and regular lessons. There were significant differences between countries on the percentage of teachers who prepared more carefully for videotaped lessons. For example, in K-S-T (Japan) only 19% of the participating teachers said they had prepared a bit more or much more carefully, whereas, in Shanghai (China), it was 53% of responding teachers (Figure 2.1).

Students reported that practices such as providing a clear structure, managing the class so no time was wasted and giving social-emotional support happened more often than normal though differences were not significant in all countries/economies. While there is evidence from both teachers and students that the videotaped lessons may have differed in some ways from typical lessons in the unit, previous research on this issue indicates this may not significantly influence conclusions drawn from videotaped lessons (Praetorius, McIntyre and Klassen, 2017[3]).

A third area of concern for a study like the present one may be the concentration on a focal topic. In the TIMSS Video Study, Stigler and colleagues (1999[4]), argued for trying to sample lessons that covered a broad array of topics as they deemed identifying a single topic for comparability practically impossible. Meanwhile, a focus on mathematics and in particular quadratic equations may mean that the Study is unable to observe certain teaching practices that are more prevalent in other subjects or that manifest in a different way. For example, a teacher may provide explanations differently when teaching a different topic. They may also use a different questioning strategy to engage students.

However, uncovering relationships between teaching and student learning may be facilitated by focusing on a single topic. Since the TIMSS Video Study 1999 was conducted, studies have highlighted the importance of exposure to curricular content and its correlation with student outcomes on international assessments, and in particular with regard to mathematics learning (Kuger et al., 2017[5]; Schmidt et al., 2015[6]). Moreover, video-based studies have also shown that a focus on a single topic can clarify differences in teaching quality in cross-country comparisons. Focusing on a single topic can make observational measures more ‘instructionally sensitive’ (Popham, 2007[7]) and allow for alignment between measures of teaching quality and student outcome measures. Nevertheless, in practice, countries differ in the details of how the content is implemented, as shown in Chapter 6 of this report, which might cause variations in the level of alignment of the topic across countries. Also as shown in Chapter 6, countries/economies varied significantly in the length of time spent teaching quadratic equations with some teaching as few as 6 lessons and others teaching more than 15.

Despite the Study’s carefully considered design, when reading the findings presented forthwith, it is important to be mindful of some of the limitations of the study. Notably, the Study is not a cross-country/economy ranking of the quality of teaching. Judgements about how the quality of teaching in one country directly compares to the quality in another need to be undertaken with a lot of caution. Despite similarities with the PISA and TALIS samples, important sampling deviations occurred in some countries/economies (Table 2.3). Furthermore, whilst the survey and test measures of the Study reach similar levels of comparability as in PISA or TALIS, more direct measures of teaching remain innately difficult to standardise to the same degree. As outlined previously, the Study established extensive quality control measures for video and teaching material rating and reached solid standards of quality control, but comparability across countries cannot be perfectly established and some measures of teaching such as “respect” or “encouragement and warmth” may vary in their meaning between countries.

Despite all these caveats and limitations, the Study provides a unique depiction of teaching at an international scale. It points towards patterns of teaching and many interesting cases of variation within countries/economies, which may vary further or differently with larger samples of teachers and which serve as potential avenues for future research and dialogue.


[2] Ho, A. and T. Kane (2013), The Reliability of Classroom Observations by School Personnel, Measures of Effective Teaching Project, Bill and Melinda Gates Foundation.

[5] Kuger, S. et al. (2017), “Mathematikunterricht und Schülerleistung in der Sekundarstufe: Zur Validität von Schülerbefragungen in Schulleistungsstudien”, Zeitschrift für Erziehungswissenschaft 36, pp. 61-98, https://doi.org/doi:/10.1007/s11618-017-0750-6.

[7] Popham, W. (2007), Instructional insensitivity of tests: Accountability’s dire drawback, https://doi.org/10.1177/003172170708900211.

[3] Praetorius, A., N. McIntyre and R. Klassen (2017), “Reactivity effects in video-based classroom research: an investigation using teacher and student questionnaires as well as teacher eye-tracking”, in Videobasierte Unterrichtsforschung, https://doi.org/10.1007/978-3-658-15739-5_3.

[6] Schmidt, W. et al. (2015), “The Role of Schooling in Perpetuating Educational Inequality: An International Perspective”, Educational Researcher, https://doi.org/10.3102/0013189X15603982.

[4] Stigler, J. et al. (1999), The TIMSS Videotape Classroom Study: Methods and Findings from an Exploratory Research Project on Eighth-Grade Mathematics Instruction in Germany, Japan, and the United States., National Center for Education Statistics (NCES), Washington DC.

[1] Taut, S. and K. Rakoczy (2016), “Observing instructional quality in the context of school evaluation”, Learning and Instruction, Vol. 46, pp. 45-60, https://doi.org/10.1016/j.learninstruc.2016.08.003.


← 1. Germany* refers to a convenience sample of volunteer schools.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2020

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at http://www.oecd.org/termsandconditions.