Chapter 12. Teaching basic experimental design with an intelligent tutor

David Klahr
Stephanie Siler
Department of Psychology, Carnegie Mellon University

Students in middle and elementary school have a poor understanding of basic experimental design – commonly known as the “Control of Variables Strategy” (CVS). The TED Tutor is an intelligent, computer-based, tutor that adapts instruction on experimental design to individual students based on its assessments of their knowledge and ability, and provides continuous feedback on students’ actions. We are embedding the TED Tutor in an adaptive computer-based instructional context in which a child selects a topic for a science fair project, and then designs and implements an experiment to explore that topic. The theoretical contribution of the Inquiry Science Project Tutor will be to determine the extent to which presenting CVS instruction in the context of other inquiry activities elicits sceptical scientific mindsets that evoke science goals of identifying causal factors, rather than engineering goals of trying to achieve specific outcomes. The practical aspects will be to increase robust learning when TED instruction guides students in their design of unconfounded experiments.



There is a broad international consensus that children’s understanding of the principles and processes of basic experimental design – commonly known as the “Control of Variables Strategy” (CVS) – is an essential component of science, technology, engineering and mathematics (STEM) education. For example, in the United States, the recently published “Next Generation Science Standards” recommends that, starting at “the earliest grades”, students should learn how to “plan and conduct an investigation collaboratively to produce data to serve as the basis for evidence, using fair tests in which variables are controlled”, The “National Curriculum” for England includes a “statutory requirement” that, during grades 5 and 6, 9-10 year-olds, pupils should be taught to plan different types of scientific enquiries to answer questions, including recognising and controlling variables where necessary”. The curriculum further states that “working scientifically” will be developed further at key stages 3 and 4, 12-14 year-olds, once pupils have built up sufficient understanding of science to engage meaningfully in more sophisticated discussion of “experimental design and control”. Mastery of the experimental method is included in South Korea’s science standards (Wichmanowski, 2015[1]) as well as Japan’s national science standards (Ministry of Education, 2008[2]). And even though its fundamentals are not explicitly addressed by the German national science standards, CVS’s underlying logic and procedures are characterised as crucial sub-skills of “experimental competence” (Wellnitz et al., 2012[3]). Furthermore, both national and international assessments (e.g. TIMSS, PISA) invariably include several items assessing CVS-related skills and understanding.

A solid understanding of the causal reasoning that underlies unconfounded experiments is necessary for both the design and interpretation of their outcomes. This knowledge can also be applied well beyond the science classroom, for example, when citizens attempt to understand and interpret correlational findings, such as those publicised in the media, often presented to support a particular public policy. Having this knowledge may help to protect citizens from uncritically accepting findings from correlational studies, or those containing confounds.

However, instructional research in the United States has repeatedly demonstrated that students – from third to seventh grade – have a surprisingly poor understanding of CVS (Chen and Klahr, 1999[4]; Siler et al., 2010[5]). Moreover, international assessments have consistently found poor performance on items that assess children’s mastery of experimental process skills – including CVS. Consider, for example, two of the items that were on the open-ended grade 8 test (TIMSS, 2011[6]). For the item in Figure 12.1, only 14% of students (worldwide) provided the correct answer, with scores for individual countries ranging from 2% in Indonesia to 44% in Singapore. For the item in Figure 12.2, the average was 21%, with individual country scores ranging from 3% in Saudi Arabia to 65% in Japan.

Figure 12.1. Example of a TIMMS test item to assess the ability of grade 8 students to design an experiment to answer a specific question
Released-construction response item (S042297) assessing grade 8 students’ experimental skills
Figure 12.1. Example of a TIMMS test item to assess the ability of grade 8 students to design an experiment to answer a specific question

Note: A correct response refers to either 1) planting (seeds from) green and red peppers AND observing the colour of the fruit; or 2) planting (seeds from) green peppers AND observing if the fruit turns red. Example: “I would take one seed from each of the peppers and plant them under the same condition and at the same time. Observe them at the same time after the peppers start to grow. If the red peppers become red and the green peppers did not, this would show that the red and green peppers are a different kind.”

Source: (TIMSS, 2011[6])

Figure 12.2. Example of a TIMMS test item to assess grade 8 students’ understanding of the concept of “all other things being equal” in an unconfounded experiment
Figure 12.2. Example of a TIMMS test item to assess grade 8 students’ understanding of the concept of “all other things being equal” in an unconfounded experiment

Note: This item requires the student to notice – and correctly name – at least one of the following features that are the same in each “Setup”: the temporal interval (2 minutes) and duration (10 minutes), the beakers (same shape, size and materials), the water (same volume and type). Several other factors, unmentioned in the diagram or accompanying text are – presumably – also the same in each setup: the thermometer type, position for taking readings); the location and surrounding temperature of each setup.

Source: (TIMSS, 2011[6])

Our own studies suggest that within-country variance on CVS skills (as well as much broader knowledge of scientific processes (Normile, 2017[7])) may be even greater than between-country variance. In one study conducted with students from a mid-sized metropolitan area in the mid-Atlantic region of the United States, we found extremely substantial SES-associated discrepancies in children’s initial understanding of CVS, as well as in their post-instructional mastery rates in transferring any knowledge of CVS to other domains (13% and 62%, respectively) (Siler et al., 2010[5]). SES-related differences in understanding of science-related skills such as CVS are very common in the United States. For example, Lorch et al. (2010[8]) and Siler and Klahr (2012[9]) conducted a large-scale evaluation of various strategies for teaching CVS with nearly 800 students from 36 different fourth-grade classrooms in the state of Kentucky. Students were taught CVS through interactive classroom discussions. On a post-test requiring them to evaluate experiments, students in schools serving predominately lower-SES populations performed very poorly – only slightly above chance – and significantly worse than students in schools serving predominately higher-SES populations.

In our own research (Siler and Klahr, 2012[9]), analyses of the explanations students gave during remedial tutoring sessions about CVS revealed that lower-SES students were more likely to make characteristic mistakes and harbour robust schema-related misconceptions that interfered with their CVS learning. In particular, when challenged to design an unconfounded experiment that could isolate a causal factor, students would frequently, and incorrectly, interpret the question as asking them to achieve what Schauble, Klopfer and Raghavan (1991[10]) termed an “engineering goal” (producing a desired effect), rather than as a “science” goal (identifying causal factors). Students also expressed their beliefs about the effects of the domain-specific variables (e.g. “I think the steep ramp will make the ball go faster”); that is, students focused on the surface features of instruction rather than on the procedural and conceptual aspects of experimental design (Siler and Klahr, 2016[11]). Moreover, we found that students interpreted “fair comparisons” or “fair tests” as those having equivalent conditions (i.e. where the two condition are set up exactly the same). We have no reason to believe that these types of deep misconceptions and misinterpretations – which can be quite robust – are unique to students in the United States. To the contrary, we suspect that they may be quite general challenges to effective CVS instruction, particularly among students who have little experience with science inquiry.

In contrast, what is predictive of students’ ability to transfer their understanding of CVS to new domains is whether they are able to articulate the rationale for controlling variables (i.e. so that only the one variable under investigation can impact the results) (Siler et al., 2010[12]; San Pedro, Gobert and Sebuwufu, 2011[13]). In another study, (Siler et al., 2011[14]), we had an experimental condition in which students’ understanding of the rationale for controlling was supported by prompting them to indicate which non-focal variables could have caused a hypothetical difference in outcomes between conditions (i.e. we explicitly asked them to identify any potential confounds, or other variables that could have caused the outcome). These students showed better transfer performance than those students who did not receive these additional prompts. Thus, supporting students’ explicit understanding of the rationale for controlling variables appears to be at least one way to produce a robust understanding of CVS.

Science inquiry support

Although CVS is fundamental to the scientific enterprise, it tends to be presented in science textbooks in a shallow manner. The short shrift given to CVS, per se, is exemplified in one widely used fourth-grade science text that allocates only 8 of its nearly 600 pages to lessons about “the experimental method”. Typically, only the procedures for designing experiments are explicitly taught in textbooks, while the conceptual basis for why those procedures are necessary and sufficient for causal inference is seldom addressed. For example, in Foresman’s “Science” (Foresman, 2000[15]; Foresman, 2003[16]), middle school textbooks, experimental design is explained as: “Change one factor that may affect the outcome of an event while holding other factors constant.” Nothing else is mentioned. Similarly, in the FAST curriculum textbook, “The Local Environment” (Pottenger and Young, 1992[17]), experimental design is explained in the context of an experiment as: “This second group will be used for the control. What happens to the control is the basis for comparing effects.” Again, nothing further is mentioned. Such brief statements about experimental design procedures, without subsequent instruction on the rationale for such designs, appear to be the norm, at least in the US textbooks that we have examined.

Similarly, websites aimed at providing support to teachers who engage their students in experiment-based science inquiry (e.g. Science Buddies:, Discovery Education: http://www.discoveryeducation) tend to briefly address procedures for conducting controlled experiments without further discussing the underlying rationale. For example, the popular Science Buddies asserts: “It is important for your experiment to be a fair test. You conduct a fair test by making sure that you change only one factor at a time while keeping all other conditions the same” (Note the implicit use of “conditions” and “factors” as synonymous!). Similarly, the only explicit CVS instruction presented in a 21-minute video on the Scientific Method on the website Discovery Education is simply: “Identify a single test variable and control other variables, so only one condition is being tested.”

Further, web-based materials for experiment-based science inquiry (e.g. Science Buddies; Discovery Education) generally fail to provide direct support for active student learning. That is, they do not include scaffolding for students as they set up their own experiments or provide feedback, even though such instructional actions have been shown to promote student learning (Bloom, 1984[18]; Vanlehn, 2011[19]). Rather, most online websites that address experiment-based science inquiry offer pre-existing science projects that students can choose, accompanied by step-by-step instructions for doing a particular experiment (e.g. Science Buddies:;; The Lawrence Hall of Science: Thus, students are not given the opportunity to design their own experiments and receive feedback on the quality of their experiments.

Some websites do include features that allow students to design experiments (e.g.; however, they often do not provide feedback on the quality of the experimental design. PhET ( is a popular website that provides various simulations of physical processes (e.g. alpha decay, pressure, electrical circuits) that students can manipulate to see how variables are interrelated; however, direct feedback on experimental design is not given. Going further, Inq-ITS ( allows students to form hypotheses, design experiments, virtually run experiments in given domains and draw conclusions. Although it does provide immediate feedback on some student actions, it does not provide automatic feedback on the experiments they design. Among the few publicly available online websites that provide scaffolding and feedback to students as they engage in inquiry processes, WISE (Linn, Clark and Slotta, 2003[20]; Slotta and Linn, 2009[21]) does provide support for students’ exploration of various science topics. However, the instructional emphasis is on conceptual learning and knowledge integration rather than domain-general experiment-based inquiry skills. In sum, we found no freely available online programmes or websites that supported middle school-aged students’ active engagement when designing experiments that were also interactive – providing student-specific support, including feedback and scaffolding.

TED Tutor: Overview

Given 1) the centrality of CVS mastery to a large part of any curriculum that includes the experimental aspects of science; 2) the poor understanding of these concepts often found among middle school students, including the array of alternative conceptions, misconceptions and misunderstandings of CVS instruction; and 3) the dearth of publicly available online programmes or websites that support active engagement with experimental design, we developed the TED Tutor (publicly available at: The TED Tutor adapts instruction to individual students based on its assessments of their knowledge (including misconceptions) and ability, and provides students with continuous feedback on their actions. These pathways are shown in Figure 12.3

Figure 12.3. Overall flow of TED Tutor, showing branching instructional event paths based on student responses to various assessment events.
Figure 12.3. Overall flow of TED Tutor, showing branching instructional event paths based on student responses to various assessment events.

Because the rationale for controlling variables is a relatively complex concept for middle school students, we hypothesised that students who are better able to integrate information (i.e. are able to make deductive inferences) would be better able to understand and articulate this rationale. In an analysis of log-file data that was generated by students using TED, we found that students’ deductive reasoning achievement scores were highly predictive of whether students explicitly expressed an understanding of the rationale for controlling variables, which (as previously discussed) was in turn significantly related to students’ CVS transfer performance (Siler et al., 2010[12]) Thus, in the TED Tutor, students’ deductive reasoning is initially assessed (#1 of Figure 12.3) and can be used to determine the type of instruction they receive from TED.

After they complete the deductive reasoning test, students’ initial understanding of CVS is assessed (#2-#3 of Figure 12.3) In the “Ramp pre-test”, in which students design an experiment for each of the four ramp variables, they are asked to indicate why they set up their experiment as they did by selecting responses from a series of drop-down menus, starting with their goal in setting up the experiment. This is intended to prompt explicit metacognitive reflection on what otherwise might be implicit goals.

Pathway 1 (higher-ability). Students who show a basic understanding of CVS (i.e. who at least contrasted the variable under investigation on the Ramp pre-test) and/or better reasoning skills can be taken to the “baseline” instruction of TED (#7-#9 in Figure 12.3). In this baseline instruction, which is based on instruction given in Chen and Klahr (Chen and Klahr, 1999[4]), students evaluate three given experiments, and are given feedback on their responses. To promote their understanding of the rationale for controlling variables, they are asked if they could “tell for sure” that the focal variable caused a hypothetical difference in outcomes across conditions. To further reinforce their understanding of the rationale for controlling variables, they are asked whether each of the other (non-focal) variables could have caused the hypothetical outcome, and then they receive feedback on their responses.

Pathway 2 (lower-ability). Students who perform poorly on the CVS pre-tests and/or deductive reasoning test (#1) – and who therefore may be less able to follow the explanations given in the baseline instruction – are given a simplified, “step-by-step” version of the initial instruction. In this instruction, which is given before the baseline instruction, students are scaffolded in applying [the] three basic rules of CVS:

  • R1: Identify the variable under investigation

  • R2: Contrast that variable

  • R3: Control/make same all other variables

Students’ responses inform a Bayesian Knowledge Tracing engine, which determines how many rounds of questioning to give an individual student (#4-#6 and #10 of Figure 12.3). We have found that this simplified instruction supports students’ understanding of the goal of the task as learning how to set up experiments that allow them to find out whether or not a variable affects an outcome. Students then progress to the baseline instruction for further instruction on the rationale for controlling variables and afterward set up experiments in the instructional domain (“Ramp post-test”) and other domains (“Story post-test”).

Effect of adaptive instruction for lower-reasoning students. We compared the effect of adding Pathway 2 to TED for lower-reasoning US sixth- and seventh-grade students (i.e. students who scored low on the deductive reasoning pre-test). As expected, the low-reasoning students performed significantly better on the transfer post-test when they were assigned to the more incremental Pathway 2 than higher-level Pathway 1. However, higher-reasoning students performed similarly in Pathway 1 and Pathway 2. In summary, adapting instruction to individual students’ deductive reasoning skills led to better outcomes. In particular, the addition of the lower-ability pathway improved transfer outcomes among lower-reasoning students.

Policy Implications

As noted at the beginning of this chapter, understanding how to design and interpret experiments is an essential component of STEM literacy, and every K-12 science curriculum includes many opportunities for children to engage in the experimental process. Nevertheless, there is consistent evidence from national and international assessments that a solid grasp of the process and rationale underlying the creation, execution and interpretation of informative experiments is exhibited by a scant proportion of the world’s population. Thus, it is important to develop instructional procedures that will increase the likelihood that children will master both the procedural and conceptual aspects of CVS.

However, an oft-repeated critique of the number of substantive topic areas crammed into the K-12 science curriculum in the United States is that it is “a mile wide and an inch deep” (Schmidt, 1997[22]). But teachers attempting to convey the substantive knowledge base in their disciplines rarely have the luxury of devoting a full class (or two) to teaching the domain-general aspects of CVS procedures and concepts. Instead, at the beginning of a class devoted to some lab work associated with a particular topic, teachers typically introduce a brief overview (if any at all) about experimental procedures and concepts, but almost always in the domain-specific context of the particular topic being taught.

However, even though, as we noted earlier, instruction on CVS is rarely given adequate time, students’ understanding of it is invariably assessed on high-stakes tests. We created TED to address this problem. Our vision is that – prior to a lesson involving, for example, an experiment in electricity, or simple machines – teachers could direct students to TED’s online, user-friendly, adaptive and individualised instruction that would bring them “up to speed” with respect to the rudimentary conceptual and procedural aspects of a “good experiment”. Having completed this kind of domain-general, albeit limited, instruction, students would be in a much better position to really understand the steps and the reasoning that will enable them to obtain information and further develop domain-specific knowledge from their subsequent experiments about various topics.

As the next step in our research, we are embedding the TED Tutor in a context in which children will be engaged in selecting a topic for, and then designing and implementing, an experiment to create a science fair project. This Inquiry Science Project Tutor (ISP Tutor) has both theoretical and applied aspects. The theoretical contribution will be to determine the extent to which presenting CVS instruction in the context of other inquiry activities elicits the type of sceptical scientific mindset that evokes a science goal, i.e. a goal of identifying causal factors, rather than an engineering goal: trying to achieve a specific outcome. From our earlier work, we expect the elicitation of science goals to lead to increased learning and transfer. The practical aspects will be to increase learning and transfer outcomes when TED instruction guides students in their design of unconfounded, albeit highly motivated, experiments. The ISP Tutor will provide support to students as they conduct inquiry activities about topics largely of their own choosing. We believe that this project is timely, given the accumulated findings of the importance of engaging students in such activities (Minner, Levy and Century, 2010[23]), as reflected in recent guidelines such as the US K-12 Framework (2012) and NGSS (2013); England’s National Curriculum (2014); Japan’s Courses of Study (2008) (Ministry of Education, 2008[2]); and Singapore’s Science Curriculum Framework (Ministry of Education, 2008[24])


[18] Bloom, B. (1984), “The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring”, Educational Researcher, Vol. 13/6, p. 4,

[4] Chen, Z. and D. Klahr (1999), “All other things being equal: Acquisition and transfer of the control of variables strategy”, Child Development, Vol. 70/5, pp. 1098-1120,

[16] Foresman, S. (2003), Science. Pearson Education, Pearson, Upper Saddle River, NJ.

[15] Foresman, S. (2000), Science. Pearson Education, Pearson, Upper Saddle River, NJ.

[20] Linn, M., D. Clark and J. Slotta (2003), “WISE design for knowledge integration”, Science Education, Vol. 87/4, pp. 517-538,

[8] Lorch, R. et al. (2010), “Learning the control of variables strategy in higher and lower achieving classrooms: Contributions of explicit instruction and experimentation”, Journal of Educational Psychology, Vol. 102/1, pp. 90-101,

[24] Ministry of Education (2008), Science Syllabus Primary 2008,

[2] Ministry of Education (2008), Shougakkou Gakusyuu Shidou Youryou (Course of Study for Elementary Schools).

[23] Minner, D., A. Levy and J. Century (2010), “Inquiry-based science instruction-what is it and does it matter? Results from a research synthesis years 1984 to 2002”, Journal of Research in Science Teaching, Vol. 47/4, pp. 474-496,

[7] Normile, D. (2017), “One in three Chinese children faces an education apocalypse. An ambitious experiment hopes to save them”, Science,

[17] Pottenger, F. and D. Young (1992), FAST 1: The local environment / Curriculum Research and Development Group, University of Hawaii. - Version details - Trove, Curriculum and Development Group, Honolulu,

[13] San Pedro, M., J. Gobert and P. Sebuwufu (2011), “The effects of quality self-explanations on robust understanding of the control of variables strategy”, paper presented at The Annual Meeting of the American Educational Research Association, (accessed on 7 November 2018).

[10] Schauble, L., L. Klopfer and K. Raghavan (1991), “Students’ transition from an engineering model to a science model of experimentation”, Journal of Research in Science Teaching, Vol. 28/9, pp. 859-882,

[22] Schmidt, W. (1997), “A splintered vision: An investigation of U.S. science and mathematics education. Executive summary”, Wisconsin Teacher of Mathematics, Vol. 48/2, pp. 4-9.

[11] Siler, S. and D. Klahr (2016), “Effects of terminological concreteness on middle-school students’ learning of experimental design”, Journal of Educational Psychology, Vol. 108/4, pp. 547-562,

[9] Siler, S. and D. Klahr (2012), “Detecting, classifying, and remediating”, in Psychology of Science, Oxford University Press,

[12] Siler, S. et al. (2010), “Predictors of transfer of experimental design skills in elementary and middle school children”, in Proceedings of the 10th ITS 2010 Conference Lecture Notes in Computer Sicence, 6095,

[14] Siler, S. et al. (2011), The effect of scaffolded causal identification in the transfer of experimental design skills,

[5] Siler, S. et al. (2010), “Training in experimental design: Developing scalable and adaptive computer-based science instruction”, Poster presented at 2008 IES Research Conference.

[21] Slotta, J. and M. Linn (2009), WISE Science: Web-Based Inquiry in the Classroom, Teachers College Press,

[6] TIMSS (2011), Assessment Copyright 2013 International Association for the Evaluation of Educational Achievement (IEA), TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

[19] Vanlehn, K. (2011), “The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems”, Educational Psychologist, Vol. 46/4, pp. 197-221,

[3] Wellnitz, N. et al. (2012), “Evaluation der bildungsstandards - eine fächerübergreifende testkonzeption für den kompetenzbereich erkenntinisgewinnung (Evaluation of national educational standards - an interdisciplinary test design for the competence area acquirement of knowledge)”, Zeitschrift für Didaktik der Naturwissenschaften (ZfDN), Vol. 18, pp. 261-291,

[1] Wichmanowski, S. (2015), Science Education in South Korea: Towards a Holistic Understanding of Science and Society, Fulbright Distinguished Awards in Teaching Program,

End of the section – Back to iLibrary publication page