4. Assessing complex problem-solving skills through the lens of decision making

Carl Wieman
Stanford University
Argenta Price
Stanford University

This chapter elaborates on one of the 21st Century competencies introduced in Chapter 1 of this report. We provide an example of applying the assessment triangle (see the Introduction chapter) to define and develop assessments for the construct of complex problem solving in science and engineering (S&E). Complex problem solving, particularly in S&E fields, is a core competency of the modern world. It is the instantiation of adaptive expertise in those fields – expert scientists and engineers are not experts because they are good at following a specific procedure or technique, rather it is because they are good at applying their knowledge and technical skills to solve complex problems in their work. Thus, problem solving is what scientists and engineers do and it is the primary goal of their education and training. Many newer science standards include problem solving, or aspects of problem solving, at the core of their standards – for example, the Next Generation Science Standards (NGSS) practice of “asking questions (science) and defining problem (engineering)” (National Research Council, 2013[1]). In the past decade, the Programme for International Student Assessment (PISA), NGSS and others have worked to improve science, technology, engineering and mathematics (STEM) education by better defining the desired core competencies and measuring how well educational systems were teaching such competencies. Problem solving and its associated skills underlie most of the cognitive and process competencies that have been identified. Here we provide a more detailed characterisation of the problem-solving process in S&E and we argue why a better characterisation of problem solving (the cognition vertex of the assessment triangle) will allow more successful assessment and teaching (interpretation vertex) of these competencies. Furthermore, we introduce assessment designs that allow observation of the constituent components of problem solving (observation vertex).

A construct to be assessed needs to be carefully defined in operational terms (i.e. the cognition vertex of the assessment triangle) for assessments to be designed to collect evidence about students’ performance on elements of that construct (i.e. the observation vertex). Extensive research on problem solving and expertise has been conducted over many decades (Frensch and Funke, 1995[2]; Csapó and Funke, 2017[3]; Dörner and Funke, 2017[4]; Ericsson et al., 2018[5]). Much research has focused on looking for cognitive differences between experts and novices using limited and targeted tasks (Chi, Glaser and Rees, 1981[6]; Hegarty, 1991[7]; Larkin et al., 1980[8]; McCloskey, 1983[9]; Card, Moran and Newell, 2018[10]; Kay, 1991[11]) and has revealed important novice-expert differences in ability to identify important features and use well-organised knowledge to reduce demands on working memory. Studies that challenged experts with unfamiliar problems also found that, relative to novices, they had more deliberate and reflective strategies and could better define the problem by applying their more extensive and organised knowledge base (Schoenfeld, 1985[12]; Wineburg, 1998[13]; Singh, 2002[14]). The tasks used to measure these expert-novice differences are not very authentic, so a criticism of this body of work is that it is not clear whether what has been measured is necessary or captures what makes someone an expert performer while doing their real-life jobs (Sternberg, 1995[15]).

Prior assessments of problem solving have been developed both as research instruments and for standardised assessment of students. An extensive thread in this work has been to measure “domain-general” problem-solving practices (see Frensch and Funke (1995[2]) for review), but those focus on generalised practices while neglecting the necessary role that disciplinary knowledge plays in the process of solving authentic problems. Recent work on the standardised assessment of problem-solving skills has recognised the importance of the “acquisition and application of knowledge” and led to the development of more innovative assessments that do not provide all the necessary information up front to assess whether test takers recognise what information they need and how to obtain it. This work has also recognised the uncertainty involved in real problem-solving tasks (Csapó and Funke, 2017[3]). These research and assessment efforts have contributed a better characterisation of the complexities of complex problem solving, culminating in the OECD formulation of a framework of problem solving reflecting these ideas and the development of a PISA exam to assess it (Ramalingam, Philpot and Mccrae, 2017[16]; Csapó and Funke, 2017[3]). Despite these advances, the process by which scientists and engineers specifically solve problems in their discipline has not been well studied and assessing the complex construct of problem solving in its various contexts has remained a challenge.

We have recently completed a study to identify the elements of problem solving in science and engineering, which we use to define the cognition vertex for assessment development. We developed an empirically-grounded framework that decomposes the problem-solving process into discrete cognitive components, or a set of specific decisions that need to be made during the solving process (Price et al., 2021[17]). We chose to focus on decisions because they are identifiable, measurable and important for students to practice. We identified a set of 29 decisions-to-be-made by examining the detailed problem-solving process used by experts from different areas of science and engineering. We observed that nearly all these decisions were made by every expert and that these decisions determine almost every action in the solution process. In making each of these decisions, experts invoke discipline-specific knowledge relevant to the problem’s context. While the specific decisions were identified by examining the processes of experts in solving highly complex problems, most decisions apply to a large range of contexts covering nearly all educational levels. Examples of such decisions include: 1) what information is needed; 2) what concepts are relevant; 3) what is a good plan; 4) what conclusions are justified by the evidence; 5) whether the solution method works; and so on. How well these decisions are made, in a relevant context, is amenable to accurate assessment. Such assessment provides a detailed characterisation of a learner’s problem-solving strengths and weaknesses including how well they can apply their relevant knowledge where needed.

When thinking about the decisions involved in problem solving it is important to consider the type of problem. There is a difference between “school problems” and “authentic problems”, as illustrated in Figure 4.1. Solving the types of problems typically found on school exams and in textbooks requires recognising and following a single, well-established procedure. These problems can be complicated, in that they require multiple steps, but very few decisions are required and hence they provide very limited assessment of problem-solving skills. They do not call upon the cognitive processes required for making many of the decisions essential for solving authentic problems. Authentic problems, like the rocket example in Figure 4.1, have much greater complexity and many unknowns. Unlike the school problem, this – like all real problems – has a mixture of relevant and irrelevant information. Some of the most challenging aspects are recognising what information is needed and how to seek out that information, evaluate its reliability and apply it. In complex authentic problems, all those decisions involve incomplete information or uncertainty and so require judgement to make good decisions. These problems are sometimes referred to as “ill-structured problems” because they cannot be solved by deterministically following a set of instructions (Simon, 1973[19]). A problem can still be authentic, requiring solvers to make decisions instead of following a prescribed procedure, but be constrained to require the knowledge expected of students at a particular level (as in the tree house example in Figure 4.1). This implies that an authentic assessment of problem solving must involve problems that call upon the student to make many decisions including those that require judgement beyond just choosing to follow a memorised procedure.

Typical school problems are thus inadequate for assessing meaningful problem-solving skills and they are also poor at teaching them. In the process of solving such school problems, the student does not practice the problem-solving decisions that are necessary to thrive as a member of the STEM workforce. Research on using more complex problems in teaching, such as the Mars Mission Challenge and Jasper Woodbury problems in secondary school (Hickey et al., 1994[20]; Cognition and Technology Group at Vanderbilt, 1992[21]) and research experiences at the undergraduate level (National Academies of Sciences, Engineering and Medicine, 2017[22]), have demonstrated the benefits of having students practice complex problem solving. Assessments signal what is important for teaching, so to shift teaching toward providing practice of these more meaningful skills assessments also need to require students to make problem-solving decisions. An ideal problem could be used interchangeably as a learning activity or formative assessment in which students work through the problem in groups with timely feedback from instructors, or as summative assessment in which students solve the problem individually.

We studied the problem-solving process by systematically characterising how experts across a wide range of science (including medicine) and engineering fields solved problems in their work (Price et al., 2021[17]). In effect, we carried out a detailed domain analysis focusing on the processes involved in contextualised problem solving in S&E. From the point of view of assessment design, this domain analysis constitutes the basis of the cognition vertex of the assessment triangle. We did this through detailed interviews with over 50 experts, where they described the process of solving a specific authentic problem from their work. We then coded the interviews in terms of the decisions represented, either as explicitly stated or implied by a choice of action between alternatives. Coding these interviews required a high level of expertise in the respective discipline to recognise where choices were being made and why. We focused our analysis on what decisions needed to be made, not on the experts’ rationale for making that decision: in other words, noting that a choice happened, not how they selected and chose among different alternatives for action. This is because, while the decisions-to-be-made were the same across disciplines, how the experts made those decisions varied greatly by discipline and by individual. The process of making the decisions relied on specialised disciplinary knowledge and experience. For example, planning in physics or biology may involve an extensive construction and data collection effort, while in medicine it may be running a simple blood test.

The coding identified a set of 29 decisions that experts made when solving their problems (Table 4.1). There was an unexpectedly large degree of overlap across the different fields with all experts mentioning essentially the same set of decisions. On average, each interview revealed 85% of the twenty-nine decisions and many decisions were mentioned multiple times in an interview. The set of decisions represent a remarkably consistent, yet flexible structure underlying S&E problem solving. Our interviews only show this set of decisions being made across S&E fields, but most of them are likely to apply in the social sciences and humanities as well. Research on expert thinking in history supports this view, as the thinking processes employed by expert historians align quite well with the decisions we have identified – for example, deciding what information is needed, deciding if information is believable, etc. (Wineburg, 1998[13]; Ercikan and Seixas, 2015[23]).

For the purposes of presentation, we categorised the decisions roughly based on the purposes they achieve (Table 4.1). We provide examples for each decision taken from an interview with an ecology professor studying animal migration. There are corresponding decisions from every field, but this example was relatively straightforward for a non-expert to understand. The ecologist heard a study about fish migration that piqued her interest, so she decided to conduct a meta-analysis of published work to investigate whether migration patterns are changing across animal species. In the process, she had to make all the 29 decisions identified.

The actual process is far less orderly and sequential than implied by Table 4.1 or in fact any characterisation of an orderly “scientific method.” Even how problems were initiated varied widely: some experts discussed importance and goals (decisions 1-3importance, fit, goals), but others mentioned a curious observation (decision 20 – anomalies), important features of their system that led them to questions (decisions 4, 6 – features, narrow) or other starting points. We also saw that there were overlaps between decisions where two or more needed to be made together. Many decisions were also mentioned repeatedly, often about different sub-problems within the larger problem or as re-addressing the same decision in response to reflection as new information and insights were developed. The sequence and number of iterations described varied dramatically by interview, making it clear that there was no coherent “scientific process” beyond that nearly all these decisions are made at some point.

A particular feature of all the problem-solving decisions is that there is not sufficient information to know what to do with certainty. If there were, then it would be just a procedure with a clearly defined set of steps to follow and thus could be carried out by a computer. For the problem-solving decisions we have identified there is not complete information, but there is sufficient information to allow a “well-educated guess” as to what would be the best choice between options. Having expertise in the discipline then means being able to use past research and experience in the discipline to make choices that are most likely to provide a desired outcome. This definition of expertise provides a standard by which to measure the quality of an assessment question that asks the student to “make and justify” any one of these decisions.

The decisions-to-be made were consistent across all S&E disciplines we studied and most likely apply in other fields as well. However, we do not believe these decisions can be measured in a domain-general context, because how the decisions were made (and the decision outcome) was completely intertwined with the discipline and the problem. Making any of the decisions requires the application of relevant disciplinary knowledge including recognising when one does not have sufficient knowledge and/or information and so needs to seek it out.

One aspect of knowledge was common across all interviews: experts had their disciplinary knowledge organised in a manner optimised for making problem-solving decisions. Studies of expertise have previously observed highly interconnected knowledge structures (Egan and Greeno, 1974[24]; Klein, 2008[25]; Mosier et al., 2018[26]). We found that in this context of problem solving, these structures served as explicit tools that guided most decisions. These knowledge structures were composed of mental models of the key features of the problem, the relationships between these features and an underlying level of mechanism that established those relationships and enabled making predictions. The models always involved some degree of simplification and approximation such that they were optimised for applying to the problem-solving decisions. The models provided a structure of knowledge and facilitated the application of that knowledge to the problem at hand, allowing experts to repeatedly run “mental simulations” to make predictions for dependencies and observables and to interpret new information. While the use of such predictive models was universal, the individual models explicitly reflected the relevant specialised knowledge, structure and standards of the discipline, which arguably largely define expertise in the discipline (Wieman, 2019[27]).

Examples of such models are: 1) in ecology, organisms with abundant food sources will continue to increase in number until limited by using up the food available or by increased predation; or 2) in physics, electric currents are electrons flowing through materials pushed along by an applied voltage, with the amount of current determined by the resistance of the material through which they are flowing and the size of the voltage.

We define the construct of “science problem solving” as the set of problem-solving decisions required for solving authentic problems in S&E. These decisions define much of the set of cognitive skills a student needs to practice and master “thinking scientifically” in any context. They are also relevant across a wide range of educational levels and contexts. Although these decisions were identified by studying the problem solving of high-level experts, we argue that they provide a broadly applicable framework for characterising, analysing and teaching S&E problem solving across all levels and contexts (except for decisions 1, 2 and 27 that are only relevant at high levels). The difference between educational levels is the relevant knowledge and predictive models needed to make these decisions wisely. Having insufficient knowledge does not negate the need to make the decisions (indeed, recognising when more knowledge is needed is one of the decisions), but the types of problems one could be expected to successfully solve would depend on the level of knowledge required. To assess the level of skill in a grade-level appropriate manner, the assessment problem scenario needs to be appropriate, and the knowledge required for making the decisions needs to match the educational level targeted.

The general applicability of the decisions is supported by other studies of student problem solving. In a study of first year university students solving introductory physics problems, the degree to which students followed the set of decisions in completing their solutions was well correlated with the correctness of their solutions (Burkholder et al., 2020[28]). We have also studied how secondary and post-secondary school students use a computer simulation to investigate and identify a hidden circuit (Salehi, 2018[29]; Salehi et al., 2020[30]; Wang et al., 2021[31]). We observed, through think-aloud interviews, a wide variation in students’ solving abilities, matching the extent to which they correctly made decisions 3-26.

Our framework of decisions is consistent with previous work on “scientific practices” and expertise, but it is more complete, specific, empirically based and generalisable across S&E disciplines. To support this claim, in Table 4.2 we compare our decision framework with the PISA 2015 Scientific Literacy competency framework (OECD, 2015) that is aligned with our vision of contextualised complex problem solving because PISA aims to measure the real-world application of scientific knowledge rather than just knowledge recall. After each PISA competency, we gave the decision number from our decision framework (refer to Table 4.1) that captured how the competency is used in problem solving.

All of the decisions in our decision framework, except decisions 1, 2, 28 and 29, occur in one or more of the PISA scientific literacy competencies and all the PISA competencies are covered by our set of decisions. However, our decision list provides greater specificity in the cognitive skills required and elements that assessment tasks should contain to collect evidence about those skills, so helps to specify what the observation vertex of the assessment triangle should include.

This can be illustrated by looking at examples of the 2015 PISA science test, such as the “Running in Hot Weather” example (Figure 4.2). This unit explores an authentic problem concerning the conditions in which it is dangerous to run and why. To solve such a problem in real life, most of the decisions on our list would be required. However, as written in the PISA example, the unit focuses primarily on two decisions: decision 15 (plan – in the narrow sense of whether they use a control-of-variables strategy) and decision 21 (appropriate conclusions – whether students correctly interpret the data they collect). The problem structure artificially narrows down and decomposes the problem and sets priorities for solving it by limiting students to consider only particular variables. The question in Figure 4.2 decides for the student that they should collect information about the effect of air temperature on body temperature specifically and even specifies how many pieces of data to collect. Essentially, the unit has made decisions 3, 4, 5, 6, 11, 13 and 14 for the students already and provides strong direction about decision 15, making it a very narrow and correspondingly incomplete measure of their scientific problem-solving ability.

Having identified these limitations in the PISA test unit, it is straightforward to set out how to modify it to obtain a more complete assessment of problem-solving decisions. This involves a restructuring of the problem with a few additional questions to probe the student’s ability to make these “missing” decisions. For example, instead of presenting a series of questions about the decomposed relationships within the simulation, students could be asked: in which conditions is it dangerous to run and why? The student could then be prompted to make specific decisions in the problem-solving process such as asking them: 1) “which variables could affect whether it is safe to run?” (decision 4 – features); 2) “what information do you need to collect to figure out whether it would be dangerous to go for a run on a particular day?” (decision 13 – information needed); and 3) “what assumptions are reasonable to make?” (decision 10 – assumptions). The current assessment also fixes how the data is organised – a simple alternative would be to give the student the choice of several different ways for laying out the information, thus assessing decision 17 (how to represent and organise information); then the student would be asked specific questions about interpretation of data. The simulation would have some realistic scatter in the data, as all authentic data has, and in addition to the current data interpretation questions the student would be asked about the believability of the information (decision 18). This demonstrates how the decisions framework provides a complete and more specific guide for assessing scientific competencies. This also illustrates how our set of decisions, with the three exceptions noted, applies to problem solving for almost every grade level given a suitable choice of problem context and questions.

The designers of the 2015 PISA test made choices to constrain the problem and the tested variables to make a test that was practical to score, could be completed in a few minutes and met standards of fairness in terms of specifically telling students exactly what information they needed to provide. This balance of practicality with open-endedness (to allow assessment of the full range of decisions in a problem-solving process) needs to be carefully considered in the design of any assessment. In assessments of complex thinking there is a fundamental tension between the validity of what is being measured and the ease and practicality of administering and scoring a test. The most accurate assessment of expertise would involve giving the test taker a variety of authentic problems to solve which are very broad in scope, such as “design a new cell phone” and then compare their solutions with how experts would solve the same problems – but this is obviously impractical in most circumstances. At the other extreme, an assessment question can be very easy to administer but so limited in context and scope that it involves essentially no meaningful decisions, just simple memorisation, as seen in many poor assessments. Most typical S&E assessments tend toward the “easy” end of the “validity-easy to use” spectrum. They test knowledge of information but seldom test whether the subject can correctly choose and apply knowledge in novel contexts to make good decisions to solve problems.

The optimum balance between authenticity and practicality is different for different level students and for different purposes of assessment (e.g. large-scale standardised tests vs. assessments within a course). Finding the appropriate balance involves choosing a problem context and questions that constrain the problem solver – but not too much – as appropriate for the assessment context. Too much constraint means the important resources and decision processes will not be probed, while too little constraint results in responses that can vary so much that it is impossible to evaluate and compare the detailed strengths and weaknesses of the test takers. Some strategies we have used for constraining the solution space, but not too much, include scaffolding the problem to probe individual decisions, including rescue points where important features are specified or information given in case students did not decide to consider those independently, and providing a flawed solution or data collection plan to be improved upon. These are discussed more in the assessment task design section below.

Figure 4.3 shows a template for designing tasks (i.e. the observation vertex of the assessment triangle) to assess a large subset of problem-solving decisions described in the decision framework (i.e. the cognition vertex) (Price et al., 2022[33]). We have developed assessment tasks following this general design for undergraduate students in mechanical and chemical engineering, biochemistry, medical diagnoses and earth sciences. Problems were designed to take 30-60 minutes to complete, so are most appropriate for use in classroom or department contexts. The template can be used for assessment at lower grade levels by choosing a grade-appropriate scenario and knowledge requirements.

This assessment task starts with a realistic problem scenario that contains irrelevant and incomplete information and that to solve requires knowledge that learners are expected to know. The assessment should then call upon learners to make decisions and provide their reasoning and information used to arrive at and justify their choices. Questions on the assessment probe different decisions and the specific sequence of questions should follow the approximate order an experienced solver would use for that problem. An important part of problem solving is recognising what information is needed and correctly applying relevant information. The decisions involved in this process can be assessed by asking students to decide what information is needed, providing them with new information and then asking them to interpret and reflect upon new information.

Authentic problem scenarios are, by definition, complex, with a wide range of possible solution paths. This demands care in the design of assessment tasks to ensure the questions are sufficiently constrained so that student responses can be interpretable and readily scored, but not too constrained that the decisions are removed (as in the PISA 2015 “Running in Hot Weather” example, Figure 4.2). While each question asked to the test taker probes a different decision, certain questions will end up being interdependent by building on decisions made in previous questions. We have developed a few design strategies to build in the appropriate level of constraint and to include “rescue points” to mitigate the issue of interdependence:

  • Constrain the solution space by providing a solution proposed by a “peer” and have test takers evaluate it and improve on its flaws. The provided “peer” work could be an answer to the problem or could be a plan for a particular sub-goal (e.g. collecting information through an experimental set-up for a biology problem). The process of troubleshooting a flawed solution or design overlaps with much of an expert problem-solving process because both involve evaluating and modifying potential solutions, including mentally testing them with new information. Thus, incorporating this kind of scenario still allows for the assessment of problem-solving decisions while significantly constraining the solution space.

  • Start with broad questions followed by questions about particularly important features. For example, the assessment could start with questions such as “what are the important criteria for deciding when it is safe to run in hot weather?” or “what criteria will you use in evaluating the proposed design?” to avoid leading students, then follow up with rescue points in the form of questions that ask about particularly important features or that provide necessary information that students might not have collected on their own. Examples include “is sweating important for running in hot weather and if so, why?” or “how suitable was the choice of material for the design?”. Learners at intermediate levels may not spontaneously recognise the significance of a factor on the initial broad question but will if prompted, allowing for a better assessment of their skills.

  • Consider the question format. Some decisions, such as reflection decisions, need to be probed explicitly through open-ended questions. Others can be probed with complex multiple-choice questions such as “which tests would you like to order to diagnose your patient?” followed by a list of (relevant and non-relevant) tests the student has to select or that ask for multiple-choice justifications (Walsh et al., 2019[34]). In an assessment that includes a simulation for collecting information, the assessment could also collect information about planning decisions through the actions the student takes (i.e. process data) rather than by asking the student directly. See Wang et al. (2021[31]) and Wang, Nair and Wieman (2021[35]) for work in this area.

The interpretation of data collected from an assessment (i.e. the interpretation vertex) occurs through scoring. For scoring decision-based assessments, the goal is to determine the extent to which the student: 1) decides before being explicitly prompted through built-in scaffolding; and 2) applies the correct reasoning and relevant information when making those decisions (whether prompted or not). For our assessments that probe high-level expertise, such as that desired by a student who has completed a university programme, we need to consult Ph.D. level experts to be sure of the correct information and reasoning. For problem solving assessments at lower levels, such as the PISA example, problems are less complex and require less extensive knowledge such that it is much easier to establish an “expert” decision. For most assessments, an instructor who is knowledgeable in the subject will produce an answer that is the same as someone very expert in the subject. Any question where there is not consistency among expert responses, we believe is unsuitable.

The decision-based assessments discussed above are designed for administration in a computer-based survey format, where students fill in open-ended responses or complete complex multiple-choice questions. As test takers progress through the assessment, they are given more information and asked more questions, and are not able to go back to change earlier responses. In some cases, there is simple branching – for example, the student may only receive the additional information they request.

Simulations are an alternative format that our group has investigated. Two simulations that we have studied extensively are the PhET Interactive Simulations called “circuit construction kit black box” and the “mystery weight” (see Figure 4.4).

The degree of constraint and complexity is built into the simulation, thereby determining the range of decisions and problem-solving skills involved. Simulations such as these are more open-ended than the survey format, in that they allow a less constrained context in which to investigate how students decide on strategies for collecting data and how they analyse it. It is also possible to record and analyse their keystrokes (“back-end data”) to get some measure of their thinking processes. The PISA 2015 simulation example is more constrained than these PhET examples, which allows for an easier interpretation of back-end data in PISA but at the expense of limiting the decisions involved.

This is work in progress, but what we have found so far is that the data collected from simulations are informative for only a limited set of decisions. While students must make many other decisions to solve the problem, it is not practical to determine these from their keystrokes (Wang et al., 2021[31]; Wang, Nair and Wieman, 2021[35]). Analysing problem solving in think-aloud interviews with these simulations, we see many of the decisions invoked in the survey format. Two unique capabilities that we have seen simulations provide is: 1) the measurement of pause time; and 2) the evolution in student strategies. Students who pause after receiving new information from the simulation, presumably to reflect on the significance of that information, perform better than students who quickly try something else with no pause (Wang, Nair and Wieman, 2021[35]; Perez et al., 2017[36]). With the simulation, we can also see that some students start with ineffective strategies but then later realise and adjust their strategies to be more effective, for example running tests that provide more useful data (Salehi, 2018[29]; Salehi et al., 2020[30]).

In conclusion, both survey and simulation format problems can sequentially provide additional information as needed. As of now, we find the survey format to be superior but further work is clearly needed in this area. With suitable affordances (e.g. what can be controlled, what can be observed, whether and how data can be collected etc.) there are likely aspects of problem-solving decisions that simulations will be better able to assess than surveys. Roll, Conati and others (Conati et al., 2015[37]; Perez et al., 2017[36]; 2018[38]) have seen how the use of different simulation designs, question prompts and analyses methods can provide other information. However, the skills and decisions that each test format is optimum for measuring and how consistent they are remain open research questions.

We have presented a novel framework for the assessment of complex problem solving in STEM. It is based on a set of 29 decisions that need to be made with limited information in the process of solving any authentic problem in science, whether in making choices in an individual’s personal life or carrying out scientific research. The decisions were identified through a careful domain analysis by examining how scientists and engineers solved problems in their work, but we have seen how nearly all these decisions apply far more widely (across educational levels and across other disciplines). The decision framework bridges the cognition and observation vertices of the assessment triangle to provide a template for assessing the full range of knowledge and skills that it is valuable for an S&E student to learn.

An overarching implication of defining problem solving as this set of decisions is that by the time students become skilled practitioners in their fields, they will be able to make such decisions when faced with novel complex problems. This framework suggests a need for a fundamental re-evaluation of how assessments and educational experiences need to be structured to provide students with opportunities to practice making these decisions and to measure their progress toward mastery. We proposed a template for designing problems to allow practice and assessment of these decisions. A virtue of this template design is that it can be used in a very similar way for instruction as for assessment: learners would work through a problem, practicing making the various decisions (Burkholder et al., 2020[28]; Wang et al., 2022[39]). The main difference between instruction and assessment is that during instruction students would get feedback and guidance on their decisions to help them improve. This follows Ericsson’s deliberate practice for the development of expertise (Ericsson, 2006[40]). We believe there is great benefit in having good assessment and good instruction transparently connected in this fashion.


[28] Burkholder, E. et al. (2020), “Template for teaching and assessment of problem solving in introductory physics”, Physical Review Physics Education Research, Vol. 16/1, https://doi.org/10.1103/physrevphyseducres.16.010123.

[10] Card, S. (ed.) (2018), The Psychology of Human-Computer Interaction, CRC Press, Boca Raton, https://doi.org/10.1201/9780203736166.

[6] Chi, M., R. Glaser and E. Rees (1981), Expertise in Problem Solving, Learning Research and Development Center, University of Pittsburgh.

[21] Cognition and Technology Group at Vanderbilt (1992), “The Jasper Series as an example of anchored instruction: Theory, program description, and assessment data”, Educational Psychologist, Vol. 27/3, pp. 291-315, https://doi.org/10.1207/s15326985ep2703_3.

[37] Conati, C. et al. (2015), “Comparing representations for learner models in interactive simulations”, in Conati, C. et al. (eds.), Artificial Intelligence in Education. AIED 2015. Lecture Notes in Computer Science, Springer International Publishing, Cham, https://doi.org/10.1007/978-3-319-19773-9_8.

[3] Csapó, B. and J. Funke (eds.) (2017), The Nature of Problem Solving: Using Research to Inspire 21st Century Learning, Educational Research and Innovation, OECD Publishing, Paris, https://doi.org/10.1787/9789264273955-en.

[4] Dörner, D. and J. Funke (2017), “Complex problem solving: What it is and what it is not”, Frontiers in Psychology, Vol. 8/1153, pp. 1-11, https://doi.org/10.3389/fpsyg.2017.01153.

[24] Egan, D. and J. Greeno (1974), “Theory of rule induction: Knowledge acquired in concept learning, serial pattern learning, and problem solving”, in Gregg, L. (ed.), Knowledge and Cognition, Erlbaum, Hillsdale.

[23] Ercikan, K. and P. Seixas (2015), “Issues in designing assessments of historical thinking”, Theory Into Practice, Vol. 54/3, pp. 255-262, https://doi.org/10.1080/00405841.2015.1044375.

[40] Ericsson, K. (2006), “The influence of experience and deliberate practice on the development of superior expert performance”, in Ericsson, K. et al. (eds.), The Cambridge Handbook of Expertise and Expert Performance, Cambridge University Press, Cambridge, https://doi.org/10.1017/cbo9780511816796.038.

[5] Ericsson, K. et al. (eds.) (2018), The Cambridge Handbook of Expertise and Expert Performance, Cambridge University Press, Cambridge, https://doi.org/10.1017/9781316480748.

[2] Frensch, P. and J. Funke (1995), Complex Problem Solving: The European Perspective, Lawrence Erlbaum, Hillsdale.

[7] Hegarty, M. (1991), “Knowledge and processes in mechanical problem solving”, in Sternberg, R. and P. Frensch (eds.), Complex Problem Solving: Principles and Mechanisms, Psychology Press, New York and London.

[20] Hickey, D. et al. (1994), “The Mars Mission Challenge: A generative, problem-solving School Science Environment”, in Vosniadou, S., E. de Corte and H. Mandl (eds.), Technology-Based Learning Environments, Springer, Berlin and Heidelberg, https://doi.org/10.1007/978-3-642-79149-9_13.

[11] Kay, D. (1991), “Computer interaction: Debugging the problems”, in Sternberg, R. and P. Frensch (eds.), Complex Problem Solving: Principles and Mechanisms, Psychology Press, New York and London.

[25] Klein, J. (2008), “Some directions for research in knowledge sharing”, Knowledge Management Research & Practice, Vol. 6/1, pp. 41-46, https://doi.org/10.1057/palgrave.kmrp.8500159.

[8] Larkin, J. et al. (1980), “Expert and novice performance in solving physics problems”, Science, Vol. 208/4450, pp. 1335-1342, https://doi.org/10.1126/science.208.4450.1335.

[9] McCloskey, M. (1983), “Naive theories of motion”, in Gentner, D. and A. Stevens (eds.), Mental Models, Lawrence Erlbaum, Hillsdale.

[17] Momsen, J. (ed.) (2021), “A detailed characterization of the expert problem-solving process in science and engineering: Guidance for teaching and assessment”, CBE—Life Sciences Education, Vol. 20/3, pp. 1-15, https://doi.org/10.1187/cbe.20-12-0276.

[26] Mosier, K. et al. (2018), “Expert professional judgments and ’naturalistic decision Making’”, in Ericsson, K. et al. (eds.), The Cambridge Handbook of Expertise and Expert Performance, Cambridge University Press, Cambridge, https://doi.org/10.1017/9781316480748.025.

[22] National Academies of Sciences, Engineering and Medicine (2017), Undergraduate Research Experiences for STEM Students, National Academies Press, Washington, D.C., https://doi.org/10.17226/24622.

[1] National Research Council (2013), Next Generation Science Standards, National Academies Press, Washington, D.C., https://doi.org/10.17226/18290.

[32] OECD (2015), PISA 2015 Released Field Trial Cognitive Items, https://www.oecd.org/pisa/test/PISA2015-Released-FT-Cognitive-Items.pdf (accessed on 23 March 2023).

[36] Perez, S. et al. (2017), “Identifying productive inquiry in virtual labs using sequence mining”, in André, E. et al. (eds.), Artificial Intelligence in Education. AIED 2017. Lecture Notes in Computer Science, Springer International Publishing, Cham, https://doi.org/10.1007/978-3-319-61425-0_24.

[38] Perez, S. et al. (2018), “Control of variables strategy across phases of inquiry in virtual labs”, in Rosé, C. et al. (eds.), Artificial Intelligence in Education. AIED 2018. Lecture Notes in Computer Science, Springer International Publishing, Cham, https://doi.org/10.1007/978-3-319-93846-2_50.

[33] Price, A. et al. (2022), “An accurate and practical method for assessing science and engineering problem-solving expertise”, International Journal of Science Education, Vol. 44, pp. 2061-2084, https://doi.org/10.1080/09500693.2022.2111668.

[18] Price, A. and C. Wieman (forthcoming), “Improved teaching of science and engineering using deliberate practice of problem-solving decisions”, Innovative Teaching and Learning.

[16] Ramalingam, D., R. Philpot and B. Mccrae (2017), “The PISA 2012 assessment of problem solving”, in Csapó, B. and J. Funke (eds.), The Nature of Problem Solving: Using Research to Inspire 21st Century Learning, OECD Publishing, Paris, https://doi.org/10.1787/9789264273955-7-en.

[29] Salehi, S. (2018), Improving Problem-Solving Through Reflection, Stanford University, https://stacks.stanford.edu/file/druid:gc847wj5876/ShimaSalehi-Dissertation-augmented.pdf.

[30] Salehi, S. et al. (2020), “Can majoring in computer science improve general problem-solving skills?”, Proceedings of the 51st ACM Technical Symposium on Computer Science Education, https://doi.org/10.1145/3328778.3366808.

[12] Schoenfeld, A. (1985), Mathematical Problem Solving, Academic Press, Orlando.

[19] Simon, H. (1973), “The structure of ill structured problems”, Artificial Intelligence, Vol. 4/3-4, pp. 181-201, https://doi.org/10.1016/0004-3702(73)90011-8.

[14] Singh, C. (2002), “When physical intuition fails”, American Journal of Physics, Vol. 70/11, pp. 1103-1109, https://doi.org/10.1119/1.1512659.

[15] Sternberg, R. (1995), “Expertise in complex problem solving: A comparison of alternative conceptions”, in Frensch, P. and J. Funke (eds.), Complex Problem Solving: The European Perspective, Psychology Press, New York and London.

[34] Walsh, C. et al. (2019), “Quantifying critical thinking: Development and validation of the physics lab inventory of critical thinking”, Physical Review Physics Education Research, Vol. 15/1, https://doi.org/10.1103/physrevphyseducres.15.010135.

[39] Wang, K. et al. (2022), “Measuring and teaching problem-solving practices using an interactive task in PhET simulation”, in AERA 2022 Annual Meeting, American Educational Research Association, San Diego.

[35] Wang, K., K. Nair and C. Wieman (2021), “Examining the links between log data and reflective problem-solving practices in an interactive task”, LAK21: 11th International Learning Analytics and Knowledge Conference, https://doi.org/10.1145/3448139.3448193.

[31] Wang, K. et al. (2021), “Automating the assessment of problem-solving practices using log data and data mining techniques”, Proceedings of the Eighth ACM Conference on Learning @ Scale, https://doi.org/10.1145/3430895.3460127.

[27] Wieman, C. (2019), “Expertise in university teaching & the implications for teaching effectiveness, evaluation & training”, Daedalus, Vol. 148/4, pp. 47-78, https://doi.org/10.1162/daed_a_01760.

[13] Wineburg, S. (1998), “Reading Abraham Lincoln: An expert/expert study in the interpretation of historical texts”, Cognitive Science, Vol. 22/3, pp. 319-346, https://doi.org/10.1207/s15516709cog2203_3.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.