It is difficult to teach computer programming to students with either little prior knowledge of programming or weak problem-solving skills. Often, teachers will introduce a new programming construct, illustrate its use in abstract situations, and then expect students to demonstrate mastery given a "real-world" solution. This article addresses several techniques that impact the issue of reducing cognitive load on novice programming students, including the use of worked examples, pair programming, and live-coded modeling. The paper builds a case for utilizing these pedagogical approaches in the K-12 classroom based on prior research on reducing cognitive load for introductory programming students.

Teaching an introductory programming course can be a daunting task for high school educators with limited background in computing, or university instructors and teaching assistants who have limited experience with pedagogical practice. The students who enter introductory classrooms often have different levels of background knowledge and greatly varying problem-solving abilities. To learn a programming language requires both an understanding of basic constructs and syntax, as well as the ability to synthesize and apply these constructs in novel ways. These two components are often addressed without distinction. To draw a parallel to another field, imagine receiving basic instruction in the use of a hammer, while additionally being asked to use the hammer adeptly to build a birdhouse. For many students, the need to acquire and apply knowledge in rapid succession presents a challenge unlike the ones found in less applied disciplines. For students previously exposed to computing ideas before entering an introductory computer science classroom, the initial cognitive load involved in these processes can be mitigated [5]. For students without prior experience, the burden of digesting new programming language syntax, and developing algorithms to unfamiliar problems must be addressed simultaneously. The challenge facing the computer science instructor is to provide enough scaffolding for students who are inexperienced in one or both domains—programming construct knowledge and problem solving—in order to reduce the demands on the student's cognitive load.

In learning programming, a significant factor impacting cognitive load comes from adopting new programming constructs. Introductory languages tend to have similar sets of useful constructs to learn, including (but not limited to) variables and arithmetic, assignment, selection, repetition, functions, and data structures. Each of these will involve learning both the syntax and the usage. Unfortunately, the syntax presents intricacies that take time for students to master. As an example, consider the difficulty in implementing variables. In several common introductory programming languages (C, C++, Java), a student must include a keyword representing the data type of the variable before providing the name of the variable during declaration. In future uses, the inclusion of the data type will reallocate the memory associated with the variable. This is a pattern change that a student must comprehend before being able to write code freely. In addition to syntactical rules, the student must understand functionality and the constraints and affordances of each new construct. Individually, syntax and usage will contribute to a student's cognitive load. In combination, the potential to overwhelm a student is elevated.

Syntax and functionality are elements that exist for students in other contexts—mathematical expression or English composition, for example—however the novice programmer must take on additional cognitive burden. Learners frequently use new programming constructs with novel problems shortly after having been introduced to them. If the student is unable to address programming syntax and functionality while concurrently engaging in the problem-solving process, the task may be overwhelming [17]. Students with prior experience in programming and problem-solving can sometimes obscure the impact of cognitive load on measures of student performance in the classroom because their experience helps them work past these roadblocks. Novice students aren't always capable of overcoming these early challenges.

Working Memory and Cognitive Load

Students rely on working memory to encode new information received from sensory stimuli. In working memory, the information is either transferred to long-term memory or it is discarded, with this depending on the student's ability to rehearse the information. If the information is adequately rehearsed, it can be stored in long-term memory where it is organized with the existing schema. A schema is an organizational construct that defines how information will be interpreted by the learner. Working memory can only manage a small number of elements simultaneously, but the functionality of working memory improves with schema acquisition [17].

According to Sweller's Cognitive Load Theory (Figure 1), working memory is impacted by three different kinds of cognitive load emanating from elements of the learning situation [16]. Intrinsic load is the load associated with elements of the learning task and their interactivity—in the case of programming this includes constructs, syntax, and aspects of applying these items to problems. Extraneous cognitive load is the load that comes from elements of the learning task not directly tied to content mastery. For programming tasks extraneous load could include the need to search for similar problems, or components of the problem description students need to decipher before implementing a solution [10]. A third component, germane load, refers to the impact of schema-related information on the learner's efforts [16]. If teachers are utilizing strategies that minimize extraneous load and maximize germane load, Cognitive Load Theory posits that students will have an easier time encoding new information.

Research has shown there are several ways an instructor can reduce cognitive load. One approach is the development of scaffolds—level-appropriate supports—that reduce the load on the student as they engage with a learning task [20]. In combination with well-defined scaffolds, an educator can also order the learning tasks from most simple to most complex, with each task providing an opportunity for further encoding in bite-sized chunks [20]. Another strategy is to provide greater opportunities for collaboration, which spreads the cognitive load among students and thus reduces individual levels of load [7].

Considering the sources of cognitive load for introductory programmers, instructors need to find ways to scaffold learning, sub-divide tasks and provide opportunities for collaboration among peers. While there are many strategies addressing cognitive load in some form, the following section contains three concrete strategies that take these specific components into consideration while remaining easy for teachers to implement. Support for these strategies comes from previous research on the impact of cognitive load for novice programmers. I present them here collectively as an argument for improved pedagogy, with descriptions of their implementation and an example of how they might be used in the classroom.

Worked Examples

Educators use worked examples to show a canonical solution to a problem with guided instruction [1]. Students can use the supplementary text or questions as needed to understand the decisions made in the programming process. The examples themselves serve as a scaffold, as users can rely on the demonstration as much as is needed while interacting with a new concept. Implementing worked examples can improve a learner's ability to complete problem-solving search on similar problems while removing some of the cognitive load associated with schema acquisition [18]. Implementing worked examples must be done in such a way as to provide clarity despite the addition of instructional information [1].

Additionally, worked examples can improve a student's ability to note intermediate tasks within the programming problem when given labels to address these subgoals in situ [3,11]. For example, a worked example can contain text describing a problem such as finding the median given a set of unordered values. A subgoal label could be added to the worked example to highlight the task of sorting the data first, if a student has previously been exposed to sorting algorithms.

Consider the example in Figure 2, which shows a worked example used to address the introduction of branching logic in a first programming course using Python. The program appears on the left, and questions used to provoke reflection appear on the right.

This worked example models both the data and the solution for the student while focusing student reflection using the question prompts on the right. After students have considered a series of related worked examples such as the one above, a new problem can be introduced with similar parameters, so a student can apply the problem-solving approach with fewer scaffolds [1,14]. The use of the instructional scaffolds should be designed to increase germane load while a student works through the example, relying on a student's existing understanding of how the programming construct is implemented syntactically and functionally.

The application of worked examples should maximize the effect of the scaffolding by applying a modeling, coaching, and fading approach [4]. During the modeling phase, several worked examples of problems should be presented to students similar in form and content. The goal of modeling is to show students how a program can be written and why the important design decisions were made in the way that they appear in the model solution. The coaching phase involves reflection on the worked examples via supporting text, and opportunity for questioning of peers and the instructor. To implement fading, the instructor can slowly remove portions of the worked example, so the student can attempt to provide the correct code to complete the algorithm. After carefully scaffolding the learning to allow for students to develop appropriate schema in long-term memory, revisiting similar tasks should not place the same demands on working memory. Future instruction in this area can then be done without these scaffolds [18].

Recalling the first worked example, a similar problem is shown in Figure 3 to demonstrate the fading technique. Fading should be repeated over multiple examples as needed and can be used to draw attention to the details of one specific programming idea in each worked example.

In this example, parts of the selection statement and the return values have been removed. The students are then asked to write the missing code segments on their own. An instructor can choose to fade as much or as little as is deemed appropriate for his or her specific learners.

It should be noted that worked examples are effective for novices but can be detrimental for students bringing some pre-existing schema to the problem-solving process [14]. It is also known that complex worked examples with multiple forms of instruction can cause a split-attention effect in which a student is overwhelmed by the supporting information and unclear on what information source is most germane to the task at hand [1]. Lastly, care should be taken so as not to suggest that there is one correct solution for any programming problem. Ideally, the worked example will allow the student to see a clear consistent example solution, and the introduction of alternative approaches can be provided as the student gains confidence and experience.

Pair Programming

Students engaged in problem-solving benefit from cooperative learning specifically due to the way peer interactions mediate their cognition [2]. The dialogue generated from social interactions in the classroom facilitates knowledge construction between students and knowledgeable others. In this way, prior knowledge from multiple sources can be combined to produce a complete understanding of a programming construct. When used in this manner, collaborative work serves as a scaffold for individual learners, and is productive assuming that learners are aware of the context-specific strengths and weaknesses of their partner [8]. In addition, the existing cognitive load of the programming task is divided among two members, leading to lower amounts of extraneous load per partner [7].

In pair programming, two programmers work together to solve problems using a specific set of roles that alternate between programmers as they work through the task [12]. The driver role casts a programmer as the partner responsible for operating the computer keyboard and mouse. While in this role, the driver writes all the code and engages verbally in the conversation surrounding the solving of the problem. The programmer in the navigator role offers suggestions to the driver for next steps in the algorithm development, provides fixes for syntactical and logical errors as they appear on the screen, and reviews the problem statement for information important to writing a problem solution. Programmers switch roles periodically in order to promote active engagement in the programming task. Since pair programming separates tasks with low-level cognitive demands (typing, computer management and navigation) from tasks with higher cognitive demands (syntax analysis, algorithm development, problem search), the amount of extraneous load impacting each individual member of the pairing is reduced.

Collaborative learning is most effective given a task that has high complexity [7]. Consider an activity in which students are asked to apply iteration to a string of characters in order to determine whether the word or phrase is a palindrome. This is a complex problem for novice programmers, as the mechanics of iterating through the characters in the string from both ends involves thinking about the position of the characters in relation to the string length. Distributing this complexity between learners has a significant impact on reducing extraneous load, and only a minimal increase in load due to the re-combination of information between individuals [7]. This latter process, called convergence, involves three phases: creating a shared understanding; eliminating redundant or similar contributions; and identifying relationships between contributions [9]. Working with students on the process of paired programming with these elements of convergence in mind is an avenue for further reducing cognitive load in this process.

When students are given the palindrome task, they should have already received instruction on the mechanics and syntax of a for-loop. In addition, the instructor should have provided students with opportunities for practice reading code segments with for-loops and writing simple for-loops with definite numbers of repetitions. This basic instruction will help the students to form the schema they will rely on jointly during the paired programming task. The instructor should also discuss the essential problem-solving approach for the palindrome problem, in which students must evaluate the strings character-by-character, starting from the endpoints and working inwards. Discussing the algorithm presents an opportunity to reduce the extraneous cognitive load on the learners. Working together, the students can now attempt to convert the algorithm as it has been verbally or symbolically represented into a series of statements in code.

The implementation of paired programming as a pedagogical practice requires that computer science instructors take the time to train students to engage with their partner effectively. With practice, students have been shown to perform more effectively and have higher completion rates when paired versus working independently [6,19]. This effect depends highly on the intentionality of how the students are paired, generally considering student ability level. Students with disparities in both ability and perceived ability do not perform as well as those in which the pairings have greater similarity in these areas [4,21]. In professional software engineering practice, studies of pairing have focused on personality types, with better performance on programming tasks being connected to heterogeneous personalities [13]. In addition to determining optimal pairs, the instructor should also consider the potential for increased cognitive load from communication and interaction deficits [8]. Providing students with the opportunity to practice pairing and utilizing collaboration scripts to help reduce the load that stems from the instructional design of pairing, are recommended approaches to help aid in the use of paired programming.

Live-Coded Modeling

To engage in live-coded modeling, a computer science instructor presents an example problem to the class and then models the problem-solving process in front of the students "live," so they can witness how a program is developed from start to finish. By collaboratively designing and implementing the solution, the extraneous cognitive load associated with the problem-solving search process can be reduced. Additionally, intrinsic load from instruction can be managed by controlling the amount of information that is provided at any individual point in the process.

Linn and Clancy advocated for a more deliberate presentation of problem-solving for students, speaking to the many benefits of modeling how one works through novel problems [10]. The advantage to presenting problems in this manner is not just the solution that the group creates, but additionally this solution is crafted from of an iterative process involving failed attempts, exploration of edge case behavior, consideration of logical arguments, and many other problem-solving challenges. The context of each new programming problem creates a need for discussion and debate over which choices can be made and why some may have greater merit than others.

As a component of this demonstration, students can engage in the process of generating pseudocode for the programming task. Several computing education researchers have suggested students express their problem-solving approach via pseudocode in order to reduce the complexity deriving from programming syntax [10,15]. Explicit modeling can help students associate specific programming constructs with familiar patterns that can be abstracted from the problem description. As a final step, the translation of the program solution to code can be modeled, a step often left for learners to uncover on their own in programming classrooms.

To aid instructors in their implementation of the techniques that have been addressed, a brief narrative demonstrating the use of modeling, student pairing, and faded examples is presented below.


Ms. Jackson is a computer science teacher in a suburban school district. She teaches two introductory sections of computer programming per semester, using Python as the introductory language. Ms. Jackson has students at different ability levels in her classroom and considers Sarah as a prototypical example of a student who would benefit from live-coded modeling. Sarah is a senior who took computer science after her favorite math teacher recommended that she give it a try. She did not bring prior programming knowledge to the course but does have strong math problem-solving abilities. Sarah is one of only six girls in the class, and despite being a relatively popular student, Ms. Jackson senses that Sarah would rather not volunteer and risk looking foolish in front of her peers. Sarah could turn the corner with the proper instructional supports.

The class has been introduced to several programming concepts, including variables, assignment, arithmetic, conditionals, loops, and functions. Ms. Jackson will next introduce the Python list data structure. After she has given students some independent practice using abstract examples, Ms. Jackson wants to present a programming assignment focused on lists. This assignment will ask the students to write a program that makes the best five-card poker hand from a set of two held and five shared cards. As described, the problem may be too complex for students to dissect on their own, and so Ms. Jackson chooses to focus on one specific part of the program: the implementation of a standard algorithm for finding a maximum value in a set of data.

After students have been presented with the problem of finding the maximum, Ms. Jackson takes a moment to place students into small groups based on her assessment of their current ability and confidence levels. The groups will be used to reduce cognitive load on individual students through shared knowledge construction at various points throughout the exercise.

Students review information they know about how many values a computer can compare simultaneously. Ms. Jackson then begins to record the student-derived steps for the maximum value algorithm on the board in the front of the class. She asks the students what information that they need to keep track of during the execution of the algorithm. The class does not respond to this question, and sensing that she may have placed too great a cognitive burden on the students, Ms. Jackson adjusts her approach. She asks students to consider what their goal is in writing the algorithm. Ms. Jackson is pleased when Sarah mentions the maximum value and suggests that they need a variable to keep track of it. By changing the question, Ms. Jackson helped point the students towards the element of the task that was most significant, again reducing extraneous load from deciphering her meaning. Now she asks them to assume that they have a maximum value, and how to proceed given that information.

As a component of this demonstration, students can engage in the process of generating pseudocode for the programming task. Several computing education researchers have suggested students express their problem-solving approach via pseudocode in order to reduce the complexity deriving from programming syntax.

Ms. Jackson asks them to reflect on the current problem and suggest some next steps in their small groups. She circulates around the room while they are discussing the problem. Ms. Jackson listens to the small group discussions, offering suggestions as she feels it helps advance their discussions and thus providing a scaffold to their learning. With a struggling group, for example, she asks them to break the task into smaller steps and labels the goal on their paper to help remind them of a similar approach they've used in the past.

After finishing the algorithm, Ms. Jackson reminds the class of Sarah's suggestion to create a variable to hold the current maximum. "What should we use as a starting value for our 'max' variable?" she asks. The questions she asks are designed to change the tasks students engage with into manageable pieces. She then asks the students how they might write code to look at every item in the list. Ms. Jackson is careful to associate the idea that is suggested—a loop—with the process of iterating over a data set. Two different students offer solutions, and the group is given the opportunity to debate the merits of each one. Ms. Jackson uses this point to discuss how several approaches could work and explains that they will need to gain some experience to know if one is more effective than the other. She asks the class to choose one solution to use currently and suggests that they can revisit the task later to try other possibilities.

Now the students must write the selection statement, essentially asking whether the current value in the set is the maximum. Ms. Jackson turns to the missing piece in the algorithm on the board where the selection statement should be inserted. "Can someone make an attempt to explain what should happen here in plain English?" Sarah provides a suggestion to compare the current value in the "max" variable to the next element in the list. Ms. Jackson types an empty if-block. Students are then asked to fill in the missing pieces. In this way, Ms. Jackson reduces the cognitive load of writing a complete statement and allows students to focus on the comparison statements much in the way that she would do by fading in a worked example.

Students enter the introductory classroom with a wide range of ability levels, particularly in their prior knowledge in programming and problem-solving ability. To address these student differences, it is important that computer science educators understand how cognitive load can be impacted by instructional design and cognitively dense tasks and implement strategies to reduce the extraneous load on learners.

After running the program and viewing the output, the students are asked how they know if they have a correct solution. Ms. Jackson asks if the students think that the program will work for all kinds of data. This is the kind of question that she wants the students to explore on their own and allows her to break down the larger programming task into a piece that students can manage. By reducing the extraneous load and providing instruction that is germane to a single task, she is helping the students to build their understanding of program correctness and testing as a distinct schema. In the small groups, students are asked to try and "break" the code. When students uncover a few errors, Ms. Jackson makes note of them. At the next opportunity for a live-coding session, she will take advantage of these troublesome spots—issues with the initial value, and with the endpoints of the loop—in order to show the students how to debug and improve upon their solution.

As a final part of this interactive discussion, Ms. Jackson gives the students a short assignment to either annotate the final solution with comments, or to explain the solution to a family member and write a short reflection about this process. This last piece helps the students to encode the information from the lesson, which in turn will benefit them when they perform a similar process in the future.


Introductory programming courses present significant challenges for educators. Students enter the introductory classroom with a wide range of ability levels, particularly in their prior knowledge in programming and problem-solving ability. To address these student differences, it is important that computer science educators understand how cognitive load can be impacted by instructional design and cognitively dense tasks and implement strategies to reduce the extraneous load on learners (see Figure 4). In this paper, I have presented a description of how working memory and cognitive load impact programming students and offer several practical strategies that can be used to address these impacts on students in the introductory programming classroom. By using worked examples, pair programming, or live-coded modeling, a programming instructor can help novice students and bridge the gap between those with and without prior experience in computer programming and problem-solving.


1. Atkinson, R., Derry, S., Renkl, A., and Wortham, D. Learning from examples: Instructional principles from the worked examples research. Review of educational research 70, 2 (2000), 181–214.

2. Beck, L., and Chizhik, A. Cooperative learning instructional methods for CS1: Design, implementation, and evaluation. ACM Transactions on Computing Education, 13, 3 (2013), 1–21.

3. Catrambone, R. The subgoal learning model: Creating better examples so that students can solve novel problems. Journal of Experimental Psychology: General, 127 (1998), 355–376.

4. Collins, A. Cognitive apprenticeship. In Cambridge Handbook of the Learning Sciences. Edited by Sawyer, R (Cambridge, Cambridge University Press, 2006), 47–60.

5. Grover, S., Pea, R. and Cooper, S. Factors influencing computer science learning in middle school. In Proceedings of the 47th ACM technical symposium on computing science education. (Memphis, TN, ACM, 2016), 552–557.

6. Hanks, B., McDowell, C., Draper, D., and Krnjajic, M. Program quality with pair programming in CS1. In Proceedings of the ninth annual SIGCSE conference on Innovation and technology in computer science education, (Leeds, United Kingdom, ACM, 2004), 176–180.

7. Kirschner, F., Paas, F., and Kirschner, P. A cognitive load approach to collaborative learning: United brains for complex tasks. Educational Psychology Review 21, 1 (2009), 31–42.

8. Kirschner, P., Sweller, J., Kirschner, F., and Zambrano, J. Cognitive Load Theory to Collaborative Cognitive Load Theory. International Journal of Computer-Supported Collaborative Learning 13, 2 (2018), 213–233.

9. Kolfschoten, G., and Brazier, F. Cognitive load in collaboration: convergence. Group Decision and Negotiation, 22, 5 (2013), 975–996.

10. Linn, M.C., and Clancy, M.J. The case for case studies of programming problems. Communications of the ACM, 35, 3 (1992), 121–132.

11. Margulieux, L.E., Catrambone, R., and Guzdial, M. Employing subgoals in computer programming education. Computer Science Education, 26, 1 (2016), 44–67.

12. Plonka, L., Segal, J., Sharp, H., and van der Linden, J. Collaboration in pair programming: driving and switching. In XP 2011: 12th International Conference on Agile Software Development, (Madrid, Spain, Springer, 2011), 43–59.

13. Sfetsos, P., Stamelos, I., Angelis, L., and Deligiannis, I. An experimental investigation of personality types impact on pair effectiveness in pair programming. Empirical Software Engineering, 14, 2 (2009), 187–226.

14. Skudder, B., and Luxton-Reilly, A. Worked examples in computer science. In Proceedings of the Sixteenth Australasian Computing Education Conference, (Auckland, New Zealand, Australian Computer Society, Inc., 2014), 59–64.

15. Soloway, E. Learning to program = learning to construct mechanisms and explanations. Communications of the ACM, 29, 9 (1986), 850–858.

16. Sweller, J. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12, 2 (1988), 257–285.

17. Sweller, J. Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4, 4 (1994), 295–312.

18. Sweller, J., and Cooper, G.A. The use of worked examples as a substitute for problem solving in learning algebra. Cognition and Instruction, 2, 1 (1985), 59–89.

19. Tsai, C.Y., Yang, Y.F., and Chang, C.K. Cognitive Load Comparison of Traditional and Distributed Pair Programming on Visual Programming Language. In International Conference of Educational Innovation through Technology, (Wuhan, China, IEEE, 2015), 143–146.

20. Van Merriënboer, J., Kirschner, P. and Kester, L. Taking the load off a learner's mind: Instructional design for complex learning. Educational psychologist 38, 1 (2003), 5–13.

21. Williams, L., Layman, L., Osborne, J., and Katira, N. Examining the compatibility of student pair programmers. In AGILE conference 2006, (Minneapolis, MN, USA, IEEE, 2006), 411–420.


Philip Sands
Counseling, Educational Psychology and Special Education
Michigan State University
Erickson Hall, 620 Farm Lane, East Lansing, MI 48824, United States.
[email protected]


F1Figure 1. The elements of Sweller's Cognitive Load Theory

F2Figure 2. A worked example with coaching prompts

F3Figure 3. A worked example with fading

F4Figure 4. Strategies for addressing cognitive load for novice programmers

©2019 ACM  2153-2184/19/03  $15.00

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.

Contents available in PDF
View Full Citation and Bibliometrics in the ACM DL.


There are no comments at this time.


To comment you must create or log in with your ACM account.