In essence, test-enhanced learning is the idea that the process of remembering concepts or facts—retrieving them from memory—increases long-term retention of those concepts or facts. This idea, also known as the testing effect, rests on myriad studies examining the ability of various types of “tests”—prompts to promote retrieval—to promote learning when compared to studying. It is one of the most consistent findings in cognitive psychology (Roediger and Butler 2011; Roediger and Pyc 2012).
In some ways, the terms “test-enhanced learning” and the “testing effect” are misnomers, in that the use of the word “tests” calls up notions of high-stakes summative assessments. In fact, most or all studies elucidating the testing effect examine the impact of low-stakes retrieval practice on a delayed summative assessment. The “testing” that actually enhances learning is the low-stakes retrieval practice that accompanies study in these experiments.
With that caveat in mind, the testing effect can be a powerful tool to add to instructors’ teaching tool kits—and students’ learning tool kits.
In this teaching guide, we provide six observations about the effects of testing from the cognitive psychology literature, summarizing one or two key studies that led to each of these conclusions. We have chosen studies performed with undergraduates learning educationally relevant materials (e.g., text passages as opposed to word pairs). We also suggest ways to implement test-enhanced learning in your class as well as important caveats to keep in mind.
The idea that active retrieval of information from memory improves memory is not a new one: William James proposed this idea in 1890, and Edwina Abbott and Arthur Gates provided support for this idea in the early part of the 20th century (James, 1890; Abbott, 1909; Gates, 1917). During the last decade, however, evidence of the benefits of testing has mounted.
In one influential study, Roediger and Karpicke investigated the effects of single versus multiple testing events on long-term retention using educationally relevant conditions (Roediger and Karpicke, 2006). Their goal was to determine if any connection existed between the number of times students were tested and the size of the testing effect. The investigators worked with undergraduates in a laboratory environment,
asking them to read passages about 250 words long. The authors compared three conditions
(see Figure 1): students who studied the passages four times for five minutes each (SSSS group);
students who studied the passages three times and completed one recall test in which they were
given a blank sheet of paper and asked to recall as much of the passage as they could (SSST
group); students who studied the passages one time and then performed the recall practice
three times (STTT group). Student retention was then tested either five minutes or one week
later using the same type of recall test used for retrieval practice.
Interestingly, results differed significantly depending on when the final test was performed.
Students who took their final test very soon after their study period (i.e., 5 minutes) benefited
from repeated studying, with the SSSS group performing best, the SSST group performing
second-best, and the STTT group performing least well. This result suggests that studying is
more effective when the information being learned is only needed for a short time. However,
when the long-term retention is the goal, testing is more effective. The researchers found that
when the final test was delayed by a week the results were reversed, with the STTT group
performing about 5% higher than the SSST group and about 21% higher than the SSSS group.
Testing had a greater impact on long-term retention than did repeated study, and the participants
who were repeatedly tested had increased retention over those who were only tested once.
Student retention was then tested either five minutes or one week later using the same type of recall
test used for retrieval practice.The study described here is one of many making up a rich literature on
the testing effect; several recent review articles provide a thorough overview of the work in this area
(Roediger and Butler, 2011; Roediger and Karpicke, 2006b; Roediger, Putnam, and Smith, 2011).
Smith and Karpicke examined whether different types of questions were equally effective at inducing the
testing effect (2014). The researchers performed a series of experiments with undergraduate students in
a laboratory environment, examining the effects of short answer (SA), multiple choice (MC), and hybrid
SA/MC formats for promoting students’ ability to remember information from a text. In one experiment,
five groups of students were compared (see Figure 3). Students read four texts, each approximately 500
words long. After each, four groups of students then participated in different types of retrieval practice,
while the fifth group was the no-retrieval control. One week later, the students returned to the lab for a
short-answer test on each of the reading passages.
Confirming other studies, students who had participated in some type of retrieval practice performed much
better on the final assessment, getting approximately twice as many questions correct as those who did not
have any retrieval practice. This was true both for questions that were directly taken from information in the
texts as well as questions that required inference from the text (see Figure 4). Interestingly, there was no
significant difference in the benefits conferred by the different types of retrieval practice; multiple-choice,
short-answer, and hybrid questions following the reading were equally effective at enhancing the students’
learning. Other experiments in the series essentially replicated these results, although one experiment did
find a slight advantage for hybrid retrieval practice (short-answer + multiple-choice) in preparing students
for short-answer tests consisting of verbatim questions on short reading passages. These results suggest
that the benefits of testing are not tied to a specific type of retrieval practice, but rather retrieval practice in
This and other studies suggest that multiple question formats can provide the benefit associated with
testing. It appears that the context may determine which question type provides the greatest benefit, with
free recall questions, multiple-choice, hybrid free recall/multiple-choice, and cued-recall questions all
providing significant benefit over study alone. The most influential studies in the field suggest that free
recall provides greater benefit than other question types (see Pyc et al., in press), but the results described here reveal an incompletely answered question.
Considerable work has been done to examine the role of feedback on the testing effect. Butler and Roediger
designed an experiment in which undergraduates studied 12 historical passages and then took multiple-choice
tests in a lab setting (Butler and Roediger, 2008). The students either received no feedback, immediate feed-
back (i.e., following each question), or delayed feedback (i.e., following completion of the 42-item test). One
week later, the students returned for a comprehensive cued-recall test. While simply completing multiple-choice
questions after reading the passages did improve performance on the final test, corresponding to other reports
on the testing effect, feedback provided an additional benefit (see Figure 5). Interestingly, delayed feedback
resulted in better final performance than did immediate feedback, although both conditions showed benefit
over no feedback.
One concern that instructors may have with regard to using testing as a teaching and learning strategy is that it may promote rote memory. While most instructors recognize that memory plays a role in allowing students to perform well within their academic domain, they want their students to be able to do more than simply remember and understand facts, but instead to achieve higher cognitive outcomes (Bloom, 1956). Some studies address this concern and report results suggesting that testing provides benefits beyond improving simple recall. For example, the study by Smith and Karpicke (2014) described above determined the effects of testing on students’ recall of specific facts from reading passages as well as their ability to answer questions that required inference. In these studies, the authors defined inference as drawing conclusions that were not directly stated within the passages but that could be drawn by synthesizing from multiple facts within the passage. The investigators observed that testing following reading improved students’ ability to answer both types of questions on a delayed test, thereby providing evidence that benefits of testing are not limited to answers that require only rote memory.
Karpicke and Blunt sought to directly address the question of whether retrieval practice can
promote students’ performance on higher order cognitive activities in a 2011 study. They
investigated the impact of retrieval practice on students’ learning of undergraduate-level science
concepts, comparing the effects of retrieval practice to the elaborative study technique, concept
mapping (Karpicke and Blunt, 2011). In one experiment, students studied a science text and
were then divided into one of four conditions: a study-once condition, in which they did not
interact further with the concepts in the text; a repeated study condition, in which they studied
the text four additional times; an elaborative study condition, in which they studied the text one
additional time, were trained on concept mapping, and produced a concept map of the concepts
in the text; a retrieval practice condition, in which they completed a free recall test, followed by an additional study period and recall test (see Figure 6). All students were asked to complete a self-assessment predicting their recall within one week; students in the repeated study group predicted better recall than students in any of the other groups. Students then returned a week later for a short-answer test consisting of questions that could be answered verbatim from the text and questions that required inferences from the text.
Students in the retrieval practice condition performed significantly better on both the verbatim questions
and the inference questions than students in any other group. The authors then asked whether these
results would whether the advantage of retrieval practice would persist if the final test consisted of a con-
cept mapping exercise (see Figure 7). The authors observed that retrieval practice produced better per-
formance than did elaborative study using concept mapping on both types of final tests (short-answer and
concept mapping).When they examined the effects on individual learners, they found that 84% (101/120)
students performed better on the final tests when they used retrieval practice as a study strategy rather
than concept mapping.
Wissman, Rawson, and Pyc have reported work that suggests that retrieval practice over one set of
material may facilitate learning of later material, which may be related or unrelated (Wissman, Rawson,
and Pyc, 2011). Specifically, they investigated the use of “interim tests.” Undergraduate students were
asked to read three sections of a text. In the “interim test” group, they were tested after reading each
of the first two sections, specifically by typing everything they could remember about the text. After com-
pleting the interim test, they were advanced to the next section of material. The “no interim test” group
read all three sections with no tests in between. Both groups were tested on Section 3 after reading it.
Interestingly, the group that had completed interim tests on Sections 1 and 2 recalled about twice as many
"idea units” from Section 3 as the students who did not take interim tests. This result was observed both
when Sections 1, 2, and 3 were about different topics and when they were about related topics. Thus
testing may have benefits that extend beyond the target material.
All of the reports described above focused on experiments performed in a laboratory setting. In addition, there are several studies that suggest the benefits of testing may also extend to the classroom.
In 2002, Leeming used an “exam-a-day” approach to teaching an introductory psychology course (Leeming, 2002). He found that students who completed an exam every day rather than exams that covered large blocks of material scored significantly higher on a retention test administered at the end of the semester.
Larsen, Butler, and Roediger asked whether a testing effect was observed for medical residents’ learning about status epilepticus and myasthenia gravis, two neurological disorders, at a didactic conference (Larsen et al., 2009). Specifically, residents participated in an interactive teaching session on the two topics and then were randomly divided into two groups. One group studied a review sheet on myasthenia gravis and took a test on status epilepticus, while the other group took a test on myasthenia gravis and studied a review sheet on status epilepticus. Six months later, the residents completed a test on both topics. The authors observed that the testing condition produced final test scores that averaged 13% higher than the study condition.
Lyle and Crawford examined the effects of retrieval practice on student learning in undergraduate statistics class (Lyle and Crawford, 2011). In one section of the course, students were instructed to spend the final 5 to 10 minutes of each class period answering two to four questions that required them to retrieve information about the day’s lecture from memory. The students in this section of the course performed about 8% higher on exams over the course of the semester than students in sections that did not use the retrieval practice method, a statistically significant difference.
Other classroom studies have been published by McDaniel, Wildman, and Anderson (2012), Orr and Foster (2013), and Stanger-Hall and colleagues (2011).
Several hypotheses have been proposed to explain the effects of testing. The retrieval effort hypothesis suggests that the effort involved in retrieval provides testing benefits (Gardiner, Craik, and Bleasdale, 1973). This hypothesis predicts that tests that require production of an answer, rather than recognition of an answer, would provide greater benefit, a result that has been observed in some studies (Butler and Roediger, 2007; Pyc and Rawson, 2009) but not others (Little and Bjork, 2012; some experiments in Smith and Karpicke, 2014; some experiments in Kang, McDermott, and Roediger 2007).
Bjork and Bjork’s new theory of disuse provides an alternative hypothesis to explain the benefits of testing (Bjork and Bjork, 1992). This theory posits that memory has two components: storage strength and retrieval strength. Retrieval events improve storage strength, enhancing overall memory, and the effects are most pronounced at the point of forgetting—that is, retrieval at the point of forgetting has a greater impact on memory than repeated retrieval when retrieval strength is high. This theory aligns with experiments that demonstrate that study is as or more effective as testing when the delay before a final test is very short (see, for example, Roediger and Karpicke 2006), because the very short delay between study and the final test means that retrieval strength is very high—an experience many students can verify from their own experience cramming. At a greater delay, however, experiences that build retrieval strength (e.g., testing) confer greater benefit than studying.
There are many ways to take advantage of the testing effect, some during class time and some outside of class time. The following are a few suggestions.
This list is a starting point. Instructors should use the principles that underlie test-enhanced learning—frequent low-stakes opportunities for students to practice recall—to develop approaches that are well-adapted for their class and context.
Keep it low-stakes. The term “testing” evokes a certain response from most of us: the
person being tested is being evaluated on his or her knowledge or understanding of a
particular area, and will be judged right or wrong, adequate or inadequate based on the
performance given. This implicit definition does not reflect the settings in which the bene-
fits of “test-enhanced learning” have been established. In the experiments done in cog-
nitive science laboratories, the “testing” was simply a learning activity for the students; in
the language of the classroom, it could be considered a “no-stakes” formative assessment
where students could evaluate their memory of a particular subject. In most of the studies
from classrooms, the “testing” was either no-stakes recall practice (Larsen et al. 2009; Lyle
and Crawford, 2001; Stanger-Hall et al., 2011) or low-stakes quizzes (McDaniel et al., 2012;
Orr and Foster, 2013). Thus, the term retrieval practice may be a more accurate description
of the activity that promoted students’ learning. Implementing approaches to test-
enhanced learning in a class should therefore involve no-stakes or low-stakes scenarios in
which students are engaged in a recall activity to promote their learning rather than being
repeatedly subjected to high-stakes testing situations.
Share your learning objectives so that students understand their targets. It’s important to note that incorporating testing—or recall practice—as a learning tool in a class should be done in conjunction with other evidence-based teaching practices, such as sharing learning objectives with students, carefully aligning learning objectives with assessments and learning activities, and offering opportunities to practice important skills. If you want students to be able apply their knowledge, analyze complex situations, and synthesize different points of view, be sure to let them know that retrieval practice will help them learn the basic information they need for these skills—but that retrieval alone is not sufficient.
Abbott EE (1909). On the analysis of the factors of recall in the learning process. Psychological Monographs, 11, 159-177.
Bjork RA (1975). Retrieval as a memory modifier: An interpretation of negative recency and related phenomena. In R.L. Solso (Ed.), Information processing and cognition (pp. 123-144) New York, NY: Wiley.
Bjork RA and Bjork EL (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, and R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays, in honor of William K. Estes (Vol 2, pp. 35067) Hillsdale, NJ: Erlbaum.
Bloom BS (1956). Taxonomy of Educational Objectives: Handbook I: The Cognitive Domain. New York: David McKay Co Inc.
Butler AC (2010). Repeated testing produces superior transfer of learning relative to repeated studying. Journal of Experimental Psychology: Learning, Memory, and Cognition 36, 1118-1133.
Butler AC, Karpicke JD, and Roediger HL III (2008). Correcting a metacognitive error: Feedback increases retention of low-confidence correct responses. Journal of Experimental Psychology: Learning, Memory, and Cognition 14, 918-928.
Butler AC and Roediger HL III (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology 19, 514-527.
Butler AC and Roediger HL III (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory and Cognition 36, 604-616.
Cantor AD, Eslick AN, Marsh EJ, Bjork RA, Bjork EL (2014). Multiple-choice tests stabilize access to marginal knowledge. Memory and Cognition SOI 10.3758/s13421-014-0462-6.
Cohen GL, Garcia J, Apfel N, and Master A (2006). Reducing the racial achievement gap: A social-psychological intervention. Science 313, 1307-1310.
Gardiner JM, Craik FIM, and Bleasdale FA (1973). Retrieval difficulty and subsequent recall. Memory and Cognition 1, 213-216.
Gates AI (1917) Recitation as a factor in memorizing. Archives of Psychology, 6(40).
Hays MJ, Kornell N, and Bjork RA (2013). When and Why a Failed Test Potentiates the Effectiveness of Subsequent Study. Journal of Experimental Psychology: Learning, Memory, and Cognition 39, 290-296.
James W (1890). The principles of psychology. New York: Holt.
Kang SHK, McDermott KB, and Roediger HL III. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology 19, 528-558.
Karpicke JD and Blunt JR (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331, 772-775.
Klionsky DJ (2008). The quiz factor. CBE—Life Sciences Education 7, 265-266.
Larsen DP, Butler AC, and Roediger HL III (2009). Repeated testing improves long-term retention relative to repeated study: a randomized controlled trial. Medical Education 43, 1174-1181.
Leeming FC (2002). The exam-a-day procedure improves performance in psychology classes. Teaching of Psychology 29, 210-212.
Leight H, Saunders, Calkins R, and Withers M (2012). Collaborative testing improves performance but not content retention in a large-enrollment introductory biology class. CBE—Life Sciences Education 11, 392-401.
Little JL and Bjork EL (2011). Pretesting with multiple-choice questions facilitates learning. Presentation at Cognitive Science Society. Retrieved from http://www.researchgate.net/publication/265883438_Pretesting_with_Multiple-choice_Questions_Facilitates_Learning, November 15, 2014.
Little JL and Bjork EL (2012). The persisting benefits of using multiple-choice tests as learning events. Presentation at Cognitive Science Society. Retrieved from http://mindmodeling.org/cogsci2012/papers/0128/paper0128.pdf , November 11, 2014.
Lyle KB and Crawford NA (2011). Retrieving essential material at the end of lectures improves performance on statistics exams. Teaching of Psychology 38, 94-97.
McDaniel MA and Masson MEJ (1985). Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cogntion 11, 371-385.
McDaniel MA, Wildman KM, and Anderson JL (2012). Using quizzes to enhance summative-assessment performance in a web-based class: An experimental study. Journal of Applied Research in Memory and Cognition 1, 18-26.
Miyake A, Kost-Smith LE, Finkelstein ND, Pollock SJ, Cohen GL, Ito TA (2010). Reducing the gender achievement gap in college science: A classroom study of values affirmation. Science 330, 1234-1237.
Orr R and Foster S (2013). Increasing student success using online quizzing in introductory (majors) biology. CBE—Life Sciences Education 12, 509-514.
Pulfrey C, Buchs C, and Butera F (2011). Why grades engender performance-avoidance goals: The mediating role of autonomous motivation. Journal of Educational Psychology 103, 683-700.
Pyc MA, Agarwal PK, and Roediger H L III (in press). Test-enhanced learning. In V. Benassi, C. Overson, & C. Hakala (Eds.), Applying the science of learning in education: Infusing psychological science into the curriculum. Society for the Teaching of Psychology. Retrieved fromhttp://psych.wustl.edu/memory/Roddy%20article%20PDF’s/Roediger%20&%20Pyc%20(2012)a_MemCog.pdf on November 14, 2014.
Pyc MA and Rawson KA (2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language 60, 437-447.
Roediger HL III, Putnam AL, and Smith MA. (2011). Ten benefits of testing and their applications to educational practice. Psychology of Learning and Motivation, Volume 55: 1-36.
Roediger HL III and Butler AC (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences 15, 20-27.
Roediger HL III and Karpicke JD (2006a). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science 17, 249-255.
Roediger HL III and Karpicke JD (2006b). The power of testing memory: basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181-210.
Roediger HL III and Pyc MA (2012). Inexpensive techniques to improve education: Applying cognitive psychology to enhance educational practice. Journal of Applied Research in Memory and Cognition 1, 242-248.
Schwartz DL and Bransford JD (1998). A time for telling. Cognition and Instruction 16, 475-522.
Smith MA and Karpicke JD (2014). Retrieval practice with short-answer, multiple-choice, and hybrid tests. Memory 22, 784-802.
Smith MK, Wood WB, Krauter K, and Knight JK (2011). Combining peer discussion with instructor explanation increases student learning from in-class concept questions. CBE—Life Sciences Education 10, 55-63.
Stanger-Hall KF, Shockley FW, and Wilson RE (2011). Teaching students how to study: A workshop on information processing and self-testing helps students learn. CBE—Life Sciences Education 10, 187-198.
Steele, CM (2010). Whistling Vivaldi: How stereotypes affect us and what we can do. New York: W.W. Norton & Company.
Tanner, KD (2012). Promoting student metacognition. CBE—Life Sciences Education 11, 113-120.
Wissman KT, Rawson KA, and Pyc MA (2011). The interim test effect: Testing prior material can facilitate the learning of new material. Psychonomic Bulletin Review 18, 1140-1147.
First published on the Vanderbilt Center for Teaching website.
Science teaching, science learning
sharing evidence-based practices for undergraduate science faculty