The subject examinations prepared by the National Board of Medical Examiners (NBME) are designed to assess educational achievement in specific, defined subject areas. Those in clinical areas such as psychiatry or medicine are intended as end of clerkship examinations. They are written in the style and format of Step 2 of the United States Medical Licensing Examination (USMLE) and rely more on application and integration of knowledge than on rote memorization. They offer a number of advantages over departments’ internally written exams: test items are extensively reviewed and pretested, reliability and validity are assured, and information is continually updated. They are much easier to use and much less labor intensive than writing and updating exam questions every year. In addition, the NBME makes available to clerkship directors, after each administration of the subject exam, an item analysis indicating content areas in which students from a given school did better or worse than national norms (1).
The psychiatry subject examination is widely used throughout the United States and Canada. In a recent survey by Levine et al. (2), 69% of clerkship director respondents reported using the exam at the end of the psychiatry clerkship. Of those, 95% indicated they used the exam for the purpose of grading students. The average weight given to the exam was 31% of the total clerkship grade. Seventy-nine percent said they used the subject exam to determine passing for the clerkship (2).
The NBME cautions, however, that test scores reflect not only learning for the specific clerkship but also “educational development resulting from the overall medical school experience” (1). Previous reports (3–7) have documented that students do better on the subject exams in internal medicine, pediatrics, obstetrics-gynecology, family medicine, and surgery if they have already taken other clerkships. Recognition of this timing bias has prompted proposals to eliminate it or reduce its impact. For example, Belmont and Cho (8) suggest grading students only in comparison to other students who took the clerkship at the same point in the overall clerkship sequence.
A timing bias has not previously been demonstrated for psychiatry. A 1997 study by Case, Ripkey, and Swanson (9) looking at psychiatry subject test scores from 45 medical schools found no differences among students who took psychiatry first, last, or in the “middle.” They speculated that the absence of a test bias based on clerkship timing was because the content and method of psychiatry were different enough from other disciplines that there was no particular advantage in having taken one or more clerkships before psychiatry.
We believed the question deserved a second look. Based on our experience in teaching psychiatry clerkship students for many years, in reading their written case reports, and in reviewing the subject exam item analyses, we felt that psychiatry was enough grounded in general medicine that previous clerkship experience would in fact be reflected in higher exam scores. We expected that students who had taken medicine or neurology before psychiatry would be more likely to do better on the psychiatry test than students who had not taken those clerkships. We did not expect to see this advantage among students who had taken surgery or obstetrics before psychiatry. In addition, we anticipated that although scores would generally rise as students came to psychiatry with increasing clinical experience, scores for those who took it as their final required rotation—in some cases after residency match results were announced—would dip. This expectation was also based on our experience in working with such students over the years and our sense that they often seem disengaged and difficult to motivate.
Our data are compiled from students taking the NBME psychiatry subject examination at the end of the psychiatry clerkship at the New York University School of Medicine between 2001 and 2004. During the period of this study, the school required eight clinical rotations during the third and fourth years of medical school: medicine, surgery, psychiatry, obstetrics-gynecology, pediatrics, neurology, ambulatory care and advanced medicine (a subinternship). A ninth required rotation, in intensive care medicine, was added in 2004 and is not included in this study. NBME subject examinations were required at the end of all core rotations except for ambulatory care and advanced medicine. Clerkship sequence was assigned by lottery, although considerable latitude was allowed in subsequently altering the sequence to accommodate research, “audition electives” for competitive residencies at other schools, and personal reasons. (The school has subsequently overhauled the lottery system, and changes are now more difficult.) As a practical matter, virtually all students took medicine and surgery during their third year. Advanced medicine was always taken during the fourth year. Both third and fourth year students took the remaining clerkships including psychiatry. (Students taking psychiatry first or second were always third-year students; those taking it last were always fourth-year students.) Although the stated expectation was that core rotations (other than advanced medicine) would be completed no later than November of the fourth year, special exemptions resulted in some students taking clerkships late in their fourth year of medical school. As a result, any given rotation in psychiatry included students of variable clinical experience. It was possible for students taking psychiatry as a first clerkship at the start of their third year to be working alongside and taking the same subject examination as students for whom psychiatry was the last of their required clerkships. Consequently, a review of test scores across the academic calendar would not give meaningful information about the influence of prior clerkship experience.
We obtained from the registrar of the medical school the clerkship sequence for each student taking psychiatry during the 4 years of our study. We then calculated the mean exam scores for all students who took psychiatry first (regardless of what month or year it was taken), for all students who took it after having taken one other clerkship, and so on. All students were required to pass the subject examination in order to receive credit for the psychiatry clerkship. We set a passing grade at a score of 60 or higher, in keeping with recommendations by a small group of clerkship directors who were polled by the NBME (10). Students who did not pass the exam were required to take it again. In the rare circumstances in which students failed a second time, additional clinical work on the wards was required before they were allowed to sit for the exam a third time. The exam grade contributed 20% to the overall clerkship grade. In calculating mean test scores by rotational sequence we used only the initial scores, not the make-up scores for students who had to repeat the exam.
A total of 635 students completed the 8 required clerkships during the study period. A one-way analysis of variance (ANOVA) was conducted to test the hypothesis that mean psychiatry exam scores would vary over time. The results of this ANOVA revealed statistically significant differences across clerkship sequence. Pairwise comparisons (t tests) were also performed to determine whether students who had completed particular clerkships before psychiatry scored higher on the subject exam than students who completed those clerkships after psychiatry.
Figure 1 shows the 4-year aggregate mean psychiatry subject examination scores and SE values grouped by rotational sequence among the eight required clinical courses. For example, scores for point 1 are for students who took psychiatry first, although they did not necessarily take it at the same time. There is a clear upward trend in scores for students who take psychiatry later that reaches statistical significance between point 1 and points 4, 5, 7 and 8. While significant, it should also be noted that the numerical differences are not large. The difference in mean scores between students taking psychiatry first and those taking it last is less than six points. Descriptive statistics for exam scores across clerkship sequence are presented in Table 1.
Table 2 presents the mean scores for students who had taken a specific clerkship prior to psychiatry compared to students who had not. There are no statistically significant differences between mean examination scores for students taking any one of the other required clerkships before psychiatry compared to students who took it after psychiatry.
Our prediction that psychiatry subject examination scores would increase among students who took psychiatry after having taken other clerkships is supported by our data. This is consistent with several other reports in the literature reporting a rotational bias in exam scores for other disciplines, although it contradicts the findings of Case et al. (9) 8 years ago. A possible reason for the difference is that the NBME subject examination in psychiatry has changed. In the 1990s questions were more purely psychiatric and relied heavily on definition of terms. For example, students might be asked to distinguish among different personality disorders. In the past 5 years, the exam has become more integrated with material from other clerkships and more heavily emphasizes clinical reasoning (11).
Contrary to expectation, we did not find evidence that medicine or neurology taken before psychiatry would increase scores more than other clerkships such as obstetrics or surgery. There is no clear advantage in which clerkships are taken before or after psychiatry.
Rather there appears to be an accumulating advantage in having more clerkship experience regardless of discipline. We also predicted that students who took psychiatry last, always taking it during their fourth year and in some cases after having received residency match results, would have lower mean subject examination scores than other students. This prediction is also not borne out. Whatever disengagement may occur when students take psychiatry late in medical school is not reflected in exam scores.
We have shown that students who are lucky or smart enough to take psychiatry last have a competitive advantage in taking the exam over other students, particularly compared to students who take psychiatry first. Because the NBME subject examination is widely used as an end of clerkship exam, and because the exam counts for up to 30% and more of the overall clerkship grade, our findings must be considered by clerkship directors. Although the numerical differences are small, it is possible that five points on a heavily weighted exam could determine the difference between one clerkship grade and another, and students will be rewarded for the timing of their clerkships rather than their work during their clerkships.
It should also be noted, however, that the advantage enjoyed by students who have completed other clerkships extends beyond examination scores. They will be more familiar with all of the clinical skills involved in caring for patients: ward routine, taking histories, presenting on rounds, writing chart notes, working with consultants and families. Accordingly, they may well receive higher clinical evaluations as well as higher exam scores, and efforts to standardize test scores will address only a small part of any timing bias.
The number of variables affecting clerkship performance is vast. The goal of a perfect clerkship grading system, unbiased by any of the varieties of previous clinical exposure, is probably elusive. As a result—and recognizing the considerable advantage of NBME subject examinations over local exams—we do not believe the demonstration of a timing bias justifies abandoning use of the exams. On the basis of the findings we have presented here, we have decided to use the psychiatry subject examination in our program as a clerkship barrier exam. Passing the test will be required for passing the clerkship, but different levels of excellence in a passing grade will not affect the overall clerkship grade. We have chosen to define “passing” as a score of 65 or higher. We believe that this helps guarantee a minimal level of knowledge about psychiatry based on national standards while reducing the effect of clerkship timing bias.
We are also mindful of the limitations of this study. The results are from a single medical school and may not be generalizable to other programs. In addition, we are unable to determine whether increasing exam scores are the result of a growing body of medical knowledge, or greater familiarity and facility with the testing format, or both.
FIGURE 1. Mean Psychiatry Subject Exam Score by Clerkship Sequence
Statistically significant differences were found between point 1 and point 4 (p≤0.05) and between point 1 and points 5, 7, and 8 (p≤0.01).