In recent years, the Royal College of Physicians and Surgeons of Canada has called into question the use of traditional oral examinations for specialty certification and has asked for major modifications to the format of certification examinations in all specialties. Specifically, in 1993 a Royal College Task Force presented the "Report on the Evaluation System for Specialist Certification," which recommended that "In oral examinations several clinical scenarios should be used instead of one…,"that "Examiners should work independently (i.e., not examine in teams or pairs)," and that "Although detailed extensive checklists are not essential in every instance, the performances of all candidates must be scored in a standardized fashion on several items for each clinical scenario or case…"(1).
These recommendations were the result of an extensive body of evidence demonstrating the inferior psychometric properties of traditional oral examinations involving only one long patient interview assessed by a pair of examiners. This traditional oral format has been shown repeatedly to have insufficient reliability and validity to support decisions about competence for "high-stakes" certification examinations (1). In seeking a format that could incorporate the requirements specified by the task force, one of the obvious choices is an objective structured clinical examination (OSCE). This assessment format has become the state of the art for performance-based assessment of medical students (2,3) and is becoming widely used for the assessment of postgraduate students (4—6), for the assessment of international medical graduates (7), and for certification by both the Medical Council of Canada (8) and the College of Family Physicians of Canada (9). In psychiatry, a growing body of literature supports the acceptability, reliability, and validity of OSCE assessment for medical students (10—13). However, unlike other medical disciplines, almost no research has been undertaken to examine the use of an OSCE for the assessment of postgraduate psychiatry residents. In fact, only one paper (14) reports on the use of this form of assessment for psychiatry residents. While the author reported that the format was well received, very limited information about the residents' attitudes and no psychometric data were reported. The Royal College Task Force Report itself has suggested that while OSCEs hold some promise for improving the psychometrics of assessment at the specialist certification level, they have not been studied well enough for this application (1).
The purpose of this study was to examine residents' experiences and attitudes about a psychiatry OSCE with a format used by medical schools across Canada and by the Medical Council of Canada. That format is a series of short stations in which students interact with standardized patients while being observed by physician examiners. Each examiner scores the interaction on a series of standardized measures but does not interact with the students. In almost all OSCEs used in Canada, stations are short, ranging from 5 to 15 minutes (7,9,12). This format is also used for psychiatry stations incorporated into large-scale examinations, such as those used by the Medical Council of Canada (8). Thus, while cogent arguments can be made that it would be appropriate to assess postgraduate residents in psychiatry with an OSCE format that uses longer stations, we were particularly interested in how residents would experience an OSCE as it is most commonly used in practice.
A unique opportunity to study residents' impressions of a psychiatry OSCE arose in 1997, when we received a grant to assess the validity of a psychiatry OSCE for medical students. The design required that a control group of residents take the examination to establish construct validity. The results reported in the medical education journal Academic Medicine (13) demonstrated robust validity of the psychiatry OSCE for the assessment of clinical clerks. The involvement of residents in the study gave us an important opportunity to evaluate their experiences and attitudes toward a psychiatry OSCE, something that they would not encounter at any other point in their training. Therefore, we collected detailed quantitative and qualitative information from the 15 residents about their experiences. We were particularly interested in their opinions about the possibility of using OSCEs for teaching and assessment of psychiatry residents. As we hoped, their observations and comments have proved to be invaluable to our efforts to understand and improve OSCE technology for use at the postgraduate level. We hope that by sharing our findings, we might assist others whom are considering or already using this increasingly popular assessment instrument.
Residents training in psychiatry at the University of Toronto were invited to participate in this study, conducted in 1997, which was approved by the Human Subject Review Committee. Of about 100 residents in the program, 18 residents ranging from postgraduate years (PGYs) 1—5 responded to an initial flyer and all agreed to participate. Each was paid $50 for time and participation. After signing consent that clearly indicated that their performance would in no way affect their in-training evaluations, they were randomly assigned to participate in one of six administrations of the departmental psychiatry OSCE. The psychiatry OSCE consisted of eight 12-minute stations in which candidates interacted with standardized patients trained to portray psychiatric problems. The examination had been carefully designed and studied over 4 years of use and found to have acceptable reliability and validity for the assessment of clinical clerks (10—13). The details of station design have been reported elsewhere (12), but the scenarios used in this study are shown in t1. Briefly, the scenarios were chosen to cover a wide spectrum of psychiatric pathology and different interviewing challenges, ranging from a demanding patient with borderline personality to a confused patient with delirium. All scenarios were set in the context of a first encounter in an emergency department, an outpatient clinic, or a hospital inpatient ward. All scenarios were created by psychiatrists and based on real cases. Standardized patients were recruited from a large standardized patient program at the university and trained by a team consisting of an experienced standardized patient trainer and an academic psychiatrist. To ensure realism, the standardized patients were instructed by using videos, relevant readings, and supervised practice, and each case was pilot tested with students and residents.
On examination day, the residents were oriented, together with the medical students, before taking the 2-hour examination.
Before entering each of the eight consecutive scenarios, the residents were shown a sheet of paper with a small amount of clinical information (patient age, gender, and presenting complaint). They were instructed to conduct each interview just as they would in their own clinical practice.
Although the residents' interviewing and management skills were assessed in all stations by using the same measures used to assess clinical clerks, the residents were told at the outset that none of the individual results would be released to their faculty supervisors. Both content and process variables were measured by using a content checklist of 15—25 items and five 5-point global scales in each station. With input from the standardized patients, the psychiatrist examiners who were observing in each station completed the evaluation measures.
Following the examination, the residents were asked to complete a detailed quantitative and qualitative assessment of their experience. Eleven questions focused on three key areas: 1) residents' perceptions of the realism of scenarios, 2) the appropriateness of the examination for the assessment of clinical clerks, and 3) the appropriateness of the examination for the assessment of residents. They were asked to rate their agreement on a 5-point scale (from strongly agree to strongly disagree) for each of 11 statements. Following completion of the quantitative survey, they were asked to "Describe what the OSCE was like for you in words" on a blank page. The residents were also invited to comment in narrative form on any of the 11 statements they made in the quantitative survey.
Numerical survey data were entered in a spreadsheet and are reported descriptively. Qualitative data were coded for themes and used to interpret and supplement the quantitative data.
A total of 18 residents volunteered to participate in this study. Although more residents expressed interest in participating, constraints of the examination design, availability of examiners, and a fixed amount of money available for honoraria limited the number of subjects who could be accommodated. Thus, the resident group was not approached for further volunteers. Of the 18 residents who took the examination, 15 (83%) completed the postexamination survey. All three of those who declined to participate cited the need to return to urgent clinical duties. The residents who chose to participate were fairly evenly distributed across the 5 years of training: in each of the five PGYs, there were 2, 4, 4, 2, and 3 residents, respectively, and 8 of the 15 were women (53%). Although the sample was small, both the level of training and the gender balance approximately reflected the program as a whole.
As can be seen in t2, opinion about the realism and accuracy of the scenarios was high. This assessment was supported by the qualitative data, in that there were very few comments suggesting the examination or scenarios were unrealistic or artificial. Further, the residents were unanimous that the situations used in the examination reflected those with which a family physician would have to deal. One commented, "My family practice friends have encountered many patients like these—and struggled." As well, all but three felt that the scenarios accurately reflected situations with which psychiatry residents would have to deal. "It was like any bad night in emerg," said one resident. The dissenters suggested that with real patients they would "act quicker in the emergency scenarios to treat" and that they "would get a more complete history in the office scenarios." Finally, one commented that "Cases that are this straightforward don't seem to make it to psychiatrists."
+
B. Usefulness in Assessing Clerks
Many residents felt that the examination was a fair assessment of clinical clerks, and most felt that it would detect an incompetent clerk. Interestingly, in the qualitative data many expressed concern that "the test was perhaps a bit difficult for a clinical clerk" and that it was "too demanding and will fail more people than needed." One resident felt that "some of the stations were too difficult (delirious/demented patient)." Another said, "Some of the stations required a high level of not only knowledge but sophistication—knowing the symptom checklist is one thing, being able to deal with complex psychodynamic issues and nightmare patients, i.e., the borderline, is another. I'm not sure this can be learned in a 6-week rotation." Their comments in this regard sharply contrasted with the views of faculty and clinical clerks themselves who have reported a high degree of satisfaction with both the format and the difficulty level of the same examination in previous studies (12).
+
C. Usefulness in Assessing Residents
Residents were much less positive about the suitability of the OSCE for evaluation of residents. As reported in t2, while half thought it would be a fair assessment of junior residents, fewer thought the OSCE would be useful for seniors and almost none for board certification. This finding was supported by the qualitative data, in that most cited concerns that the OSCE format they experienced did not tap into "advanced competencies" such as the therapeutic alliance, transference issues, and synthesizing biopsychosocial issues. "While the OSCE screens for weakness overall, there is little time to demonstrate advanced competency (as a senior resident should have). Even in the ER, there is usually 1 hour for assessment." Another said, "I am not sure if an OSCE of 12-minute stations would tell you more than an oral exam. The interpersonal part and the ability to synthesize info in the biopsychosocial model requires more time." A third said while it "may test basic knowledge…it does not reflect how we learn to assess and manage patients." But not all agreed. One resident called the OSCE "an excellent test that I believe would be very appropriate to training and assessing resident clinical skills," and another said "this really should be how we are examined so that everyone would meet the standards of practice." With regard to the place of an OSCE in specialty certification, while the vast majority rejected the idea of replacing the oral examination, one resident suggested a reduced chance of failure in an OSCE: "There are more cases and it's not just sink or swim with one patient."
Opinion was quite different with regard to the role of an OSCE for evaluation during residency training. Most of the comments were positive, including "interesting and helpful even at the PGY-3 level" and "this was good practice for me—I actually enjoyed it" and "an excellent experience." Some recognized personal weaknesses: "I realized that I don't know the criteria for post-partum blues vs. depression (how embarrassing!)" and "I realized that I didn't know the Folstein (mini-mental status) by heart." Others provided suggestion for modification, such as "perhaps fewer stations/longer for more senior residents with increased depth vs. breadth," whereas some were keen to help improve stations with comments such as "make the actor of certifiability tangential rather than word salad" and "I think it would be highly relevant to have an alcohol station."
Finally, there were several comments that suggested that the residents were aware that different competencies might be assessed at different levels of training and that their perceptions of what was expected influenced their performances. "It was unclear to me at times if I should be acting like a clerk/resident/staff in terms of how I took a history, and discussed management with the patient." Despite instructions to conduct themselves as they would in their clinical practice, some struggled with whether they should function as they did when they were clinical clerks or in some other way. "I had a difficult time putting myself in the right mind set—I realized how much my approach has changed since clerkship," said one resident. Another commented: "I think it is a fair assessment (the situations are not unrealistic), but I think the expectations of how the situations are handled should be different according to level of training."
Also, there were many general comments about the overall experience for the residents. These comments were coded for themes. About half were positive and half were negative. Overall positive comments referred to the examination being fair (1), fun (2), worthwhile (3), or confidence-building (2). Negative comments referred to it being exhausting (2), anxiety-provoking (2), stressful (2), traumatic (2), or confusing (1).
While a detailed analysis of the residents' actual examination scores has been reported elsewhere (13), it is important only to note that there was a wide range of scores, with the residents as a group obtaining scores similar to clinical clerks for straightforward data gathering, but significantly higher scores than clinical clerks for interpersonal process variables and management.
The role of an OSCE in the assessment of residents is unclear, particularly in psychiatry, where there has been a dearth of studies of this assessment format. Results from this study suggest that, on the positive side, the residents felt fairly enthusiastic about having such an experience as part of their training program. The residents were certainly much less positive about the use of such an examination at the Royal College of Physicians and Surgeons of Canada specialty certification level. This finding may not be surprising, given that the competencies required of senior residents are different from those required of clinical clerks, and the format under study was clearly developed for the assessment of the latter group. Similarly, the residents were more likely to endorse this format as an appropriate evaluation for junior residents than for senior residents, likely because the skill level of junior residents is more similar to that of clinical clerks. Nevertheless, it is important to examine carefully the reasons why residents felt negative about certain aspects of the experience. Contrary to a commonly expressed, albeit empirically unsupported, view that standardized patients are "inauthentic," the residents neither found the simulations unrealistic nor the use of standardized patients artificial. On the contrary, the residents reported a very high degree of satisfaction with scenarios presented and stated that, for the most part, they accurately reflected the problems they saw in clinical practice. Rather, the negative aspects they commented on were related to stress and anxiety, the short time interval, and the nature of the task. Indeed, all examinations are stressful, and very high levels of anxiety have been associated with traditional oral examinations (15). Thus, while evaluators should do everything possible to make examinations tolerable, it will be most useful in a future study to directly compare an OSCE and an oral examination format in terms of stress and anxiety generated.
With respect to the duration of stations, the 12-minute format was probably too short. The residents correctly pointed out that while knowledge of diagnostic criteria can be assessed in a short time, the advanced competencies of therapeutic alliance and biopsychosocial thinking may be difficult to demonstrate in this short interval. But that does not mean that an hour is necessary. The literature is clear that the gain in validity of using only one patient for a long interview comes at a very high cost in terms of very low reliability (1).
What is needed is a modification of the OSCE format that preserves its reliable and objective nature but allows the assessment of higher competencies in a way that is most valid for postgraduate students. MacRae et al. are investigating such a modification in surgery, and similar studies are needed in psychiatry (16).
There are, of course, several limitations to this study. First, the residents were invited to participate in an examination that was designed for the purpose of assessing the skills of clinical clerks, and their comments reflect some confusion about the level of expectation for their performance. Some of the anxiety undoubtedly resulted from this role confusion. Second, only a small sample of the resident group participated, and indeed, practicalities of the study limited the number that could be accommodated. It is quite possible that better prepared residents volunteered first, feeling more confident about their skills. Thus, the experiences and attitudes of this group might not be representative of the whole resident body at this postgraduate program, or other programs. Nevertheless, we have found the residents' observations to be very instructive in our efforts to conceptualize a better method of resident assessment.
OSCEs have been a very successful evaluation tool at the undergraduate level and hold promise for teaching and assessment in postgraduate programs. Despite the use of standardized patients, the residents found the scenarios very realistic. While presenting some promise as an evaluation instrument with better psychometric properties than traditional oral examinations, the format of short, focused stations, currently used to assess undergraduate students, will require modification before OSCEs can be used to evaluate postgraduate trainees in psychiatry. Further, an important literature documenting the cognitive differences between novice and expert professionals exists and should become the basis for such modifications (17). Given the recent directives of the Royal College of Physicians and Surgeons of Canada and the growth in use of OSCE for assessment of residents, we anticipate interesting and innovative modifications of the "traditional" OSCE to meet the needs of postgraduate trainees.
This study was supported by a peer-reviewed grant from the Medical Council of Canada Research and Development Fund.