Medical schools have traditionally used two means of assigning grades in psychiatry clerkships: written exams and clinical evaluations from faculty and residents. Written exams provide an objective measure, since all students are presented with the same questions and similar answers can reasonably be expected to result in similar grades. Clinical evaluations may offer a more comprehensive means of evaluating skills and knowledge than a written exam. Grades from clinical evaluations, however, may be more variable, as they may be dependent on both the particular evaluator and the clinical situations with which a student is confronted (1,2).
An alternative that aims to provide a compromise between the two forms of testing is an objective structured clinical examination (OSCE). An OSCE often uses standardized patients (SPs) to evaluate a student's ability to perform a clinical interview. The SPs are trained to record and evaluate the students' actions in a simulated clinical setting. The interview is also observed, either directly or on videotape, by an experienced clinician or other evaluator. A grade is assigned based on performance in this situation. A written exam in which the student provides a diagnosis and proposes treatment can also be used to increase the reliability of the exam (3). Ideally, the OSCE should combine objectivity with the opportunity to evaluate a wide range of skills.
Using an OSCE in a psychiatry clerkship provides a set of challenges not seen in some other forms of such exams. Many reports about OSCEs discuss the use of multiple standardized medical cases, each of which a student can be expected to confront in just a few minutes. In some instances this format has been adapted to use standardized psychiatric patients (4,5). Interviews were similarly brief and were aimed to simulate the time a family practitioner or other primary care provider might have with a psychiatric patient. A full, in-depth psychiatric interview would be expected to take much longer, and reports about using an OSCE to simulate such an encounter are not evident in the literature.
To test the value of such an OSCE within the third-year psychiatry clerkship at Tulane University, we established an exam using a single simulated case. After completing the exam, students were solicited for their opinions of the OSCE and for ways to make it better. The OSCE also was evaluated for reliability by examining the consistency of the grading, and for validity by comparing scores on the OSCE to previously established means of evaluating students during the clerkship.
Funding for this project was made possible by a grant from Partners in TIME (Teaching and Learning Innovations in Medical Education), a program initiated by Tulane University School of Medicine's Office of Educational Research and Services and based on Louisiana Board of Regents Support Funds. The total cost was $7,170, half of which supported indirect costs. The remaining funds provided reimbursement for SPs (training and session performance), independent reviewers who rated the sessions on videotape, and a statistician who conducted data analyses. This project required close collaboration between the Medical Student Education Program in Psychiatry and the Program for Teaching and Assessment of Clinical Skills.
SPs simulated a case based on a published account of a young woman with multiple, complex problems (6). Axis I diagnoses were alcohol abuse and major depressive disorder with atypical features. The Axis II diagnosis was borderline personality disorder. According to the simulation, physical (Axis III) complaints had been addressed by the referring, primary care clinic and involved vaginal discharge. Axis IV (stressors) included a history of sexual abuse, the recent pregnancy of her mother, gynecological complaints, and the worry that she may have contracted a sexually transmitted disease. The SPs also simulated eventful psychiatric, medical, family, and social histories.
Three SPs were trained over the course of three sessions. In the first session, SPs were introduced to the case. This included a narrative history of the patient whose role they were assuming, a description of that patient's personality, what questions needed to be asked to elicit particular responses, and how to interact during the simulation. In the second training session, SPs practiced using the content (response) checklist, watched videotapes of each other's portrayals, and attempted to standardize their performances. SPs were trained not to disclose information unless asked, but to be forthcoming within the context of any given question. In the last training session, psychiatry residents interviewed the SPs in their roles while being supervised by senior faculty. In the actual interviews with students, SPs used makeup to simulate recent self-inflicted slash marks on both their wrists.
All Class of 2001 students assigned to the psychiatry clerkship during the second semester of their third year of medical school participated in the study (N=82). Students were told the simulated patient's chief complaint, a brief medical history, and vital signs. They were then given 45 minutes to complete a psychiatry evaluation. After the interview, an additional 15 minutes was allotted for completion of a written exercise, which consisted of listing a differential diagnosis, identifying safety risks to the patient, and suggesting a treatment plan.
Student interviews were videotaped. Performance on the OSCE was assessed on the basis of three components. First, SPs completed a content checklist designed to ensure that students performed all the necessary components of the psychiatric interview. The checklist contained 36 items, including basic elements of a mental status examination along with relevant questions such as suicide risks and history of drug and alcohol abuse. Second, SPs completed a patient perception scale, which assigned scores on a 5-point scale (1=poor, 5=excellent) for friendliness, respect, willingness to listen, proper attention, responsiveness, and clarity. Third, the written component of the exercise, described above, was scored by a psychiatry faculty member (P.R.). Scores on the OSCE were used for self-assessment only and did not contribute to the students' final grades in the clerkship.
+
Evaluation of the Usefulness of the OSCE
Graded portions of the OSCE were checked for reliability by calculating Cronbach's coefficient alpha and by comparing how different evaluators graded similar sessions. The written exam scores were checked by an independent grader. Scores on the content checklist and patient perception scale were repeated by evaluators (three 4th-year medical students) who watched the interviews on videotape, had the checklist available at the time of review, and had previously received a 30-minute training session on how to verify items on the checklist.
Validity was evaluated by investigating correlations among OSCE grades and previously established measures of student performance, including scores on the National Board of Medical Examiners (NBME) subject exam in psychiatry, grades on an essay exam in which students write an evaluation based on two videotaped interviews of actual psychiatric patients (video exam), and grades in the clinical portion of the psychiatry rotation. Clinical grades were determined by attending physicians who had observed at least one full interview conducted by the students and who had supervised the students throughout their clinical rotation. Since no prior assumptions were made about how scores in these areas might compare, correlation coefficients were calculated by means of both Pearson's linear correlations and Spearman's nonparametric correlations. Statistical significance was established at P<0.05.
At the end of the exam, students filled out an evaluation in which they used a 7-point scale to rate the exam for usefulness, believability, difficulty, and appropriate length. Students were also encouraged to contribute individual comments about the OSCE exam, including its length, the complexity of the case, and how the OSCE might be improved.
Of the 82 students who completed the OSCE, data on grades in the psychiatry rotation were available for 80, and 78 completed the evaluation form to give their opinions of the OSCE. Cronbach's coefficient alpha was calculated as 0.76 for the OSCE content checklist as evaluated by the SPs. SPs and clinical evaluators reviewing videotapes of the exam agreed 88% of the time on checklist items, a statistically significant result (r=0.76, P<0.001). This is in keeping with findings by others (7) that indicate that SPs accurately record 30 items at 76% accuracy. Two independent graders agreed 91% of the time for points on the written component, also a statistically significant correlation (r=0.81, P<0.001). Ratings by SPs and videotape reviewers for interpersonal performance on the patient perception scale during the OSCE performance were not significantly correlated (r=0.018, P>0.1), nor were videotape reviewers' scores in this area significantly correlated with any other measure.
F1 shows how students rated the OSCE for usefulness, believability, and difficulty. Also shown are student opinions of the appropriateness of the amount of time allotted for the exam (45 minutes for the interview and 15 minutes for completing the written portion). Mean scores for the 7-point scale are shown with standard deviations (n=78).
t1 shows correlation coefficients among score on the NBME shelf exam in psychiatry (NBME Exam), grade assigned by the attending physician from the student's clinical rotation in psychiatry (Ward Grade), grade from an exam in which students evaluate two videotaped patient interviews (Video Exam), and components of the OSCE, including the thoroughness of the interview as measured by the content checklist filled out by the SPs (OSCE Checklist), written differential diagnosis and treatment plan formulated by the students (OSCE Written), and the SP's perception of the students' professionalism and empathy as expressed on a 5-point scale (OSCE Personal). Parametric (Pearson) correlation coefficients are shown above nonparametric (Spearman) coefficients.
Correlations within OSCE components (Checklist, Written, Personal) and between OSCE components and psychiatry clerkship grade components (NBME Exam, Ward Grade, Video Exam) were all positive but small to moderate in strength (t1). Among OSCE components, Checklist correlated significantly with Written and Personal, but Written and Personal were not significantly correlated. OSCE Checklist did not correlate significantly with any of the three psychiatry clerkship grade components, but OSCE Written correlated significantly with NBME Exam and Ward Grade, and OSCE Personal correlated significantly with Video Exam.
This study showed that it was both feasible and practical to conduct an OSCE simulating a full-length interview with a complicated psychiatric patient. Costs were similar to those reported for other OSCE exams (8), with cost per student averaging about $50, excluding indirect expenses. The use of a single case, as opposed to multiple stations, may also save on training costs and make it easier to recruit and train the required number of SPs. In a real testing scenario, however, it might be necessary to vary the individual case used for test security reasons.
Students were specifically asked about the time allotted and complexity of the case. They seemed to find both appropriate. We did have isolated comments from both ends of the spectrum, stating that the case was either too simple ("really textbook"), too complicated ("extremely difficult to assess"), too short ("I wish we had more time"), and too long ("35 minutes would be sufficient"). However, the majority of students (77%) rated the time allotted as about right, and on average students rated the case as moderately difficult (F1). Student comments also indicated that this sort of interview was an appropriate test of their abilities: "All practitioners should know how to diagnose, assess, and begin treatment of such cases."
Other student feedback concerning the OSCE was similarly positive. Response scores for believability and usefulness were high (F1). Individual comments also were generally favorable. Students commented that the "patient was very believable," that "it was very well done," and that "having a live patient was better than a [written] exam could ever be." Students also liked that the exam was "like the ER, ICU, triage, or any place you go to see patients."
The negative comments focused mostly on two general points. First, there were a few complaints about pressures from adding the OSCE when students felt their time would have been better spent studying for the other exams: "Don't put it the day before our [written] exam" and "Have it at a more convenient time" were fairly indicative of this sentiment.
Overtesting is a common stress in medical school, and thus such complaints were not surprising. Making use of the OSCE earlier in the psychiatry rotation, away from other exams, could offer other potential benefits. Standardized patients are a useful means not just for testing, but also for teaching (9). As one student put it, "An [SP] interview earlier in the rotation becomes the basis for additional learning."
The second category of complaint that students made related to clarity of objectives and assessment of their performance, e.g., "Please give use more information about the format prior to taking the OSCE" and "I would not want this graded unless there is a bit more structure." This complaint had some merit. Grading of an OSCE based on a single case is unlikely to have the same degree of reliability or objectivity as a written exam (10).
In rating interpersonal skills, we found no significant correlation between the ratings given by the SPs and those from independent reviewers who rated videotapes of the interviews. The discrepancy between the SPs' and independent reviewers' ratings of interpersonal skills might, in part, be explained by differences in direct, in-room, observation versus observing the interview on video. Both video review (11) and grading done by SPs (12). have been used successfully in the past as means of evaluating clerks. Independent reviewers may have been less invested in the process, however. Because their engagement is limited to observation, they could not get the same feel for the interaction as someone involved in the process. At least the SPs appeared to be more in tune with how well the interviewer was obtaining relevant information. SP ratings of interpersonal performance correlated with the OSCE checklist and performance on the clerkship Video Exam (t1), whereas ratings by the reviewers did not (data not shown). The correlation between the SPs' personal ratings and performance on the Video Exam is particularly interesting, since it could imply that the SPs were able to get a sense not just of how well students were gathering information in this interview, but also of how attuned the students might be in other situations.
The strength of the correlations between components of the OSCE and measures of psychiatry clerkship performance was only moderate. This result may have reflected a real difference in skills. For even the most well established measures, performance in one area does not always correlate well with performance in another (12). Alternatively, the variation observed may have been due to the measures used. In particular, future studies could benefit from more objective measures of communication skills (13,14). A third possibility is that results in the OSCE varied because the test itself was variable. Responses to individual areas of questioning can be standardized, but it is difficult to standardize aspects such as affect. Both performers and performances could vary according to when and with whom a student was tested. Unfortunately, we did not record which students worked with which SPs in the initial data set, so we cannot say if one SP might have communicated the sense of the case better than another. Future studies would benefit, however, from tracking variations in the testing environment as they relate to performance in the exam.
Although there was grading variation in the OSCE, the test did meet the main goals of this method for assessment—offering a compromise between the objectivity of multiple-choice exams and the broader scope of clinical evaluations. Scores from the written portion of the OSCE were significantly correlated with results on the NBME exam and with ward grades assigned by experienced clinicians who had observed the students during their rotations on psychiatry.
Reliability for both the written and checklist portions of the OSCE were actually quite good for this type of exam (7,10). Independent graders agreed around 90% of the time on individual questions, and they showed significant overall agreement in final grades. A high degree of internal-consistency reliability was seen in the content checklist, and results from the checklist were correlated significantly with performance on the written and interpersonal-performance portions of the OSCE. In other words, students who were more thorough in their interviews were more likely to correctly assess the patient and to leave a favorable impression on the SPs.
The validity of the OSCE is supported by comparison with non-OSCE measures. Students who did better on the OSCE written exam were likely to be the same students who scored well on ward clinical evaluations and the NBME exam. In addition, SP ratings of interpersonal skills correlated significantly with the Video Exam component of the clerkship. There is likely a real and important difference among students who scored well on the OSCE and those who scored poorly. Thus, the OSCE could be a useful means of identifying students who are particularly deficient in knowledge, clinical evaluation skills, or bedside manner.
Previous studies of the OSCE have demonstrated the usefulness of multiple short interviews to test students' ability to assess a psychiatric patient (4,5). Our experience demonstrates that the OSCE need not be limited to this format. A single, full-length psychiatric interview was observed to be reflective of both student knowledge on written exams and of performance in the ward. Short interviews are appropriate to a family practice or similar setting in which a general practitioner might be required to deal with a simple psychiatric issue. A longer exam offers the opportunity to assess a student's ability to draw out subtle information necessary to formulate as well as diagnose a patient with multiple problems, the type of patient that is likely to be referred to a psychiatrist. The test was not perfect. It should be remembered that even for an ideal OSCE, some degree of subjectivity is likely to be introduced during the grading process. Nonetheless, future OSCEs would benefit from refinements that might improve objectivity and standardization. Our experience indicates that an OSCE simulating a complete interview with a complicated patient can be an effective tool for evaluating medical students in their psychiatry rotations.