Academic Psychiatry
Journal Home Search Current Issue Past Issues Subscribe All APPI Journals Help Contact Us
 
Quicksearch
Advanced Search
Or Search All APPI Journals
This Article
* Full Text (PDF)
* Alert me when this article is cited
* Alert me if a correction is posted
* Citation Map
Services
* Email this article to a Colleague
* Similar articles in this journal
* Similar articles in PubMed
* Alert me to new issues of the journal
* Add to My Articles & Searches
* Download to citation manager
* reprints & permissions
Citing Articles
* Citing Articles via Google Scholar
Google Scholar
* Articles by Yudkowsky, R.
* Search for Related Content
PubMed
* PubMed Citation
* Articles by Yudkowsky, R.
Related Collections
* Miscellaneous Education and Training
* Education, Psychiatrists
Academic Psychiatry 26:187-192, September 2002
© 2002 Academic Psychiatry


Commentary

Should We Use Standardized Patients Instead of Real Patients for High-Stakes Exams in Psychiatry?

Rachel Yudkowsky, M.D., MHPE

Dr. Yudkowsky is Director of the Clinical Performance Center and Associate Director of Faculty Development, Department of Medical Education, University of Illinois at Chicago College of Medicine. Address correspondence to Dr. Yudkowsky, Department of Medical Education M/C 591, University of Illinois at Chicago College of Medicine, 808 S. Wood Street, Chicago, IL 60612. E-mail: rachely{at}uic.edu

Key Words: Standardized Patients • Objective Structured Clinical Examination (OSCE)

Performance assessment of residents and clinicians is a high-stakes activity for both examinees and society. Assessment during residency may affect residents' ability to continue their training; assessment leading to board certification can determine clinicians' ability to participate in managed care panels and other professional activities. Patients, hospital credential committees, and healthcare organizations rely on residency programs and the specialty boards to certify that their clinicians are capable and safe practitioners. Medical educators are constantly striving to develop valid, reliable, and fair assessment methods to provide better accountability for both examinee and societal stakeholders.

Standardized patients (SPs) and objective structured clinical examinations (OSCEs) are increasingly popular as assessment methods in medical education. Standardized patients provide simulations of real patients that can be enacted repeatedly in a consistent, reliable manner. OSCE is a performance assessment format in which examinees rotate through several examination stations and are presented with a different performance challenge in each station (1). OSCEs are typically composed of several SP cases with or without additional non-SP stations. More than 90% of medical schools accredited by the Liaison Committee on Medical Education (LCME) use SPs and/or OSCEs for teaching and/or assessment of medical students (2). The Accreditation Council for Graduate Medical Education (ACGME) recently recommended that residency programs develop SP and OSCE assessments as the "best method" to evaluate competencies such as interviewing, patient education and counseling, physical examination, interpersonal and communication skills, and professionalism (3). In the same vein, The American Board of Psychiatry and Neurology (ABPN) is considering the use of SPs in the Part II examination as a way of improving the fairness and psychometric properties of the exam (4).

I will suggest in this paper that there is much to be gained by moving to the use of simulated patients in place of real patients for high-stakes performance assessment of residents and clinicians in psychiatry. After reviewing some problems with the use of real patients and the evidence for the validity and reliability of standardized patients (for assessment in general and for psychiatry in particular), I will discuss the advantages and disadvantages of using SPs for high-stakes exams and reflect on some of the challenges that remain in designing OSCEs for the assessment of residents and clinicians in psychiatry. Although I will briefly describe two recent projects of the ABPN Subcommittee on the Utilization of Standardized Patients, of which I am a member, the opinions stated herein are mine alone and do not necessarily reflect the position or opinion of the ABPN.

WHY NOT CONTINUE TO USE REAL PATIENTS?

An assessment based on an interaction with an actual patient has tremendous appeal. The uniqueness and variability of real patients affords the observation of diverse interpersonal and communication skills and of the ability to deal with real affects and challenges of the sort examinees might well encounter in practice. The interview provides rich material for a post-encounter discussion that can provide insight into an examinee's knowledge, clinical reasoning, and management philosophy. Unfortunately, this same uniqueness and variability makes real patients poorly suited for high-stakes summative evaluations of clinical skills. High-stakes examinations require a high standard of validity, reliability, and fairness, all difficult to achieve with real patients. I will address these in order.

Validity refers to whether an assessment actually measures what it purports to measure, in our case clinical competence: the extent to which an individual can handle the various situations that arise in a domain of practice such as psychiatry (5). An assessment based on an interaction with a real patient has very good "authenticity": the task to be performed in the assessment is very similar to the task in actual practice; that is, interacting with a real patient. The validity problem arises because of the small number of interactions that can be observed in the context of an examination with real patients and the nonrepresentativeness of the patients who are able to participate in examinations.

Content validity asks whether the sample of cases in the examination is a representative sample of the universe of cases that examinees should be able to handle. A representative sample is needed because competency is case specific (6,7). The ability to handle a challenge of one type, for example a depressed patient with psychomotor retardation, tells us little about that same clinician's ability to handle a different challenge, such as a patient undergoing alcohol withdrawal. An exam composed of only one or two cases cannot provide an adequate sampling of the extensive and varied domain of psychiatric practice.

The content validity of real-patient assessments is also limited by the need for patients to be able to understand and consent to their role in the examination process. Thus patients who are actively psychotic, delirious, demented, or belligerent will be systematically excluded from the examination. The sample of patients available for an examination often includes only those outpatients who are stable enough to arrive reliably at the examination site and cooperate with the examination process—a very unrepresentative sample indeed.

Reliability refers to the reproducibility, consistency, or stability of measurements across different trials. Reliability is a challenge in part because of the substantial variability in ratings by different examiners. In a classic study by Noel et al. (8), 203 experienced internal medicine faculty rated videotapes of two residents in a mock clinical practice exam. The performance of the same resident was rated as unsatisfactory or marginal by 48% of raters and satisfactory or superior by 52% of raters! Faculty identified only 30% of the residents' strengths and weaknesses, although this rose to 60% or more among raters using a structured form. Similarly, Kalet et al. (9) found that experienced, trained faculty viewing a videotaped interaction correctly identified only 43% of students who were rated as inadequate by a panel of expert raters. Significantly, Kalet and colleagues found that the ratings most salient for psychiatry, those of interpersonal and communication skills, were consistently less reliable than ratings of the information obtained during the interview. Expert examiners—connoisseurs of the behavior to be observed—can provide a more reliable global rating of competence and decrease this variability somewhat. Panels of 3 to 5 expert raters are the current gold standard for evaluating performance.

Variability between examiner ratings is certainly a concern. An even greater source of variability is the inconsistency of examinee performance across cases—as noted above, performance on one case is a very poor predictor of performance on other cases. This content specificity is found across examination platforms; it is true of computer simulations, vignettes, and multiple-choice questions. Since a given patient is in effect a single scorable "item" on the examination, many patients must be seen in order to reach a reliable and defensible decision about an individual's competency.

Fairness is another major challenge for live-patient assessments. Since each examinee encounters a unique patient/examiner dyad, there is no a priori way to standardize the task, the examiner, or the scoring rubric. To the extent that these are unstandardized, an oral exam can be a liability rather than an advantage (10).

To summarize, the difficult logistics associated with real patients typically result in nonstandardized assessments consisting of one or two cases that are unsystematically selected from a nonrepresentative pool of patients. These conditions unsurprisingly result in assessments that are characterized by relatively low validity and reliability and that can be perceived as unfair.

Until recently, the use of real patients, despite all its drawbacks, was the only way to authentically assess the interpersonal and communication skills so crucial to psychiatric practice. With the advent of standardized patients, that is no longer the case.

STANDARDIZED PATIENTS

A standardized patient case is a simulation of a patient by an actor or other lay person who is rigorously trained to present the specified history and physical findings in a particular manner. Unlike role-play, a simulation is scripted and standardized so that the portrayal is highly consistent. A capable SP responds accurately, flexibly, and in character to different clinicians who may have very different interpersonal and interviewing styles. The SP may also complete a checklist or rating scale that records and assesses the behavior of the examinee during the interaction.

The psychometrics of standardized patients were extensively researched in the years following the introduction of SPs by Howard Barrows in the 1960s. Several excellent reviews of the SP methodology were published in the 1990s (5,7,11,12) and are highly recommended for anyone planning to implement an SP teaching or assessment program. Some of the findings in these reviews of the use of SPs (to portray medical patients) are as follows:

  • SPs can portray scripted cases accurately and consistently. Experienced physicians cannot differentiate real patients from SPs when the latter are sent unannounced into a physician's office.
  • A pass-fail decision based on only one SP case is not a reliable indicator of the examinee's clinical skills. About 10 to 14 cases are needed to reach a reliability of 0.80. A reliable pass-fail decision can be reached with fewer cases.
  • Additional examiners should be used to increase the number of cases assessed, rather than to increase the number of examiners per case.

CAN SPs CONVINCINGLY PORTRAY PSYCHIATRIC DISORDERS?

The ability of SPs to portray medical patients does not necessarily mean that they can accurately render the complex affects and behaviors inherent in many psychiatric disorders. While there are only a handful of published papers describing the use of standardized patients in psychiatry, their results have generally been very positive.

Famuyiwa et al. in 1991 (13) were the first to report on the use of simulated patients in psychiatry. Although they concluded that "there is some evidence for the justification of using the OSCE as a major form of assessment in psychiatry," these unstandardized simulations would not have provided the improved reliability and fairness obtainable with SPs.

Loschen (14) used SPs in an OSCE to assess the clinical skills of 2nd-year and 4th-year psychiatry residents. Three classes of residents (N=15) provided feedback about the experience. The overall examination received a rating of 1.0 from all three classes, where 1=positive and 0=negative. The quality of the SPs was rated 0.8 and 0.5 by the second and third groups, respectively; the first group rated only the overall experience.

Hodges et al. (15) developed a 6-station psychiatry OSCE to evaluate 3rd-year students. They reported that 70% of 120 students and 94% of 80 faculty examiners felt that the portrayals were "very realistic." In another study by the same group, 15 residents took the clerkship OSCE for validation purposes; 80% agreed that the simulations were realistic and reflected situations that a psychiatry resident would have to deal with (16).

Coyle et al. (17) described using simulated patients in a resident psychotherapy class and reported that this was "well received." They felt that using experienced mental health counselors as the SPs enhanced the realism of the situation.

The American Board of Psychiatry and Neurology conducted two studies to establish whether standardized patients could portray psychiatric disorders well enough for a Boards-type assessment. In the first project, SPs were trained to simulate a patient with schizophrenia and a patient with major depression. The ABPN concluded that SPs could be used to assess the interpersonal skills and the history-taking skills of residency-trained psychiatrists, and established the Subcommittee on the Utilization of Standardized Patients.

In the second project, 23 ABPN directors and senior examiners served as examiners and 16 Chicago-area psychiatry residents served as the examinees in a pilot OSCE consisting of 3 SP cases and a vignette. Feedback about the SPs' portrayal of patients with psychiatric diagnoses and about their ability to evoke a realistic emotional response in the participants was positive. As a result, the ABPN decided to continue to explore the usefulness of this testing methodology for specialty certification and conducted a more extensive field trial in June 2002 in collaboration with the Educational Commission for Foreign Medical Graduates (ECFMG; 4).

Krahn et al. (18) challenged the verisimilitude of psychiatric SPs on the basis of their pilot study comparing the responses of students and faculty to actual and standardized patients. Students could usually identify the SPs, felt less attentive and less emotionally engaged with SPs, and had more difficulty feeling empathy for a standardized patient. Faculty were always able to detect SPs, they felt that the SP cases were too straightforward and unrealistic, and they, too, found it harder to empathize with the SPs. The authors speculate that students and faculty may have been distracted and biased by their efforts to identify which patients were "imposters." Some of the differences between the Krahn pilot and the other studies with more positive results may also be a function of patient training.

Psychiatric disorders are indeed difficult to simulate. Achieving high-quality simulation and standardization requires hours of training, which may include watching videotapes of real patients with the disorders or even visiting a psychiatric unit. However, the majority of studies indicate that SPs can be trained to simulate psychiatric disorders to a degree sufficient for high-stakes testing. As with Resusci-Annie models and flight simulators, a simulation need only be "good enough" to reliably evoke the behavior to be assessed, and SPs seem well able to meet this criterion.

ADVANTAGES OF USING SPs AND OSCEs

Granted that SPs can provide high-fidelity simulations of psychiatric disorders, what would be the advantages for high-stakes testing?

Logistics. SPs are available when and where needed. SPs can be trained to the specific case required, and can portray that case repeatedly and consistently. There are no issues of confidentiality or consent, or ethical concerns that undergoing the examination process might not be in a patient's best interest.

Validity. Examinations can include multiple long and short cases and thus can provide a more systematic and representative sampling of the domain of psychiatric practice. Exam blueprints can be constructed to specify the types of cases and patients to be seen. Critical tasks such as dealing with a bolting patient or treating a drug overdose can be assessed in a safe and controlled setting. Assessments can include ratings of interpersonal skills by SPs in appropriate cases, providing added depth to the evaluation and affording a voice to patient stakeholders.

Interrater reliability. Key tasks or critical actions can be determined for each case. Examiners can be trained to attend to these key features of each task—and be provided with exemplars of acceptable and unacceptable performance—as a way of decreasing variance across raters. An examiner may rate many examinees on the same case, enabling them to become "expert raters" or connoisseurs of the variety of behaviors possible in the context of that case.

Generalizability. The larger number of cases possible in SP-based exams provides a much broader base from which to generalize to performance in practice.

Fairness. The examination is more equitable to the extent that both the patients and the examiners are standardized and consistent across examinees.

DISADVANTAGES OF USING SPs

No matter how realistic the portrayal, SPs are not in fact real patients such as those that a clinician will be dealing with in practice. Nonetheless, SPs can provide sufficiently high-fidelity simulations that the advantages far outweigh the slight loss of authenticity.

Cost is a major consideration in SP-based examinations. Standardized patients must be extensively trained, usually by an experienced full-time trainer. Additional training is needed if the SPs are to function as raters as well. There is considerable evidence that trained SPs using a checklist or rating scale can record and assess the key questions and maneuvers expected of an examinee as effectively as faculty examiners. Accordingly, some institutions take advantage of SP raters to decrease the amount (and cost) of physician time committed to an exam.

CHALLENGES AND OUTSTANDING ISSUES

Further research is needed in several areas. For example:

Scoring instruments for expert versus novice clinicians. Expert clinicians tend to reach diagnoses by rapid pattern-matching of salient features rather than by the methodical inquiry typical of novices. Experts and novices also differ in the questions they ask, their representation of the problem, and their discourse about a case. These and other differences may have an effect on the types of scoring instruments most appropriate for different levels of examinees.

SP exams of medical students are generally scored by using dichotomous checklists: either an item receives credit or not. Rating scales, on the other hand, allow the rater to indicate how well the item was performed, for example on a five-point scale. There is mounting evidence that checklists, which favor the stepwise approach taught to medical students early in their training, are inappropriate tools for the assessment of more expert clinicians. Thus Hodges et al. (19) found that attending psychiatrists scored lower than students on the checklist score of a psychiatry OSCE, but higher on a global rating scale.

Other questions about scoring include whether SPs or clinicians (or both) should rate the interaction and whether other methods such as post-encounter probes (written or oral questions about the case) add value to the assessment. How scores for different cases should be combined and how standards should be set are questions still at issue for SP exams in general, and psychiatry will need to grapple with these as well.

Examiner training. SPs who rate high-stakes exams undergo rigorous training and quality assurance to ensure that their ratings are consistent and fair. What kind of training would be practical, effective, and palatable for clinicians serving as examiners? What kind of training—and what kind of rating method—would enable examiners to utilize their expert judgment while minimizing idiosyncratic ratings?

Station length. Decreasing the duration of an OSCE station, within reason, allows more stations to be included in the exam and results in improved validity and reliability. The appropriate length for a given OSCE station depends on the specific task to be accomplished or behavior to be assessed. Little if any research has been done to determine the minimum time needed to assess various tasks.

Impact on the curriculum. Medical schools that have established SP-based assessments find that students make more effort to see a variety of patients and to obtain feedback on their performance from residents and faculty. Given that "assessment drives the curriculum," what are the effects of conducting high-stakes SP-based exams on psychiatry residency programs and on resident and clinician learning?

CONCLUSIONS

Standardized patient–based assessment of performance offers the opportunity to significantly upgrade the validity, reliability, and fairness of high-stakes examinations of psychiatry residents and clinicians. However, SP assessments are costly and resource-intensive, and they should not be viewed as a panacea. They should be reserved for skills that can be assessed only through the vehicle of a doctor–patient interaction, and not to assess skills such as clinical reasoning that can be assessed as well or better by written or computer-based exams (7). The ACGME website (www.acgme.org) provides a useful table suggesting the preferred assessment methods for their mandated competencies.

Hodges and colleagues' OSCE handbook in this issue provides an excellent resource for educators to use in designing and implementing SP-based exams and OSCEs. Collaborations among residency programs, across task forces from the Association for Academic Psychiatry and the American Association of Directors of Psychiatric Residency Training, and between residency programs and the ABPN can facilitate the creative resolution of the remaining research and development challenges.

SPs are already in use in high-stakes examinations. The Educational Council for Foreign Medical Graduates and The Medical Council of Canada licensing exams include SP OSCEs, and the National Board of Medical Examiners is planning to institute an SP-based clinical practice exam for graduating medical students within the next few years. Psychiatry too can benefit from authentic, valid, and reliable assessments using standardized patients.

REFERENCES

  1. Harden RM, Gleeson FA: Assessment of clinical competence using an observed structured clinical examination. Med Educ 1979; 13:41-47[Medline]
  2. LCME annual medical school questionnaire, 1999-2000. Data available by request from the Association of American Medical Colleges, www.aamc.org
  3. ACMGE/ABMS Joint Initiative: ACGME competencies: suggested best methods for evaluation. Version 1.1, September 2000. Available from the Accreditation Council for Graduate Medical Education: www.acgme.org/Outcome/assess/ ToolTable.pdf. Accessed 10/29/01
  4. Davis G, Schowalter J: Report of The subcommittee on the Utilization of Standardized Patients. American Board of Psychiatry and Neurology internal document, January 2001
  5. Kane MT: The assessment of professional competence. Evaluation and the Health Professions 1992; 15:163-182[Abstract/Free Full Text]
  6. Elstein AS, Shulman LS, Sprafka SA: Medical Problem Solving: An Analysis of Clinical Reasoning. Cambridge, MA, Harvard University Press, 1978
  7. van der Vleuten CPM, Swanson DB: Assessment of clinical skills with standardized patients: state of the art. Teaching and Learning in Medicine 1990; 2:58-76
  8. Noel GL, Herbers JE, Caplow MP, et al: How well do internal medicine faculty members evaluate the clinical skills of residents? Ann Intern Med 1992: 117:757-765
  9. Kalet A, Earp JA, Kowlowitz V: How well do faculty evaluate the interviewing skills of medical students? J Gen Intern Med 1992; 7:499-505[Medline]
  10. Guerin RO: Disadvantages to using the oral examination, in Assessing Clinical Reasoning: The Oral Examination and Alternative Methods. Edited by Bashook PG, Mancall EL. Evanston, IL, American Board of Medical Specialties, 1995, pp 41-48
  11. Colliver JA, Williams RG: Technical issues: test application. Acad Med 1993; 68:454-460[Medline]
  12. Barrows HS: An overview of the uses of standardized patients for teaching and evaluating clinical skills. Acad Med 1993; 68:443-451[Medline]
  13. Famuyiwa OO, Zachariah MP, Ilechukwu STC: The objective structured clinical exam in psychiatry. Med Educ 1991; 25:45-50[Medline]
  14. Loschen EL: Using the objective structured clinical examination in a psychiatry residency. Academic Psychiatry 1993; 17:95-104[Abstract]
  15. Hodges B, Regehr G, Hanson M, et al: An objective structured clinical examination for evaluating psychiatry clerks. Acad Med 1997; 72:715-721[Medline]
  16. Hodges B, Hanson M, McNaughton N, et al: What do psychiatry residents think of an objective structured clinical examination? Academic Psychiatry 1999; 23:198-204[Abstract/Free Full Text]
  17. Coyle B, Miller M, McGowan KR: Using standardized patients to teach and learn psychotherapy. Acad Med 1998; 73:591-592[Medline]
  18. Krahn LE, Sutor B, Bostwick JM: Conveying emotional realism: a challenge to using standardized patients. Acad Med 2001; 76:216-217[CrossRef][Medline]
  19. Hodges B, Regehr G, Hanson M, et al: Validation of an objective structured clinical examination in psychiatry. Acad Med 1998; 73:910-912[Medline]




This Article
* Full Text (PDF)
* Alert me when this article is cited
* Alert me if a correction is posted
* Citation Map
Services
* Email this article to a Colleague
* Similar articles in this journal
* Similar articles in PubMed
* Alert me to new issues of the journal
* Add to My Articles & Searches
* Download to citation manager
* reprints & permissions
Citing Articles
* Citing Articles via Google Scholar
Google Scholar
* Articles by Yudkowsky, R.
* Search for Related Content
PubMed
* PubMed Citation
* Articles by Yudkowsky, R.
Related Collections
* Miscellaneous Education and Training
* Education, Psychiatrists


Get information about faster international access.

Privacy Policy

Copyright © 2002 Academic Psychiatry. All rights reserved.

Home | Search | Current Issue | Past Issues | Subscribe | All APPI Journals | Help | Contact Us

American Psychiatric Publishing, Inc. American Association of Chairs of Departments of Psychiatry American Association of Directors of Psychiatric Residency Training Association of Directors of Medical Student Education in Psychiatry Association for Academic Psychiatry
1000 Wilson Boulevard, Suite 1825, Arlington, VA 22209-3901 * 800-368-5777 * appi at psych.org