In this issue, Whelan et al. (1) address the question of who (standardized patients or physicians) should grade the performance of students taking Objective Structured Clinical Examinations (OSCEs). This is an interesting issue that has educational as well as socioeconomic dimensions.
The last three decades of the 20th century were characterized by a significant shift in the way physician competence in Anglo-Saxon countries has been defined and assessed. The adoption of performance-based frameworks, such as the influential “Millers Pyramid” (2), placed more emphasis on what physicians could do rather than what they know. During the same period, the influence of medical educators with training in psychometrics led to a much greater emphasis on standardization, reliability, and validity in assessment. Together, the adoption of performance and psychometric discourses created a fertile ground for new assessment technologies such as the OSCE. No longer a novelty at the end of the first decade of the 21st century, OSCEs have been widely implemented by health professions around the world, including psychiatry (3).
Mental health professionals need no convincing that one of the core competencies tested in an OSCE, or in any performance-based examination for that matter, is communication skills. However, whether communication skills are a unified construct is less clear. Our group (4) has reported that the appropriateness of specific communication skills (e.g., open-ended versus directed questioning) varies greatly according to the clinical problem encountered. For example, the often-taught communication style that gives priority to open-ended questions and listening is appropriate to a passive and withdrawn patient but entirely inadequate for an agitated manic patient. And although the first item of most communication scales is “makes eye contact,” we know that in some cultures direct eye contact is considered intrusive and uncomfortable. Therefore, to some degree, what is “appropriate” competence in a performance-based examination is a matter of perspective. So who is best positioned to assess the adequacy of student competence in a performance-based examination? Whelan et al. (1) follow the tradition of addressing this question from a psychometric perspective of “accuracy” (5–7), that is, the rater who is best able to reliably and consistently (psychometrically) score performances is considered the most appropriate examiner.
This raises the interesting issue of what it actually means to be an examiner. At one extreme, we have interviewed individuals who argued, “There are no evaluators in the room, there are merely observers” (8). The implication is that the markers are “merely identifying behaviors that individuals perform and [that] it is the responsibility of the test administrators to compile those records into evaluations and numbers.” This idea arises from the often-made, but seldom-explicated, distinction between “assessment” and “evaluation” (9).
Assessment is a process by which information is obtained relative to some known objective or goal. Assessment of skill attainment is rather straightforward. Either the skill exists at some acceptable level or it does not. Skills are readily demonstrable.
Inherent in the idea of evaluation is “value.” When we evaluate, what we are doing is engaging in some process that is designed to provide information that will help us make a judgment about a given situation. When we evaluate, we are saying that the process will yield information regarding the worthiness, appropriateness, goodness, validity, legality, etc., of something for which a reliable measurement or assessment has been made.
From this perspective, the “veracity” of the recording of a dispassionate and neutral observer is all that matters for “reliable assessment.” This view is congruent with a positivist conception that there is a reality/truth that can be captured so long as it is not colored by “measurement error” or “rater bias.” By contrast, we hold a constructivist view that whatever “reality” manifests in examinations (competence, empathy, etc.) is unstable across time and place and is created differently in minute-to-minute human interactions, varying not only according to who the doctor and the (standardized) patient are, but also who the raters are, what the setting is, what the expectations are, and in what culture the whole exercise is located (10). Thus the person grading is not engaged in a neutral act of dispassionate observation and simple scoring; rather, he or she is undertaking a complicated, culturally bound act of interpretation.
During OSCEs in which examiners are asked to complete global ratings or overall judgments of competence, this interpretive, evaluative function is explicitly embraced. However, during OSCEs in which the examiner is constrained by a checklist, there is a greater illusion of dispassionate objectivity. Yet it has been shown to be difficult and even invalid to use binary checklists to assess complex phenomena such as empathy, rapport, and problem-solving, which are subtle and creative, involve pattern recognition, and exist on a continuum rather than in an all-or-nothing state (11). These considerations render the issue of who is the best examiner much more complex. The response is not simply “the most accurate” but, rather, “those who can best evaluate the complex phenomena one wishes to capture.”
Who is best able to assess the complex elements of competence that manifest in the doctor-patient relationship? It is first necessary to think a little about what a standardized patient (SP) is actually doing. A large part of what SPs are doing in OSCEs arises as they reflect on what they experience during role portrayals. Sometimes called the “third eye,” this involves a complex emotional and cognitive process. Standardized patients are not simply “objective observers” but, rather, are interested parties. How they are made to feel as the patients in the clinical encounters is of central importance to their experiences of the interviewers. The articulation of these feelings is one of the most valuable aspects of working with SPs in a teaching context, particularly for highly affective or psychologically complex cases.
What is the interplay of this SP-as-patient versus SP-as-examiner perspective in the context of an examination? In our research, SPs report that they engage in three levels of reflection while portraying psychiatry roles. On the most superficial level, they reflect on the content of their characters’ history. Did the students ask the right questions in order to elicit their patients’ stories? On a deeper level, they report reflecting on their patients’ comprehension about what was being communicated by the interviewers and how this affects whether they give information to the students. For example, in one instance, an SP withheld information based on the perception that even though she, as the SP, understood the question, she did not think that the patient would have picked up on it. Finally, at the deepest level, SPs take a “protective” stance to the patients they are portraying, deciding whether they will or will not reveal information to the students, depending on how they “feel” as the patients. In one instance, an SP reported that she withheld information because she did not “feel” that the student understood how difficult the requested information was for her, as the patient, to reveal. These levels of reflection are salient in understanding SPs’ perspectives on whether a student is competent, and could be expected to influence their evaluative judgments (12).
Clearly, the process of scoring performance in an OSCE is not entirely dispassionate or cognitive. In addition to the issue of perspective is the issue that emotions elicited during the encounter have an impact on the SP. Although SPs are engaged in observing the interviewer from inside the specific patient role as they interact (as described above), the clinical content recalled by the SPs is also affected by the emotional and psychological state generated by the patient roles they are playing. In psychiatry simulations, the information being portrayed is both affective and linguistic (13). In order to portray roles that are “authentic,” SPs are required to be genuine in their emotional engagement so as to evoke realistic responses in the interviewer. Emotional portrayals require them to reflect and call on personal information in such a way that both the patient and the SP are present emotionally. The ethical implications of such emotional work on the SPs themselves are becoming an important consideration, and in some roles when emotion is exquisitely felt (e.g., borderline personality disorder), one might expect that the in-the-moment patient emotional experience (e.g., perceived abandonment) could influence judgments the SPs make about the competence of the interviewer. Finally, while emotions arise from the role scripted for the case, interviewee-specific emotions also arise in the room. Any examiner who has had to sit watching an unempathic or difficult patient interview will recognize that strong emotions can also be elicited in any observer.
Finally, at a sociocultural level, there are local, regional, and national constraints on what an OSCE actually is that are relevant to the questions of who should be the raters. This is most evident if one examines carefully the roles made possible within the OSCE “cultures” that have become strikingly different from one country to another. Thus general conclusions about who should examine in OSCEs must also be examined through a cultural filter. For example, economic priorities of efficiency and cost savings mean that in the United States, standardized patients are framed as being more readily available and less expensive than physician examiners. Thus SP examiners are the foundation of the whole United States Medical Licensing Examination system (Clinical Skills Assessment). By contrast, in Canada and the United Kingdom, there has been an effort to maintain the role of the physician examiners in OSCEs. This framing rests on arguments about “professional judgment” of competence by peers, a tradition of “volunteerism” in medical education, and less priority given to psychometrics than to “authenticity” (8).
In summary, the article by Whelan et al. (1) focuses on the correlation of standardized patients’ scores with those of physicians and, in doing so, raises important questions relevant to the fairness of examinations and to the development of the competence of future physicians. As we have argued, however, it is important to look a little beyond the psychometrics measurement issues in fully exploring who and what an examiner is or should be. We believe that this latter research should be informed at least as much by the educational and sociopolitical contexts in which examiners find themselves as by the psychometric measurement characteristics of the tools with which examiners are assigned to work.