Validity studies of the objective structured clinical examination (OSCE) in psychiatry are rare in the literature. They are rare despite the speculation by Hodges et al. that the validity of psychiatry OSCEs for evaluating clinical skills of medical students may be inferior to OSCEs in other specialties, primarily due to the limitations imposed by OSCEs regarding measurement (i.e., content-specific checklists may not be able to capture the nuances of a psychiatric interview), time (i.e., the time allowed to complete an OSCE station is often much shorter than the time required to do a psychiatric interview in the clinic), and complexity (i.e., complicated psychiatric presentations are difficult to simulate) (1). With respect to the OSCE method in general, there is the additional question of differential validity as a function of the type of measurement employed. Reznick et al. have suggested that the checklist approach, relative to a global process evaluation approach, may favor comprehensiveness over clinical judgment and restrict the domain of clinical complexity that is simulated in OSCE stations (2). Across various medical specialties, including psychiatry, the data have generally indicated that global ratings of clinical process (e.g., rapport, interpersonal sensitivity, respect, listening skills) are more reliable, generalizable, and valid (construct and concurrent validity) indicators of student performance than content-specific binary checklists (1, 3—5). However, Reznick et al., did not find sufficient evidence in an OSCE study of medical licensure to recommend the process approach over the checklist approach, with respect to concurrent validity (2). The work of Hodges et al. has also questioned the validity of OSCE checklists (relative to global process evaluations) regarding the ability to discriminate between more advanced clinicians (e.g., residents) and novice clinicians (e.g., medical students) (1, 6, 7).
As the literature demonstrates, questions regarding the validity of the OSCE and its methods of measurement remain relevant. With respect to construct validity, no study has evaluated checklist scores or global OSCE scores in psychiatry using an independent evaluation of clinical skills (although Regehr et al. did compare performance across OSCEs in clinical medicine, psychiatry, and community/family medicine ). Construct validity has been evaluated by correlating checklist ratings or process ratings with a summary score (i.e., for the clerkship as a whole) or an overall rating of performance made at the time of the OSCE (2, 5, 8). The recent development of the clinical skills examination (CSE) offers a unique opportunity to contribute to the literature on psychiatry OSCE validity using an independently developed and administered measure of general clinical skills. The CSE, developed by the National Board of Medical Examiners (NBME) and the Educational Commission for Foreign Medical Graduates (ECFMG) and administered through the United States Medical Licensing Examination (USMLE) program, is a clinical skills examination that is a required component of the USMLE Step 2 examination for all medical students. It has been pilot tested extensively in medical schools throughout the United States and is designed to evaluate whether medical students can gather information from patients, perform a physical examination, and communicate their findings to patients and colleagues while acting in an interpersonally skillful manner. As a participant in the pilot program, Saint Louis University, the site of the present research, had access to CSE data for one class of medical students.
Consistent with Anastasi's concepts regarding construct validity (9), we examined convergent and discriminant validity for a nine-station psychiatry OSCE in two successive classes of third-year medical students. The psychiatry OSCE contained both content-specific checklist scores (i.e., interview mechanics, which was primarily history taking, differential diagnosis, and observation skills) and global process ratings (i.e., clinical interpersonal skills) for evaluating performance. Using multiple regression analysis, we predicted the psychiatry OSCE scores from other indicators of clinical skills and psychiatry knowledge. Clinical skills examination scores were available in one class of students and included history taking, physical examination, interpersonal skills, physical examination method, and communication. Additionally, building on the cross-domain work of Regehr et al. (4), an OB/GYN OSCE score was available for both classes. To evaluate the correlation between clinical skills and general psychiatry knowledge, we also included NBME psychiatry subject examination scores for both classes. We expected the pattern of prediction to support the construct validity of the psychiatry OSCE for checklist scores as well as global process ratings.
The Saint Louis University Institutional Review Board provided expedited approval of this archival study.
Performance data from clinical rotations were aggregated for all third-year students in 1999—2000 (N=142) and 2000—2001 (N=144). The 1999—2000 class was 45.8% (N=65) women and 54.2% (N=77) men. The 2000—2001 class was 46.5% (N=67) women and 53.5% (N=67) men.
Psychiatry Objective Structured Clinical Examination
The psychiatry OSCE comprised nine stations. Five of the stations involved a 15-minute interview with a standardized patient. An external observer (i.e., a standardized patient other than the standardized patient being interviewed) scored interview mechanics, using binary checklists ranging from 9 to 15 items (scored 0/1). The mechanics checklists were content-oriented and included items related primarily to history taking (e.g., substance abuse, depression, medication usage and side effects, medical history, family history, social history, abuse history, treatment history) and mental status examination skills, but also included items for clinical communication (i.e., data supplied by the student to the patient and student responses to patient questions about medication compliance/side effects, treatment course/options, patient financial resources) and prescription writing. Following the interview, the standardized patient completed a six-item patient perception questionnaire (10). The patient perception questionnaire is process-oriented and measures subjective standardized patient perception of the student's clinical interpersonal skills with regard to greeting, listening, interest, respect, language use, and allowance for questions. Each component was evaluated on a global 5-point Likert-type rating scale (1=poor; 2=fair; 3=good; 4=very good; 5=excellent). Two of the stations were writing stations where students were required to report clinically relevant observations about a videotaped patient (eight and 17 content-oriented checklist items, respectively, scored 0/1) and make a differential diagnosis (12 and 14 content-oriented checklist items, respectively, scored 0/1). For one station, students viewed a videotaped patient and then presented a summary of the patient's medical history, psychiatric history, social history, and mental status to an attending physician who completed checklists for presentation content and technique—the latter reflecting organization of information, specificity of information, etc.—(22 items and five items, respectively, were included in the mechanics score). The final station was a differential diagnosis writing station (13 checklist items). All checklist data and patient perception questionnaire data were aggregated and scaled to T scores (mean=50, SD=10). Students received separate scores for mechanics, patient perception questionnaire, differential diagnosis, and observation, in addition to a total OSCE score. Internal consistency reliability (Cronbach's coefficient alpha) was 0.71 for mechanics, 0.85 for patient perception questionnaire, 0.73 for differential diagnosis, 0.67 for observation, and 0.88 for the total OSCE. No changes were made to the psychiatry OSCE between 1999—2000 and 2000—2001.
Other Performance Measures
The National Board of Medical Examiners psychiatry subject examination (NBME PSE) was administered at the end of each rotation for 1999—2000 and 2000—2001. The student's national percentile rank was used as the measure of performance, and the reliability and validity of the NBME subject tests have been established (11).
The OB/GYN OSCE comprised five stations, four of which used standardized patients. Two of the standardized patient stations involved taking a history and performing a focused physical examination. The three remaining stations were history-only stations. The final station required students to derive differential diagnoses and an initial diagnostic plan for one of the history-only standardized patients. A communication challenge was incorporated into three of the four standardized patient stations. Students were expected to recognize the standardized patient's concern and give an adequate response. Interview mechanics was scored using checklists (range of eight to 14 items). The patient perception questionnaire and response to the communication challenge (two to four checklist items) were used to evaluate clinical process skills. The differential diagnosis and diagnostic plan station required four acceptable diagnoses and four initial diagnostic tests. Scores from the five individual stations were averaged to arrive at a total OSCE score. Internal consistency reliability of the total OB/GYN OSCE was 0.54. No changes were made to the OB/GYN OSCE between 1999—2000 and 2000—2001.
The CSE was administered to all students in the 1999—2000 rotations. In that same year, Saint Louis University participated in a pilot phase of the CSE development program. Five cases examined history taking and communication skills, five cases examined history taking and physical examination skills, and all cases used the patient perception questionnaire to evaluate clinical interpersonal skills. The USMLE reports internal consistency reliabilities for the CSE ranging from 0.77 to 0.82 (12). Equated, standardized scores were generated for history taking (i.e., questions asked by the student about the chief complaint, history of the present illness, past medical history, family history, review of systems); physical examination (i.e., examination of the patient that includes inspection, auscultation, percussion, palpation, and other specific maneuvers); communication (i.e., information relayed to the patient, spontaneously or in response to patient questions); clinical interpersonal skills, as measured by the patient perception questionnaire; and physical examination method (i.e., hand washing, draping, explaining the physical exam, considerations of patient comfort, performing the exam on the patient's skin).
In both 1999 and 2000, one-half of the students completed the clinical rotation in psychiatry immediately prior to the OB/GYN rotation, while the other half completed the OB/GYN rotation immediately prior to the psychiatry rotation. Each rotation was scheduled for 6 weeks. In psychiatry, performance on the OSCE accounted for 30% of the student's final grade, while in OB/GYN it accounted for 20% of the final grade. The CSE was completed at the end of the academic year when all students had completed both rotations and both OSCEs.
Separate multiple regression analyses were conducted to predict the psychiatry OSCE scores (total, mechanics, patient perception questionnaire, differential diagnosis, observation) from the other performance measures in both the 1999—2000 and 2000—2001 classes. For the 1999—2000 class, predictors included the OB/GYN OSCE, NBME PSE, and five CSE scores. For the 2000—2001 class, the predictors were the OB/GYN OSCE and the NBME PSE. A standard regression approach was used so that all predictors were entered into the equation simultaneously and then evaluated for significance. This approach requires individual predictors to explain variance in the criterion (any of the five psychiatry OSCE scores) that is not explained by all of the other variables in the model. In other words, the significance of a predictor is evaluated relative to its ability to explain a significant proportion of unique variance in the criterion. A t statistic is used to determine whether the standardized regression coefficient (beta weight—the raw regression coefficient [B] expressed in z-score form—symbolized as β) is significantly different from zero. A significant beta weight indicates the amount of change in the criterion (expressed in standard deviation z-score units) that is associated with a one-unit change in the predictor (also expressed in z-score units). The significance of the regression equation as a whole was evaluated by the total amount of variance that the significant predictors explain in the criterion, expressed as R2. Additionally, the normality, homoscedasticity (equal variance), and linearity of the residuals for each regression equation were evaluated.
As shown in t1, the OB/GYN OSCE, NBME PSE, CSE history taking, and CSE physical examination scores all predicted unique variance in the total psychiatry OSCE for the 1999—2000 class. For example, a one-standard deviation change in OB/GYN OSCE performance was associated with a 0.36 (or about one-third) standard deviation change in psychiatry OSCE performance, controlling for all other variables in the model. Together, the predictors explained 35% of the variance in the total psychiatry OSCE score. The mechanics score was predicted by CSE history taking, the OB/GYN OSCE, and the NBME PSE. The psychiatry patient perception questionnaire score was predicted by the OB/GYN OSCE and the CSE patient perception questionnaire. Differential diagnosis was predicted by the OB/GYN OSCE and the NBME PSE. Observation was predicted by the CSE physical examination score. There was no evidence of non-normality, heteroscedasticity (unequal variance), or nonlinearity in the residuals for these equations.
As shown in t2, for the 2000—2001 class, both the OB/GYN OSCE and NBME PSE scores predicted unique variance in the total psychiatry OSCE, mechanics, and differential diagnosis scores. The psychiatry patient perception questionnaire and observation scores were predicted by the OB/GYN OSCE. Again, there was no evidence of non-normality, heteroscedasticity, or nonlinearity in the residuals for the regression equations.
The psychiatry OSCE examined in this study is a combination of binary checklists for evaluating clinical skills and numeric rating scales for assessing standardized patient perception of interpersonal skills. The pattern of relationships with three other indicators of performance showed evidence of adequate construct validity, similar to that found recently for a checklist-scored OSCE in surgery (13).
To demonstrate construct validity, a given measure must be shown to correlate with other variables with which it should theoretically correlate while at the same time shown not to correlate with variables from which it should theoretically differ (9). If at all possible, these relationships should be demonstrated across different methods of assessing the construct in question. In addition, the magnitude of the positive relationships should be moderate (not too high, which would indicate redundancy, and not too low, which would indicate lack of validity and presence of error variance) (9).
In support of the construct validity of the psychiatry OSCE evaluated here, the mechanics score (i.e., history taking and mental status examination skills) was predicted by the CSE history taking score, the OB/GYN OSCE score—which primarily represented history taking skills, interpersonal skills, and differential diagnosis skills—and, to a lesser extent, general psychiatry knowledge. The correlation with the OB/GYN OSCE was obtained despite the relatively low reliability of the OB/GYN OSCE and the fact that the content domain was entirely different between the psychiatry and OB/GYN OSCEs. The latter finding indicates that, to some degree, the psychiatry mechanics score assessed general history taking skills and not content-specific knowledge. Equally important, the mechanics score was not predicted by CSE scores that measure other domains such as physical examination skills, interpersonal skills, or communication.
With respect to interpersonal skills, the psychiatry OSCE patient perception questionnaire score was significantly predicted by the OB/GYN score—which included an interpersonal skills component—and the CSE interpersonal skills score, but not by psychiatry knowledge or other specific CSE indicators. The lack of correlation of the CSE communication score with the psychiatry patient perception questionnaire score requires explanation. The CSE communication score is concerned with the communication of relevant clinical information to the patient, including thorough responses to patients' questions concerning matters such as side effects and treatment course. Although the psychiatry OSCE contained a subset of items related to this type of communication, they were subsumed in the mechanics score, which was primarily a measure of history taking and mental status evaluation. As a result, there was no psychiatry OSCE score that paralleled the CSE communication score.
Differential diagnosis in psychiatry was predicted by the OB/GYN OSCE and general psychiatry knowledge. The OB/GYN OSCE was comprised of four stations, one of which was devoted to differential diagnosis (i.e., about one-quarter of the OB/GYN content was related to this skill). The correlation between the psychiatry OSCE differential diagnosis score and the total score for the OB/GYN OSCE is therefore indicative of construct validity. Since differential diagnosis on the psychiatry OSCE was confined to writing stations that required knowledge of specific diagnostic criteria, the relationship with the NBME PSE was expected. The fact that the differential diagnosis score was not predicted by CSE indicators of history taking, interpersonal skills, communication, and physical examination is further evidence that this score measured a unique construct within the psychiatry OSCE.
The observation score of the psychiatry OSCE was predicted primarily by the CSE physical examination score. The interpretation of this relationship is not straightforward. We speculate, however, that both scores may reflect an underlying construct such as "clinical judgment/discernment skills." The psychiatry OSCE observation score reflected the student's ability to identify critical, clinically relevant information from a nonspecific, multifaceted patient presentation. In other words, it reflected astute clinical judgment regarding what was important for interpreting the patient's presentation. The physical examination score of the CSE reflected a student's ability to discern critical information from the patient's physical presentation of nonspecific data, including appearance, heart sounds, lung sounds, and muscle rigidity. In light of these similarities, the correlation between the two scores becomes more interpretable.
The current study supports the construct validity of both content-specific checklists and global process ratings. Consistent with much previous research, the reliability of the global process score (patient perception questionnaire) for the psychiatry OSCE was higher than that of checklists. In turn, the strongest evidence for construct validity was found for the process ratings (the amount of variance explained by the predictors was highest for the psychiatry patient perception questionnaire, at 28%). Nevertheless, the reliability of checklist scores was consistent with other studies (1—5, 8) and probably adequate for the purposes of clerkship evaluation (3). More relevant to this study, the pattern and magnitude of the relationships between the other indicators of performance and the psychiatry OSCE checklist scores demonstrated an acceptable level of construct validity that was comparable to that found for the global process rating.
In summary, the current study supports the construct validity of a psychiatry OSCE that incorporates both checklist scores and global ratings in the evaluation of psychiatry clerks. Its primary strength lies in its use of multiple measures, including an independently derived and administered test of clinical skills that is a requirement for all medical students. Future studies that examine other aspects of psychiatry OSCE validity, including criterion-related predictive validity studies and research to improve the ability of OSCEs to discriminate levels of clinical competence, would be useful in the continued evaluation of this mode of assessment.