Each year thousands of U.S. third-year medical students enter and complete a psychiatry clerkship. Students are currently assessed through a combination of various methods (e.g., a final written examination, supervised interviews, oral examinations, case presentations and written clinical assessments). Accurate assessments are of critical importance in validating that medical students are sufficiently trained and in identifying marginal students and arranging their remediation. Medical student evaluations also affect graduation and impact residency program admissions. Strong evaluation tools allow students to receive meaningful feedback on their development as clinicians and guide clerkship directors in maintaining an effective core psychiatry training experience. Nationwide, clinical evaluations of third-year psychiatry clerks are the most frequently used and most heavily weighted component in the overall assessment process of students (1). The results of the clinical evaluation are summarized in the clinical grade form. The design of this vital document varies considerably among schools. Despite clerkship directors’ decades of experience with the clinical assessment tool used in psychiatry rotations, there are little available data in the medical education literature regarding this topic. Relevant contributions to the literature are presented below.
The purpose of this survey was to acquire a snapshot of the clinical assessment tool as it is used today by psychiatry clerkship directors. Consequently, we identify formats, virtues, deficiencies, utilization and areas needing development in currently used tools. Such a review allows individual programs to further optimize their clinical evaluation tool and the larger assessment process to which it contributes.
We constructed a 26-item questionnaire de novo for psychiatry clerkship directors. Questions were designed to address issues suspected to be most relevant to clinical grading in psychiatry clerkships based on the authors’ collective experience. The Institutional Review Board (IRB) determined that this study met all criteria for an educational instructional exemption from full IRB review. The survey was deployed at the Association of Directors of Medical Student Education in Psychiatry (ADMSEP) national conference in the Summer of 2003. From this first dissemination we received 25 completed surveys. In order to avoid redundant responses from any single clerkship, respondents at the conference were identified by their institution. The 104 U.S. medical schools lacking a completed survey (from the conference) were then identified, and the psychiatry clerkship director at each respective school was mailed a survey. From the mailings, we received an additional 60 completed surveys. The overall return rate was 66% (85 responders out of 129 U.S. and U.S. territory medical schools). Data from the survey distributions were pooled and analyzed, and there was no offered incentive for completion of the survey.
In this study, the term “clinical grade” is a subcomponent grade and is not to be confused with the term “overall grade” for the rotation, which includes several grade subcomponents. Our study addresses the clinical evaluation only and does not consider other clerkship grade subcomponents such as final examinations, oral examinations or other types of assessment.
Select survey question results are displayed in the tables below. All results are expressed as percentages of the sample.
The survey respondents were largely clerkship directors (87.2%). The remainder (12.8%) were ADMSEP members familiar with the psychiatry clerkship grading practice in their institution (only one response completed per institution). The respondents were a diverse mixture of new and more experienced clerkship directors, with just over one-half (51.1%) serving in their position for 5 years or less and 48.8% occupying their positions for 6 years or more. Our data are less specific but parallel the study findings of Sierles and Magrane, wherein the average clerkship director had been in his position for 5.9 years (2).
Most respondents (89.3%) had a combination of “Preset criteria/Checkbox” elements and “narrative” elements. A purely narrative assessment form was not found in the entire respondent pool, perhaps because this type of form provides results that vary in focus, are less objective, require more time to complete, and are not easily assimilated into final grade calculations. Only 10.7% of respondents had a purely “Preset criteria/Checkbox” based form. Thus, it seems clear that most clerkship directors value narrative comments enough to include them on the grade form. This issue is explored in more depth later.
Input in a student’s clinical grade was provided by the attending 100% of the time, a reassuring finding since it is hard to envision a sound clinical student assessment that did not include attending input. It was surprising and encouraging to note that the clerkship director had adequate clinical experience with students and the necessary administrative time to provide input into the clinical assessment in 72.1% of the institutions surveyed. Residents in psychiatry were involved in this grade 81% of the time. Since residents spend an extraordinary amount of “face time” with the third-year medical students, we see this as a very positive finding. Resident input enriches the feedback that students receive. It was rare (2.3%) that students rated each other on the clinical assessment form.
Regarding the types of grading systems used on the clinical evaluation, nearly half (47.0%) stated they had an honors/pass/fail system. It would be interesting to know how this finding compares to other clerkship disciplines. The other clinical grading systems used were as follows: 18% said they had a numeric scale, and the remaining 32% of the sample had, in descending order: percentage score (12.5%), other (8.4%), letter grade (8.4%), pass/fail (4.8%), and comments only (1.2%). It would be informative to know what the other category actually describes. That is, are new and/or innovative grading systems being tested, and do they provide additional value over the more commonly used existing systems?
Most clinical evaluation forms had been recently created or revised, with 57.8% in use only 1–4 years. This suggests an ongoing process of refining the clerkship clinical assessment. It appears that recent local work has resulted in the creation of new or modified assessment forms in many clerkships. Hopefully this represents positive evolution of assessment tools in pursuit of greater accuracy, consistency and validity. In some cases new forms may represent responses to sweeping institutional mandates. In this context, the resulting generic clerkship grading forms may be ill suited for the needs of a psychiatry rotation. Satisfaction levels of faculty who use school mandated grading forms remains unknown.
More than two-thirds (65.2%) of respondents assign a weight of 50%–70% of the overall grade to their clinical assessment. This figure is entirely consistent with the findings of Carlson et al. (1) wherein the clinical evaluations of third-year psychiatry clerks were the most frequently used and heavily weighted component in overall student assessment. It stands to reason that the clinical evaluation is so heavily weighted because it involves direct observation of student skills. The validity and importance of direct observation in medical student training are well documented (3–6). Interestingly, in a previous Association of American Medical Colleges review, clerkships across all disciplines were found to have an identical weighting (50%–70%) of the clinical grade (7).
A majority of respondents (88.1% [total among those who agree and strongly agree]) indicated that grade inflation was a problem in the clinical assessment. The largest group of respondents (59.5%) stated that 20%–30% of students acquired the highest possible clinical grade in their clerkships. Many respondents appear to feel that 20%–30% of students receiving clinical honors represent grade inflation. Yet, more than one-third (33.7%) indicated that 40% or more of their students receive clinical honors. It seems likely that the latter subset of respondents would be more likely to indicate that grade inflation was a significant problem with their clinical assessment.
When narrative responses were included on the form, most respondents (64%) felt they were moderately or highly useful for student feedback. There was significant divergence among clerkship directors as to whether narrative comments were useful in determining clinical grades, with 50% stating it was minimally useful (or not at all) and 50% stating these comments were moderately (or highly) useful; 36.5% of respondents felt that narrative comments were “highly useful” for the Dean’s letter, an important document which tends to allocate “more space for clinical evaluations than basic science evaluations” (8) and is heavily utilized by residency training directors and others.
The data from the clinical evaluation were felt to be “very useful” by respondents in measuring the following abstract constructs, in descending order: “attitude” (34.6%), “professional behavior” (32.9%), “interpersonal skills” (32.5%), “communication skills” (26%), “clinical skills” (25.6%), “clinical knowledge” (24.7%). However, the peak in all listed parameters was in the “moderately useful” range (48.1%–60.3%). The question arises as to why the peak was not in the “very useful” range. Why did respondents not judge comments concerning these indispensable attributes as “very useful” more often? Perhaps part of the reason is that the survey did not clearly define these terms. This may have created some inconsistency in the respondents’ understanding of the questions. In any event, clerkship directors must avoid this type of error by explicitly defining these abstract constructs to students and faculty. Each such construct should be defined by constituent competencies and examples of specific desirable and undesirable behaviors. For example, communication skills would include many constituent skills such as listening well, presenting clinical data in an organized manner, and emphasizing critical points. An example of desirable behaviors would be checking to make sure a patient understands communicated instructions. An undesirable behavior would be maintaining minimal eye contact while presenting information. Giving specific meaning to these constructs would improve both teaching and learning, as both parties would have a sharper focus upon important goals of the rotation.
Regarding the question “Do most clinical faculty members discriminate appropriately regarding the students they evaluate?” one-half of the respondents (50%) felt that they “frequently” did. However, a surprising number felt that they only did so “occasionally” (41.7%). This low confidence level in discrimination suggests that a superb student might be overlooked, an adequate student might receive honors or its equivalent (grade inflation), and worst, a marginally competent student might receive a satisfactory grade and miss remediation. Other consequences of poor performance measurement include inaccurate student feedback, lowered student morale due to inconsistencies in assessment, transcript grades that lack validity, and compromise of the process of resident selection.
The problem of poor discrimination is likely multifactorial in its causation. For example, the clinical grading forms may be poorly designed or supervising attendings may be less than fully candid when completing the grading form. Faculty may have difficulty giving candid, constructive feedback or may wish to avoid any confrontations with students disappointed with their grades. These factors are suggested by the previously mentioned grade inflation, found to be a problem by 88.1% of survey respondents (51.2% agree, 36.9% strongly agree). Exploring all of the reasons for grade inflation exceeds the scope of this article. However, this problem is not unique to psychiatry clerkships as grade inflation has also been found problematic in other clerkship specialties (9, 10).
Another identifiable source for poor discrimination strength is using a single generic institution-designed form mandated for use in all the third-year medical school clerkships (found in 53.0% of respondents). If the grade form does not measure what psychiatry supervisors think should be measured, then obvious misapplication of the tool may occur. It is a potentially challenging but important task to change the school-mandated form in cases where the form does not suitably “fit” the field of psychiatry.
If a respondent answered question 12 (regarding faculty discriminating appropriately between students) as either “not at all” or “occasionally,” they were asked to specify “What could change for the better?” Respondents noted that “faculty training” in use of the clinical assessment tool would be the most helpful (44.7%) in correcting this shortfall. Perhaps not surprisingly, only 7.9% believed that “a new form” would solve the problem. Only 13.1% believed a “different scale” would help, and the more pessimistic respondents believed “nothing will help” (18.4%). Faculty evaluating students clearly need further education by local clerkship directors and national medical education leaders on the importance and skills of giving accurate feedback and interpreting evaluation tools. Consistency, clarity, and specificity in performance descriptors and constructs would facilitate training in the use of evaluation tools and would be a worthy undertaking for national clerkship directors’ meetings.
Clerkship directors and faculty (by proxy, i.e., clerkship administrators’ supposition of the faculty’s impressions) were asked what they like most about their grading form. Admittedly asking the clerkship director to answer a question on the behalf of his attendings (without directly asking the attendings) introduces potential for error. Nonetheless, both sets of responses included as the favorite feature of a grade form the descriptor “easy to understand” (34.0% and 41.5%, respectively). Clerkship directors may wish to keep this in mind when revising their grade forms. The response “provides useful data” was the second most frequent response (19.9%) for clerkship directors. Clerkship directors assumed that faculty prefer short evaluation forms (20.8%, the second most frequently picked response for this set).
When asked for a vision of a perfect clinical assessment tool (i.e., “What three things would you most want to evaluate with a clinical evaluation form?” [regardless of how well these are accomplished with your present form]), the respondents placed “clinical skills” (29.7%) at the top of the list, which was followed closely by “professional behavior” (24.5%) and “clinical knowledge” (17.3%). Less frequently mentioned items included “interpersonal skills” (10.8%), “communication skills” (9.2%), “attitude” (8.0%), and “other” (0.4%). Yet, the actual in-use forms were felt to be “very useful” in measuring “interpersonal skills” (32.5%) and “communication skills” (26.0%), both more highly rated than “clinical knowledge” (24.7%) or “clinical skills” (25.6%). It seems more highly valued performance parameters are measured less frequently or less effectively with current tools.
Our survey attempts to portray a cross-sectional view of the contemporary psychiatry clinical assessment tool. Our work supports earlier findings that the clinical assessment is the most heavily weighted component in the overall assessment of students, as is the case in nonpsychiatric third year clerkships. We find that most psychiatry clerkships have a combination of relatively recently developed clinical grade forms with preset criteria and narrative based components. No responding clerkship has a purely narrative type form. Narrative comments are seen most frequently as moderately useful, and many find them highly useful in measuring abstract constructs such as attitude, professional behavior and interpersonal skills. They are most frequently seen as “highly adequate” when used as an inclusion in the letter from the medical student dean. We find that attending input was included in all clerkship assessments and resident input was quite high. We note that the majority of clerkships have an honors/pass/fail type of grading structure. The respondents indicate that grade inflation is considered a significant problem with the clinical assessment, and this has previously been reported in nonpsychiatric clerkships. Similarly, attendings are seen as suboptimal in discriminating between students’ performances. Reasons for this are manifold, but the most commonly cited solution is faculty education regarding the clerkship director’s expectations. We note that the favorite feature of grade forms is that they are “easy to understand.” The most desirable measurement capabilities of the clinical assessment are clinical skills, professional behavior and clinical knowledge. Given the importance of the clinical assessment in psychiatry and the low confidence in discriminating aspects of performance we found, it is imperative that faculty are better educated regarding the value and technique of providing accurate feedback to trainees. Consensus regarding important clinical performance constructs and descriptors would also be helpful. National leadership on these issues is appropriate given the salience of clinical evaluations in the training of competent and effective physicians.