Pharmacotherapy is a central, if not primary, competency for the psychiatrist. National trends suggest that office-based psychiatrists are providing less psychotherapy and more medication management (1). Suboptimal performance in pharmacotherapy can result in decreased clinical response and even substantial harm to patients (2). To our knowledge, no tool that assesses performance in psychopharmacotherapy has been formally developed and validated.
Direct observation has the advantage of assessing performance in vivo (i.e., what trainees do in their professional practice as opposed to controlled-representations of professional practice). As such, it has emerged as a primary method for assessing clinical competency. Direct observation tools possess the common properties of structured observation, specific and timely feedback based on observation, and documentation of the learners' performance immediately after observation (3). Many incorporate checklists while others employ global rating scales. Checklists offer better reliability than traditional, summative global assessment tools and can assess a broad range of clinical competencies (4—7). Similarly, assessment based on direct observation produces a more valid measure of clinic competence than assessment based solely on the case presentation (8).
We developed a novel direct observation instrument to assess trainee performance of a medication management session and to enhance feedback. Goals for the instrument included evidence of content validity, feasibility to implement and sustain in a training clinic, acceptability to users, and utility to trainees and educators (9). In this article, we describe the development and initial implementation of the tool, named the Psychopharmacotherapy-Structured Clinical Observation (P-SCO) tool, and report data on its feasibility and utility.
+
Development of the Pharmacotherapy-Structured Clinical Observation Instrument
Checklists are useful to assess any competency that can be broken down into specific behaviors or actions; they can also improve the detection of errors (10, 11). We developed a checklist for pharmacotherapy by defining the essential tasks of the typical medication management appointment. We developed an initial list of tasks by consulting various expert sources, including two standard psychopharmacology textbooks (12, 13), pharmacotherapy-related core competencies identified by the American Board of Psychiatry and Neurology (14) and the ACGME's Psychiatry Residency Review Committee (15), and medication management protocols from two major research studies (16—18). Finally, we identified and solicited the opinions of local experts in pharmacotherapy. Feedback was incorporated, and some items were modified, added, or deleted.
Several design features were added to enhance the formative feedback generated by the P-SCO and to capture aspects of clinical care not covered by the checklist. We initially adopted a done/not done checklist scale, but the scale was expanded to a 5-option scale after feedback during the faculty training event indicated the need to assess the quality of what was done and to differentiate between "not done" (when it should have been) and "not applicable" (when it was not necessary). In addition, a section was designated to document reinforcing and corrective feedback. The implementation required that faculty provide the trainee with feedback both in writing (the completed P-SCO itself) and verbally in person immediately following the observation.
We designed the instrument to be used for either formative or summative feedback (i.e., to promote trainee skill acquisition via effective feedback and/or to determine and document competency in pharmacotherapy). In addition, the instrument was created for either "brief" observation, during which only a portion of the patient encounter is observed, or "long" observation, in which the entire patient encounter is observed. See Figure 1 for the P-SCO.
+
Implementation of the P-SCO
The pilot testing of the P-SCO occurred in four outpatient medication management clinics of a university-based hospital with trainees who were third-year residents. Trainees each spend 12 months in one of the clinics. Each clinic lasts 3.5 hours and includes a 30-minute case conference at the beginning and end, surrounding five 30-minute appointment slots. The four clinics serve a total of 500 patients with a broad range of diagnoses and source of payment. Prior to this intervention, resident performance was assessed at the mid-point and end-of-rotation by conventional, online global assessments. Similar to the P-SCO, these assessments have a section for reinforcing and corrective feedback.
Faculty development consisted of a 30-minute session during which faculty used the P-SCO to assess a video-recorded resident-patient encounter followed by a discussion facilitated by the first author. All seven attendings in the four medication management clinics agreed to participate. Five faculty members received the full training while two received an abbreviated version that did not include simulation with the video-recorded encounter.
+
Educational Intervention
During the weekly preclinic case conference, faculty decided which trainees would be observed. After the observation, faculty were instructed to complete the P-SCO form, recording at least one specific reinforcing and one specific corrective comment; provide the feedback verbally to the resident as soon after the visit as possible; and return the completed P-SCO to the clinic director who made a copy for administration purposes and then gave the original form to the trainee.
Because clinical performance is specific to the clinical content of the case, trainee performance with a particular patient is not highly correlated with another and, therefore, inadequate sampling compromises accuracy (19, 20). Thus, the reliability and validity of direct observation assessments improves as the number of observations increase, with most studies suggesting that four to seven observations are necessary to obtain sufficient reliability for formative purposes and eight to 14 for high-stakes determinations (21—23). We set a target for each resident to be observed eight times over the course of the academic year. This required each resident to receive 0.67 P-SCO observations per month and each faculty to perform 1.3 observations per month. This was thought to be frequent enough to provide sufficient reliability, potent enough to affect learning, and doable. These rates became the benchmark for the feasibility of the intervention.
The study had three principal outcomes: the development of the P-SCO instrument itself, its feasibility, and its utility. We defined feasibility as the capacity to implement the P-SCO as intended. In particular, we measured the extent to which the completion rate of the P-SCOs met or exceeded the target of 1.3 per faculty per month and 0.67 per resident per month. We also measured the degree of completion of each form. We defined utility as providing useful feedback on pharmacotherapy competence to the resident. In particular, we compared the qualitative comments produced by the P-SCO and the conventional global assessment, including the number per completed form per resident, the specificity of the comments, and the proportion of comments that were reinforcing or corrective. We also compared both instruments in terms of the spread of ratings generated by each instrument.
A log recorded the date and resident-faculty dyad for each P-SCO performed. A copy of the written feedback given to the resident was retained. Copies of the conventional midpoint and end of rotation global assessments completed by the same faculty on the same trainees were obtained. Each completed form was de-identified. At least two of the authors (of JQY, SL, LT) independently analyzed each completed P-SCO and global assessment by employing a qualitative theme analysis method. The unit of analysis for coding was a discrete comment. We classified each comment as reinforcing, corrective, or unknown/other (24). In addition, we coded the content of each comment. Comments were designated specific or nonspecific (general) depending on whether they were linked to a specific behavior or attribute of the trainee. For comments deemed specific, the items of the P-SCO were used as the initial coding scheme to further characterize the content. When none of the P-SCO items described the content, the raters created a new category. Comments deemed specific were further coded as to whether or not the focus was related to patient care competencies. Results were then compared and differences resolved through discussion. The dataset was reanalyzed using the final coding scheme. The data from the coded assessments were transferred to a spreadsheet. In addition, the numerical grades from each P-SCO and global assessment were entered into a spreadsheet. Institutional review board approval was obtained.
For each faculty and resident, we divided the total number of P-SCOs completed by the number of months participating in the intervention in order to calculate the observation frequency. Data for this calculation came from the observation log. A t test for single samples was performed using the target as the hypothesized mean. In addition, the percentage of faculty and residents meeting or exceeding the target was calculated. Finally, the number of P-SCOs without written comments and the number of checklist items not marked were tabulated as was the frequency that options other than "meet expectations" were chosen.
Data from the qualitative thematic analysis were extracted in order to compare quantity, specificity, and content of the comments generated by the P-SCO and the conventional global assessment, respectively. In addition, the spread of the final ratings were compared. Both scales were anchored around "meets expectations" with one option for exceptional performance—"done extraordinarily well" on the P-SCO and "always exceeds expectations" on the global assessment. The P-SCO had two options for performance that was below "meets expectations"—"not done" and "improvement suggested"—while the global assessment had only one option entitled "below expectations." For purposes of this analysis, the mean number of "done extraordinarily well" versus "exceeds expectations" were compared as was the mean number "not done" or "improvement suggested" versus "below expectations." Because we are most interested in the impact that the P-SCO has on the educational experience of the resident, the unit of analysis was the individual resident. Two-tailed t tests for dependent samples were performed using SPSS (Version 16.0, 2008).
The iterative methods described above produced the P-SCO. See Figure 1 for the 27-item checklist, the rating scale, and the prompt for both reinforcing and corrective written comments.
Faculty completed 91 observations with the P-SCO. Completion rates for faculty and residents were 2.6/month and 1.1/month, respectively. All faculty and residents met or exceeded the targets established for feasibility, and the difference between the actual and target rates were statistically significant (Table 1). These rates translate into 13 completed P-SCOs per resident per academic year. Only two of the completed forms (2.2%) had no written feedback. Of the 27 items on the checklist, an average of 1.9 items (7.0%) per P-SCO were not rated. This includes five forms completed in the context of observing only a part of an interview, resulting in a large number of items not being marked. Finally, raters did use all rating categories on the P-SCO. Each completed P-SCO averaged 1.1 (SD=1.5) "done with suggestions for improvement," 0.8 (SD=1.0) "not done," and 3.1 (SD=1.5) "NA."
Faculty performed 91 observations with the P-SCO and completed 60 global assessments, resulting in a total of 513 comments (5.3 per resident per assessment) and 247 comments (4.0 per resident per assessment), respectively. The ratio of reinforcing to corrective comments was approximately 3:2 for the P-SCO and 4:1 for the global assessments. Three percent of the P-SCO comments were general compared to 43% for the global assessments. Typical general comments were "great job," "good performance," or "truly exceptional." Ninety-five percent of the comments (5.0 per resident per assessment) were specific using the P-SCO compared with 55% (2.2 per resident per assessment) uaing the global assessment. About 30% of the specific comments on the global assessment related to nonpatient care competencies such as contributing to the learning of others in case conference or evidence-based medicine skills. Therefore, 95% of the comments (5.0 per resident per assessment) were patient care specific with the P-SCO while only 40% (1.5 per resident per assessment) were with the global assessment. For patient care-specific comments, the P-SCO delivered over 3.3 times more total, 2.6 times more reinforcing, and 5.3 times more corrective comments compared with the global assessment. The t tests comparing the performance of the two instruments in the above dimensions were all statistically significant and favored the P-SCO (Table 2).
Significant differences also existed in the final ratings that each tool generated. Each P-SCO averaged 4.2 "exceeds expectations" and 1.7 "below expectations" compared with 2.6 and 0 for each global assessment, respectively (p=0.011 and p<0.0001, respectively).
Table 3 presents several comments transcribed from completed P-SCOs which illustrate their specificity and substantiveness.
The qualitative analysis of the comments revealed a number of weaknesses in the checklist itself. Several items were too vague or not defined in behavioral terms (e.g., item 4 "maintains frame"). Some items overlapped and could be combined. For example, item 17 ("updates treatment plan…") and item 18 ("modifies treatment plan for less than expected responders") were used interchangeably. Item 19 ("plan for adherence") and item 20 ("plan for adverse effects") could be combined with item 17 as well. Other items were too restrictive, such as limiting ventilation to feelings related to the illness. Several important tasks were found to be missing, most notably engaging the patient in treatment planning, appropriately seeking consultation, and exploring patient beliefs about illness and treatment.
This study describes the development of the P-SCO, a structured clinical observation tool for pharmacotherapy in psychiatry. The content was derived from an interactive and iterative process that consulted numerous sources of expertise. The tool was designed with features supported by research on competency assessment. The initial testing of this tool met or exceeded the targets set for number of observations per month for faculty and residents alike. Very few checklist items were not used by observers and even fewer P-SCOs had no comments. The P-SCO generated a relatively high volume of specific comments. These results support the feasibility of implementing the P-SCO in a busy training clinic.
Moreover, the comparative analysis suggests that the P-SCO has utility. The completed P-SCO provided a greater number of specific corrective and reinforcing comments compared with the traditional global assessment, especially with regard to patient care competencies. The P-SCO had a higher proportion of corrective comments and dramatically fewer general comments compared with the global assessment. The comments are noteworthy for their specificity, breadth, and substance (Table 3). The global assessment did offer specific comments on competencies not involving direct patient care, such as practice-based learning and participation in case conferences.
Effective feedback is central to learning (10, 11, 25). These initial results suggest that the P-SCO facilitates the three essential aspects of effective formative feedback: direct observation by a faculty member; evaluation of the observed performance relative to a reference standard, in this case the specific essential tasks delineated in the tool; and communication of the perceived performance gaps and strengths to the learner with specificity, timeliness, and focus on modifiable behaviors (8, 25—28). Further, the tool provides written qualitative feedback which has been shown to be helpful (11, 29, 30).
It is not surprising that a structured direct observation tool resulted in more patient care-specific feedback than the midpoint and end of rotation global assessments that are based on recall of a resident's performance over a 6- and 12-month period. But these findings are significant in light of the fact that many training programs rely on the conventional global assessment as a primary method to assess patient care competencies. Direct observation is valued for its validity when it comes to competency assessment. Indeed, many of the comments recorded on the P-SCO require direct observation.
The tool itself has several limitations. While the P-SCO has items related to diagnosis and treatment, it does not probe the clinical reasoning of the trainee in sufficient depth to assess the quality of the clinical decision making. It also does not assess the medical knowledge of the trainee and other competencies such as practice-based learning. As such, the P-SCO is not sufficient to assess the overall competence of a trainee. Additional tools are necessary for these assessment tasks.
This study has several limitations. First, the raters (JQY, LT, SL) were not blind to the study hypotheses and this may have biased the coding toward a positive finding. Second, while we developed the tool through an iterative process that consulted multiple sources of expertise, the tool requires further content validation. Subsequent efforts should survey a wider body of experts and employ quantitative measures of content validity. In addition, other important psychometric properties such as interrater reliability and construct and convergent validity deserve more systematic study in multiple settings. Finally, the impact of this tool on learning and skill acquisition as well as the impact on feedback itself warrants further investigation. While observational aids do improve feedback (31), the feedback actually given by faculty is often less than ideal (32). Training faculty in giving feedback has been shown to be essential and would need to accompany implementation of this tool.
The primary contribution of this study is to describe a process by which a competency assessment tool for pharmacotherapy can be developed and tested in order to demonstrate its feasibility and utility. We plan additional, more rigorous validation of the tool and more systematic study of its impact on learning.
At the time of submission, the authors reported no competing interests.