All psychiatry residencies are required to provide training in psychodynamic psychotherapy (1). A relatively consistent standard approach to this task has evolved over years of educational experience. This approach usually consists of didactic seminars or courses, clinical encounters with actual patients, and supervision (with and without direct observation of the patient—trainee interaction). Supervisors are rarely specifically trained for this crucial task (2), and their reports may be based on unsystematic, uncontrolled observations of trainees. Nevertheless, this kind of supervision of psychotherapy is an integral part of the education of mental health professionals (3,4) and is the primary component and most frequently used method for teaching therapy procedures and processes (5—9). The judgments of these supervisors are used by training programs to monitor the development of clinical skills; evaluate the professional competence of the therapists in training (10—13); and certify that the trainees, at the end of their training programs, are competent enough to apply for licensure and/or begin independent practice.
In spite of the importance and ubiquity of this pedagogical system, there has been little empirical research to support it. It has been simply assumed that 1) residency training programs in psychiatry are successful in making trainees more skillful in psychodynamic psychotherapy; 2) a monotonic relationship exists between the amount of training and level of skill achieved (i.e., the more training, the more skill); and 3) supervisors function both as purveyors of psychotherapy skill and as assessors of change of level of skill.
Although the relationship between higher levels of skill of psychotherapists and enhanced patient outcomes has yet to be definitively proven, there is reasonable evidence that supports this assertion (14—16). For example, Stein and Lambert (17) report a modest relationship between training and clinical improvement. They found that a variety of outcome sources are associated with modest effect sizes favoring more trained therapists. This conclusion is echoed in Beutler and Kendall's (18) article on training, in which they state that professional training enhances clinical efficacy if the type of training, the clinical setting, and the types of patient problems are considered.
Instruments to measure psychotherapeutic skills are necessary to verify whether skill has been achieved. Investigators from training programs in psychology and psychiatry have reported on instruments designed to measure the acquisition of therapist skill from a number of different perspectives. Many of them are reviewed by Alberts and Edelstein (7) and Fuqua et al. (19). A few studies specific to residents in psychiatry have also been reported. Fix and Haffke (14) studied the responses of 17 psychiatry residents to the Carkhuff communication scales (written responses to 16 taped patient expressions of feeling), but did not find superior skills in advanced residents, compared with novices. The researchers concluded that only a longitudinal study could determine whether communicative levels change during a residency program.
Liston et al. (8) developed the Psychotherapy Competence Assessment Schedule (PCAS), an 85-item criterion-referenced instrument, for the evaluation of the performance of individual psychodynamically oriented psychotherapy using videotape recordings of residents conducting therapy. By using the PCAS, experienced supervisors compared the performance of three first-year and three second-year residents in the conduct of individual psychodynamically oriented psychotherapy and found that the performance of the first-year residents was more uniform than that of the second-year residents. The interrater agreement, however, was found to be uniformly low, and only one point in time was measured.
Buckley et al. (9) constructed the 29-item Supervisor Evaluation Scale (SES) and used it along with the Psychotherapy Self-Evaluation Scale (PSES) to investigate the extent to which learning and experience affected the acquisition of psychotherapeutic skills from both the supervisors' and supervisees' points of view. Third-year psychiatry residents were rated by their supervisors using SES 8 months following an initial assessment. The residents concurrently rated themselves on the same 29 skills by using the PSES. Based on these two points of time (pretest-posttest), the authors found that there was a significant positive change over time in their pilot study. The interrater reliability was good and implied a high degree of agreement among supervisors who rated the same residents.
Moline and Winer (20) designed an instrument to assess residents' grasp of elements basic to the conduct of psychoanalytically oriented psychotherapy and later (21) introduced a rating scale that was designed for use by supervisors of residents doing psychotherapy; however, no empirical studies were reported using this instrument. Koenigsberg et al. (22) created an instrument that achieved high levels of interrater reliability, but it only relied on the frequencies with which techniques were used rather than on the acquisition of skill.
While there have been several attempts to measure the acquisition of dynamically oriented psychotherapeutic skill, there have been few attempts to do so longitudinally in actual clinical settings using naturalistic supervisory procedures. The "naturalistic" approach (used in the present study) emphasizes the primacy of data and seeks to provide explanations for observed patterns of phenomena and is different from approaches that emphasize the refutation of theoretical hypotheses using experimental methodology (23).
Given the problems inherent in the assessment of competence in a naturalistic framework, the Supervisor Report (SR) (24) was designed to provide a structured, reliable, and valid means of systematically measuring supervisors' assessments of trainees' dynamic psychotherapy skill. It was used in a naturalistic study of the acquisition of psychotherapeutic skill in psychiatry residents and psychology graduate students and interns and demonstrated both construct and face validity (24). In that study, an experienced group of supervisors was asked to make judgments about how well they thought their supervisees were conducting psychotherapy. The SR was designed to gather this information, and it included the delineation of a number of behaviors and attitudes thought to comprise therapist skill. Factor analysis indicated that these behaviors and attitudes were divided into two major groupings: 1) those elements that were thought to comprise specific "Psychotherapeutic Techniques;" and 2) those elements that were thought to reflect the "Educational Alliance" between supervisor and supervisee. The instrument proved to be valid and internally consistent, both in terms of the identification of what was done well and what was done poorly by the therapist-trainees, and in terms of the correlations with global skill. In that sample, there was a much higher correlation between skill and the "Psychotherapeutic Techniques" score than between skill and the "Educational Alliance" score. A secondary analysis of a group of supervisors who endorsed a self-psychology orientation revealed a striking change in this trend. In this latter group, the correlation between skill and the "Psychotherapeutic Techniques" score was lower, and the correlation between skill and the "Educational Alliance" score was much higher. On the other hand, the quality of the Educational Alliance played no role in the ratings of trainee skill for supervisors who did not endorse a self-psychology orientation. The expected "relationship" emphasis of the self-psychology supervisors contrasted with the "cognitive" emphasis of the non—self-psychology supervisors demonstrates the external validity of the SR. In another study using the SR, Krasner et al.(25) used a mixed cross-sectional and longitudinal design to study skill acquisition in psychiatry residents, psychology interns, and psychology graduate students and identified a weak trend in the direction of greater global skill for more advanced trainees. For the purpose of that study, the original SR was revised to obtain greater sensitivity in the assessment of psychotherapy skill.
Overall, the SR has some advantages over previously developed instruments. First, it is a comprehensive "noninvasive" measure that can be easily used in clinical settings. Second, it can be used extensively, over time, in a variety of training programs, with minimal administrative effort because it does not require specialized instruction for either supervisors or trainee/therapists. Finally, it identifies the educational alliance (the supervisor—supervisee relationship) as an important factor in the assessment of psychotherapeutic skill. The aim of the present study was to compare advanced psychiatry residents with beginning psychiatry residents in terms of their psychotherapeutic skill.
The setting for this study was a hospital outpatient treatment center, which is part of an academic medical center and the Department of Psychiatry and Behavioral Sciences of Northwestern University Medical School. The patients consisted of mildly to moderately disturbed individuals who were working or going to school and were not better served by the hospital's other programs for the more severely mentally ill.
Trainees were psychiatric residents completing a 2.5-year, half-time rotation at the aforementioned outpatient treatment center. The majority of these residents were Caucasian females. All of the psychiatric residents in the clinic agreed to participate in the study and signed the appropriate informed consent.
Supervisors were the faculty of the medical school's department of psychiatry and behavioral sciences. The majority of the supervisors were Caucasian male psychiatrists with a mean of 15 years' experience in the practice of psychodynamic psychotherapy. Most of the supervisors in the clinic agreed to participate in the study and signed the appropriate informed consent.
Residents carried about 6—10 psychotherapy cases per week for the length of the outpatient rotation. As cases terminated, new cases were assigned, keeping the trainees' caseload relatively constant. Each resident had three supervisors and met with each supervisor for 1 hour per week. Some effort was made by the training director to match residents with supervisors who would complement or add to the residents' existing psychotherapy style. The choice of supervisor for each case was left to the residents' discretion. Case discussions were based on process notes and audiotapes. Each resident was assigned new supervisors once a year.
The SR was revised from its original form (14,27) to expand the number of possible ratings above satisfactory, creating a five-point Likert scale ranging from "poorly" to "outstanding." The 22-item questionnaire was subjected to factor analysis with Varimax rotation, resulting in three orthogonal factors. These factors—"Psychotherapy Skill", "Treatment Alliance," and "Educational Alliance"—each had relatively high reliability coefficients (alphas=0.96, 0.80, and 0.93, respectively). A Global Skillfulness score was obtained by averaging the responses to two questions: "Given this therapist's present level of training, how skillfully do you think that he or she handled the case?" and "Compared to an expert therapist, how skillfully do you think that this therapist handled the case?" Starting at the sixth therapy session and then every tenth session subsequently, supervisors were mailed the SR to rate the resident's psychotherapy skill with each individual patient. The front page identified the resident, supervisor, and patient by name, but this page was discarded prior to data entry, and then the resident, supervisor, and patient were only identified by code number. Mailings continued until the resident and patient terminated or the resident completed the rotation. These ratings were used for research purposes only, and at no time did the SR rating influence the training program's formal assessments of the residents. The overall estimated response rate for the study was about 33%.
Two consecutive ratings from the same supervisor, rating the same resident, treating the same patient, comprised the data for assessing the test—retest reliability. These consecutive ratings occurred within a 10-week timeframe at any point during the course of therapy. Correlations were computed (n=35) for the three factors and Global Skillfulness separately. The test—retest correlations were as follows: "Psychotherapy Skill" 0.65, "Treatment Alliance" 0.42, "Educational Alliance" 0.62, and Global Skillfulness 0.72. Thus, supervisor ratings showed moderate stability over time.
Two different supervisors' ratings of the same resident based on treatment with different patients within a 2-month interval comprised the data for assessing interrater reliability. An analysis of variance (ANOVA) was performed on each of the three factors and Global Skillfulness separately. This test resulted in the following estimates of interrater consistency (alpha) of a single supervisor report (n=51): "Psychotherapy Skill" 0.21, "Treatment Alliance" 0.39, "Educational Alliance" 0.20, and Global Skillfulness 0.31. These results indicated that a single SR was not sufficient to yield an accurate assessment of skill. The Spearman-Brown correction indicated that the average of two SRs would result in the following estimates of internal consistency: "Psychotherapy Skill" 0.34, "Treatment Alliance" 0.56, "Educational Alliance" 0.33, and Global Skillfulness 0.47.
To evaluate longitudinal trends, residents who had SRs 12 months apart were identified. Two SRs rated within 2 months of each other were averaged to create the observation at Time 1 and a separate observation, using the same criteria, at Time 2. SRs from different supervisors were chosen whenever possible. Paired t-tests (n=15) were performed to investigate changes in skill from Time 1 to Time 2 (T1). The scores for all three factors and for Global Skillfulness improved from Time 1 to Time 2; however, there was not a significant difference between the scores at the two times. More than half of the 15 residents had improved scores after 12 months of training, but the magnitude of this improvement was not large enough to yield a statistically significant difference.
In another evaluation of changes in skill, residents in the first half of training were compared with residents in the second half of training. Some residents were included in both groups, whereas others were only included in one group making this a mixed cross-sectional and longitudinal design. Each observation, in either the first or second half of training, consisted of two SRs. This analysis represents the difference between less and more advanced residents. The independent t-tests resulted in significant differences for "Psychotherapeutic Skill," "Treatment Alliance," and Global Skillfulness (T2). Differences in "Educational Alliance" were not significant, but the score changes were in the desired direction.
In our sample, some residents showed marked improvement in their psychotherapy skill and others clearly did not. T3 shows the magnitude of the change after 12 months of training for two residents, Resident No. 1 who improved greatly and Resident No. 2 who worsened in skill. The following descriptions of each resident suggest a relationship between the magnitude and direction of improvement in psychotherapy skill based on SR scores and a number of other educational measures, interests, and aptitudes.
As can be seen in T3, Resident No. 1 showed marked improvement in skill over the course of his training. Resident No. 1 was younger than most of the other residents. At the beginning of his training, he was considered by his supervisors to be naïve and not self-aware, though he demonstrated eagerness to learn and was open to new ideas. In the second year of the residency, he requested a referral for treatment from the training director and subsequently began psychotherapy with an experienced training analyst. As his training proceeded, he began to develop an interest in psychotherapy along with a variety of other areas in psychiatry. He was appointed chief resident in his 4th year of training. On the Psychiatry Resident In-Training Examination (PRITE), a multiple-choice, standardized, knowledge-base test given to every U.S. resident in psychiatry, he scored in the 89th percentile in his postgraduate year (PGY)-2, in the 82nd percentile in his PGY-3, and in the 96th percentile in his PGY-4. His senior paper focused on a psychoanalytic approach to severe character pathology. After graduation, Resident No. 1 entered private practice.
As can be seen in T3, Resident No. 2 declined in skill over the course of her training. Resident No. 2 was older than the majority of the other residents. She had been a practicing optometrist for 8 years prior to entering medical school. From the beginning of residency training, she demonstrated interests in neuropsychiatry and psychopharmacology. In the last years of the residency, she became known for her diligence and comprehensive, medically oriented approach to patient care, but she was observed by her supervisors to be relatively disinterested in psychodynamic psychotherapy. On the PRITE, she scored in the 99th percentile in her PGY-2, in the 93rd percentile in her PGY-3, and in the 86th percentile in her PGY-4. Her senior paper focused on the neuropsychiatric aspects of a neurological disorder. After graduation, Resident No. 2 took a position in an academic department of psychiatry.
These case examples illustrate the utility of the SR in detecting real-world changes in psychotherapy skill. Resident No. 1 was observed to increase his interest in and focus on psychotherapy, and this was reflected in his improved SR scores. Resident No. 2 showed a disinterest in psychotherapy and increased involvement in neuropsychiatry, and this, too, was reflected in her scores.
Skill, or the consistent ability of an individual to perform a standardized set of procedures to accomplish a specific task, is clearly important to measure. Because psychotherapy skill correlates positively and significantly with patient outcome (4,16,17,26,27), the assessment of skill has particular relevance for application in a wide variety of settings (e.g., completion of training, certification, and practice). The judgment of skill in the conduct of psychodynamic psychotherapy, like the majority of performance measurements in applied settings, relies on subjective assessments. As Linacre et al. (28) point out, "One of the major problems in assessment and evaluation is that different people rate the same performance with varying degrees of severity." These subjective judgments tend to introduce distortion into the measurement process that can be addressed by means of rating scale development and/or rater training (29). In our naturalistic sample in which rater training was highly impractical, the SR was developed to minimize these problems with assessment and quantify the psychiatric residents' changes in psychotherapy skill.
Our study showed that comparisons between SR scores in early and later periods of residency training revealed consistent, but not statistically significant, improvements in the SR's three factors and in Global Skillfulness. Further, when comparing a group of residents in the first half of psychotherapy training with a group of residents in the second half of training, statistical significance in the direction of increased skill was found for "Psychotherapeutic Skill," "Treatment Alliance," and Global Skillfulness. The third factor, "Educational Alliance," also showed improvement, but not to a great degree. Finally, in comparing two individual residents in terms of competence and interests in allied educational areas, corroboration between observational data and score changes in the SR were identified.
Several important problems must be kept in mind in considering the results of this study. These problems were poor interrater reliability, moderate test—retest reliability, and the uncontrolled factors influencing the supervisor's ratings. The minimal instructions and lack of training for the supervisors may have created differences in the way the SR was completed. Different patients were rated and the extent or kind of information presented on each patient varied, increasing the difficulty of obtaining convergent ratings. The study design limited test—retest data to 10-week intervals, an interval large enough that real changes in skill might have occurred. The changes in ratings over this interval may be accurate assessments of real changes rather than poor agreement over time. Ratings also were influenced by uncontrolled factors. The orientation of the supervisors varied, which may have affected how certain items were rated or interpreted. Variability among patients, including the type of problem and difficulty of the case, also may have impacted the ratings, so that for the advanced residents who treated more difficult cases, improvements in SR ratings may have been limited.
While these problems were noted, there remain several important questions that must be considered. Were these changes robust enough to indicate increases in psychotherapy skill? And was the SR sufficiently sensitive to detect these changes accurately? Since residents come into the program already selected for basic empathic and listening skills, perhaps great leaps in skill are improbable. Although the absolute magnitude of change was small, it does compare favorably with the changes documented in sequential PRITE administrations, a widely accepted measure of psychiatric knowledge base. Therefore, in spite of interrater and test—retest reliabilities that are far from perfect, the SR's detection of significant results cannot be dismissed and suggests its utility in assessing psychotherapy skill in psychiatry residents.
Research in assessment of psychotherapy skill acquisition must continue. We would like to determine whether these findings are replicable in different residency programs and under more controlled conditions (e.g., direct and/or videotaped observations of residents conducting psychotherapy). We also believe that future research should attempt to establish the relationship between the acquisition of psychotherapy skill and increases in treatment outcomes to provide a solid real-world connection to our attempts to measure skill.