Every residency program invests time and resources to choose the best applicants. The methods used vary and include nonstructured interviews by faculty, score-point systems, and standardized tests.
Many programs have been receiving more applications than ever before. In 1986, there were more than 113,000 applications for 18,770 residency positions (1). In 2007, the National Residency Matching Program (NRMP) reported more than 200,000 applications for 21,845 positions offered (1). Specifically, the number of international medical graduates (IMGs) participating in the NRMP increased 9%, from 13,929 in 2006 to 14,935 in 2007. However, although the number of applicants has increased in several specialties (including psychiatry), the number of positions has remained essentially unchanged.
There is no agreement about the predictive factors of resident performance. In psychiatry, personal interview traditionally has been a very important component in resident selection. In a survey among psychiatry program directors (2), psychological stability, sensitivity, and general intelligence were considered the most important attributes.
In programs with a large number of IMG applicants, the selection may be even more complicated, given the greater variability of the candidates’ characteristics. Considering that psychiatry ranked second among all medical specialties with the largest number of IMGs (31%), surpassed only by internal medicine (37%) (3), examining possible predictive factors of subsequent resident performance would assist program directors to improve the choice of IMG applicants.
We conducted a retrospective review of the application file and residency evaluations of 50 IMG residents who completed the 4-year psychiatric residency training in an urban university-affiliated program between July 1994 and June 2004. In this cohort, all IMGs began and finished their psychiatric training at this program without interruption. Residency performance evaluations for those who transferred to a different program (16.9%) were unavailable and were therefore excluded from the study. American medical graduates, representing a small minority of the initial cohort (6.2%), were excluded as well.
Our sample consisted of 50 IMG residents (32 women), with an average age of 34.5 years old (SD=5.7). Ninety percent (n=45) had previous clinical experience in the United States at the time of application. Forty-six percent (n=23) were born in Eastern European countries; 34% (n=17) were from Latin America. None identified English as their primary language. The average length of time between graduation from medical school and the beginning of residency was 9.2 years (SD=5.9).
Variables at the time of application included grades in medical school (transformed into a percentage for standardized comparison), the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 scores (2 digits), and a personal interview score, rated on a scale of 1 to 5 (5 as the best), given by the faculty. It is worth mentioning that the number of attempts for the USMLE Steps was not available for all the candidates, so this variable was not computed in the data analysis. The interview criteria had been established by consensus of the faculty members and consistently applied for the 1994–2004 recruitment period and included language proficiency, behavior and appearance, interest in psychiatry, interpersonal responsiveness to the interviewer, and self-insight. Selection of the interviewers was based on availability and features of particular interest in the candidate’s profile (e.g., inclusion of a psychoanalyst when a candidate’s CV showed a psychodynamic background). Three personal interviews were usually granted for each candidate during the application process. Interviewers scored different features of the candidate’s performance, and a final overall score was given after discussion and agreement. If there was discrepancy among the interviewers, an additional interview helped to assign a final score.
For this study, residency performance was primarily established by a program director’s ranking. A systematic review of the residents’ evaluation was not conducted, although the program director, who acted as such for all graduates in this cohort, had the option to look back at the evaluation forms to “refresh” his memory and compare each graduate with his or her classmates. In the composition of the ranking, the program director was asked essentially to consider the resident’s academic accomplishments and clinical performance, although other aspects could be included if they helped provide a more accurate description of the resident’s performance. To verify the ranking by the program director as a reliable measure, we calculated the Cohen’s kappa interrater reliability among the program director’s, chairman’s, and associate program director’s rankings. We found substantial agreement (κ=0.77, p<0.001) among them.
The program director’s ranking was grouped into binomial (high/average performance) and trinomial (high/middle/low) categories. We also examined secondary aspects of the resident’s performance including percentile scores of the Psychiatry Residency In-Training Examination (PRITE); five point-score format evaluation in seminars and psychotherapy supervision; outcome (pass/fail-conditional) in the Mock Psychiatric Board Examination in the last year of training; and outcome (pass-fail) in the American Board of Psychiatry and Neurology (ABPN) Oral Examination 2 years after graduation.
Descriptive statistics were computed for comparison of demographic characteristics using unpaired Student’s t test and chi-square. Next, we examined outcome predictors of residency performance by using univariate analysis (Student’s t test and analysis of variance). Cohen’s kappa was calculated for interrater reliability among evaluations from the program director, the chairman, and the associate program director. The alpha level was set at 0.05 (two-tailed).
The USMLE Step 1 score and interview score at the time of application were significantly associated to the program director’s ranking grouped in binomial categories (high/average performance) (Table 1). No particular criterion utilized in the interview was significantly related to the resident’s performance.
Among secondary outcomes, USMLE Steps 1 and 2 and interview significantly correlated with the PRITE (USMLE 1, r=0.37; USMLE 2, r=0.40, p<0.003) and psychotherapy session evaluations (r=0.38, p<0.003), respectively (Table 2). Gender was associated with pass rates on the ABPN examination (women=23/32 versus men=8/18, p<0.0125).
Our results confirm previous findings concerning the selection process of IMGs (4, 5), namely, scores in standardized examinations and performance in the personal interview appearing to predict future performance. Although significant, the small difference in USMLE Step 1 scores and interview performance between high and average residents makes it difficult to apply them routinely in a selection process. Hence, the main value of proposing predictors may rest in highlighting those most likely related to the forthcoming resident’s performance, somewhat limited in this study by the small sample size (e.g., USMLE Step 2 only showed a trend).
By any means, even when standardized examinations may measure the applicant’s acquisition of knowledge and test-taking skills, they do not evaluate the necessary basic clinical skills to succeed in a residency program. For instance, a hard-working, responsible, diligent, team-member candidate could perform relatively well during residency training, even when as a candidate his or her scores in the standardized examination were average. In addition, these type of data are susceptible to bias. Knowledge of the candidate’s score creates a “halo effect” on the interviewer, leading to a premature conclusion, even prior to the interview (6). Underneath the scores are hidden variables, such as curriculum differences among schools of origin, test-taking experiences (“practice effect”), emotional control, mental readiness, or “test wiseness,” that are not “accessible” to program directors.
The personal interview seems to be a valuable tool to assess the interpersonal and communication skills of candidates (6). Although no specific criteria in the interview were predictive of residents’ performance in our study, Rao (5) found that command of English was more valuable among IMGs compared with American medical graduates. Unfortunately, we lack data on TOEFL scores or ECFMG English tests, which could have provided a more sensitive measure on candidates’ English proficiency. Interestingly, the reliability of the personal interview predicting resident performance has been criticized (7) due to little evidence that asking hidden-value questions, usually recognized by the applicant, improve the selection process.
Although we found no difference in residents’ performance by previous experience in the United States, it is undeniable that familiarity with the cultural environment has a positive effect, which we crudely attempted to measure. Questions that target the level of adjustment to U.S. culture may not only uncover specific needs of future psychiatric residents but also help them to smoothly transition into a residency.
The USMLE measuring level of “technical knowledge” only moderately correlates with residents’ performance on tests such as the PRITE, even though both examinations have a similar format (time-limited; multiple-choice questions) and attempt to measure similar areas. Likewise, the personal interview and psychotherapy session evaluation may have overlapping aspects of assessment such as psychological mindedness, ability to communicate, and interpersonal skills, which may explain their moderate association.
Standardized admission tests and undergraduate grades are valid predictors of graduate students’ performance (8), probably because of similar evaluation criteria. In our study, medical school grades failed as a significant predictor of residents’ performance; the residency, with its own demands and expectations, is not a mere extension of medical school work. Moreover, foreign medical school grades are particularly difficult to assess due to differences in the nature, quality, and methodology of didactics and evaluations (5).
Given the great variability of predictors in resident performance, alternative candidate selection methods have been proposed. One example, albeit limited in its scope vis-à-vis the whole process, is the Defining Issue Test (9). Based on Kohlberg’s theory of moral development, the Defining Issue Test was designed to measure moral reasoning in conflicting values. Sheehan (9) found high correlation between moral development and clinical performance in both American medical graduates and IMGs across different residency specialties, although psychiatry was not included. While one can genuinely argue that moral perspective is influenced by cultural background, further research with tests of broader applicability and longitudinal follow-up, including psychiatric residents, is encouraged to validly include and implement the assessment of moral reasoning as part of the resident selection process as some authors (10) support.
Finding reliable predictors of clinical performance is a difficult task, especially if we consider residents a highly preselected group (due to interest or self-assessment) that is, however, diverse in other aspects. Future research to overcome the limitations in our study by including American medical graduates, larger sample sizes, refined description of the applicants’ profile, and precise measurement of residents’ performance would help identify specific aspects that mold and determine the selection of IMG candidates.
At the time of submission, the authors reported no competing interests.