Xu G, Veloski JJ, Hojat M: Board certification: associations with physicians' demographics and performances during medical school and residency. Academic Medicine 1998; 73:1283—1289
Xu and his colleagues investigated the relationship between several variables and the attainment of board certification for a cohort of graduates from Jefferson Medical College. The sample consisted of 1,186 physicians who graduated between 1976 and 1985 and entered the specialties of family practice, internal medicine, or surgery.
Demographic variables included gender, age at graduation (26 or younger and over age 26), and race—ethnicity. Academic performance during medical school included grade point averages in basic science courses, clerkship examination scores, and National Board of Medical Examiners (NBME) Part I and Part II scores. Three measures of postgraduate performance were derived from ratings usually provided by the residency training director. These were data-gathering and processing skills, interpersonal and attitudinal skills, and socioeconomic aspects of patient care.
Practice specialty was ascertained from the American Medical Association's Physician Masterfile, which contains self-report data. Board-certification status was obtained from the American Board of Medical Specialties and had to be in the area of practice. Ninety percent of the sample was board-certified—82% of the surgeons, 92% of the family practitioners, and 94% of the internists. The certification rate was essentially equal for men (90%) and women (89%).
The younger graduates were more likely to be certified (92%) than were the older graduates (84%). Underrepresented minorities were much less likely to be certified (57%) than white or Asian physicians (92%).
In general, those who were board-certified had better academic performance in medical school and residency than those who were not. Regression analyses were done for each of the three specialty groups separated by gender, age, and race—ethnicity, and the R2 values were all relatively small (<0.10), especially for older and minority groups. The one notable predictor of board certification was NBME Part II score for internists and surgeons. For family practitioners, the most significant predictor was the rating of the residents' interpersonal and attitudinal skills.
The authors conclude that "our data add support to previous evidence of the validity of board certification as a measure of clinical competence."
The researchers suggest that the lower rates of certification for older graduates and underrepresented minorities deserve further attention by the specialty boards and policymakers.
Norcini J, Grosso L: The generalizability of ratings of item relevance. Applied Measurement in Education 1998; 11:304—309
In this study, internists were asked to rate the relevance of multiple-choice items from a recertification examination to the practice of their specialty. Five sets of single best-answer items of 25—28 items each and 3 groups of multiple true—false items consisting of 38—40 sets each were distributed to 8 rater groups ranging in size from 6 to 9.
The relevance ratings for the 132 single best-answer items ranged from 2.67 to 5.00, with a mean of 4.21. The average correlation between relevance rating and item difficulty across the five groups of raters was 0.31 (P<0.001), and with item discrimination it was −0.04. The relevance ratings for the 117 sets of multiple true—false items ranged from 3.00 to 5.00, with a mean of 4.26. The correlation between relevance rating and set difficulty was 0.34 (P<0.001), and with set discrimination it was 0.31 (P<0.001).
Generalizability analyses indicated that ten raters were needed to obtain fairly stable estimates of item, stem, and total test relevance for both item formats. Norcini and Grosso note that the modest correlations between item statistics and relevance ratings suggest that while the ratings to some extent reflect item quality, they also provide information that is not captured by difficulty and discrimination indices. The researchers argue that "the use of relevance ratings can enhance the validity of inferences based on test scores and provide an external source of data that can be used to defend credentialling decisions."
Cizek GJ, Robinson KL, O'Day DM: Nonfunctioning options: a closer look. Educational and Psychological Measurement 1998; 58:605—611
Multiple-choice item writers are usually instructed to generate five options—the correct answer and four distractors—to minimize the chance of examinees getting items right by guessing. However, it is often difficult to create choices that are attractive to test takers, and in this article Cizek and his colleagues discuss the effect of deleting nonfunctioning distractors on item performance.
Thirty-two items from a medical specialty examination were administered in a 5-option format to 719 examinees and in a 4-option format to 726 examinees. Options were eliminated based on review by content experts using item difficulty and discrimination indices. The average difficulty of the items in the 5-option format was 0.62, and the average discrimination was 0.38. In the 4-option format, the means were 0.71 and 0.43, respectively.
In the 4-option format, 9 of the 32 items were significantly easier and 2 were more difficult. With regard to discrimination, nine were more discriminating and three were less discriminating than in the 5-option format. Factor analyses demonstrated that in both formats, the items loaded on a single primary factor.
The authors conclude that there may be advantages to deleting nonfunctioning options from multiple-choice items, including increased discrimination, increased reliability, and decreased test-taking time, because of fewer options to read. This approach would allow for the administration of more items in the same time frame, which could improve test validity.
Jansen JJM, Grol RPTM, Crebolder HFJM, et al: Failure of feedback to enhance self-assessment skills of general practitioners. Teaching and Learning in Medicine 1998; 10:145—151
Jansen and colleagues designed a 3-hour continuing medical education course to increase knowledge of and proficiency in performing four technical clinical skills: injection technique of the shoulder, ophthalmoscopic control in diabetes, Pap smear, and laboratory examination of fluor vaginalis. Sixty subjects were randomly assigned to two groups. Group A took the course 3 months after enrollment, and Group B took it 6 months after enrollment.
At enrollment, both groups completed a 60-item multiple-choice test to measure knowledge and a 22-item self-assessment questionnaire. These instruments were also completed at 3 and at 6 months after enrollment.
Group A also took a skills examination after the course and at 6 months. Group B only took it after their course (at 6 months). All participants received personal feedback on their test scores and detailed written information about each procedure. At enrollment, there were no significant differences between the knowledge scores of the two groups and no differences on their self-assessments. At 3 months, Group A (which had had the course) scored significantly better on the multiple-choice test, and their self-assessment ratings were also significantly higher. At 6 months (both groups had completed the course), there were no significant differences on any of the three measures.
At enrollment, the correlation between knowledge scores and self-assessment scores was 0.19, and it was 0.46 (P<0.001) at 3 months and 0.21 at 6 months. The correlation between technical skills and self-assessment for Group A was 0.06 at 3 months and 0.46 (P<0.001) at 6 months for both groups.
The authors conclude that self-assessment was a poor indicator of competence and that receiving extensive individualized feedback did not improve the situation. The researchers argue that self-assessment is more closely related to generalized self-attribution than to external feedback and suggest that "self-assessment scores on their own are an invalid source of information concerning competence of practicing physicians."