Assessment of clinical skill is a sophisticated and controversial field. The Objective Structured Clinical Examinations (OSCEs) have been developed to elaborate on the objectivity and reliability of clinical assessment. The OSCE in medical education was introduced first by Harden and Gleeson (1).
The OSCE’s widespread adoption may be due to skepticism regarding traditional oral examinations, which test an examinee with only one patient and only one rater (2). In a study in which 10,000 medical students took the United States Medical Licensing Examination over a 3-year period, interrater agreement for each student was less than 0.25 (3).
Leichner et al. (4) proved that the type of patient and the rater were influential in examination results and particularly important for students on the edge of failing. They also showed that because it is difficult to increase the reliability of the oral examination by the mere improvement of the rater’s diligence, assessing different skills (not only one skill) and using several raters can increase the validity and reliability of the oral examination (5). Leichner et al. (4) also suggested that OSCE-type examinations, rather than single-examiner oral examinations, should be developed in psychiatry.
The OSCEs consist of multiple stations testing a spectrum of clinical and practical skills. At each station, an examiner who is not directly involved or face-to-face with the examinee rates his or her clinical skills using preplanned checklists (1), which are straightforward and clearly laid out to assess objective and practical goals. The OSCEs usually use standardized or simulated patients who perform a scenario uniformly for all examinees (6, 7).
The OSCEs have been used in psychiatry for many years, and various authors have reported on acceptable reliability and validity (2, 8–13).
In Iran, OSCEs are held for several subspecialties, such as community medicine, pediatrics, and radiology, but not psychiatry. The Ministry of Health, which authorizes specialty examinations in Iran, mandated that all medical specialties take an oral exit examination such as an OSCE. Every year, after the national board examination in psychiatry, many complaints were made regarding the unfairness of the exam, and using OSCEs might eliminate doubts about the chance factor and unfairness of the exam (14). Therefore, the current study was performed to determine the reliability and validity of the psychiatry OSCE in Iran.
Designing the stations, preparing checklists, and other executive functions entailed 3 months of study, 9 months of executive work, and seven workshops for psychiatry lecturers from medical universities all over Iran.
First, objectives were defined over three sessions by 19 members of the Board for Examination on Psychiatry, who had a mean of 13 years of experience in education. Participants were from the medical universities of Tehran, Shahid Beheshti, Mazandaran, and Isfahan. From the general guidelines of psychiatry education, as devised by the Secretariat of the Council for Medical Specialty Programs of the Ministry of Health and Medical Education, objectives that seemed practically feasible for assessment in the clinical setting were selected. The list of objectives was then reviewed by more lecturers, and eventually 15 were agreed upon. In an 8-hour workshop, expert professors of psychiatry from all over Iran incorporated the 15 objectives into eight stations. In another workshop, the lecturers devised the final format of the questions and the content of the checklist, which were then reviewed and revised by each involved university.
The location for the OSCE was prepared, and the examiners matched the implemented potentials of the stations with the set objectives. The scripted scenarios from each group of examiners were then reviewed and commented on by the other groups.
Because this was the first experience with a psychiatry OSCE in Iran, no pretrained simulated patients were available. Eight simulated patients—four staff of Tehran Psychiatry Institute and four psychology students—were trained in four sessions (about 8 hours). They studied their roles and scripts and then performed in the presence of a psychiatrist familiar with the project.
One day prior to the OSCE, to check the quality of the physical space of the stations, the examiners inspected the site and in some instances changed the position of objects in the rooms. Then a mock examination without examinees was done by other examiners who had not directly participated in making the stations, and the arrangement of the objects in the rooms and the performances of the simulated patients were evaluated.
All third-year psychiatric residents who had passed the graduation (preboard) examination were invited to participate voluntarily in the OSCE on August 13, 2004, at the Center for Clinical Skills at Firoozgar Hospital in Tehran.
Each of the nine stations took 12 minutes and tested a particular skill (Table 1). There was a single ring one minute prior to the end of time at each station to remind the resident that the time was ending soon. Subsequently, upon hearing a continuous ring, the resident would move to the next station. On the door of the station room and on the seat for the examinee was an instruction sheet of the expected tasks, which consisted of three parts: the goal of the station (e.g., history taking), the intended task, and the examiners’ way of assessment (e.g., checklists). After going through four stations, the examinees took a short rest.
Each station included a simulated patient and two independent examiners who rated the examinees independently according to checklists. There was no communication between examiners and examinees in the stations. The raters were selected from the lecturers who were not involved in the design and implementation of the station. However, because of limitations in the number of lecturers available, in some stations the rater was the designer of the checklist or scenario of the same station.
Several days before the exam, the examiners were acquainted with the OSCE components (e.g., instructions, blueprints, and the way of scoring). One day before the examination, the methods of rating were discussed and consensus was reached in a session with all of the examiners.
Each question on the checklists was scored from 0 (poor) to 3 (excellent) according to the performance of the examinee. The checklists included the intended skills of each station, for example, “Was the patient’s family history obtained?” “Was the suicide ability of the patient considered?” At the end of each checklist (except for Station 9) were two questions, one about the examiners’ general impression of the examinee and one pertaining to the simulated patient’s impression of the examinee. The examinee’s overall score was then calculated, excluding the score from the last question. The maximum score was 20 for all the stations except Station 1, which had a maximum score of 30; Station 4, which had a maximum score of 10; and Stations 8 and 9 together had a score of 20. One week after the OSCE, each examinee received a result card that gave the overall score, the best score, and the averages and maximum scores of each station.
To assess the reliability and credibility of the OSCE, statistical analyses of Cronbach alpha, kappa, and Pearson’s correlation coefficient were used.
From seven medical universities of Iran, 22 residents—15 men (68.2%) and seven women (31.8%) with a mean age of 33.3 years old (SD=1.32)—participated in the OSCE. Table 1 shows the mean and standard deviation scores from the stations.
To assess construct validity, academic psychiatrists at the examinees’ universities were asked to categorize the residents by their performance ability. Examinees’ rankings at each station were compared with the traditional clinical skill assessments made by individual departments and the OSCE scores. There were no differences between them.
Correlation coefficients between the psychiatry OSCE scores and the scores of written (multiple-choice questions) and oral board examinations (individual patient assessments) on psychiatry that were held 3 weeks later were 0.36 (p>0.05) and 0.63 (p<0.05), respectively.
By the median value, the scores were divided into higher and lower scores. Kappa concordance coefficient and the correlation between the scores of examinees were computed, from 0.27 for Station 7 to 0.81 for Station 6; all were significant except for Stations 7 and 8 (Table 1).
The exam showed a high internal consistency, and Cronbach alpha coefficient for all of the OSCE exams was 82. Table 1 shows the internal consistency for each station.
This was the first psychiatry OSCE in Iran, and it yielded remarkable reliability and validity. The interrater reliability for each station was significant, despite the fact that some of the kappas were moderate. Internal consistency of all the stations was acceptable. The lowest belonged to Station 4 (antisocial personality imitating psychotic symptoms)—there was a high degree of controversy regarding the design of the scenario and checklist of this station, which seems to explain the low internal consistency with lack of consensus among the designers. Low alpha in this station reflected a diversity of objectives for the case. After the exam, a decision was made to assign a smaller weight for this station (half of the scores in other stations).
Similar to the study by Walters et al. (12), our study had a good face and content validity. All along the procurement phase for the OSCE and the related workshops, as well as during the course of the OSCE, we aimed to obtain a reasonably high content validity.
The plausibility of the stations and simulated patients resulted from considering the opinions of expert psychiatry examiners, team work, long hours of discussion, and incorporating feasible objectives of psychiatry training. Our study reproduced Park et al.’s study (11), which showed evidences of construct validity of the psychiatry OSCE. Another study done in Iran (15) showed that simulated patients can convincingly portray psychiatric disorders and act according to the requested complex scenarios in an OSCE.
In other worldwide centers, the usual time for traditional individual patient assessment examinations is 1 hour, while in Iran this examination is 45 minutes (16, 17). Because the usual time of OSCE stations should be similar to an interview’s real time, it seems that 15 minutes for each station would be appropriate for Iranian psychiatric residents.
For concurrent reliability between the OSCE and other examinations, Pearson’s test was used. As was expected, the OSCE score was not associated with the written board examination results but had a significant correlation with the oral board examination results, which measure clinical skills. Promotion examination scores and the written board examination scores were correlated—rendered by the content of these examinations that assess residents’ knowledge.
Departments’ internal scores only correlated with the oral board examination scores. Internal scores reflected the global skills of the residents as well as their knowledge and examiners’ intuitive impression of the residents. Internal scores were obtained from observations of direct attendance, after spending nearly 3 months on a ward, whereas promotion examinations are exerted at the end of every year by a multiple-choice questionnaire.
This examination was designed as a conclusive assessment for board certification, and the current study is a suitable stepping stone for future studies. Using academic psychiatrists as raters had a considerable role in covering the shortcomings of OSCEs—a strategy which can be utilized until suitable alternatives are developed. In Hodges’ study (10), eight residents did not regard OSCE as an ultimate method of assessment. However, it is safer not to rely on their residents’ opinion, since they were passive observers. Moreover, Hodges’ study was designed for students, not residents. Meanwhile, an OSCE is now included in part 1 of the clinical examination given by the Royal College in the United Kingdom, replacing the individual patient assessment (13).
The population sample in this study was fairly small, not allowing for any generalization of our results. Only 22 of 75 graduated residents and seven of 13 Iranian medical schools participated in the OSCE. The fact that this was the first time an OSCE was conducted and that participation was voluntary may partially explain the low rate of participation.
In spite of trying to separate script writers and examiners, we were unsuccessful in some stations.
The authors would like to thank Dr. Brian Hodges for his comments and feedback regarding the analysis and writing of the manuscript.
At the time of submission, the authors reported no competing interests.