The reliability of the evaluative judgments by psychiatric faculty of physician-trainee interviewing skills was studied. Three methods were included: global rating scales, data checklists, and a time-allotment form. Data were obtained during a training workshop for psychiatric instructors in the U.S. The authors found low interrater reliability with all three methods. The study findings were replicated at a second workshop with Canadian faculty. The authors outline some recommended modifications of observational systems that may help improve both the accuracy and reliability of ratings of trainee interviewing skills. The use of more accurate quantitative techniques is briefly reviewed.