Within an ongoing drug surveillance project (AMUP) in psychiatric hospitals, a comparative study was carried out to evaluate two methods commonly used in the field of adverse drug reaction assessment. Two raters, who have cooperated with the project since its inception, evaluated 80 randomly selected ADRs twice; first, by an empirical (implicit) approach, and second, 4 weeks later, by using an algorithm as proposed by Kramer et al. 1979. Agreement on medication and related probability ratings was obtained in 81% of all 80 cases for the empirical method (weighted Kappa = 0.41), and in 69% for the algorithmic method (weighted Kappa = 0.62), indicating that agreement exceeded chance for both methods. By comparison with assessments made in previous case conferences of the project, empirical ratings were found to be reliable over time due to homogeneous use of criteria by project raters. In contrast to the reports on the subject, agreement between raters appeared to be superior in the empirical method as compared to the algorithmic assessment. Analysis of disagreements suggested that probability ratings based on the empirical method were nonspecific, due to conventional criteria applied in the project. Inter-rater agreement was reduced by polypharmacy, especially in the case of algorithmic assessments. The consistency of assessment was also lowered by the fact that the 2 methods assigned different weights to particular assessment criteria.