Measuring interobserver variation in a pathology EQA scheme using weighted kappa for multiple readers
Abstract
Background A Urological Pathology External Quality Assurance (EQA) Scheme in the UK has reported observer variation in the diagnosis and grading of adenocarcinoma in prostatic biopsies using basic kappa statistics, which rate all disagreements equally. Aim The aim of this study is to use customised weighting schemes to report kappa statistics that reflect the closeness of interobserver agreement in the prostate EQA scheme. Methods A total of 83, 114 and 116 pathologists took part, respectively, in three web-based circulations and were classified as either expert or other readers. For analyses of diagnosis, there were 10, 8 and 8 cases in the three circulations, respectively. For analyses of Gleason Sum Score, only invasive cases were included, leaving 5, 5 and 6 cases, respectively. Analyses were conducted using customised weighting schemes with ‘pairwise-weighted’ kappa for multiple readers. Results Analysis of diagnosis for all circulations and all readers gave a composite kappa value of 0.86 and pairwise-weighted kappa (kappa(p-w)) value of 0.91, both regarded as ‘almost perfect’ agreement. This was due to the high proportion of responses that showed partial agreement. Analysis of Gleason Sum Score gave kappa=0.38 and kappa(p-w) = 0.58 over all circulations and all readers, indicating that discrepancies occur at the boundary between adjacent grades and may not be as clinically significant as suggested by composite kappa. Conclusion Weighted k show higher levels of agreement than previously reported as they have the advantage of applying weighting, which reflects the relative importance of different types of discordance in diagnosis or grading. Agreement on grading remained low.
Collections
Research team
Cancer Screening Evaluation Unit (DoH)
Language
eng
License start date
2011-12
Citation
JOURNAL OF CLINICAL PATHOLOGY, 2011, 64 pp. 1128 - 1131
Publisher
B M J PUBLISHING GROUP