A practical application of analysing weighted kappa for panels of experts and EQA schemes in pathology
MetadataShow full item record
Background Kappa statistics are frequently used to analyse observer agreement for panels of experts and External Quality Assurance (EQA) schemes and generally treat all disagreements as total disagreement. However, the differences between ordered categories may not be of equal importance (eg, the difference between grades 1 vs 2 compared with 1 vs 3). Weighted kappa can be used to adjust for this when comparing a small number of readers, but this has not as yet been applied to the large number of readers typical of a national EQA scheme. Aim To develop and validate a method for applying weighted kappa to a large number of readers within the context of a real dataset: the UK National Urological Pathology EQA Scheme for prostatic biopsies. Methods Data on Gleason grade recorded by 19 expert readers were extracted from the fixed text responses of 20 cancer cases from four circulations of the EQA scheme. Composite kappa, currently used to compute an unweighted kappa for large numbers of readers, was compared with the mean kappa for all pairwise combinations of readers. Weighted kappa generalised for multiple readers was compared with the newly developed ‘pairwise-weighted’ kappa. Results For unweighted analyses, the median increase from composite to pairwise kappa was 0.006 (range -0.005 to +0.052). The difference between the pairwise-weighted kappa and generalised weighted kappa for multiple readers never exceeded +/-0.01. Conclusion Pairwise-weighted kappa is a suitable and highly accurate approximation to weighted kappa for multiple readers.
Version of record
Cancer Screening Evaluation Unit (DoH)
License start date
JOURNAL OF CLINICAL PATHOLOGY, 2011, 64 pp. 257 - 260
B M J PUBLISHING GROUP