Bayesian disclosure risk assessment: predicting small frequencies in contingency tables
We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set.
Version of record
categorical data; identification; model uncertainty; prediction GRAPHICAL MODELS; MICRODATA
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2007, 56 (5), pp. 551 - 570