Investigation of the ability of haplotype association and logistic regression to identify associated susceptibility loci
MetadataShow full item record
While finely spaced markers are increasingly being used in case-control association studies in attempts to identify susceptibility loci, not enough is yet known as to the optimal spacing of such markers, their likely power to detect association, the relative merits of single marker versus multimarker analysis, or which methods of analysis may be optimal. Some investigations of these issues have used markers simulated under different theoretical models of population evolution. However the HapMap project and other sources provide real datasets which can be used to obtain a more realistic view of the performance of these approaches. SNPs around APOE and from two HapMap regions were used to obtain information regarding linkage disequilibrium (LD) relationships between polymorphisms, and these real patterns of LD were used to simulate datasets such as would be obtained in case-control studies were these SNPs to influence susceptibility to disease. The datasets obtained were analysed using tests for heterogeneity of estimated haplotype frequencies and using logistic regression analyses in which only main effects from each marker were considered. All markers surrounding the putative susceptibility locus were analysed, using sets of either 1, 2, 3 or 4 markers at a time. Some markers within 150 kb of the susceptibility locus were able to detect association. At distances less than 100 kb there was no correlation between the distance from the susceptibility locus and the strength of evidence for association. When the average inter-locus spacing is 25 kb many loci would not be detected, while when the spacing is as low as 2 kb one can be fairly confident that at least one marker will be in strong enough LD with the susceptibility locus to enable association to be detected, if the susceptibility locus has a strong enough effect relative to the sample size. With an inter-locus spacing of 4 kb some susceptibility loci did not have a marker locus in strong LD, potentially undermining the ability to detect association. There was little difference in the performance of haplotype-based analysis compared with logistic regression considering effects of each marker as separate. Multimarker analysis on occasion produced results which were much more highly significant than single marker analysis, but only very rarely. Our results support the view that if markers are randomly selected then a spacing as low as 2 kb is desirable. Multimarker analysis can sometimes be more powerful than single marker analysis so both should be performed. However, because it is rare for multimarker analysis to be much more highly significant than single marker analysis one should strongly suspect that when such results occur they may be due to mistakes in genotyping or through some other artefact. Haplotype analysis may be more prone to such problems than logistic regression, suggesting that the latter method might be preferred.
Version of record
License start date
ANNALS OF HUMAN GENETICS, 2006, 70 pp. 893 - 906
WILEY-BLACKWELL PUBLISHING, INC