A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies.
Loading...
Embargo End Date
ICR Authors
Authors
Nyamundanda, G
Poudel, P
Patil, Y
Sadanandam, A
Poudel, P
Patil, Y
Sadanandam, A
Document Type
Journal Article
Date
2017-09-07
Date Accepted
2017-08-18
Abstract
Genome projects now generate large-scale data often produced at various time points by different laboratories using multiple platforms. This increases the potential for batch effects. Currently there are several batch evaluation methods like principal component analysis (PCA; mostly based on visual inspection), and sometimes they fail to reveal all of the underlying batch effects. These methods can also lead to the risk of unintentionally correcting biologically interesting factors attributed to batch effects. Here we propose a novel statistical method, finding batch effect (findBATCH), to evaluate batch effect based on probabilistic principal component and covariates analysis (PPCCA). The same framework also provides a new approach to batch correction, correcting batch effect (correctBATCH), which we have shown to be a better approach to traditional PCA-based correction. We demonstrate the utility of these methods using two different examples (breast and colorectal cancers) by merging gene expression data from different studies after diagnosing and correcting for batch effects and retaining the biological effects. These methods, along with conventional visual inspection-based PCA, are available as a part of an R package exploring batch effect (exploBATCH; https://github.com/syspremed/exploBATCH ).
Citation
Scientific reports, 2017, 7 (1), pp. 10849 - ?
Source Title
Publisher
NATURE PORTFOLIO
ISSN
2045-2322
eISSN
2045-2322
Collections
Research Team
Systems and Precision Cancer Medicine
