Flexible Data Analysis Pipeline for High-Confidence Proteogenomics.

Weisser, H; Wright, JC; Mudge, JM; Gutenbrunner, P; Choudhary, JS

View/Open

Published version (2.170Mb)

Date

2016-12-02

ICR Author

Wright, James

Choudhary, Jyoti

Author

Weisser, H

Wright, JC

Mudge, JM

Gutenbrunner, P

Choudhary, JS

Type

Journal Article

Metadata

Show full item record

Abstract

Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are "novel" peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such "novel" peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome.

Subject

Testis

Humans

Proteomics

Genome, Human

Software

Male

Data Mining

Search Engine

Molecular Sequence Annotation

Machine Learning

Proteogenomics

Research team

Functional Proteomics Group

Language

eng

Date accepted

2016-10-27

License start date

2016-12

Citation

Journal of proteome research, 2016, 15 (12), pp. 4686 - 4695

Publisher

AMER CHEMICAL SOC

Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0

Publications Repository

Publications Repository