Show simple item record

dc.contributor.authorMudge, JM
dc.contributor.authorJungreis, I
dc.contributor.authorHunt, T
dc.contributor.authorGonzalez, JM
dc.contributor.authorWright, JC
dc.contributor.authorKay, M
dc.contributor.authorDavidson, C
dc.contributor.authorFitzgerald, S
dc.contributor.authorSeal, R
dc.contributor.authorTweedie, S
dc.contributor.authorHe, L
dc.contributor.authorWaterhouse, RM
dc.contributor.authorLi, Y
dc.contributor.authorBruford, E
dc.contributor.authorChoudhary, JS
dc.contributor.authorFrankish, A
dc.contributor.authorKellis, M
dc.date.accessioned2019-11-18T14:14:14Z
dc.date.issued2019-12-01
dc.identifier.citationGenome research, 2019, 29 (12), pp. 2073 - 2087
dc.identifier.issn1088-9051
dc.identifier.urihttps://repository.icr.ac.uk/handle/internal/3423
dc.identifier.eissn1549-5469
dc.identifier.doi10.1101/gr.246462.118
dc.description.abstractThe most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.
dc.formatPrint-Electronic
dc.format.extent2073 - 2087
dc.languageeng
dc.language.isoeng
dc.publisherCOLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.subjectAnimals
dc.subjectHumans
dc.subjectSequence Analysis, DNA
dc.subjectOpen Reading Frames
dc.subjectPseudogenes
dc.subjectGenome, Human
dc.subjectExons
dc.subjectGenome-Wide Association Study
dc.subjectHigh-Throughput Nucleotide Sequencing
dc.titleDiscovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.
dc.typeJournal Article
dcterms.dateAccepted2019-09-09
rioxxterms.versionofrecord10.1101/gr.246462.118
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0
rioxxterms.licenseref.startdate2019-12
rioxxterms.typeJournal Article/Review
dc.relation.isPartOfGenome research
pubs.issue12
pubs.notesNot known
pubs.organisational-group/ICR
pubs.organisational-group/ICR/Primary Group
pubs.organisational-group/ICR/Primary Group/ICR Divisions
pubs.organisational-group/ICR/Primary Group/ICR Divisions/Cancer Biology
pubs.organisational-group/ICR/Primary Group/ICR Divisions/Cancer Biology/Functional Proteomics Group
pubs.publication-statusPublished
pubs.volume29
pubs.embargo.termsNot known
icr.researchteamFunctional Proteomics Group
dc.contributor.icrauthorWright, James
dc.contributor.icrauthorChoudhary, Jyoti


Files in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record

https://creativecommons.org/licenses/by/4.0
Except where otherwise noted, this item's license is described as https://creativecommons.org/licenses/by/4.0