Publications Repository

Publications Repository

View Item 
  •   Home
  • ICR Divisions
  • Other ICR Research
  • View Item
  • Home
  • ICR Divisions
  • Other ICR Research
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

The application of Hadoop in structural bioinformatics.

Thumbnail
View/Open
Accepted version (533.7Kb)
Publication Date
2018-11-20
ICR Author
Alnasir, Jamie
Author
Alnasir, JJ
Shanahan, HP
Type
Journal Article
Metadata
Show full item record
Abstract
The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up.
URL
https://repository.icr.ac.uk/handle/internal/3371
Collections
  • Other ICR Research
Licenseref URL
http://www.rioxx.net/licenses/under-embargo-all-rights-reserved
Version of record
10.1093/bib/bby106
Language
eng
Date accepted
2018-10-05
License start date
2018-11-20
Citation
Briefings in bioinformatics, 2018

Browse

All of ICR repositoryICR DivisionsIssue dateAuthorsTitlesSubjectsThis collectionIssue dateAuthorsTitlesSubjects

Statistics

Most popular itemsStatistics by countryMost popular authors
  • Login
  • Registered office: The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP
    A Charity, Not for Profit. Company Limited by Guarantee.
    Registered in England No. 534147. VAT Registration No. GB 849 0581 02.