Show simple item record

dc.contributor.authorHunter, B
dc.contributor.authorReis, S
dc.contributor.authorCampbell, D
dc.contributor.authorMatharu, S
dc.contributor.authorRatnakumar, P
dc.contributor.authorMercuri, L
dc.contributor.authorHindocha, S
dc.contributor.authorKalsi, H
dc.contributor.authorMayer, E
dc.contributor.authorGlampson, B
dc.contributor.authorRobinson, EJ
dc.contributor.authorAl-Lazikani, B
dc.contributor.authorScerri, L
dc.contributor.authorBloch, S
dc.contributor.authorLee, R
dc.coverage.spatialSwitzerland
dc.date.accessioned2022-09-02T09:59:49Z
dc.date.available2022-09-02T09:59:49Z
dc.date.issued2021-11-04
dc.identifierARTN 748168
dc.identifier.citationFrontiers in Medicine, 2021, 8 pp. 748168 -
dc.identifier.issn2296-858X
dc.identifier.urihttps://repository.icr.ac.uk/handle/internal/5383
dc.identifier.eissn2296-858X
dc.identifier.eissn2296-858X
dc.identifier.doi10.3389/fmed.2021.748168
dc.description.abstractImportance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation. Objective: To automate lung nodule identification in a tertiary cancer centre. Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients. Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy. Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
dc.formatElectronic-eCollection
dc.format.extent748168 -
dc.languageeng
dc.language.isoeng
dc.publisherFRONTIERS MEDIA SA
dc.relation.ispartofFrontiers in Medicine
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectinformatics
dc.subjectlung nodule
dc.subjectmachine learning
dc.subjectnatural language processing (NLP)
dc.subjectstructured query language (SQL)
dc.titleDevelopment of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre.
dc.typeJournal Article
dcterms.dateAccepted2021-10-07
dc.date.updated2022-09-02T09:59:27Z
rioxxterms.versionVoR
rioxxterms.versionofrecord10.3389/fmed.2021.748168
rioxxterms.licenseref.startdate2021-11-04
rioxxterms.typeJournal Article/Review
pubs.author-urlhttps://www.ncbi.nlm.nih.gov/pubmed/34805217
pubs.organisational-group/ICR
pubs.organisational-group/ICR/Primary Group
pubs.organisational-group/ICR/Primary Group/ICR Divisions
pubs.organisational-group/ICR/Primary Group/ICR Divisions/Cancer Therapeutics
pubs.organisational-group/ICR/Primary Group/ICR Divisions/Cancer Therapeutics/Computational Biology and Chemogenomics
pubs.organisational-group/ICR/ImmNet
pubs.publication-statusPublished online
pubs.publisher-urlhttp://dx.doi.org/10.3389/fmed.2021.748168
pubs.volume8
icr.researchteamComputational Biology
dc.contributor.icrauthorAl-Lazikani, Bissan
icr.provenanceDeposited by Mr Arek Surman on 2022-09-02. Deposit type is initial. No. of files: 1. Files: Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Ce.pdf


Files in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record

http://creativecommons.org/licenses/by/4.0/
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/