Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7389
DC FieldValueLanguage
dc.contributor.authorLi, Hongjianen_US
dc.contributor.authorLu, gangen_US
dc.contributor.authorSze, Kam-Heungen_US
dc.contributor.authorSu, Xianweien_US
dc.contributor.authorChan, Wai-Yeeen_US
dc.contributor.authorProf. LEUNG Kwong Saken_US
dc.date.accessioned2023-02-20T10:26:02Z-
dc.date.available2023-02-20T10:26:02Z-
dc.date.issued2021-
dc.identifier.citationBriefings in Bioinformatics, November 2021, Vol. 22 (6), bbab225en_US
dc.identifier.issn1477-4054-
dc.identifier.urihttp://hdl.handle.net/20.500.11861/7389-
dc.description.abstractThe superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.en_US
dc.language.isoenen_US
dc.relation.ispartofBriefings in Bioinformaticsen_US
dc.titleMachine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmarken_US
dc.typePeer Reviewed Journal Articleen_US
dc.identifier.doi10.1093/bib/bbab225-
item.fulltextNo Fulltext-
crisitem.author.deptDepartment of Applied Data Science-
Appears in Collections:Publication
Show simple item record

SCOPUSTM   
Citations

11
checked on Jan 3, 2024

Page view(s)

41
checked on Jan 3, 2024

Google ScholarTM

Impact Indices

Altmetric

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.