Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.11861/7389
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Hongjian | en_US |
dc.contributor.author | Lu, gang | en_US |
dc.contributor.author | Sze, Kam-Heung | en_US |
dc.contributor.author | Su, Xianwei | en_US |
dc.contributor.author | Chan, Wai-Yee | en_US |
dc.contributor.author | Prof. LEUNG Kwong Sak | en_US |
dc.date.accessioned | 2023-02-20T10:26:02Z | - |
dc.date.available | 2023-02-20T10:26:02Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Briefings in Bioinformatics, November 2021, Vol. 22 (6), bbab225 | en_US |
dc.identifier.issn | 1477-4054 | - |
dc.identifier.uri | http://hdl.handle.net/20.500.11861/7389 | - |
dc.description.abstract | The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks. | en_US |
dc.language.iso | en | en_US |
dc.relation.ispartof | Briefings in Bioinformatics | en_US |
dc.title | Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark | en_US |
dc.type | Peer Reviewed Journal Article | en_US |
dc.identifier.doi | 10.1093/bib/bbab225 | - |
item.fulltext | No Fulltext | - |
crisitem.author.dept | Department of Applied Data Science | - |
Appears in Collections: | Applied Data Science - Publication |
SCOPUSTM
Citations
14
checked on Nov 17, 2024
Page view(s)
63
Last Week
1
1
Last month
checked on Nov 21, 2024
Google ScholarTM
Impact Indices
Altmetric
PlumX
Metrics
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.