Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7548
DC FieldValueLanguage
dc.contributor.authorChan, Tak-Mingen_US
dc.contributor.authorLi, Gangen_US
dc.contributor.authorProf. LEUNG Kwong Saken_US
dc.contributor.authorLee, Kin-Hongen_US
dc.date.accessioned2023-03-23T04:34:11Z-
dc.date.available2023-03-23T04:34:11Z-
dc.date.issued2009-
dc.identifier.citationBMC Bioinformatics, 2009, vol. 107, Article number 321en_US
dc.identifier.issn14712105-
dc.identifier.urihttp://hdl.handle.net/20.500.11861/7548-
dc.description.abstractBackground: Identification of transcription factor binding sites (TFBSs) is a central problem in Bioinformatics on gene regulation. de novo motif discovery serves as a promising way to predict and better understand TFBSs for biological verifications. Real TFBSs of a motif may vary in their widths and their conservation degrees within a certain range. Deciding a single motif width by existing models may be biased and misleading. Additionally, multiple, possibly overlapping, candidate motifs are desired and necessary for biological verification in practice. However, current techniques either prohibit overlapping TFBSs or lack explicit control of different motifs. Results: We propose a new generalized model to tackle the motif widths by considering and evaluating a width range of interest simultaneously, which should better address the width uncertainty. Moreover, a meta-convergence framework for genetic algorithms (GAs), is proposed to provide multiple overlapping optimal motifs simultaneously in an effective and flexible way. Users can easily specify the difference amongst expected motif kinds via similarity test. Incorporating Genetic Algorithm with Local Filtering (GALF) for searching, the new GALF-G (G for generalized) algorithm is proposed based on the generalized model and meta-convergence framework. Conclusion: GALF-G was tested extensively on over 970 synthetic, real and benchmark datasets, and is usually better than the state-of-the-art methods. The range model shows an increase in sensitivity compared with the single-width ones, while providing competitive precisions on the E. coli benchmark. Effectiveness can be maintained even using a very small population, exhibiting very competitive efficiency. In discovering multiple overlapping motifs in a real liver-specific dataset, GALF-G outperforms MEME by up to 73% in overall F-scores. GALF-G also helps to discover an additional motif which has probably not been annotated in the dataset. http://www.cse.cuhk.edu.hk/%7Etmchan/GALFG/. © 2009 Chan et al; licensee BioMed Central Ltd.en_US
dc.language.isoenen_US
dc.relation.ispartofBMC Bioinformaticsen_US
dc.titleDiscovering multiple realistic TFBS motifs based on a generalized modelen_US
dc.typePeer Reviewed Journal Articleen_US
dc.identifier.doi10.1186/1471-2105-10-321-
item.fulltextNo Fulltext-
crisitem.author.deptDepartment of Applied Data Science-
Appears in Collections:Applied Data Science - Publication
Show simple item record

SCOPUSTM   
Citations

18
checked on Nov 17, 2024

Page view(s)

31
Last Week
0
Last month
checked on Nov 21, 2024

Google ScholarTM

Impact Indices

Altmetric

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.