Efficient algorithm for mining correlated protein-DNA binding cores

Wong, Po-Yuen; Chan, Tak-Ming; Wong, Man-Hon; Prof. LEUNG Kwong Sak

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7526

DC Field	Value	Language
dc.contributor.author	Wong, Po-Yuen	en_US
dc.contributor.author	Chan, Tak-Ming	en_US
dc.contributor.author	Wong, Man-Hon	en_US
dc.contributor.author	Prof. LEUNG Kwong Sak	en_US
dc.date.accessioned	2023-03-22T06:53:26Z	-
dc.date.available	2023-03-22T06:53:26Z	-
dc.date.issued	2012	-
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, Volume 7238 LNCS, Issue PART 1, Pages 470 - 4812012 17th International Conference on Database Systems for Advanced Applications	en_US
dc.identifier.isbn	978-364229037-4	-
dc.identifier.issn	16113349	-
dc.identifier.uri	http://hdl.handle.net/20.500.11861/7526	-
dc.description.abstract	Correlated protein-DNA interaction (binding cores) between transcription factor (TFs) and transcription factor binding sites (TFBSs) are usually identified by costly 3D structural experiments. To avoid numerous unsuccessful trials, we are motivated to develop a cheap and efficient sequence-based computational method for providing testable novel binding cores with high confidence to accelerate the experiments. Although there are abundant sequence-based motif discovery algorithms, few directly address associating both TF and TFBS core motifs which are both verifiable on 3D structures. In this paper, we formally define the problem of discovering correlated TF-TFBS binding cores, and apply association rule mining techniques over existing real sequence data (TRANSFAC). The proposed algorithm first builds two frequent sequence tree (FS-Tree) structures storing condensed information for association rule mining. Association rules are then generated by depth-first traversal on the structures. FS-Trees have several advantages to support further applications, including efficient calculation of the support and confidence, simple generation of candidate rules, and applicability of effective pruning techniques. As a result, the FS-Trees serve as a useful basis for more general extensions related to biological binding core identification. We tested our algorithm on real sequence data from the biological database TRANSFAC and focus on efficiency comparisons with the recent work employing association rule mining. The rules discovered reveal real TF-TFBS binding cores in independent 3D verifications on Protein Data Bank (PDB). © 2012 Springer-Verlag.	en_US
dc.language.iso	en	en_US
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics	en_US
dc.title	Efficient algorithm for mining correlated protein-DNA binding cores	en_US
dc.type	Conference Paper	en_US
dc.identifier.doi	10.1007/978-3-642-29038-1_34	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Department of Applied Data Science	-
Appears in Collections:	Applied Data Science - Publication

Find@HKSYU

Show simple item record

SCOPUS^TM
Citations

3

checked on Jun 29, 2025

Page view(s)

50

Last Week
0

Last month

checked on Jul 4, 2025

Google Scholar^TM

Impact Indices

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

SCOPUS^TM
Citations

Google Scholar^TM

PlumX
Metrics