Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7500
DC FieldValueLanguage
dc.contributor.authorNi, Bingen_US
dc.contributor.authorWong, Man-Honen_US
dc.contributor.authorLam, Chi-Fai Daviden_US
dc.contributor.authorProf. LEUNG Kwong Saken_US
dc.date.accessioned2023-03-16T03:39:31Z-
dc.date.available2023-03-16T03:39:31Z-
dc.date.issued2014-
dc.identifier.citationInternational Journal of Data Mining and Bioinformatics, 2014, Vol. 9 (4), pp 358-385en_US
dc.identifier.issn1748-5673-
dc.identifier.issn1748-5681-
dc.identifier.urihttp://hdl.handle.net/20.500.11861/7500-
dc.description.abstractThis paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.en_US
dc.language.isoenen_US
dc.relation.ispartofInternational Journal of Data Mining and Bioinformaticsen_US
dc.titleApplying Agrep to r-NSA to solve multiple sequences approximate matchingen_US
dc.typePeer Reviewed Journal Articleen_US
dc.identifier.doi10.1504/IJDMB.2014.062145-
item.fulltextNo Fulltext-
crisitem.author.deptDepartment of Applied Data Science-
Appears in Collections:Applied Data Science - Publication
Show simple item record

SCOPUSTM   
Citations

1
checked on Nov 17, 2024

Page view(s)

32
Last Week
0
Last month
checked on Nov 24, 2024

Google ScholarTM

Impact Indices

Altmetric

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.