N-SAMSAM: A simple and faster algorithm for solving approximate matching in DNA sequences

Ni, Bing; Wong M.H.; Prof. LEUNG Kwong Sak

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7551

DC Field	Value	Language
dc.contributor.author	Ni, Bing	en_US
dc.contributor.author	Wong M.H.	en_US
dc.contributor.author	Prof. LEUNG Kwong Sak	en_US
dc.date.accessioned	2023-03-23T04:47:21Z	-
dc.date.available	2023-03-23T04:47:21Z	-
dc.date.issued	2008	-
dc.identifier.citation	2008 IEEE Congress on Evolutionary Computation, CEC 2008, pp. 2592 - 2598, 2008 , Article number 4631146	en_US
dc.identifier.isbn	978-142441823-7	-
dc.identifier.uri	http://hdl.handle.net/20.500.11861/7551	-
dc.description.abstract	This work proposes a novel algorithm to do approximate matching in a database consisting of multiple sequences. We apply Agrep algorithm in an indexing structure, the r-cut numerical substring array (r-NSA). The structure basically indexes all the substrings of length r. The advantage of using the r-NSA is two-fold: (1) The space requirement of the r-NSA is much smaller than that of the other existing indexing structures, such as the generalized suffix tree. (2) We propose an algorithm to apply Agrep in the r-NSA, in which the substrings are processed sequentially. Since the common substrings are processed only once, the cost of our algorithm is smaller than that of the full scanning search by Agrep. Consequently, the matching time of our algorithm is also reduced. We design experiments to validate and compare the performance of our algorithm against the full scanning search by Agrep. We define the speed-up of our algorithm as the time required by the full scanning search by Agrep over that of our algorithm. We use eight sets of real DNA sequences in our experiments, and the results show that our algorithm achieves significant speed-up. We also investigate the speed-up of difference data sets, and analyze their differences in detail. © 2008 IEEE.	en_US
dc.language.iso	en	en_US
dc.relation.ispartof	2008 IEEE Congress on Evolutionary Computation, CEC 2008	en_US
dc.title	N-SAMSAM: A simple and faster algorithm for solving approximate matching in DNA sequences	en_US
dc.type	Conference Paper	en_US
dc.identifier.doi	10.1109/CEC.2008.4631146	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Department of Applied Data Science	-
Appears in Collections:	Applied Data Science - Publication

Find@HKSYU

Show simple item record

SCOPUS^TM
Citations

3

checked on May 18, 2025

Page view(s)

34

Last Week
0

Last month

checked on May 19, 2025

Google Scholar^TM

Impact Indices

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

SCOPUS^TM
Citations

Google Scholar^TM

PlumX
Metrics