Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7535
DC FieldValueLanguage
dc.contributor.authorProf. LEUNG Kwong Saken_US
dc.contributor.authorLee, Kin Hongen_US
dc.contributor.authorWang, Jin-Fengen_US
dc.contributor.authorNg, Eddie Y. T.en_US
dc.contributor.authorChan, Henry L. Y.en_US
dc.contributor.authorTsui, Stephen K. W.en_US
dc.contributor.authorMok, Tony S. K.en_US
dc.contributor.authorTse, Pete Chi-Hangen_US
dc.contributor.authorSung, Joseph Jao-Yiuen_US
dc.date.accessioned2023-03-23T02:57:06Z-
dc.date.available2023-03-23T02:57:06Z-
dc.date.issued2011-
dc.identifier.citationIEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011, Vol. 8 ( 2), pp. 428 - 440, Article number 4760134en_US
dc.identifier.issn15455963-
dc.identifier.urihttp://hdl.handle.net/20.500.11861/7535-
dc.description.abstractExtraction of meaningful information from large experimental data sets is a key element in bioinformatics research. One of the challenges is to identify genomic markers in Hepatitis B Virus (HBV) that are associated with HCC (liver cancer) development by comparing the complete genomic sequences of HBV among patients with HCC and those without HCC. In this study, a data mining framework, which includes molecular evolution analysis, clustering, feature selection, classifier learning, and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then, meaningful rules are learned by our algorithm called the Rule Learning, which is based on Evolutionary Algorithm. Also, a new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The nonadditivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. These two classifiers give explicit information on the importance of the individual mutated sites and their interactions toward the classification (potential causes of liver cancer in our case). A thorough comparison study of these two methods with existing methods is detailed. For genotype B, genotype C subgroups C1, C2, and C3, important mutation markers (sites) have been found, respectively. These two classification methods have been applied to classify never-seen-before examples for validation. The results show that the classification methods have more than 70 percent accuracy and 80 percent sensitivity for most data sets, which are considered high as an initial scanning method for liver cancer diagnosis. © 2011 IEEE.en_US
dc.language.isoenen_US
dc.relation.ispartofIEEE/ACM Transactions on Computational Biology and Bioinformaticsen_US
dc.titleData mining on DNA sequences of hepatitis B virusen_US
dc.typePeer Reviewed Journal Articleen_US
dc.identifier.doi10.1109/TCBB.2009.6-
item.fulltextNo Fulltext-
crisitem.author.deptDepartment of Applied Data Science-
Appears in Collections:Publication
Show simple item record

SCOPUSTM   
Citations

47
checked on Jan 3, 2024

Page view(s)

16
checked on Jan 3, 2024

Google ScholarTM

Impact Indices

Altmetric

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.