Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7623
DC FieldValueLanguage
dc.contributor.authorJin, Huidongen_US
dc.contributor.authorWong, Man-Leungen_US
dc.contributor.authorProf. LEUNG Kwong Saken_US
dc.date.accessioned2023-03-28T03:46:56Z-
dc.date.available2023-03-28T03:46:56Z-
dc.date.issued2003-
dc.identifier.citationProceedings - IEEE International Conference on Data Mining, 2003, ICDM, pp. 91 - 98en_US
dc.identifier.isbn0769519784-
dc.identifier.isbn978-076951978-4-
dc.identifier.issn15504786-
dc.identifier.urihttp://hdl.handle.net/20.500.11861/7623-
dc.description.abstractThe scalability problem in data mining involves the development of methods for handling large databases with limited computational resources. In this paper, we present a two-phase scalable model-based clustering framework: First, a large data set is summed up into sub-clusters; Then, clusters are directly generated from the summary statistics of sub-clusters by a specifically designed Expectation-Maximization (EM) algorithm. Taking example for Gaussian mixture models, we establish a provably convergent EM algorithm, EMADS, which embodies cardinality, mean, and covariance information of each sub-cluster explicitly. Combining with different data summarization procedures, EMADS is used to construct two clustering systems: gEMADS and bEMADS. The experimental results demonstrate that they run several orders of magnitude faster than the classic EM algorithm with little loss of accuracy. They generate significantly better results than other model-based clustering systems using similar computational resources. © 2003 IEEE.en_US
dc.language.isoenen_US
dc.relation.ispartofProceedings - IEEE International Conference on Data Mining, ICDMen_US
dc.titleScalable model-based clustering by working on data summariesen_US
dc.typeConference Proceedingsen_US
item.fulltextNo Fulltext-
crisitem.author.deptDepartment of Applied Data Science-
Appears in Collections:Publication
Show simple item record

Page view(s)

14
checked on Jan 3, 2024

Google ScholarTM

Impact Indices

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.