Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.11861/7597
Title: | Scalable model-based clustering for large databases based on data summarization |
Authors: | Jin, Huidong Wong, Man-Leung Prof. LEUNG Kwong Sak |
Issue Date: | 2005 |
Source: | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, vol. 27 (11), pp. 1710 - 1719 |
Journal: | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Abstract: | The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources such as memory and computation time. In this paper, two scalable clustering algorithms, bEMADS and gEMADS, are presented based on the Gaussian mixture model. Both summarize data into subclusters and then generate Gaussian mixtures from their data summaries. Their core algorithm, EMADS, is defined on data summaries and approximates the aggregate behavior of each subcluster of data under the Gaussian mixture model. EMADS is provably convergent. Experimental results substantiate that both algorithms can run several orders of magnitude faster than expectation-maximization with little loss of accuracy © 2005 IEEE. |
Type: | Peer Reviewed Journal Article |
URI: | http://hdl.handle.net/20.500.11861/7597 |
ISSN: | 01628828 |
DOI: | 10.1109/TPAMI.2005.226 |
Appears in Collections: | Applied Data Science - Publication |
Find@HKSYU Show full item record
SCOPUSTM
Citations
35
checked on Nov 17, 2024
Page view(s)
37
Last Week
0
0
Last month
checked on Nov 21, 2024
Google ScholarTM
Impact Indices
Altmetric
PlumX
Metrics
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.