A hierarchical gamma mixture model-based method for estimating the number of clusters in complex data

Dr. AZHAR Muhammad; Huang, Joshua Zhexue; Masud, Md Abdul; Li, Mark Junjie; Cui, Laizhong

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/9004

DC Field	Value	Language
dc.contributor.author	Dr. AZHAR Muhammad	en_US
dc.contributor.author	Huang, Joshua Zhexue	en_US
dc.contributor.author	Masud, Md Abdul	en_US
dc.contributor.author	Li, Mark Junjie	en_US
dc.contributor.author	Cui, Laizhong	en_US
dc.date.accessioned	2024-03-13T03:13:33Z	-
dc.date.available	2024-03-13T03:13:33Z	-
dc.date.issued	2020	-
dc.identifier.citation	Applied Soft Computing, 2020, vol. 87, article no. 105891.	en_US
dc.identifier.issn	1568-4946	-
dc.identifier.issn	1872-9681	-
dc.identifier.uri	http://hdl.handle.net/20.500.11861/9004	-
dc.description.abstract	This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data.	en_US
dc.language.iso	en	en_US
dc.relation.ispartof	Applied Soft Computing	en_US
dc.title	A hierarchical gamma mixture model-based method for estimating the number of clusters in complex data	en_US
dc.type	Peer Reviewed Journal Article	en_US
dc.identifier.doi	https://doi.org/10.1016/j.asoc.2019.105891	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Department of Applied Data Science	-
Appears in Collections:	Applied Data Science - Publication

Find@HKSYU

Show simple item record

Page view(s)

22

Last Week
1

Last month

checked on Nov 21, 2024

Google Scholar^TM

Impact Indices

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

Google Scholar^TM

PlumX
Metrics