Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/9004
Title: A hierarchical gamma mixture model-based method for estimating the number of clusters in complex data
Authors: Dr. AZHAR Muhammad 
Huang, Joshua Zhexue 
Masud, Md Abdul 
Li, Mark Junjie 
Cui, Laizhong 
Issue Date: 2020
Source: Applied Soft Computing, 2020, vol. 87, article no. 105891.
Journal: Applied Soft Computing 
Abstract: This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data.
Type: Peer Reviewed Journal Article
URI: http://hdl.handle.net/20.500.11861/9004
ISSN: 1568-4946
1872-9681
DOI: https://doi.org/10.1016/j.asoc.2019.105891
Appears in Collections:Publication

Show full item record

Google ScholarTM

Impact Indices

Altmetric

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.