A hierarchical gamma mixture model-based method for classification of high-dimensionals data

Dr. AZHAR Muhammad; Li, Mark Junjie; Huang, Joshua Zhexue

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/9005

DC Field	Value	Language
dc.contributor.author	Dr. AZHAR Muhammad	en_US
dc.contributor.author	Li, Mark Junjie	en_US
dc.contributor.author	Huang, Joshua Zhexue	en_US
dc.date.accessioned	2024-03-13T03:21:36Z	-
dc.date.available	2024-03-13T03:21:36Z	-
dc.date.issued	2019	-
dc.identifier.citation	Entropy, 2019, vol. 21(9).	en_US
dc.identifier.issn	1099-4300	-
dc.identifier.uri	http://hdl.handle.net/20.500.11861/9005	-
dc.description.abstract	Data classification is an important research topic in the field of data mining. With the rapid development in social media sites and IoT devices, data have grown tremendously in volume and complexity, which has resulted in a lot of large and complex high-dimensional data. Classifying such high-dimensional complex data with a large number of classes has been a great challenge for current state-of-the-art methods. This paper presents a novel, hierarchical, gamma mixture model-based unsupervised method for classifying high-dimensional data with a large number of classes. In this method, we first partition the features of the dataset into feature strata by using k-means. Then, a set of subspace data sets is generated from the feature strata by using the stratified subspace sampling method. After that, the GMM Tree algorithm is used to identify the number of clusters and initial clusters in each subspace dataset and passing these initial cluster centers to k-means to generate base subspace clustering results. Then, the subspace clustering result is integrated into an object cluster association (OCA) matrix by using the link-based method. The ensemble clustering result is generated from the OCA matrix by the k-means algorithm with the number of clusters identified by the GMM Tree algorithm. After producing the ensemble clustering result, the dominant class label is assigned to each cluster after computing the purity. A classification is made on the object by computing the distance between the new object and the center of each cluster in the classifier, and the class label of the cluster is assigned to the new object which has the shortest distance. A series of experiments were conducted on twelve synthetic and eight real-world data sets, with different numbers of classes, features, and objects. The experimental results have shown that the new method outperforms other state-of-the-art techniques to classify data in most of the data sets.	en_US
dc.language.iso	en	en_US
dc.relation.ispartof	Entropy	en_US
dc.title	A hierarchical gamma mixture model-based method for classification of high-dimensionals data	en_US
dc.type	Peer Reviewed Journal Article	en_US
dc.identifier.doi	10.3390/e21090906	-
item.fulltext	No Fulltext	-
crisitem.author.dept	Department of Applied Data Science	-
Appears in Collections:	Applied Data Science - Publication

Find@HKSYU

Show simple item record

SCOPUS^TM
Citations

3

checked on Jun 29, 2025

Page view(s)

35

Last Week
0

Last month

checked on Jun 29, 2025

Google Scholar^TM

Impact Indices

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

SCOPUS^TM
Citations

Google Scholar^TM

PlumX
Metrics