Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification

Liang, Yong; Liu, Cheng; Luan, Xin-Ze; Prof. LEUNG Kwong Sak; Chan, Tak-Ming; Xu, Zong-Ben; Zhang, Hai

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/7511

Title:	Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification
Authors:	Liang, Yong Liu, Cheng Luan, Xin-Ze Prof. LEUNG Kwong Sak Chan, Tak-Ming Xu, Zong-Ben Zhang, Hai
Issue Date:	2013
Source:	BMC Bioinformatics, 2013, 14, 198
Journal:	BMC Bioinformatics
Abstract:	Background Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0 < q < 1) regularizations and has been demonstrated many attractive properties. Results In this work, we investigate a sparse logistic regression with the L1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L1 and elastic net regularization approaches. Conclusions From our evaluations, it is clear that the sparse logistic regression with the L1/2 penalty achieves higher classification accuracy than those of ordinary L1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L1/2 penalty is effective technique for gene selection in real classification problems.
Type:	Peer Reviewed Journal Article
URI:	http://hdl.handle.net/20.500.11861/7511
DOI:	10.1186/1471-2105-14-198
Appears in Collections:	Publication

Find@HKSYU

Show full item record

SCOPUS^TM
Citations

129

checked on Jan 3, 2024

Page view(s)

16

checked on Jan 3, 2024

Google Scholar^TM

Impact Indices

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

SCOPUS^TM
Citations

Google Scholar^TM

PlumX
Metrics