A Review Study of Microarray Data Classification with the Application of Dimension Reduction

A Review Study of Microarray Data Classification with the Application of Dimension Reduction


  • Sharifah Nadia Syed Hasan College of Computing, Informatics and Mathematics, UiTM Melaka Campus, Jasin Branch, Melaka, Malaysia
  • Noor Wahida Jamil College of Computing, Informatics and Mathematics, UiTM Melaka Campus, Jasin Branch, Melaka, Malaysia


Microarray, Classification, Dimensionality Reduction, Feature Extraction, Feature Selection


Background. The growth of gene expression or microarray data, mainly in cancer disease, has become a game changer for feature selection techniques in handling complex data. Hence, the advancement of Deoxyribonucleic acid (DNA) microarray technology has made it feasible to measure the expression level of thousands of genes with the ability to diagnose early detection. This extensive study is conducted to review and analyse literature related to applying various dimensionality reduction approaches to predict microarray data. This study is aimed for the Data Science and Medical Sciences disciplines with the goal of extending future research and broader interdisciplinary collaboration efforts.

Methods. The systematic review of this study is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines and reported in accordance with the PRISMA statement. Other than that, a systematic search is conducted using two search engines including, Scopus and Web of Science (WoS), from 2018 to 2022 by inputting the "feature extraction," "feature selection," "classification," and "microarray" as keywords. Based on the inclusion and exclusion criteria, the final articles available for review are 53 articles. Specifically, this study reports on the performance of feature selection approaches and the empirical comparisons of classification techniques used on the microarray dataset.

Results. According to the analysis, part of the included articles is mostly hybrid and novel approaches proposed for gene selection. Many novel and hybrid methods were developed to produce a good performance in terms of accuracy and computational efficiency. Moreover, the hybrid methods are proven effective in reducing dimensions and selecting relevant features. Besides, machine learning techniques are still the top interest among researchers for classification despite the emergence of deep learning approaches.


Download data is not yet available.


Al-Rajab, M., Lu, J., Xu, Q., Kentour, M., Sawsa, A., Shuweikeh, E., Joy, M., & Arasaradnam, R. (2023). A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset. PLOS ONE, 18(11), e0286791. https://doi.org/10.1371/journal.pone.0286791

Almarzouki, H. Z. (2021). Deep-learning-based cancer profiles classification using gene expression data profile. Journal of Healthcare Engineering, 2022. https://doi.org/https://doi.org/10.1155/2022/4715998

Almugren, N., & Alshamlan, H. M. (2019). New bio-marker gene discovery algorithms for cancer gene expression profile. IEEE Access, 7, 136907–136913. https://doi.org/10.1109/ACCESS.2019.2942413

Alshamlan, H. M. (2021). An effective filter method towards the performance improvement of FF-SVM algorithm. IEEE Access, 9, 140835–140840. https://doi.org/10.1109/ACCESS.2021.3119233

Alshareef, A. M., Alsini, R., Alsieni, M., Alrowais, F., Marzouk, R., Abunadi, I., & Nemri, N. (2022). Optimal deep learning enabled prostate cancer detection using microarray gene expression. Journal of Healthcare Engineering, 2022. https://doi.org/https://doi.org/10.1155/2022/7364704

Angulo, A. P. (2018). Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information, 9(1), 6. https://doi.org/https://doi.org/10.3390/info9010006

Baardwijk, M. van, Cristoferi, I., Ju, J., Varol, H., Minnee, R. C., Reinders, M. E. J., Li, Y., Stubbs, A. P., & Groningen, M. C. C. (2022). A decentralized kidney transplant biopsy classifier for transplant rejection developed using genes of the Banff-Human organ transplant panel. Frontiers in Immunology, 13. https://doi.org/https://doi.org/10.3389/fimmu.2022.841519

Babichev, S., & Škvor, J. (2020). Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics, 10(8). https://doi.org/https://doi.org/10.3390/diagnostics10080584

Bilen, M., H. Isik, A., & Yigit, T. (2020). A new hybrid and ensemble gene selection approach with an enhanced genetic algorithm for classification of microarray gene expression values on leukemia cancer. International Journal of Computational Intelligence Systems, 13(1), 1554–1556. https://doi.org/https://doi.org/10.2991/ijcis.d.200928.001

Brankovic, A., Hosseini, M., & Piroddi, L. (2019). A distributed feature selection algorithm based on distance correlation with an application to microarrays. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(6), 1802–1815. https://doi.org/10.1109/TCBB.2018.2833482

Cao, B., Jianwei Zhao, Yang, P., Yang, P., Liu, X., Qi, J., Simpson, A., Elhoseny, M., Mehmood, I., & Muhammad, K. (2019). Multiobjective feature selection for microarray data via distributed parallel algorithms. Future Generation Computer Systems, 100, 952–981. https://doi.org/https://doi.org/10.1016/j.future.2019.02.030

Castillo, D., Galvez, J. M., Herrera, L. J., Rojas, F., Valenzuela, O., Caba, O., Prados, J., & Rojas, I. (2019). Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level. PLoS One, 14(2). https://doi.org/https://doi.org/10.1371/journal.pone.0212127

Cilia, N. D., Stefano, C. De, Fontanella, F., Raimondo, S., & Freca, A. S. di. (2019). An experimental comparison of feature-selection and classification methods for microarray datasets. Information, 10(3), 109. https://doi.org/https://doi.org/10.3390/info10030109

El Kafrawy, P., Fathi, H., Qaraad, M., Kelany, A. K., & Chen, X. (2021). An efficient SVM-based feature selection model for cancer classification using high-dimensional microarray data. IEEE Access, 9, 155353–155369. https://doi.org/10.1109/ACCESS.2021.3123090

Gálvez, J. M., Castillo-Secilla, D., Herrera, L. J., Valenzuela, O., Caba, O., Prados, J. C., Ortuño, F. M., & Rojas, I. (2020). Towards improving skin cancer diagnosis by integrating microarray and RNA-Seq Datasets. IEEE Journal of Biomedical and Health Informatics, 24(7), 2119–2130. https://doi.org/10.1109/JBHI.2019.2953978

Giordano, M., Tripathi, K. P., & Guarracino, M. R. (2018). Ensemble of rankers for efficient gene signature extraction in smoke exposure classification. BMC Bioinformatics, 19(48). https://doi.org/https://doi.org/10.1186/s12859-018-2035-3

Hamim, M., Mouden, I. El, Ouzir, M., Moutachaouik, H., & Hain, M. (2021). A novel dimensionality reduction approach to improve microarray data classification. IIUM Engineering Journal, 22(1). https://doi.org/https://doi.org/10.31436/iiumej.v22i1.1447

Hamraz, M., Gul, N., Raza, M., Khan, D. M., Khalil, U., Zubair, S., & Khan, Z. (2021). Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Computer Science. https://doi.org/https://doi.org/10.7717/peerj-cs.562

Hengpraprohm, S., & Jungjit, S. (2020). Ensemble feature selection for breast cancer classification using microarray data. Intelegencia Artificial, 23(65), 100–114. https://doi.org/https://doi.org/10.4114/intartif.vol23iss65pp100-114

Hilal, A. M., Malibari, A. A., Obayya, M., Alzahrani, J. S., Alamgeer, M., Mohamed, A., Motwakel, A., Yaseen, I., Hamza, M. A., & Zamani, A. S. (2022). Feature subset selection with optimal adaptive neuro-fuzzy systems for bioinformatics gene expression classification. Computational Intelligence and Neuroscience. https://doi.org/https://doi.org/10.1155/2022/1698137

Iochins Grisci, B., Cesar Feltes, B., & Dorn, M. (2019). Neuroevolution as a tool for microarray gene expression pattern identification in cancer research. Journal of Biomedical Informatics, 89, 122–133. https://doi.org/https://doi.org/10.1016/j.jbi.2018.11.013

Ke, W., Wu, C., Wu, Y., & Xiong, N. N. (2018). A New filter feature selection based on criteria fusion for gene microarray data. IEEE Access, 6, 61065–61076. https://doi.org/10.1109/ACCESS.2018.2873634

Khan, Z., Naeem, M., Khalil, U., Khan, D. M., Aldahmani, S., & Hamraz, M. (2019). Feature selection for binary classification within functional genomics experiments via interquartile range and clustering. IEEE Access, 7, 78159–78169. https://doi.org/10.1109/ACCESS.2019.2922432

Lee, J., Choi, I. Y., & Jun, C.-H. (2021). An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data. Expert Systems with Applications, 166, 113971. https://doi.org/10.1016/j.eswa.2020.113971

Liu, X.-Y., Liang, Y., Wang, S., Yang, Z.-Y., & Ye, H.-S. (2018). A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access, 6, 22863–22874. https://doi.org/10.1109/ACCESS.2018.2818682

Lu, L., Townsend, K. A., & Jr., B. J. D. (2021). GEOlimma: Differential expression analysis and feature selection using pre-existing microarray data. BMC Bioinformatics, 22(44). https://doi.org/https://doi.org/10.1186/s12859-020-03932-5

Luo, K., Wang, G., Li, Q., & Tao, J. (2019). An improved SVM-RFE based on $F$ -Statistic and mPDC for gene selection in cancer classification. IEEE Access, 7, 147617–147628. https://doi.org/10.1109/ACCESS.2019.2946653

Manita, G., & Korbaa, O. (2020). Binary political optimizer for feature selection using gene expression data. Computational Intelligence and Neuroscience. https://doi.org/https://doi.org/10.1155/2020/8896570

Mazumder, D. H., & Veilumuthu, R. (2019). An enhanced feature selection filter for classification of microarray cancer data. ETRI Journal, 41(3), Ramachandran Veilumuthu. https://doi.org/https://doi-org.ezaccess.library.uitm.edu.my/10.4218/etrij.2018-0522

Momenzadeh, M., Sehhati, M., & Rabbani, H. (2019). A novel feature selection method for microarray data classification based on hidden Markov model. Journal of Biomedical Informatics, 95, 103213. https://doi.org/https://doi.org/10.1016/j.jbi.2019.103213

Murugesan, V., & Balamurugan, P. (2023). Breast cancer classification by gene expression analysis using hybrid feature selection and hyper-heuristic adaptive universum support vector machine. International Journal of Electrical and Computer Engineering Systems, 14(3). https://doi.org/10.32985/IJECES.14.3.1

Noh, S. S. M., Ibrahim, N., Mansor, M. M., & Yusoff, M. (2023). Hybrid filtering methods for feature selection in high-dimensional cancer data. International Journal of Electrical and Computer Engineering, 13(6). https://doi.org/10.11591/ijece.v13i6.pp6862-6871

Othman, M. S., Raja Kumaran, S., & Mi Yusuf, L. (2020). Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access, 8. https://doi.org/10.1109/ACCESS.2020.3029890

Parhi, P., Bisoi, R., & Dash, P. K. (2022). Influential gene selection from high-dimensional genomic data using a bio-inspired algorithm wrapped broad learning system. IEEE Access, 10, 49219–49232. https://doi.org/10.1109/ACCESS.2022.3170038

Prabhakar, S. K., & Lee, S.-W. (2020). An integrated approach for ovarian cancer classification with the application of stochastic optimization. IEEE Access, 8, 127866–127882. https://doi.org/10.1109/ACCESS.2020.3006154

Prabhakar, S. K., & Lee, S.-W. (2022). Transformation based tri-level feature selection approach using wavelets and swarm computing for prostate cancer classification. IEEE Access, 8, 127462–127476. https://doi.org/10.1109/ACCESS.2020.3006197

Prabhakar, S. K., Rajaguru, H., & Won, D.-O. (2021). A holistic performance comparison for lung cancer classification using swarm intelligence techniques. Journal of Healthcare Engineering. https://doi.org/https://doi.org/10.1155/2021/6680424

Qaraad, M., Amjad, S., Manhrawy, I. I. M., Fathi, H., Hassan, B. A., & Kafrawy, P. El. (2021). A Hybrid Feature selection optimization model for high dimension data classification. IEEE Access, 9, 42884–42895. https://doi.org/10.1109/ACCESS.2021.3065341

Qasem, S. N., & Saeed, F. (2021). Hybrid feature selection and ensemble learning methods for gene selection and cancer classification. International Journal of Advanced Computer Science and Applications, 12(2). https://doi.org/10.14569/IJACSA.2021.0120225

Ramasamy, P., & Kandhasamy, P. (2018). Effect of intuitionistic fuzzy normalization in microarray gene selection. Turkish Journal of Electrical Engineering and Computer Sciences, 6(3), 1141–1152. https://doi.org/10.3906/elk-1708-105

Roffo, G., Melzi, S., Castellani, U., Vinciarelli, A., & Cristani, M. (2021). Infinite feature selection: A graph-based feature filtering approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4396–4410. https://doi.org/10.1109/TPAMI.2020.3002843

Rostami, M., Forouzandeh, S., Berahmand, K., Soltani, M., Shahsavari, M., & Oussalah, M. (2022). Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artificial Intelligence in Medicine, 123. https://doi.org/https://doi.org/10.1016/j.artmed.2021.102228

Şahín, C. B., & Dírí, B. (2019). Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system. IEEE Access, 7, 24165–24178. https://doi.org/10.1109/ACCESS.2019.2900118

Sharifai, A. G., & Zainol, Z. B. (2021). Multiple filter-based rankers to guide hybrid grasshopper optimization algorithm and simulated annealing for feature selection with high dimensional multi-class imbalanced datasets. IEEE Access, 9, 74127–74142. https://doi.org/10.1109/ACCESS.2021.3081366

Shibata, M., Okamura, K., Yura, K., & Umezawa, A. (2020). High-precision multiclass cell classification by supervised machine learning on lectin microarray data. Regenerative Therapy, 15, 195–201. https://doi.org/https://doi.org/10.1016/j.reth.2020.09.005

Song, S., Chen, X., Tang, Z., & Todo, Y. (2021). A two-stage method based on multiobjective differential evolution for gene selection. Computational Intelligence and Neuroscience. https://doi.org/https://doi.org/10.1155/2021/5227377

Sun, L., Zhang, X., Xu, J., Wang, W., & Liu, R. (2018). A gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered, 9(1), 144–151. https://doi.org/https://doi-org.ezaccess.library.uitm.edu.my/10.1080/21655979.2017.1403678

Tripathy, J., Dash, R., Pattanayak, B. K., Mishra, S. K., Mishra, T. K., & Puthal, D. (2022). Combination of reduction detection using TOPSIS for gene expression data analysis. Big Data and Cognitive Computing, 6(1), 24. https://doi.org/https://doi.org/10.3390/bdcc6010024

World Health Organization. (2022). Cancer. Retrieved from, https://www.who.int/news-room/fact-sheets/detail/cancer

Xu, J., Mu, H., Wang, Y., & Huang, F. (2018). Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Computational and Mathematical Methods in Medicine. https://doi.org/https://doi.org/10.1155/2018/5490513

Yang, Z.-Y., Liang, Y., Zhang, H., Chai, H., Zhang, B., & Peng, C. (2018). Robust sparse logistic regression with the (0 < q < 1) regularization for feature selection using gene expression data. IEEE Access, 6, 68586–68595. https://doi.org/10.1109/ACCESS.2018.2880198

Yu, K., Huang, M., Chen, S., Feng, C., & Li, W. (2022). GSEnet: Feature extraction of gene expression data and its application to Leukemia classification. Mathematical Biosciences and Engineering, 19(5), 4881–4891. https://doi.org/10.3934/mbe.2022228

Yu, K., Xie, W., Wang, L., & Li, W. (2021). ILRC: A hybrid biomarker discovery algorithm based on improved L1 regularization and clustering in microarray data. BMC Bioinformatics, 22(514). https://doi.org/https://doi.org/10.1186/s12859-021-04443-7

Yuan, L., Sun, Y., & Huang, G. (2020). Using class-specific feature selection for cancer detection with gene expression profile data of platelets. Sensors, 20(5). https://doi.org/https://doi.org/10.3390/s20051528

Zhang, D., Zou, L., Zhou, X., & He, F. (2018). Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer. IEEE Access, 6, 28936–28944. https://doi.org/10.1109/ACCESS.2018.2837654

Zhao, D., Liu, H., Zheng, Y., He, Y., Lu, D., & Lyu, C. (2019). Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. Journal of Biomedical Informatics, 92, 103124. https://doi.org/https://doi.org/10.1016/j.jbi.2019.103124

Zheng, D., Ding, Y., Ma, Q., Zhao, L., Guo, X., Shen, Y., He, Y., Wei, W., & Liu, F. (2019). Identification of serum MicroRNAs as novel biomarkers in esophageal squamous cell carcinoma using feature selection algorithms. Frontiers in Oncology. https://doi.org/https://doi.org/10.3389/fonc.2018.00674




How to Cite

Syed Hasan, S. N., & Jamil, N. W. (2024). A Review Study of Microarray Data Classification with the Application of Dimension Reduction. Journal of Computing Research and Innovation, 9(1), 235–256. Retrieved from https://jcrinn.com/index.php/jcrinn/article/view/424



General Computing