A Review Study of Microarray Data Classification with the Application of Dimension Reduction
Keywords:
Microarray, Classification, Dimensionality Reduction, Feature Extraction, Feature SelectionAbstract
Background. The growth of gene expression or microarray data, mainly in cancer disease, has become a game changer for feature selection techniques in handling complex data. Hence, the advancement of Deoxyribonucleic acid (DNA) microarray technology has made it feasible to measure the expression level of thousands of genes with the ability to diagnose early detection. This extensive study is conducted to review and analyse literature related to applying various dimensionality reduction approaches to predict microarray data. This study is aimed for the Data Science and Medical Sciences disciplines with the goal of extending future research and broader interdisciplinary collaboration efforts.
Methods. The systematic review of this study is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines and reported in accordance with the PRISMA statement. Other than that, a systematic search is conducted using two search engines including, Scopus and Web of Science (WoS), from 2018 to 2022 by inputting the "feature extraction," "feature selection," "classification," and "microarray" as keywords. Based on the inclusion and exclusion criteria, the final articles available for review are 53 articles. Specifically, this study reports on the performance of feature selection approaches and the empirical comparisons of classification techniques used on the microarray dataset.
Results. According to the analysis, part of the included articles is mostly hybrid and novel approaches proposed for gene selection. Many novel and hybrid methods were developed to produce a good performance in terms of accuracy and computational efficiency. Moreover, the hybrid methods are proven effective in reducing dimensions and selecting relevant features. Besides, machine learning techniques are still the top interest among researchers for classification despite the emergence of deep learning approaches.
Downloads
References
Al-Rajab, M., Lu, J., Xu, Q., Kentour, M., Sawsa, A., Shuweikeh, E., Joy, M., & Arasaradnam, R. (2023). A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset. PLOS ONE, 18(11), e0286791. https://doi.org/10.1371/journal.pone.0286791
Almarzouki, H. Z. (2021). Deep-learning-based cancer profiles classification using gene expression data profile. Journal of Healthcare Engineering, 2022. https://doi.org/https://doi.org/10.1155/2022/4715998
Almugren, N., & Alshamlan, H. M. (2019). New bio-marker gene discovery algorithms for cancer gene expression profile. IEEE Access, 7, 136907–136913. https://doi.org/10.1109/ACCESS.2019.2942413
Alshamlan, H. M. (2021). An effective filter method towards the performance improvement of FF-SVM algorithm. IEEE Access, 9, 140835–140840. https://doi.org/10.1109/ACCESS.2021.3119233
Alshareef, A. M., Alsini, R., Alsieni, M., Alrowais, F., Marzouk, R., Abunadi, I., & Nemri, N. (2022). Optimal deep learning enabled prostate cancer detection using microarray gene expression. Journal of Healthcare Engineering, 2022. https://doi.org/https://doi.org/10.1155/2022/7364704
Angulo, A. P. (2018). Gene selection for microarray cancer data classification by a novel rule-based algorithm. Information, 9(1), 6. https://doi.org/https://doi.org/10.3390/info9010006
Baardwijk, M. van, Cristoferi, I., Ju, J., Varol, H., Minnee, R. C., Reinders, M. E. J., Li, Y., Stubbs, A. P., & Groningen, M. C. C. (2022). A decentralized kidney transplant biopsy classifier for transplant rejection developed using genes of the Banff-Human organ transplant panel. Frontiers in Immunology, 13. https://doi.org/https://doi.org/10.3389/fimmu.2022.841519
Babichev, S., & Škvor, J. (2020). Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics, 10(8). https://doi.org/https://doi.org/10.3390/diagnostics10080584
Bilen, M., H. Isik, A., & Yigit, T. (2020). A new hybrid and ensemble gene selection approach with an enhanced genetic algorithm for classification of microarray gene expression values on leukemia cancer. International Journal of Computational Intelligence Systems, 13(1), 1554–1556. https://doi.org/https://doi.org/10.2991/ijcis.d.200928.001
Brankovic, A., Hosseini, M., & Piroddi, L. (2019). A distributed feature selection algorithm based on distance correlation with an application to microarrays. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(6), 1802–1815. https://doi.org/10.1109/TCBB.2018.2833482
Cao, B., Jianwei Zhao, Yang, P., Yang, P., Liu, X., Qi, J., Simpson, A., Elhoseny, M., Mehmood, I., & Muhammad, K. (2019). Multiobjective feature selection for microarray data via distributed parallel algorithms. Future Generation Computer Systems, 100, 952–981. https://doi.org/https://doi.org/10.1016/j.future.2019.02.030
Castillo, D., Galvez, J. M., Herrera, L. J., Rojas, F., Valenzuela, O., Caba, O., Prados, J., & Rojas, I. (2019). Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level. PLoS One, 14(2). https://doi.org/https://doi.org/10.1371/journal.pone.0212127
Cilia, N. D., Stefano, C. De, Fontanella, F., Raimondo, S., & Freca, A. S. di. (2019). An experimental comparison of feature-selection and classification methods for microarray datasets. Information, 10(3), 109. https://doi.org/https://doi.org/10.3390/info10030109
El Kafrawy, P., Fathi, H., Qaraad, M., Kelany, A. K., & Chen, X. (2021). An efficient SVM-based feature selection model for cancer classification using high-dimensional microarray data. IEEE Access, 9, 155353–155369. https://doi.org/10.1109/ACCESS.2021.3123090
Gálvez, J. M., Castillo-Secilla, D., Herrera, L. J., Valenzuela, O., Caba, O., Prados, J. C., Ortuño, F. M., & Rojas, I. (2020). Towards improving skin cancer diagnosis by integrating microarray and RNA-Seq Datasets. IEEE Journal of Biomedical and Health Informatics, 24(7), 2119–2130. https://doi.org/10.1109/JBHI.2019.2953978
Giordano, M., Tripathi, K. P., & Guarracino, M. R. (2018). Ensemble of rankers for efficient gene signature extraction in smoke exposure classification. BMC Bioinformatics, 19(48). https://doi.org/https://doi.org/10.1186/s12859-018-2035-3
Hamim, M., Mouden, I. El, Ouzir, M., Moutachaouik, H., & Hain, M. (2021). A novel dimensionality reduction approach to improve microarray data classification. IIUM Engineering Journal, 22(1). https://doi.org/https://doi.org/10.31436/iiumej.v22i1.1447
Hamraz, M., Gul, N., Raza, M., Khan, D. M., Khalil, U., Zubair, S., & Khan, Z. (2021). Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Computer Science. https://doi.org/https://doi.org/10.7717/peerj-cs.562
Hengpraprohm, S., & Jungjit, S. (2020). Ensemble feature selection for breast cancer classification using microarray data. Intelegencia Artificial, 23(65), 100–114. https://doi.org/https://doi.org/10.4114/intartif.vol23iss65pp100-114
Hilal, A. M., Malibari, A. A., Obayya, M., Alzahrani, J. S., Alamgeer, M., Mohamed, A., Motwakel, A., Yaseen, I., Hamza, M. A., & Zamani, A. S. (2022). Feature subset selection with optimal adaptive neuro-fuzzy systems for bioinformatics gene expression classification. Computational Intelligence and Neuroscience. https://doi.org/https://doi.org/10.1155/2022/1698137
Iochins Grisci, B., Cesar Feltes, B., & Dorn, M. (2019). Neuroevolution as a tool for microarray gene expression pattern identification in cancer research. Journal of Biomedical Informatics, 89, 122–133. https://doi.org/https://doi.org/10.1016/j.jbi.2018.11.013
Ke, W., Wu, C., Wu, Y., & Xiong, N. N. (2018). A New filter feature selection based on criteria fusion for gene microarray data. IEEE Access, 6, 61065–61076. https://doi.org/10.1109/ACCESS.2018.2873634
Khan, Z., Naeem, M., Khalil, U., Khan, D. M., Aldahmani, S., & Hamraz, M. (2019). Feature selection for binary classification within functional genomics experiments via interquartile range and clustering. IEEE Access, 7, 78159–78169. https://doi.org/10.1109/ACCESS.2019.2922432
Lee, J., Choi, I. Y., & Jun, C.-H. (2021). An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data. Expert Systems with Applications, 166, 113971. https://doi.org/10.1016/j.eswa.2020.113971
Liu, X.-Y., Liang, Y., Wang, S., Yang, Z.-Y., & Ye, H.-S. (2018). A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access, 6, 22863–22874. https://doi.org/10.1109/ACCESS.2018.2818682
Lu, L., Townsend, K. A., & Jr., B. J. D. (2021). GEOlimma: Differential expression analysis and feature selection using pre-existing microarray data. BMC Bioinformatics, 22(44). https://doi.org/https://doi.org/10.1186/s12859-020-03932-5
Luo, K., Wang, G., Li, Q., & Tao, J. (2019). An improved SVM-RFE based on $F$ -Statistic and mPDC for gene selection in cancer classification. IEEE Access, 7, 147617–147628. https://doi.org/10.1109/ACCESS.2019.2946653
Manita, G., & Korbaa, O. (2020). Binary political optimizer for feature selection using gene expression data. Computational Intelligence and Neuroscience. https://doi.org/https://doi.org/10.1155/2020/8896570
Mazumder, D. H., & Veilumuthu, R. (2019). An enhanced feature selection filter for classification of microarray cancer data. ETRI Journal, 41(3), Ramachandran Veilumuthu. https://doi.org/https://doi-org.ezaccess.library.uitm.edu.my/10.4218/etrij.2018-0522
Momenzadeh, M., Sehhati, M., & Rabbani, H. (2019). A novel feature selection method for microarray data classification based on hidden Markov model. Journal of Biomedical Informatics, 95, 103213. https://doi.org/https://doi.org/10.1016/j.jbi.2019.103213
Murugesan, V., & Balamurugan, P. (2023). Breast cancer classification by gene expression analysis using hybrid feature selection and hyper-heuristic adaptive universum support vector machine. International Journal of Electrical and Computer Engineering Systems, 14(3). https://doi.org/10.32985/IJECES.14.3.1
Noh, S. S. M., Ibrahim, N., Mansor, M. M., & Yusoff, M. (2023). Hybrid filtering methods for feature selection in high-dimensional cancer data. International Journal of Electrical and Computer Engineering, 13(6). https://doi.org/10.11591/ijece.v13i6.pp6862-6871
Othman, M. S., Raja Kumaran, S., & Mi Yusuf, L. (2020). Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access, 8. https://doi.org/10.1109/ACCESS.2020.3029890
Parhi, P., Bisoi, R., & Dash, P. K. (2022). Influential gene selection from high-dimensional genomic data using a bio-inspired algorithm wrapped broad learning system. IEEE Access, 10, 49219–49232. https://doi.org/10.1109/ACCESS.2022.3170038
Prabhakar, S. K., & Lee, S.-W. (2020). An integrated approach for ovarian cancer classification with the application of stochastic optimization. IEEE Access, 8, 127866–127882. https://doi.org/10.1109/ACCESS.2020.3006154
Prabhakar, S. K., & Lee, S.-W. (2022). Transformation based tri-level feature selection approach using wavelets and swarm computing for prostate cancer classification. IEEE Access, 8, 127462–127476. https://doi.org/10.1109/ACCESS.2020.3006197
Prabhakar, S. K., Rajaguru, H., & Won, D.-O. (2021). A holistic performance comparison for lung cancer classification using swarm intelligence techniques. Journal of Healthcare Engineering. https://doi.org/https://doi.org/10.1155/2021/6680424
Qaraad, M., Amjad, S., Manhrawy, I. I. M., Fathi, H., Hassan, B. A., & Kafrawy, P. El. (2021). A Hybrid Feature selection optimization model for high dimension data classification. IEEE Access, 9, 42884–42895. https://doi.org/10.1109/ACCESS.2021.3065341
Qasem, S. N., & Saeed, F. (2021). Hybrid feature selection and ensemble learning methods for gene selection and cancer classification. International Journal of Advanced Computer Science and Applications, 12(2). https://doi.org/10.14569/IJACSA.2021.0120225
Ramasamy, P., & Kandhasamy, P. (2018). Effect of intuitionistic fuzzy normalization in microarray gene selection. Turkish Journal of Electrical Engineering and Computer Sciences, 6(3), 1141–1152. https://doi.org/10.3906/elk-1708-105
Roffo, G., Melzi, S., Castellani, U., Vinciarelli, A., & Cristani, M. (2021). Infinite feature selection: A graph-based feature filtering approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4396–4410. https://doi.org/10.1109/TPAMI.2020.3002843
Rostami, M., Forouzandeh, S., Berahmand, K., Soltani, M., Shahsavari, M., & Oussalah, M. (2022). Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artificial Intelligence in Medicine, 123. https://doi.org/https://doi.org/10.1016/j.artmed.2021.102228
Şahín, C. B., & Dírí, B. (2019). Robust feature selection with LSTM recurrent neural networks for artificial immune recognition system. IEEE Access, 7, 24165–24178. https://doi.org/10.1109/ACCESS.2019.2900118
Sharifai, A. G., & Zainol, Z. B. (2021). Multiple filter-based rankers to guide hybrid grasshopper optimization algorithm and simulated annealing for feature selection with high dimensional multi-class imbalanced datasets. IEEE Access, 9, 74127–74142. https://doi.org/10.1109/ACCESS.2021.3081366
Shibata, M., Okamura, K., Yura, K., & Umezawa, A. (2020). High-precision multiclass cell classification by supervised machine learning on lectin microarray data. Regenerative Therapy, 15, 195–201. https://doi.org/https://doi.org/10.1016/j.reth.2020.09.005
Song, S., Chen, X., Tang, Z., & Todo, Y. (2021). A two-stage method based on multiobjective differential evolution for gene selection. Computational Intelligence and Neuroscience. https://doi.org/https://doi.org/10.1155/2021/5227377
Sun, L., Zhang, X., Xu, J., Wang, W., & Liu, R. (2018). A gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered, 9(1), 144–151. https://doi.org/https://doi-org.ezaccess.library.uitm.edu.my/10.1080/21655979.2017.1403678
Tripathy, J., Dash, R., Pattanayak, B. K., Mishra, S. K., Mishra, T. K., & Puthal, D. (2022). Combination of reduction detection using TOPSIS for gene expression data analysis. Big Data and Cognitive Computing, 6(1), 24. https://doi.org/https://doi.org/10.3390/bdcc6010024
World Health Organization. (2022). Cancer. Retrieved from, https://www.who.int/news-room/fact-sheets/detail/cancer
Xu, J., Mu, H., Wang, Y., & Huang, F. (2018). Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Computational and Mathematical Methods in Medicine. https://doi.org/https://doi.org/10.1155/2018/5490513
Yang, Z.-Y., Liang, Y., Zhang, H., Chai, H., Zhang, B., & Peng, C. (2018). Robust sparse logistic regression with the (0 < q < 1) regularization for feature selection using gene expression data. IEEE Access, 6, 68586–68595. https://doi.org/10.1109/ACCESS.2018.2880198
Yu, K., Huang, M., Chen, S., Feng, C., & Li, W. (2022). GSEnet: Feature extraction of gene expression data and its application to Leukemia classification. Mathematical Biosciences and Engineering, 19(5), 4881–4891. https://doi.org/10.3934/mbe.2022228
Yu, K., Xie, W., Wang, L., & Li, W. (2021). ILRC: A hybrid biomarker discovery algorithm based on improved L1 regularization and clustering in microarray data. BMC Bioinformatics, 22(514). https://doi.org/https://doi.org/10.1186/s12859-021-04443-7
Yuan, L., Sun, Y., & Huang, G. (2020). Using class-specific feature selection for cancer detection with gene expression profile data of platelets. Sensors, 20(5). https://doi.org/https://doi.org/10.3390/s20051528
Zhang, D., Zou, L., Zhou, X., & He, F. (2018). Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer. IEEE Access, 6, 28936–28944. https://doi.org/10.1109/ACCESS.2018.2837654
Zhao, D., Liu, H., Zheng, Y., He, Y., Lu, D., & Lyu, C. (2019). Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. Journal of Biomedical Informatics, 92, 103124. https://doi.org/https://doi.org/10.1016/j.jbi.2019.103124
Zheng, D., Ding, Y., Ma, Q., Zhao, L., Guo, X., Shen, Y., He, Y., Wei, W., & Liu, F. (2019). Identification of serum MicroRNAs as novel biomarkers in esophageal squamous cell carcinoma using feature selection algorithms. Frontiers in Oncology. https://doi.org/https://doi.org/10.3389/fonc.2018.00674
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Sharifah Nadia Syed Hasan, Noor Wahida Jamil (Author)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.