Bibliometric Analysis of Research on Firth Penalized Logistic Regression in Addressing Complete Separation

Bibliometric Analysis of Research on Firth Penalized Logistic Regression in Addressing Complete Separation

Authors

  • Nurul Husna Jamian Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Perak Branch, Tapah Campus, 35400 Tapah Road, Perak, Malaysia.
  • Ahmad Zia Ul-Saufie Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), 40450 Shah Alam, Malaysia.
  • Mohammad Nasir Abdullah Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Perak Branch, Tapah Campus, 35400 Tapah Road, Perak, Malaysia.

Keywords:

Firth Penalized Logistic Regression, Complete Separation, Logistic Regression, Bibliometric Analysis

Abstract

 

Complete separation in logistic regression leads to infinite estimates which prevents reliable inference. Firth's penalized likelihood method has emerged as a widely accepted and reliable solution that provides finite and more stable estimates. Despite its growing relevance, a thorough understanding of the global research on this topic remains limited. This study conducts a bibliometric analysis of trends related to complete separation in logistic regression using Firth penalized regression. Bibliographic data were retrieved from the Scopus database and analysed using Microsoft Excel and VOSviewer software. After applying inclusion criteria, nine journal articles published between 2012 and 2024 were identified through a structured search conducted on February 22, 2025. The findings reveal a small but growing body of literature, reflecting the emerging status of research on complete separation in logistic regression using Firth penalized regression. The results show an upward trend in publications, particularly from 2019 onward with the United States and Malaysia identified as the most productive countries. Influential articles contributed to methodological development and applications in health and transportation research. Keyword co-occurrence analysis identified thematic clusters in human studies, statistical modelling, and estimation techniques. These findings provide an overview of publication trends, collaboration networks, and research gaps which could support future methodological and multidisciplinary integration of Firth penalized regression.

Downloads

Download data is not yet available.

References

Abdullah, M. N., Wah, Y. B., Majeed, A. B. A., Zakaria, Y., & Shaadan, N. (2022a). Identification of blood-based multi-omics biomarkers for Alzheimer’s disease using Firth’s logistic regression. Pertanika Journal of Science & Technology, 30(2), 1197–1218. https://doi.org/10.47836/pjst.30.2.19

Abdullah, M. N., Wah, Y. B., Zakaria, Y., Majeed, A. B. A., & Huat, O. S. (2022b). Discovering potential blood-based cytokine biomarkers for Alzheimer’s disease using Firth logistic regression. Epidemiology Biostatistics and Public Health, 16(4). https://doi.org/10.2427/13173

Alam, T. F., Rahman, M. S., & Bari, W. (2022). On estimation for accelerated failure time models with small or rare event survival data. BMC Medical Research Methodology, 22, 169. https://doi.org/10.1186/s12874-022-01638-1

Allison, P. D. (2008). Convergence failures in logistic regression. In SAS Global Forum 2008 (Vol. 360, No. 1, p. 11). http://www2.sas.com/proceedings/forum2008/360-2008.pdf

Baas, J., Schotten, M., Plume, A., Côté, G., & Karimi, R. (2020). Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quantitative Science Studies, 1(1), 377–386. https://doi.org/10.1162/qss_a_00019

Blondel, V., Guillaume, J. L., & Lambiotte, R. (2024). Fast unfolding of communities in large networks: 15 years later. Journal of Statistical Mechanics Theory and Experiment, 2024, 10R001. https://doi.org/10.1088/1742-5468/ad6139

Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008

Botes, M., & Fletcher, L. (2014, January). Comparing logistic regression methods for a sparse data set when complete separation is present. In Annual Proceedings of the South African Statistical Association Conference (Vol. 2014, No. con-1, pp. 1-8). South African Statistical Association (SASA).

Clark, R. G., Blanchard, W., Hui, F. K. C., Tian, R., & Woods, H. (2023). Dealing with complete separation and quasi-complete separation in logistic regression for linguistic data. Research Methods in Applied Linguistics, 2(1), 100044. https://doi.org/10.1016/j.rmal.2023.100044

Choi, L., Carroll, R. J., Beck, C., Mosley, J. D., Roden, D. M., Denny, J. C., & Van Driest, S. L. (2018). Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects. Bioinformatics, 34(17), 2988–2996. https://doi.org/10.1093/bioinformatics/bty306

D’Angelo, G., & Ran, D. (2024). Tutorial on Firth’s logistic regression models for biomarkers in preclinical space. Pharmaceutical Statistics, 24(1), e2422. https://doi.org/10.1002/pst.2422

De Oliveira, O. J., Da Silva, F. F., Juliani, F., Barbosa, L. C. F. M., & Nunhes, T. V. (2019). Bibliometric method for mapping the state-of-the-art and identifying research gaps and trends in literature: An essential instrument to support the development of scientific projects. IntechOpen. https://doi.org/10.5772/intechopen.85856

Dobrescu, A., Nussbaumer-Streit, B., Klerings, I., Wagner, G., Persad, E., Sommer, I., Herkner, H., & Gartlehner, G. (2021). Restricting evidence syntheses of interventions to English-language publications is a viable methodological shortcut for most medical topics: A systematic review. Journal of Clinical Epidemiology, 137, 209–217. https://doi.org/10.1016/j.jclinepi.2021.04.012

Donner, P. (2020). A validation of coauthorship credit models with empirical data from the contributions of PhD candidates. Quantitative Science Studies, 1(2), 551-564. https://doi.org/10.1162/qss_a_00048

Fijorek, K., & Sokolowski, A. (2012). Separation-resistant and bias-reduced logistic regression: STATISTICA macro. Journal of Statistical Software, Code Snippets, 47(2), 1-12. https://doi.org/10.18637/jss.v047.c02

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38. https://doi.org/10.1093/biomet/80.1.27

Gilbert, N. (2022). Logistic regression (ebook). Taylor & Francis Group.

Gim, T. H. T., & Ko, J. (2016). Maximum likelihood and Firth logistic regression of the pedestrian route choice. International Regional Science Review, 40(6), 616–637. https://doi.org/10.1177/0160017615626214

Harzing, A. W. (2019). Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science? Scientometrics, 120, 341–349. https://doi.org/10.1007/s11192-019-03114-y

Harzing, A. W. (2023). Publish or Perish user’s manual. Harzing.com. https://harzing.com/resources/publish-or-perish/manual

Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21(16), 2409-2419.

Hess, A. S., & Hess, J. R. (2019). Logistic regression. Transfusion, 59(7), 2197–2198. https://doi.org/10.1111/trf.15406

Ilmasari, D., Sahabudin, E., Riyadi, F. A., Abdullah, N., & Yuzir, A. (2022). Future trends and patterns in leachate biological treatment research from a bibliometric perspective. Journal of Environmental Management, 318, 115594. https://doi.org/10.1016/j.jenvman.2022.115594

Iskandar, A., Azis, F., Dewi, R. D. C., Rusli, R., & Ahmar, A. S. (2020). Co-authorship visualization of research on COVID-19 from Web of science data using bibliometric analysis. Library Philosophy and Practice (e-journal), 4528, 1-9. https://digitalcommons.unl.edu/libphilprac/4528

Karabon, P. (2020, March). Rare events or non-convergence with a binary outcome? The power of Firth regression in PROC LOGISTIC. In SAS Global Forum 2020. https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4654-2020.pdf

Kumar, R. (2025). Bibliometric analysis: Comprehensive insights into tools, techniques, applications, and solutions for research excellence. Spectrum of Engineering and Management Sciences, 3(1), 45–62. https://doi.org/10.31181/sems31202535k

Mansournia, M. A., Geroldinger, A., Greenland, S., & Heinze, G. (2017). Separation in logistic regression: Causes, consequences, and control. American Journal of Epidemiology, 187(4), 864–870. https://doi.org/10.1093/aje/kwx299

Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & López-Cózar, E. D. (2021). Google scholar, Microsoft academic, Scopus, Dimensions, Web of science, and open citations’ coci: A multidisciplinary comparison of coverage via citations. Scientometrics, 126, 871–906. https://doi.org/10.1007/s11192-020-03690-4

Nathanson, B. H., & Higgins, T. L. (2008). An introduction to statistical methods used in binary outcome modeling. Seminars in Cardiothoracic and Vascular Anesthesia, 12(3), 153–166. https://doi.org/10.1177/1089253208323415

Noor, H. M., & Asmael, N. M. (2023). A study on interstate freight mode choice between trucks and trains used to transport oil products: A case study of Iraq. Transport Problems, 18(3), 29–40. https://doi.org/10.20858/tp.2023.18.3.03

Nussbaumer-Streit, B., Klerings, I., Dobrescu, A. I., Persad, E., Stevens, A., Garritty, C., Kamel, C., Affengruber, L., King, V. J., & Gartlehner, G. (2020). Excluding non-English publications from evidence-syntheses did not change conclusions: A meta-epidemiological study. Journal of Clinical Epidemiology, 118, 42–54. https://doi.org/10.1016/j.jclinepi.2019.10.011

Olowe, K. J., Edoh, N. L., Zouo, S. J. C., & Olamijuwon, J. (2024). Comprehensive review of logistic regression techniques in predicting health outcomes and trends. World Journal of Advanced Pharmaceutical and Life Sciences, 7(2), 016–026. https://doi.org/10.53346/wjapls.2024.7.2.0039

Perianes-Rodriguez, A., Waltman, L., & Van Eck, N. J. (2016). Constructing bibliometric networks: A comparison between full and fractional counting. Journal of Informetrics, 10(4), 1178–1195. https://doi.org/10.1016/j.joi.2016.10.006

Pozsgai, G., Lövei, G. L., Vasseur, L., Gurr, G., Batáry, P., Korponai, J., Littlewood, N. A., Liu, J., Móra, A., Obrycki, J., Reynolds, O., Stockan, J. A., VanVolkenburg, H., Zhang, J., Zhou, W., & You, M. (2020). A comparative analysis reveals irreproducibility in searches of scientific literature. bioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2020.03.20.997783

Puhr, R., Heinze, G., Nold, M., Lusa, L., & Geroldinger, A. (2017). Firth’s logistic regression with rare events: Accurate effect estimates and predictions? Statistics In Medicine, 36(14), 2302–2317. https://doi.org/10.1002/sim.7273

Rahiminejad, S., Maurya, M. R., & Subramaniam, S. (2019). Topological and functional comparison of community detection algorithms in biological networks. BMC Bioinformatics, 20, 212. https://doi.org/10.1186/s12859-019-2746-0

Rainey, C., & McCaskey, K. (2021). Estimating logit models with small samples. Political Science Research and Methods, 9(3), 549–564. https://doi.org/10.1017/psrm.2021.9

Rojas-Flores, S., Ramirez-Asis, E., Delgado-Caramutti, J., Nazario-Naveda, R., Gallozzo-Cardenas, M., Diaz, F., & Delfin-Narcizo, D. (2023). An analysis of global trends from 1990 to 2022 of microbial fuel cells: A bibliometric analysis. Sustainability, 15(4), 3651. https://doi.org/10.3390/su15043651

Sarudin, E. S., Ariffin, W. N. M., & Jamian, S. S. (2024). Mapping the landscape: A bibliometric analysis of staff scheduling optimization research trends and keywords evolution. International Journal of Research and Innovation in Social Science, 8(8), 358–372. https://doi.org/10.47772/ijriss.2024.808029

Sarudin, E. S., Aziz, W. N. H. W. A., Saleh, S. S. M., & Arsad, R. (2023). An overview of bibliometric indices and keyword classification in shift scheduling. International Journal of Academic Research in Economics and Management Sciences, 12(2), 289-302. https://doi.org/10.6007/ijarems/v12-i2/17317

Šinkovec, H., Geroldinger, A., & Heinze, G. (2019). Bring more data! —A good advice? Removing separation in logistic regression by increasing sample size. International Journal of Environmental Research and Public Health, 16(23), 4658. https://doi.org/10.3390/ijerph16234658

Stolte, M., Herbrandt, S., & Ligges, U. (2024). A comprehensive review of bias reduction methods for logistic regression. Statistics Surveys, 18, 139-162. https://doi.org/10.1214/24-ss148

Suhas, S., Manjunatha, N., Kumar, C. N., Benegal, V., Rao, G. N., Varghese, M., & Gururaj, G. (2023). Firth’s penalized logistic regression: A superior approach for analysis of data from India’s national mental health survey, 2016. Indian Journal of Psychiatry, 65(12), 1208–1213. https://doi.org/10.4103/indianjpsychiatry.indianjpsychiatry_827_23

Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics, 111, 1053–1070. https://doi.org/10.1007/s11192-017-2300-7

Van Eck, N. J., & Waltman, L. (2023). VOSviewer manual (Version 1.6.19). Centre for Science and Technology Studies, Leiden University. https://www.vosviewer.com/documentation/Manual_VOSviewer_1.6.19.pdf

Walker, D. A., & Smith, T. J. (2019). Logistic regression under sparse data conditions. Journal of Modern Applied Statistical Methods, 18(2), eP3372. https://doi.org/10.22237/jmasm/1604190660

Wun, M. K., Padula, A. M., Greer, R. M., & Leister, E. M. (2022). A review of 91 canine and feline red‐bellied black snake (pseudechis porphyriacus) envenomation cases and lessons for improved management. Australian Veterinary Journal, 100(7), 318–328. https://doi.org/10.1111/avj.13159

Yaman, A., Yoganingrum, A., Yaniasih, Y., & Riyanto, S. (2019). Tinjauan pustaka sistematis pada basis data pustaka digital: Tren riset, metodologi, dan coverage fields. Jurnal Dokumentasi Dan Informasi, 40(1), 1-20. https://doi.org/10.14203/j.baca.v40i1.481

Zorn, C. (2005). A solution to separation in binary response models. Political Analysis, 13(2), 157-170. https://doi.org/10.1093/pan/mpi009

Downloads

Published

2025-09-01

How to Cite

Jamian, N. H., Ahmad Zia Ul-Saufie, & Abdullah, M. N. (2025). Bibliometric Analysis of Research on Firth Penalized Logistic Regression in Addressing Complete Separation. Journal of Computing Research and Innovation, 10(2), 266–280. Retrieved from https://jcrinn.com/index.php/jcrinn/article/view/535

Issue

Section

General Computing

Most read articles by the same author(s)

Loading...