Bibliometric Analysis of Research on Firth Penalized Logistic Regression in Addressing Complete Separation
Keywords:
Firth Penalized Logistic Regression, Complete Separation, Logistic Regression, Bibliometric AnalysisAbstract
Complete separation in logistic regression leads to infinite estimates which prevents reliable inference. Firth's penalized likelihood method has emerged as a widely accepted and reliable solution that provides finite and more stable estimates. Despite its growing relevance, a thorough understanding of the global research on this topic remains limited. This study conducts a bibliometric analysis of trends related to complete separation in logistic regression using Firth penalized regression. Bibliographic data were retrieved from the Scopus database and analysed using Microsoft Excel and VOSviewer software. After applying inclusion criteria, nine journal articles published between 2012 and 2024 were identified through a structured search conducted on February 22, 2025. The findings reveal a small but growing body of literature, reflecting the emerging status of research on complete separation in logistic regression using Firth penalized regression. The results show an upward trend in publications, particularly from 2019 onward with the United States and Malaysia identified as the most productive countries. Influential articles contributed to methodological development and applications in health and transportation research. Keyword co-occurrence analysis identified thematic clusters in human studies, statistical modelling, and estimation techniques. These findings provide an overview of publication trends, collaboration networks, and research gaps which could support future methodological and multidisciplinary integration of Firth penalized regression.
Downloads
References
Abdullah, M. N., Wah, Y. B., Majeed, A. B. A., Zakaria, Y., & Shaadan, N. (2022a). Identification of blood-based multi-omics biomarkers for Alzheimer’s disease using Firth’s logistic regression. Pertanika Journal of Science & Technology, 30(2), 1197–1218. https://doi.org/10.47836/pjst.30.2.19
Abdullah, M. N., Wah, Y. B., Zakaria, Y., Majeed, A. B. A., & Huat, O. S. (2022b). Discovering potential blood-based cytokine biomarkers for Alzheimer’s disease using Firth logistic regression. Epidemiology Biostatistics and Public Health, 16(4). https://doi.org/10.2427/13173
Alam, T. F., Rahman, M. S., & Bari, W. (2022). On estimation for accelerated failure time models with small or rare event survival data. BMC Medical Research Methodology, 22, 169. https://doi.org/10.1186/s12874-022-01638-1
Allison, P. D. (2008). Convergence failures in logistic regression. In SAS Global Forum 2008 (Vol. 360, No. 1, p. 11). http://www2.sas.com/proceedings/forum2008/360-2008.pdf
Baas, J., Schotten, M., Plume, A., Côté, G., & Karimi, R. (2020). Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quantitative Science Studies, 1(1), 377–386. https://doi.org/10.1162/qss_a_00019
Blondel, V., Guillaume, J. L., & Lambiotte, R. (2024). Fast unfolding of communities in large networks: 15 years later. Journal of Statistical Mechanics Theory and Experiment, 2024, 10R001. https://doi.org/10.1088/1742-5468/ad6139
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008, P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Botes, M., & Fletcher, L. (2014, January). Comparing logistic regression methods for a sparse data set when complete separation is present. In Annual Proceedings of the South African Statistical Association Conference (Vol. 2014, No. con-1, pp. 1-8). South African Statistical Association (SASA).
Clark, R. G., Blanchard, W., Hui, F. K. C., Tian, R., & Woods, H. (2023). Dealing with complete separation and quasi-complete separation in logistic regression for linguistic data. Research Methods in Applied Linguistics, 2(1), 100044. https://doi.org/10.1016/j.rmal.2023.100044
Choi, L., Carroll, R. J., Beck, C., Mosley, J. D., Roden, D. M., Denny, J. C., & Van Driest, S. L. (2018). Evaluating statistical approaches to leverage large clinical datasets for uncovering therapeutic and adverse medication effects. Bioinformatics, 34(17), 2988–2996. https://doi.org/10.1093/bioinformatics/bty306
D’Angelo, G., & Ran, D. (2024). Tutorial on Firth’s logistic regression models for biomarkers in preclinical space. Pharmaceutical Statistics, 24(1), e2422. https://doi.org/10.1002/pst.2422
De Oliveira, O. J., Da Silva, F. F., Juliani, F., Barbosa, L. C. F. M., & Nunhes, T. V. (2019). Bibliometric method for mapping the state-of-the-art and identifying research gaps and trends in literature: An essential instrument to support the development of scientific projects. IntechOpen. https://doi.org/10.5772/intechopen.85856
Dobrescu, A., Nussbaumer-Streit, B., Klerings, I., Wagner, G., Persad, E., Sommer, I., Herkner, H., & Gartlehner, G. (2021). Restricting evidence syntheses of interventions to English-language publications is a viable methodological shortcut for most medical topics: A systematic review. Journal of Clinical Epidemiology, 137, 209–217. https://doi.org/10.1016/j.jclinepi.2021.04.012
Donner, P. (2020). A validation of coauthorship credit models with empirical data from the contributions of PhD candidates. Quantitative Science Studies, 1(2), 551-564. https://doi.org/10.1162/qss_a_00048
Fijorek, K., & Sokolowski, A. (2012). Separation-resistant and bias-reduced logistic regression: STATISTICA macro. Journal of Statistical Software, Code Snippets, 47(2), 1-12. https://doi.org/10.18637/jss.v047.c02
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38. https://doi.org/10.1093/biomet/80.1.27
Gilbert, N. (2022). Logistic regression (ebook). Taylor & Francis Group.
Gim, T. H. T., & Ko, J. (2016). Maximum likelihood and Firth logistic regression of the pedestrian route choice. International Regional Science Review, 40(6), 616–637. https://doi.org/10.1177/0160017615626214
Harzing, A. W. (2019). Two new kids on the block: How do Crossref and Dimensions compare with Google Scholar, Microsoft Academic, Scopus and the Web of Science? Scientometrics, 120, 341–349. https://doi.org/10.1007/s11192-019-03114-y
Harzing, A. W. (2023). Publish or Perish user’s manual. Harzing.com. https://harzing.com/resources/publish-or-perish/manual
Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21(16), 2409-2419.
Hess, A. S., & Hess, J. R. (2019). Logistic regression. Transfusion, 59(7), 2197–2198. https://doi.org/10.1111/trf.15406
Ilmasari, D., Sahabudin, E., Riyadi, F. A., Abdullah, N., & Yuzir, A. (2022). Future trends and patterns in leachate biological treatment research from a bibliometric perspective. Journal of Environmental Management, 318, 115594. https://doi.org/10.1016/j.jenvman.2022.115594
Iskandar, A., Azis, F., Dewi, R. D. C., Rusli, R., & Ahmar, A. S. (2020). Co-authorship visualization of research on COVID-19 from Web of science data using bibliometric analysis. Library Philosophy and Practice (e-journal), 4528, 1-9. https://digitalcommons.unl.edu/libphilprac/4528
Karabon, P. (2020, March). Rare events or non-convergence with a binary outcome? The power of Firth regression in PROC LOGISTIC. In SAS Global Forum 2020. https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4654-2020.pdf
Kumar, R. (2025). Bibliometric analysis: Comprehensive insights into tools, techniques, applications, and solutions for research excellence. Spectrum of Engineering and Management Sciences, 3(1), 45–62. https://doi.org/10.31181/sems31202535k
Mansournia, M. A., Geroldinger, A., Greenland, S., & Heinze, G. (2017). Separation in logistic regression: Causes, consequences, and control. American Journal of Epidemiology, 187(4), 864–870. https://doi.org/10.1093/aje/kwx299
Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & López-Cózar, E. D. (2021). Google scholar, Microsoft academic, Scopus, Dimensions, Web of science, and open citations’ coci: A multidisciplinary comparison of coverage via citations. Scientometrics, 126, 871–906. https://doi.org/10.1007/s11192-020-03690-4
Nathanson, B. H., & Higgins, T. L. (2008). An introduction to statistical methods used in binary outcome modeling. Seminars in Cardiothoracic and Vascular Anesthesia, 12(3), 153–166. https://doi.org/10.1177/1089253208323415
Noor, H. M., & Asmael, N. M. (2023). A study on interstate freight mode choice between trucks and trains used to transport oil products: A case study of Iraq. Transport Problems, 18(3), 29–40. https://doi.org/10.20858/tp.2023.18.3.03
Nussbaumer-Streit, B., Klerings, I., Dobrescu, A. I., Persad, E., Stevens, A., Garritty, C., Kamel, C., Affengruber, L., King, V. J., & Gartlehner, G. (2020). Excluding non-English publications from evidence-syntheses did not change conclusions: A meta-epidemiological study. Journal of Clinical Epidemiology, 118, 42–54. https://doi.org/10.1016/j.jclinepi.2019.10.011
Olowe, K. J., Edoh, N. L., Zouo, S. J. C., & Olamijuwon, J. (2024). Comprehensive review of logistic regression techniques in predicting health outcomes and trends. World Journal of Advanced Pharmaceutical and Life Sciences, 7(2), 016–026. https://doi.org/10.53346/wjapls.2024.7.2.0039
Perianes-Rodriguez, A., Waltman, L., & Van Eck, N. J. (2016). Constructing bibliometric networks: A comparison between full and fractional counting. Journal of Informetrics, 10(4), 1178–1195. https://doi.org/10.1016/j.joi.2016.10.006
Pozsgai, G., Lövei, G. L., Vasseur, L., Gurr, G., Batáry, P., Korponai, J., Littlewood, N. A., Liu, J., Móra, A., Obrycki, J., Reynolds, O., Stockan, J. A., VanVolkenburg, H., Zhang, J., Zhou, W., & You, M. (2020). A comparative analysis reveals irreproducibility in searches of scientific literature. bioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2020.03.20.997783
Puhr, R., Heinze, G., Nold, M., Lusa, L., & Geroldinger, A. (2017). Firth’s logistic regression with rare events: Accurate effect estimates and predictions? Statistics In Medicine, 36(14), 2302–2317. https://doi.org/10.1002/sim.7273
Rahiminejad, S., Maurya, M. R., & Subramaniam, S. (2019). Topological and functional comparison of community detection algorithms in biological networks. BMC Bioinformatics, 20, 212. https://doi.org/10.1186/s12859-019-2746-0
Rainey, C., & McCaskey, K. (2021). Estimating logit models with small samples. Political Science Research and Methods, 9(3), 549–564. https://doi.org/10.1017/psrm.2021.9
Rojas-Flores, S., Ramirez-Asis, E., Delgado-Caramutti, J., Nazario-Naveda, R., Gallozzo-Cardenas, M., Diaz, F., & Delfin-Narcizo, D. (2023). An analysis of global trends from 1990 to 2022 of microbial fuel cells: A bibliometric analysis. Sustainability, 15(4), 3651. https://doi.org/10.3390/su15043651
Sarudin, E. S., Ariffin, W. N. M., & Jamian, S. S. (2024). Mapping the landscape: A bibliometric analysis of staff scheduling optimization research trends and keywords evolution. International Journal of Research and Innovation in Social Science, 8(8), 358–372. https://doi.org/10.47772/ijriss.2024.808029
Sarudin, E. S., Aziz, W. N. H. W. A., Saleh, S. S. M., & Arsad, R. (2023). An overview of bibliometric indices and keyword classification in shift scheduling. International Journal of Academic Research in Economics and Management Sciences, 12(2), 289-302. https://doi.org/10.6007/ijarems/v12-i2/17317
Šinkovec, H., Geroldinger, A., & Heinze, G. (2019). Bring more data! —A good advice? Removing separation in logistic regression by increasing sample size. International Journal of Environmental Research and Public Health, 16(23), 4658. https://doi.org/10.3390/ijerph16234658
Stolte, M., Herbrandt, S., & Ligges, U. (2024). A comprehensive review of bias reduction methods for logistic regression. Statistics Surveys, 18, 139-162. https://doi.org/10.1214/24-ss148
Suhas, S., Manjunatha, N., Kumar, C. N., Benegal, V., Rao, G. N., Varghese, M., & Gururaj, G. (2023). Firth’s penalized logistic regression: A superior approach for analysis of data from India’s national mental health survey, 2016. Indian Journal of Psychiatry, 65(12), 1208–1213. https://doi.org/10.4103/indianjpsychiatry.indianjpsychiatry_827_23
Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics, 111, 1053–1070. https://doi.org/10.1007/s11192-017-2300-7
Van Eck, N. J., & Waltman, L. (2023). VOSviewer manual (Version 1.6.19). Centre for Science and Technology Studies, Leiden University. https://www.vosviewer.com/documentation/Manual_VOSviewer_1.6.19.pdf
Walker, D. A., & Smith, T. J. (2019). Logistic regression under sparse data conditions. Journal of Modern Applied Statistical Methods, 18(2), eP3372. https://doi.org/10.22237/jmasm/1604190660
Wun, M. K., Padula, A. M., Greer, R. M., & Leister, E. M. (2022). A review of 91 canine and feline red‐bellied black snake (pseudechis porphyriacus) envenomation cases and lessons for improved management. Australian Veterinary Journal, 100(7), 318–328. https://doi.org/10.1111/avj.13159
Yaman, A., Yoganingrum, A., Yaniasih, Y., & Riyanto, S. (2019). Tinjauan pustaka sistematis pada basis data pustaka digital: Tren riset, metodologi, dan coverage fields. Jurnal Dokumentasi Dan Informasi, 40(1), 1-20. https://doi.org/10.14203/j.baca.v40i1.481
Zorn, C. (2005). A solution to separation in binary response models. Political Analysis, 13(2), 157-170. https://doi.org/10.1093/pan/mpi009
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nurul Husna Jamian, Ahmad Zia Ul-Saufie, Mohammad Nasir Abdullah (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.