Logistic regression and learning personalization in higher education

Autores

DOI:

https://doi.org/10.56183/iberoeds.v5i1.708

Palavras-chave:

logistic regression, extrapolation, personalized learning, higher education

Resumo

This study focuses on the early prediction of academic performance in higher education students to personalize the learning process and enable timely interventions. Logistic Regression, widely used for its interpretability and effectiveness, serves as a starting point to assess its validity and utility in the context of higher education. A dataset of 10,184 students from the Universidad Autónoma de Baja California was analyzed. Three variable configurations (Basic, Complete, and Exam) and three classification algorithms (Logistic Regression, Naive Bayes, and Decision Tree) were compared using five-fold cross-validation and random sampling (90% training, 10% testing). Accuracy, Recall, F1 Score, and AUC-ROC were employed as evaluation metrics. Logistic Regression (Basic configuration) achieved the best metrics, yielding a Recall near 0.88 and an AUC-ROC around 0.72–0.76, outperforming Naive Bayes and Decision Tree. High school GPA emerged as the most influential variable, followed by Writing scores. These findings highlight the potential of Logistic Regression for early risk detection and learning personalization, although further investigation is warranted to address predictive fairness and incorporate socio-emotional factors that ensure a more inclusive and effective educational approach.

Referências

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–231. https://doi.org/10.1214/ss/1009213726

Burkov, A. (2019). The hundred-page machine learning book. Andriy Burkov.

Cecenardo-Galiano, C., Sumaran-Pedraza, C., Obregon-Palomino, L., Iparraguirre-Villanueva, O., & Cabanillas-Carbonell, M. (2024). Predictive model with machine learning for academic performance. En X. S. Yang, R. S. Sherratt, N. Dey, & A. Joshi (Eds.), Proceedings of Eighth International Congress on Information and Communication Technology (ICICT 2023) (Vol. 695, pp. 955–967). Springer. https://doi.org/10.1007/978-981-99-3043-2_81

Chapelle, C. A. (2021). Argument-based validity in testing: Building and evaluating the case for test use. Language Testing, 38(3), 361–377. https://doi.org/10.4135/9781071878811

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3–13. https://doi.org/10.1111/j.1745-3992.2009.00165.x

Contreras, L. E., Fuentes, H. J., & Rodríguez, J. I. (2020). Predicción del rendimiento académico como indicador de éxito/fracaso de los estudiantes de ingeniería, mediante aprendizaje automático. Formación Universitaria, 13(5), 233–246. https://doi.org/10.4067/S0718-50062020000500233

Cuji Chacha, B. R., Gavilanes López, W. L., Vicente Guerrero, V. X., & Villacis Villacis, W. G. (2020). Student dropout model based on logistic regression. En Applied Technologies (pp. 321–333). Springer. https://doi.org/10.1007/978-3-030-42520-3_26

Dawar, I., Negi, S., Lamba, S., & Kumar, A. (2024). Enhancing student academic performance forecasting: A comparative analysis of machine learning algorithms. SN Computer Science, 5, artículo 758. https://doi.org/10.1007/s42979-024-03118-3

Forero-Corba, W., & Negre Bennasar, F. (2024). Diseño y simulación de un modelo de predicción para la evaluación de la competencia digital docente usando técnicas de Machine Learning. Edutec, Revista Electrónica de Tecnología Educativa, (89), 18–43. https://doi.org/10.21556/edutec.2024.89.3201

Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2a ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7

Kane, M. T. (2006). Validation. En R. L. Brennan (Ed.), Educational Measurement (4.a ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence, 2, 1137–1143.

Messick, S. (1989). Validity. En R. L. Linn (Ed.), Educational measurement (3a ed., pp. 13–103). American Council on Education.

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5a ed.). John Wiley & Sons. https://doi.org/10.1002/9781118097281

Paterson, K., & Guerrero, A. (2022). Predictive analytics in education: Considerations in predicting versus explaining college student retention. Research in Higher Education Journal, 44. https://files.eric.ed.gov/fulltext/EJ1401369.pdf

Reyes Rocabado, J., Escobar Flores, C., Duarte Vargas, J., & Ramírez Peradotto, P. (2007). Una aplicación del modelo de regresión logística en la predicción del rendimiento estudiantil. Estudios Pedagógicos, 33(2), 101–120. https://doi.org/10.4067/S0718-07052007000200008

Sánchez Sordo, J. M. (2019). Desarrollo de un entorno digital de aprendizaje desde el Conectivismo y su posterior análisis utilizando algoritmos de machine learning. Edutec, Revista Electrónica de Tecnología Educativa, (69), 1–22. https://doi.org/10.21556/edutec.2019.69.1355

Serrano, J. L., & Moreno-García, J. (2024). Inteligencia artificial y personalización del aprendizaje: ¿innovación educativa o promesas recicladas? Edutec, Revista Electrónica de Tecnología Educativa, (89), 1–17. https://doi.org/10.21556/edutec.2024.89.3577

Villar, A., & de Andrade, C. R. V. (2024). Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study. Discover Artificial Intelligence, 4(2). https://doi.org/10.1007/s44163-023-00079-z

Zerkouk, M., Mihoubi, M., & Chikhaoui, B. (2024). A machine learning-based model for student dropout prediction in online training. Education and Information Technologies, 29, 15793–15812. https://doi.org/10.1007/s10639-024-12500-w

Downloads

Publicado

2025-06-05

Como Citar

Ruiz Mendoza, K. K., & Pedroza Zúñiga, L. H. (2025). Logistic regression and learning personalization in higher education. Ibero-American Journal of Education & Society Research, 5(1), e25004. https://doi.org/10.56183/iberoeds.v5i1.708