abstract
- The present study puts forward a regression analytic model based on the random forest algorithm, developed to predict, at an early stage, the global academic performance of the undergraduates of a polytechnic higher education institution. The study targets the universe of an institution composed of 5 schools rather than following the usual procedure of delimiting the prediction to one single specific degree course. Hence, we intend to provide the institution with one single tool capable of including the heterogeneity of the universe of students as well as educational dynamics. A different approach to feature selection is proposed, which enables to completely exclude categories of predictive variables, making the model useful for scenarios in which not all categories of data considered are collected. The introduced model can be used at a central level by the decision-makers who are entitled to design actions to mitigate academic failure.
- This work was supported by the Portuguese Foundation for Science and Technology (FCT) under Project UID/EEA/04131/2013. The authors would also like to thank the Polytechnic Institute of Bragan¸ca for making available the data analysed in this study.