Abstract
Aspect-Based Sentiment Analysis (ABSA) has emerged as a powerful tool for deriving actionable insights from qualitative feedback in education. This study presents a multitask learning framework to analyze student evaluations of teaching (SET) by extracting and classifying opinions on specific aspects of teaching performance. Leveraging a novel and first open-sourced dataset of 6,025 Spanish-language comments, the proposed framework integrates opinion segmentation and multi-label classification to capture nuanced feedback on nine predefined aspects, such as "Teaching Quality" and "Classroom Atmosphere." Applications of this approach extend beyond SET analysis, offering valuable insights for course improvement, faculty assessment, and institutional decision-making in higher education. The paper compares the performance of fine-tuned transformers (BERT and RoBERTa) with large language models (LLMs), including GPT-4o, GPT4o-mini, and LLama-3.1-8B, using both fine-tuned and Few-shot Chain of Thought (CoT) methodologies. Evaluation results reveal that fine-tuned GPT-4o outperformed all other models, achieving a weighted F1-score of 0.69 for positive aspects and 0.79 for negative aspects, while Few-shot CoT approaches demonstrated competitive performance with greater scalability and interpretability. Our findings demonstrate the framework's potential to transform unstructured feedback into structured insights, aiding educators and institutions in enhancing teaching quality and student engagement.
This article is a part of the special issue “Transforming Education in the 21st Century: Foresight and Sustainable Development”
Guest Editors: Asad Abbas (Writing Lab, Institute for the Future of Education, Tecnologico de Monterrey, Monterrey, Mexico), Ahsan Ali (Zhejiang Sci-Tech University, Hangzhou, China), Jose Luis Martin-Nuñez (Instituto de Ciencias de la Educación, Universidad Politécnica de Madrid, Madrid, Spain), Mehul Mahrishi (Swami Keshvanand Institute of Technology, Management & Gramothan, Jaipur, India)
References
Butt S., Mejía-Almada P., Alvarado-Uribe J., Ceballos H.G., Sidorov G., Gelbukh A. (2023) MF-SET: A Multitask Learning Framework for Student Evaluation of Teaching. In: Proceedings of the Future Technologies Conference (FTC) 2023 (ed. K. Arai), vol. 1, Heidelberg, Dordrecht, London, New York: Springer, pp. 254–270. https://doi.org/10.1007/978-3-031-47454-5_20
Chauhan G.S., Agrawal P., Meena Y.K. (2019) Aspect-based sentiment analysis of students’ feedback to improve teaching–learning process. In: Information and Communication Technology for Intelligent Systems: Proceedings of ICTIS 2018 (eds. S. Satapathy, A. Joshi), vol. 2, Heidelberg, Dordrecht, London, New York: Springer, pp. 259–266. https://doi.org/10.1007/978-981-13-1747-7_25
Chen J., Wang R., Fang B., Zuo C. (2024) Fine-grained aspect-based opinion mining on online course reviews for feedback analysis. Interactive Learning Environments, 32(8), 4380–4395. https://doi.org/10.1080/10494820.2023.2198576
Dehbozorgi N., Mohandoss D.P. (2021) Aspect-based emotion analysis on speech for predicting performance in collaborative learning. Paper presented at the IEEE Frontiers in Education Conference (FIE) 13–16 October 2021, Lincoln, NE, USA. https://doi.org/10.1109/FIE49875.2021.9637330
Gallardo K., Butt S., Ceballos H. (2023) Improvement of Teaching Competencies Training in Higher Education Faculty Based on Student Evaluations of Teaching and AI Systems. In: Proceedings of the International Conference in Information Technology and Education (ICITED) 2023 (eds. A. Mesquita, A. Abreu, J.V. Carvalho, C. Santana, C.H.P. de Mello), Heidelberg, Dordrecht, London, New York: Springer, pp. 555–563. https://doi.org/10.1007/978-981-99-5414-8_51
Hadi M.U., Al Tashi Q., Shah A., Qureshi R., Muneer A., Irfan M., Zafar A., Shaikh M.B., Akhtar N., Wu J., Mirjalili S., Shah M. (2024) Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects (TechRxiv preprint 682263). https://doi.org/10.36227/techrxiv.23589741.v8
Kenton J.D.M.-W.C., Toutanova L.K. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding (ArXiv preprint 1810.04805). https://doi.org/10.48550/arXiv.1810.04805
Liu Y. (2019) Roberta: A robustly optimized bert pretraining approach (ArXiv preprint 1907.11692) https://doi.org/10.48550/arXiv.1907.11692
Marsh H.W., Roche L.A. (1997) Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187–1197. https://doi.org/10.1037/0003-066X.52.11.1187
Mughal N., Mujtaba G., Kumar A., Daudpota S.M. (2024) Comparative Analysis of Deep Natural Networks and Large Language Models for Aspect-Based Sentiment Analysis. IEEE Access, 12, 60943–60959. https://doi.org/10.1109/ACCESS.2024.3386969
Nazir A., Rao Y., Wu L., Sun L. (2020) Issues and challenges of aspect-based sentiment analysis: A comprehensive survey. IEEE Transactions on Affective Computing, 13(2), 845–863. https://doi.org/10.1109/TAFFC.2020.2970399
Nowell C., Gale L.R., Handley B. (2010) Assessing faculty performance using student evaluations of teaching in an uncontrolled setting. Assessment & Evaluation in Higher Education, 35(4), 463–475. https://doi.org/10.1080/02602930902862875
Qiao K., Li G., Zeng X., Li W. (2023) Utilizing Large Language Models for the Generation of Aspect-Based Sentiment Analysis Datasets. Paper presented at the 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), 25–27 August 2023, Hangzhou, China. https://doi.org/10.1109/ICBAIE59714.2023.10281255
Shaikh S., Daudpota S.M., Yayilgan S.Y., Sindhu S. (2023) Exploring the potential of large-language models (LLMs) for student feedback sentiment analysis. Paper presented at the 2023 International Conference on Frontiers of Information Technology (FIT), 11–12 December 2023, Islamabad, Pakistan. https://doi.org/10.1109/FIT60620.2023.00047
Zhang W., Deng Y., Liu B., Pan S., Bing L. (2024) Sentiment Analysis in the Era of Large Language Models: A Reality Check (ArXiv preprint 2305.15005). https://doi.org/10.48550/arXiv.2305.15005
Zhong Q., Ding L., Liu J., Du B., Tao D. (2023) Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT (ArXiv preprint 2302.10198). https://doi.org/10.48550/arXiv.2302.10198
Zhou J., Ye J. (2023) Sentiment analysis in education research: A review of journal publications. Interactive Learning Environments, 31(3), 1252–1264. https://doi.org/10.1080/10494820.2020.1826985

This work is licensed under a Creative Commons Attribution 4.0 International License.
