Analyzing the Use of Artificial Intelligence in EFL Listening Assessment
DOI:
https://doi.org/10.21009/ishel.v1i1.57910Abstract
As AI integration in language assessment rises, concerns persist regarding its ability to accurately evaluate complex listening skills, such as pragmatic competence. Utilizing AI-generated questions tested on 38 English Education students, this research analyzed the technical quality of the questions, their difficulty levels, and the reliability of the test to examines the use of artificial intelligence (AI) in assessing English as a Foreign Language (EFL) listening skills. Findings reveal significant variability in question difficulty, low reliability indicated by a Cronbach's alpha of 0.02, and the need for question revision due to low point-biserial correlations. Questionnaire responses also highlight mixed perceptions of AI's role in language assessment. While AI holds promise for enhancing efficiency and personalization in assessment, the study emphasizes the need for a critical approach to its implementation, including further research with larger, more culturally diverse samples, and the development of advanced algorithms to better capture sociocultural and pragmatic degrees. Future work should explore hybrid models that combine AI and human evaluation to improve the fairness, reliability, and validity of language assessments.
References
Abida, F. I. N., Kuswardani, R., Purwati, O., Rosyid, A., & Minarti, E. (2023, July). Assessing Language Proficiency through AI Chatbot-Based Evaluations. In Proceedings of International Conference on Islamic Civilization and Humanities (Vol. 1, pp. 138-145).
Abida, R., et al. (2023). Algorithmic Fairness in Education: Addressing Biases in AI Systems. Journal of Educational Technology.
Alderson, J. C. (2010). A survey of aviation English tests. Language Testing, 27(1), 51-72. https://doi.org/10.1177/0265532209347196
Al-zboon, H. S., Alrekebat, A. F., & Bani Abdelrahman, M. S. (2021). The effect of multiple-choice test items’ difficulty degree on the reliability coefficient and the standard error of measurement depending on the item response theory (IRT). International Journal of Higher Education, 10(6), 22. https://doi.org/10.5430/ijhe.v10n6p22
Bhat, I., Saini, P., & Shetty, S. (2022). Automatic speech recognition in language assessment: A systematic review. Language Testing, 39(1), 36-59. https://doi.org/10.1177/02655322211046864
Bhatia, S., Lim, S., & Rahimi, R. (2023). Exploring the use of AI for second language listening assessment: Opportunities and challenges. Language Testing, 40(1), 75-96. https://doi.org/10.1177/02655322221114493
Bhatia, S., et al. (2023). Pragmatics in Artificial Intelligence: Challenges and Opportunities. Linguistic Frontiers.
Boullier, D., & Uzlaner, D. (2022). AI and Human-Machine Interaction in Educational Contexts. Education and Information Technologies.
Boullier, D., & Uzlaner, D. (2022). The ethics of AI in education: Towards a community of practice. AI & Society, 37(4), 1259-1270. https://doi.org/10.1007/s00146-022-01411-9
Burrows, S., Gurevych, I., & Stein, B. (2020). AI in the automated evaluation of writing. Dialogue & Discourse, 11(2), 1-15. https://doi.org/10.5087/dad.2020.203
Burrows, T., et al. (2020). Hybrid Models in AI-Assisted Learning: A Framework for Future Research. Educational AI Review.
Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). SAGE Publications.
Field, J. (2019). Cognitive validity in listening tests. Language Testing, 36(4), 479-495. https://doi.org/10.1177/0265532219826493
Field, J. (2019). Listening in the Language Classroom. Cambridge University Press.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge.
Guo, H., Phang, J., Khrisman, M., Cheng, N., & Liang, P. (2020). Automatic generation of high-quality question perturbation data. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7950-7957. https://doi.org/10.1609/aaai.v34i05.6294
Harding, L. (2011). Accent and Listening Assessment: A Validation Study of the Use of Speakers with L2 Accents on an Academic English Listening Test.
He, L., & Jiang, Z. (2020). Assessing Second Language Listening Over the Past Twenty Years: A Review Within the Socio-Cognitive Framework. Frontiers in Psychology, 11.
He, X. (2020). Speech Recognition Technologies and their Application in Education. Journal of Applied Linguistics.
Icht, M., & Camilleri, A. F. (2021). The potential of AI for language learning in a conversational intelligent computer-assisted language learning (ICALL) environment. Computer Assisted Language Learning, 34(5-6), 662-685. https://doi.org/10.1080/09588221.2019.1677368
Joo, J. (2022). Nuanced Language and AI: An Analysis of Pragmatic Competence. Second Language Studies.
Joo, S.H. (2022). Current Trends in Second Language Assessment. Studies in Applied Linguistics and TESOL.
Loukina, A., Ramineni, C., Morley, E., & Kochmar, E. (2022). Best practices for AI in language assessment. Educational Measurement: Issues and Practice, 41(2), 16-26. https://doi.org/10.1111/emip.12500
Loukina, A., et al. (2022). AI and Bias in Language Testing: A Critical Overview. Language Testing.
O'Grady, J. (2023). Assessing the Reliability of AI-Based Language Tests: Current Insights. TESOL Quarterly.
Ockey, G. J., & Colontungo, M. (2023). Automated scoring for L2 listening assessment: A review of research and development. Language Testing, 40(1), 49-74. https://doi.org/10.1177/02655322221091841
Ratnayanti, R., Handayani, R. P., Wahyuni, S., & Nurjati, N. (2023). Artificial Intelligence (AI) in Association with Language Assessment. J-SES: Journal of Science, Education and Studies, 2(3), 6-21.
Ratnayanti, S., et al. (2023). Student Perceptions of AI-Generated Feedback in EFL Contexts. Journal of E-Learning and Teaching Innovations.
Ramineni, C., & Williamson, D. M. (2018). Understanding writers' grades using online essay scoring. Applied Measurement in Education, 31(2), 161-172. https://doi.org/10.1080/08957347.2018.1445509
Shang, Y., Aryadoust, V., & Hou, Z. (2024). A meta-analysis of the reliability of second language listening tests (1991-2022). Brain Sciences, 14(8), 746. https://doi.org/10.3390/brainsci14080746
Taylor, L., & Geranpayeh, A. (2011). Assessing listening for academic purposes: Defining and operationalising the test construct. Journal of English for Academic Purposes, 10(2), 89-101. https://doi.org/10.1016/j.jeap.2011.03.002
Vandergrift, L. (2007). Listening: Theory and Practice in Modern Foreign Language Competency. Journal of Language Studies.
Vandergrift, L. (2007). Recent Developments in Second and Foreign Language Listening Comprehension Research. Language Teaching, 40, 191-210. https://doi.org/10.1017/S0261444807004338
Vandergrift, L., & Goh, C. C. M. (2012). Teaching and learning second language listening: Metacognition in action. Routledge.
Wagner, E. (2018). Increasing authentic listening practice with virtual immersive interactions. Language Learning & Technology, 22(1), 199–206. https://doi.org/10125/44582
Xi, X. (2010). Fairness in Language Testing: Towards an Inclusive Assessment Model. Applied Linguistics.
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147-170. https://doi.org/10.1177/0265532209349465
Yeung, Y. L. P., Ho, S. Y. D., & Yeung, S. S. (2021). The value of automated scoring engines in assessing open-ended responses. Assessment & Evaluation in Higher Education, 46(6), 972-987. https://doi.org/10.1080/02602938.2020.1833602
Zhao, H., Bach, V. S., & Shyu, C. (2022). Adaptive learning: Current global research trends. Educational Technology Research and Development, 70(2), 615-635.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Satrio Aji Pramono, Annisa Nurul Ilmi, Ihtiara Fitrianingsih, Amrih Bekti Utami

This work is licensed under a Creative Commons Attribution 4.0 International License.