Analyzing the Use of Artificial Intelligence in EFL Listening Assessment

Satrio Aji Pramono; Annisa Nurul Ilmi; Ihtiara Fitrianingsih; Amrih Bekti Utami

doi:10.21009/ishel.v1i1.57910

Authors

Satrio Aji Pramono universitas negeri yogyakarta
Annisa Nurul Ilmi UNY
Ihtiara Fitrianingsih universitas negeri yogyakarta
Amrih Bekti Utami universitas negeri yogyakarta

DOI:

https://doi.org/10.21009/ishel.v1i1.57910

Abstract

As AI integration in language assessment rises, concerns persist regarding its ability to accurately evaluate complex listening skills, such as pragmatic competence. Utilizing AI-generated questions tested on 38 English Education students, this research analyzed the technical quality of the questions, their difficulty levels, and the reliability of the test to examines the use of artificial intelligence (AI) in assessing English as a Foreign Language (EFL) listening skills. Findings reveal significant variability in question difficulty, low reliability indicated by a Cronbach's alpha of 0.02, and the need for question revision due to low point-biserial correlations. Questionnaire responses also highlight mixed perceptions of AI's role in language assessment. While AI holds promise for enhancing efficiency and personalization in assessment, the study emphasizes the need for a critical approach to its implementation, including further research with larger, more culturally diverse samples, and the development of advanced algorithms to better capture sociocultural and pragmatic degrees. Future work should explore hybrid models that combine AI and human evaluation to improve the fairness, reliability, and validity of language assessments.

Author Biographies

Satrio Aji Pramono, universitas negeri yogyakarta

english education study program, faculty languages,arts, and culture

Ihtiara Fitrianingsih, universitas negeri yogyakarta

english education study program, faculty languages, arts, and culture

Amrih Bekti Utami, universitas negeri yogyakarta

english education study program, faculty languages, arts, and culture

References

Abida, F. I. N., Kuswardani, R., Purwati, O., Rosyid, A., & Minarti, E. (2023, July). Assessing Language Proficiency through AI Chatbot-Based Evaluations. In Proceedings of International Conference on Islamic Civilization and Humanities (Vol. 1, pp. 138-145).

Abida, R., et al. (2023). Algorithmic Fairness in Education: Addressing Biases in AI Systems. Journal of Educational Technology.

Alderson, J. C. (2010). A survey of aviation English tests. Language Testing, 27(1), 51-72. https://doi.org/10.1177/0265532209347196

Al-zboon, H. S., Alrekebat, A. F., & Bani Abdelrahman, M. S. (2021). The effect of multiple-choice test items’ difficulty degree on the reliability coefficient and the standard error of measurement depending on the item response theory (IRT). International Journal of Higher Education, 10(6), 22. https://doi.org/10.5430/ijhe.v10n6p22

Bhat, I., Saini, P., & Shetty, S. (2022). Automatic speech recognition in language assessment: A systematic review. Language Testing, 39(1), 36-59. https://doi.org/10.1177/02655322211046864

Bhatia, S., Lim, S., & Rahimi, R. (2023). Exploring the use of AI for second language listening assessment: Opportunities and challenges. Language Testing, 40(1), 75-96. https://doi.org/10.1177/02655322221114493

Bhatia, S., et al. (2023). Pragmatics in Artificial Intelligence: Challenges and Opportunities. Linguistic Frontiers.

Boullier, D., & Uzlaner, D. (2022). AI and Human-Machine Interaction in Educational Contexts. Education and Information Technologies.

Boullier, D., & Uzlaner, D. (2022). The ethics of AI in education: Towards a community of practice. AI & Society, 37(4), 1259-1270. https://doi.org/10.1007/s00146-022-01411-9

Burrows, S., Gurevych, I., & Stein, B. (2020). AI in the automated evaluation of writing. Dialogue & Discourse, 11(2), 1-15. https://doi.org/10.5087/dad.2020.203

Burrows, T., et al. (2020). Hybrid Models in AI-Assisted Learning: A Framework for Future Research. Educational AI Review.

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). SAGE Publications.

Field, J. (2019). Cognitive validity in listening tests. Language Testing, 36(4), 479-495. https://doi.org/10.1177/0265532219826493

Field, J. (2019). Listening in the Language Classroom. Cambridge University Press.

Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge.

Guo, H., Phang, J., Khrisman, M., Cheng, N., & Liang, P. (2020). Automatic generation of high-quality question perturbation data. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7950-7957. https://doi.org/10.1609/aaai.v34i05.6294

Harding, L. (2011). Accent and Listening Assessment: A Validation Study of the Use of Speakers with L2 Accents on an Academic English Listening Test.

He, L., & Jiang, Z. (2020). Assessing Second Language Listening Over the Past Twenty Years: A Review Within the Socio-Cognitive Framework. Frontiers in Psychology, 11.

He, X. (2020). Speech Recognition Technologies and their Application in Education. Journal of Applied Linguistics.

Icht, M., & Camilleri, A. F. (2021). The potential of AI for language learning in a conversational intelligent computer-assisted language learning (ICALL) environment. Computer Assisted Language Learning, 34(5-6), 662-685. https://doi.org/10.1080/09588221.2019.1677368

Joo, J. (2022). Nuanced Language and AI: An Analysis of Pragmatic Competence. Second Language Studies.

Joo, S.H. (2022). Current Trends in Second Language Assessment. Studies in Applied Linguistics and TESOL.

Loukina, A., Ramineni, C., Morley, E., & Kochmar, E. (2022). Best practices for AI in language assessment. Educational Measurement: Issues and Practice, 41(2), 16-26. https://doi.org/10.1111/emip.12500

Loukina, A., et al. (2022). AI and Bias in Language Testing: A Critical Overview. Language Testing.

O'Grady, J. (2023). Assessing the Reliability of AI-Based Language Tests: Current Insights. TESOL Quarterly.

Ockey, G. J., & Colontungo, M. (2023). Automated scoring for L2 listening assessment: A review of research and development. Language Testing, 40(1), 49-74. https://doi.org/10.1177/02655322221091841

Ratnayanti, R., Handayani, R. P., Wahyuni, S., & Nurjati, N. (2023). Artificial Intelligence (AI) in Association with Language Assessment. J-SES: Journal of Science, Education and Studies, 2(3), 6-21.

Ratnayanti, S., et al. (2023). Student Perceptions of AI-Generated Feedback in EFL Contexts. Journal of E-Learning and Teaching Innovations.

Ramineni, C., & Williamson, D. M. (2018). Understanding writers' grades using online essay scoring. Applied Measurement in Education, 31(2), 161-172. https://doi.org/10.1080/08957347.2018.1445509

Shang, Y., Aryadoust, V., & Hou, Z. (2024). A meta-analysis of the reliability of second language listening tests (1991-2022). Brain Sciences, 14(8), 746. https://doi.org/10.3390/brainsci14080746

Taylor, L., & Geranpayeh, A. (2011). Assessing listening for academic purposes: Defining and operationalising the test construct. Journal of English for Academic Purposes, 10(2), 89-101. https://doi.org/10.1016/j.jeap.2011.03.002

Vandergrift, L. (2007). Listening: Theory and Practice in Modern Foreign Language Competency. Journal of Language Studies.

Vandergrift, L. (2007). Recent Developments in Second and Foreign Language Listening Comprehension Research. Language Teaching, 40, 191-210. https://doi.org/10.1017/S0261444807004338

Vandergrift, L., & Goh, C. C. M. (2012). Teaching and learning second language listening: Metacognition in action. Routledge.

Wagner, E. (2018). Increasing authentic listening practice with virtual immersive interactions. Language Learning & Technology, 22(1), 199–206. https://doi.org/10125/44582

Xi, X. (2010). Fairness in Language Testing: Towards an Inclusive Assessment Model. Applied Linguistics.

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147-170. https://doi.org/10.1177/0265532209349465

Yeung, Y. L. P., Ho, S. Y. D., & Yeung, S. S. (2021). The value of automated scoring engines in assessing open-ended responses. Assessment & Evaluation in Higher Education, 46(6), 972-987. https://doi.org/10.1080/02602938.2020.1833602

Zhao, H., Bach, V. S., & Shyu, C. (2022). Adaptive learning: Current global research trends. Educational Technology Research and Development, 70(2), 615-635.

Analyzing the Use of Artificial Intelligence in EFL Listening Assessment

Authors

DOI:

Abstract

Author Biographies

Satrio Aji Pramono, universitas negeri yogyakarta

Ihtiara Fitrianingsih, universitas negeri yogyakarta

Amrih Bekti Utami, universitas negeri yogyakarta

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

menu

stats

Tools

Proceeding of International Seminar on Humanity, Education and Language

Stay Connected