Correlation Between Automatic Short Answer Scoring and Manual Scoring by Teacher on Indonesian Assessments

Ravika Ayu; Dade Nurjanah

doi:10.21009/jtp.v27i2.48378

Authors

Ravika Ayu Informatics, School of Computing, Telkom University, Bandung, Indonesia
Dade Nurjanah Informatics, School of Computing, Telkom University, Bandung, Indonesia

DOI:

https://doi.org/10.21009/jtp.v27i2.48378

Keywords:

Automated Short Answer Scoring, Sentence Embedding, Sentence Transformers

Abstract

Assessment is one tool evaluation in the learning teaching process to determine quality of learning. One of method being assessed Enough complicated in the assessment process is essay test. The essay test requires more time lots in the proofreading process as well as low validity and reliability because possible essay assessment influenced element subjective. Therefore, needed application used for correct essay answers expected automatically can help teachers to correct answer with fast and more results objective namely Automated Short Answer Scoring (ASAS). This research use sentence embedding method for measure similarity meaning between answer key and students answer. Data sets used consists of two data. One data consists of 1200 pairs of answer keys and students answer used for train the model. Two data totaling 250 words is used for evaluate models. Before enter the sentence embedding process, answering will through the pre-processing stage is remove stop words, remove empty, case folding, delete number, delete punctuation. Testing this system will done with method compare assessment carried out system with assessment carried out by teachers conventional use coefficient correlation. The result of test coefficient correlation Pearson of 0.81 and concluded reached 81 % similar with human rater assessment. This study can help teachers to be more efficient in the assessment.

References

Automated Essay Scoring Menggunakan Semantic Textual Similarity Berbasis Transformer Untuk Penilaian Ujian Esai. (2023). Jurnal Teknologi Informasi dan Ilmu Komputer, 10(6), 1177–1184. https://doi.org/10.25126/jtiik.1067338

Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807

Cerratto Pargman, T., Lindberg, Y., & Buch, A. (2023). Automation Is Coming! Exploring Future(s)-Oriented Methods in Education. Postdigital Science and Education, 5(1), 171–194. https://doi.org/10.1007/s42438-022-00349-6

Chalmers, D., & McAusland, W. (2002). Computer-assisted assessment. The Handbook for Economics Lecturers, 1–20.

Chicco, D., Starovoitov, V., & Jurman, G. (2021). The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment. IEEE Access, 9, 47112–47124. https://doi.org/10.1109/ACCESS.2021.3068614

Choi, H., Kim, J., Joe, S., & Gwon, Y. (2021). Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks. 2020 25th International Conference on Pattern Recognition (ICPR), 5482–5487. https://doi.org/10.1109/ICPR48806.2021.9412102

Conneau, A., & Kiela, D. (2018). SentEval: An Evaluation Toolkit for Universal Sentence Representations. ArXiv, abs/1803.05449. https://api.semanticscholar.org/CorpusID:3932228

Conole, G., & Warburton, B. (2005). A review of computer-assisted assessment. ALT-J, 13(1), 17–31. https://doi.org/10.1080/0968776042000339772

Dadi, R., & Sanampudi, S. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 1–33. https://doi.org/10.1007/s10462-021-10068-2

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:52967399

Gomaa, W. H., & Fahmy, A. A. (2020). Ans2vec: A Scoring System for Short Answers. Dalam A. E. Hassanien, A. T. Azar, T. Gaber, R. Bhatnagar, & M. F. Tolba (Eds.), The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019) (hlm. 586–595). Springer International Publishing.

Gu, P. Y., & Lam, R. (2023). Developing Assessment Literacy for Classroom-Based Formative Assessment. 46(2), 155–161. https://doi.org/10.1515/CJAL-2023-0201

Hasanah, U., Astuti, T., Wahyudi, R., Rifai, Z., & Pambudi, R. A. (2018). An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian. 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), 230–234. https://doi.org/10.1109/ICITISEE.2018.8720957

Herwanto, G. B., Sari, Y., Prastowo, B. N., Bustoni, I. A., & Hidayatulloh, I. (2018). UKARA: A Fast and Simple Automatic Short Answer Scoring System for Bahasa Indonesia. ICEAP Proceeding Book Vol 2. https://api.semanticscholar.org/CorpusID:209097879

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods, 25(1), 114–146. https://doi.org/10.1177/1094428120971683

Lahitani, A. R., Permanasari, A. E., & Setiawan, N. A. (2016). Cosine similarity to determine similarity measure: Study case in online essay assessment. 2016 4th International Conference on Cyber and IT Service Management, 1–6. https://doi.org/10.1109/CITSM.2016.7577578

Li, B., & Han, L. (2013). Distance Weighted Cosine Similarity Measure for Text Classification. Dalam H. Yin, K. Tang, Y. Gao, F. Klawonn, M. Lee, T. Weise, B. Li, & X. Yao (Eds.), Intelligent Data Engineering and Automated Learning – IDEAL 2013 (hlm. 611–618). Springer Berlin Heidelberg.

Lubis, F. F., Putri, A., Waskita, D., Sulistyaningtyas, T., Arman, A. A., Rosmansyah, Y., & others. (2021). Automated Short-Answer Grading using Semantic Similarity based on Word Embedding. International Journal of Technology, 12(3), 571–581.

Mamun, A. A., Sohel, M., Mohammad, N., Sunny, M. S. H., Dipta, D. R., & Hossain, E. (2020). A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models. IEEE Access, 8, 134911–134939. https://doi.org/10.1109/ACCESS.2020.3010702

Ormerod, C. M. (2022). Short-answer scoring with ensembles of pretrained language models. ArXiv, abs/2202.11558. https://api.semanticscholar.org/CorpusID:247058701

Pires, T., Schlinger, E., & Garrette, D. (2019). How Multilingual is Multilingual BERT? ArXiv, abs/1906.01502. https://api.semanticscholar.org/CorpusID:174798142

Putnikovic, M., & Jovanovic, J. (2023). Embeddings for Automatic Short Answer Grading: A Scoping Review. IEEE Transactions on Learning Technologies, 16(2), 219–231. https://doi.org/10.1109/TLT.2023.3253071

Rahutomo, F., Kitasuka, T., & Aritsugi, M. (2012, Oktober). Semantic Cosine Similarity. [Tipe publikasi tidak jelas, diasumsikan Prosiding Konferensi].

Rajagede, R. A. (2021). Improving Automatic Essay Scoring for Indonesian Language using Simpler Model and Richer Feature. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 6(1), 11–18. https://doi.org/10.22219/kinetik.v6i1.1196

Ramnarain-Seetohul, V., Bassoo, V., & Rosunally, Y. (2022). Similarity measures in automated essay scoring systems: A ten-year review. Education and Information Technologies, 27(4), 5573–5604. https://doi.org/10.1007/s10639-021-10838-z

Ratna, A. A. P., Astato, A. W., Budiardjo, B., & Hartanto, D. (2007). SIMPLE-O: Web Based Automated Essay Grading System Using Latent Semantic Analysis method for Indonesian Language considering weighted word and. Chairman of 10th International Conference on QIR 2007.

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:201646309

Shcherbakov, M. V., Brebels, A., Shcherbakova, N. L., Tyukov, A. P., Janovsky, T. A., Kamaev, V. A., & others. (2013). A survey of forecast error measures. World Applied Sciences Journal, 24(24), 171–176.

Stephens, D. (2001). Use of computer assisted assessment: Benefits to students and staff. Education for Information, 19(4), 265–275. https://doi.org/10.3233/EFI-2001-19401

Sugiyono, P. (2019). Metode penelitian pendidikan (kuantitatif, kualitatif, kombinasi, R&D dan penelitian pendidikan). Metode Penelitian Pendidikan, 67.

Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., & Arora, R. (2019). Pre-Training BERT on Domain Resources for Short Answer Grading. Dalam K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (hlm. 6071–6075). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1628

Susongko, P. (2010). Perbandingan Keefektifan Bentuk Tes Uraian dan Teslet dengan Penerapan Graded Response Model (GRM). Jurnal Penelitian Dan Evaluasi Pendidikan, 14. https://api.semanticscholar.org/CorpusID:142555426

Tim Pusat Penilain Pendidikan. (2019). Panduan Penilaian Tes Tertulis. Pusat Penilaian Pendidikan.

Wang, B., & Kuo, C. .-C. J. (2020). SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2146–2157. https://doi.org/10.1109/TASLP.2020.3008390

Yunanda, G., Nurjanah, D., & Meliana, S. (2022). Recommendation System from Microsoft News Data using TF-IDF and Cosine Similarity Methods. Building of Informatics, Technology and Science (BITS), 4(1), 277−284. https://doi.org/10.47065/bits.v4i1.1670

Zupanc, K., & Bosnić, Z. (2017). Automated essay evaluation with semantic analysis. Knowledge-Based Systems, 120, 118–132. https://doi.org/10.1016/j.knosys.2017.01.006

Correlation Between Automatic Short Answer Scoring and Manual Scoring by Teacher on Indonesian Assessments

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

menu