Correlation Between Automatic Short Answer Scoring and Manual Scoring by Teacher on Indonesian Assessments
DOI:
https://doi.org/10.21009/jtp.v27i2.48378Keywords:
Automated Short Answer Scoring, Sentence Embedding, Sentence TransformersAbstract
Assessment is one tool evaluation in the learning teaching process to determine quality of learning. One of method being assessed Enough complicated in the assessment process is essay test. The essay test requires more time lots in the proofreading process as well as low validity and reliability because possible essay assessment influenced element subjective. Therefore, needed application used for correct essay answers expected automatically can help teachers to correct answer with fast and more results objective namely Automated Short Answer Scoring (ASAS). This research use sentence embedding method for measure similarity meaning between answer key and students answer. Data sets used consists of two data. One data consists of 1200 pairs of answer keys and students answer used for train the model. Two data totaling 250 words is used for evaluate models. Before enter the sentence embedding process, answering will through the pre-processing stage is remove stop words, remove empty, case folding, delete number, delete punctuation. Testing this system will done with method compare assessment carried out system with assessment carried out by teachers conventional use coefficient correlation. The result of test coefficient correlation Pearson of 0.81 and concluded reached 81 % similar with human rater assessment. This study can help teachers to be more efficient in the assessment.
References
Automated Essay Scoring Menggunakan Semantic Textual Similarity Berbasis Transformer Untuk Penilaian Ujian Esai. (2023). Jurnal Teknologi Informasi dan Ilmu Komputer, 10(6), 1177–1184. https://doi.org/10.25126/jtiik.1067338
Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education: Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
Cerratto Pargman, T., Lindberg, Y., & Buch, A. (2023). Automation Is Coming! Exploring Future(s)-Oriented Methods in Education. Postdigital Science and Education, 5(1), 171–194. https://doi.org/10.1007/s42438-022-00349-6
Chalmers, D., & McAusland, W. (2002). Computer-assisted assessment. The Handbook for Economics Lecturers, 1–20.
Chicco, D., Starovoitov, V., & Jurman, G. (2021). The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment. IEEE Access, 9, 47112–47124. https://doi.org/10.1109/ACCESS.2021.3068614
Choi, H., Kim, J., Joe, S., & Gwon, Y. (2021). Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks. 2020 25th International Conference on Pattern Recognition (ICPR), 5482–5487. https://doi.org/10.1109/ICPR48806.2021.9412102
Conneau, A., & Kiela, D. (2018). SentEval: An Evaluation Toolkit for Universal Sentence Representations. ArXiv, abs/1803.05449. https://api.semanticscholar.org/CorpusID:3932228
Conole, G., & Warburton, B. (2005). A review of computer-assisted assessment. ALT-J, 13(1), 17–31. https://doi.org/10.1080/0968776042000339772
Dadi, R., & Sanampudi, S. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55, 1–33. https://doi.org/10.1007/s10462-021-10068-2
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:52967399
Gomaa, W. H., & Fahmy, A. A. (2020). Ans2vec: A Scoring System for Short Answers. Dalam A. E. Hassanien, A. T. Azar, T. Gaber, R. Bhatnagar, & M. F. Tolba (Eds.), The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019) (hlm. 586–595). Springer International Publishing.
Gu, P. Y., & Lam, R. (2023). Developing Assessment Literacy for Classroom-Based Formative Assessment. 46(2), 155–161. https://doi.org/10.1515/CJAL-2023-0201
Hasanah, U., Astuti, T., Wahyudi, R., Rifai, Z., & Pambudi, R. A. (2018). An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian. 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), 230–234. https://doi.org/10.1109/ICITISEE.2018.8720957
Herwanto, G. B., Sari, Y., Prastowo, B. N., Bustoni, I. A., & Hidayatulloh, I. (2018). UKARA: A Fast and Simple Automatic Short Answer Scoring System for Bahasa Indonesia. ICEAP Proceeding Book Vol 2. https://api.semanticscholar.org/CorpusID:209097879
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods, 25(1), 114–146. https://doi.org/10.1177/1094428120971683
Lahitani, A. R., Permanasari, A. E., & Setiawan, N. A. (2016). Cosine similarity to determine similarity measure: Study case in online essay assessment. 2016 4th International Conference on Cyber and IT Service Management, 1–6. https://doi.org/10.1109/CITSM.2016.7577578
Li, B., & Han, L. (2013). Distance Weighted Cosine Similarity Measure for Text Classification. Dalam H. Yin, K. Tang, Y. Gao, F. Klawonn, M. Lee, T. Weise, B. Li, & X. Yao (Eds.), Intelligent Data Engineering and Automated Learning – IDEAL 2013 (hlm. 611–618). Springer Berlin Heidelberg.
Lubis, F. F., Putri, A., Waskita, D., Sulistyaningtyas, T., Arman, A. A., Rosmansyah, Y., & others. (2021). Automated Short-Answer Grading using Semantic Similarity based on Word Embedding. International Journal of Technology, 12(3), 571–581.
Mamun, A. A., Sohel, M., Mohammad, N., Sunny, M. S. H., Dipta, D. R., & Hossain, E. (2020). A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models. IEEE Access, 8, 134911–134939. https://doi.org/10.1109/ACCESS.2020.3010702
Ormerod, C. M. (2022). Short-answer scoring with ensembles of pretrained language models. ArXiv, abs/2202.11558. https://api.semanticscholar.org/CorpusID:247058701
Pires, T., Schlinger, E., & Garrette, D. (2019). How Multilingual is Multilingual BERT? ArXiv, abs/1906.01502. https://api.semanticscholar.org/CorpusID:174798142
Putnikovic, M., & Jovanovic, J. (2023). Embeddings for Automatic Short Answer Grading: A Scoping Review. IEEE Transactions on Learning Technologies, 16(2), 219–231. https://doi.org/10.1109/TLT.2023.3253071
Rahutomo, F., Kitasuka, T., & Aritsugi, M. (2012, Oktober). Semantic Cosine Similarity. [Tipe publikasi tidak jelas, diasumsikan Prosiding Konferensi].
Rajagede, R. A. (2021). Improving Automatic Essay Scoring for Indonesian Language using Simpler Model and Richer Feature. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 6(1), 11–18. https://doi.org/10.22219/kinetik.v6i1.1196
Ramnarain-Seetohul, V., Bassoo, V., & Rosunally, Y. (2022). Similarity measures in automated essay scoring systems: A ten-year review. Education and Information Technologies, 27(4), 5573–5604. https://doi.org/10.1007/s10639-021-10838-z
Ratna, A. A. P., Astato, A. W., Budiardjo, B., & Hartanto, D. (2007). SIMPLE-O: Web Based Automated Essay Grading System Using Latent Semantic Analysis method for Indonesian Language considering weighted word and. Chairman of 10th International Conference on QIR 2007.
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:201646309
Shcherbakov, M. V., Brebels, A., Shcherbakova, N. L., Tyukov, A. P., Janovsky, T. A., Kamaev, V. A., & others. (2013). A survey of forecast error measures. World Applied Sciences Journal, 24(24), 171–176.
Stephens, D. (2001). Use of computer assisted assessment: Benefits to students and staff. Education for Information, 19(4), 265–275. https://doi.org/10.3233/EFI-2001-19401
Sugiyono, P. (2019). Metode penelitian pendidikan (kuantitatif, kualitatif, kombinasi, R&D dan penelitian pendidikan). Metode Penelitian Pendidikan, 67.
Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., & Arora, R. (2019). Pre-Training BERT on Domain Resources for Short Answer Grading. Dalam K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (hlm. 6071–6075). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1628
Susongko, P. (2010). Perbandingan Keefektifan Bentuk Tes Uraian dan Teslet dengan Penerapan Graded Response Model (GRM). Jurnal Penelitian Dan Evaluasi Pendidikan, 14. https://api.semanticscholar.org/CorpusID:142555426
Tim Pusat Penilain Pendidikan. (2019). Panduan Penilaian Tes Tertulis. Pusat Penilaian Pendidikan.
Wang, B., & Kuo, C. .-C. J. (2020). SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2146–2157. https://doi.org/10.1109/TASLP.2020.3008390
Yunanda, G., Nurjanah, D., & Meliana, S. (2022). Recommendation System from Microsoft News Data using TF-IDF and Cosine Similarity Methods. Building of Informatics, Technology and Science (BITS), 4(1), 277−284. https://doi.org/10.47065/bits.v4i1.1670
Zupanc, K., & Bosnić, Z. (2017). Automated essay evaluation with semantic analysis. Knowledge-Based Systems, 120, 118–132. https://doi.org/10.1016/j.knosys.2017.01.006
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ravika Ayu, Dade Nurjanah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Jurnal Teknologi Pendidikan is an Open Access Journal. The authors who publish the manuscript in Jurnal Teknologi Pendidikan agree to the following terms.
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.




