Analisis Butir Situasional Judgement Test Kompetensi Kepemimpinan Karyawan BUMN dengan Rasch Model


  • Ni Ketut Laksmi Kusuma Fakultas Psikologi Universitas Sebelas Maret
  • Moh. Abdul Hakim Fakultas Psikologi Universitas Sebelas Maret



item analysis, situational judgement test, leadership competency , competency assessment, assessment own-stated enterprise employee


Asesmen kompetensi SDM memiliki peran krusial dalam organisasi, khusunya pada PT X, sebuah BUMN di sektor transportasi Indonesia. PT X mengembangkan tes kompetensi kepemimpinan melalui Situational Judgement Test (SJT), diharapkan dapat secara objektif mengukur kompetensi dengan skenario pekerjaan realistis. Penelitian ini melibatkan 2.368 karyawan PT X dari kelompok jabatan level 1. Analisis butir dilakukan pada 48 aitem yang mengukur 7 kompetensi kepemimpinan pada level kompetensi 1, dengan tujuan meningkatkan mutu tes melalui perbaikan atau penghapusan butir yang tidak sesuai. Ditemukan bahwa sebanyak 15 aitem dijawab >50% responden, sedangkan 33 aitem dijawab benar oleh <50% responden. Meskipun sebagian besar aitem memiliki tingkat kesukaran yang baik (-2 ≥ b ≥ +2), beberapa aitem seperti DLE 1.3.3, DLE 1.2.2, dan DEX 1.2.2 yang memiliki tingkat kesukaran kurang baik. Pengukuran paling akurat ditemukan pada beberapa aitem, seperti SOR 1.2.1, SOR 1.3.1, SOR 1.1.1, dan SOR 1.1.2, sementara DEX 1.2.2 menunjukkan pengukuran yang kurang akurat. Evaluasi kecocokan (Infit & Outfit) pada seluruh aitem menunjukkan nilai yang sesuai (0,5 – 1,5) dengan kompetensi yang diukur, menegaskan keandalan tes. Wright Map menunjukkan kompetensi DEX mampu mengukur keseluruhan abilitas; kompetensi DLE mampu memotret abilitas responden rata-rata hingga tinggi; kompetensi Strategic Orientation (SOR), Developing Organizational Capabilities (DOC), dan Leading Change (LCH) memotret abilitas rata-rata; kompetensi Global Business Savvy (GBS) dan Managing Diversity (MDI) memotret abilitas pada tingkat rata-rata dan di bawah rata-rata. Penelitian ini menyimpulkan bahwa tes SJT ini memiliki kualitas butir yang baik dan dapat diandalkan untuk asesmen kompetensi PT X secara berkelanjuta.


Affleck, P., Bowman, M., Wardman, M., Sinclair, S., & Adams, R. (2016). Can we improve on situational judgement tests? British Dental Journal, 220(1), 9-10.

Aiken, L. R. (1994). Psychological Testing and Assessment (8th ed.). Allyn & Bacon.

Anastasi, A., & Urbina, S. (1997). Psychological Testing (7th ed.). Upper Saddle River, NJ: Prectice Hall.

Ang, S., Van Dyne, L., & Rockstuhl, T. (2015). Cultural intelligence: Origins, conceptualization, evolution, and methodological diversity. In M. J. Gelfand, C.-Y. Chiu, & Y.-Y. Hong (Eds.), Handbook of advances in culture and psychology, Vol. 5, pp. 273–323). Oxford University Press.

Arikunto, S. (2008). Dasar-Dasar Evaluasi Pendidikan. Bumi Aksara.

Ashraf, Z. A., & Jaseem, K. (2020). Classical and modern methods in item analysis of test tools. International Journal of Research and Review, 7(5), 397-403. Azizah, A., & Wahyuningsih, S. (2020). Penggunaan Model Rasch untuk Analisis Instrumen Tes pada Mata Kuliah Matematika Aktuaria. JUPITEK: Jurnal Pendidikan Matematika, 3(1), 45-50. Blanc, A., & Rojas, A. J. (2018). Use of Rasch Person-Item Maps to Validate a Theoretical Model for Measuring Attitudes toward Sexual Behaviors. PLOS ONE, 13(8), e0202551. Bond, T.G., & Fox, C.M. (2015). Applying the rasch model fundamental measurement in the human sciences (3rd ed.). Mahwah, NJ: Erlbaum. Boone, W. J. (2016). Rasch Analysis for Instrument Development: Why, When, and How? CBE—Life Sciences Education, 15(4), rm4.

Courville, T. G. (2004). An Empirical Comparison of Item Response Theory and Classical Test Theory Item/Person Statistics. Unpublished Ph.D Dissertation, Texas A & M University. Engelhard Jr., G. (2013). Invariant Measurement: Using Rasch Models in the Social, Behavioral and Health Sciences. Fernanda, J. W., & Hidayah, N. (2020). Analisis Kualitas Soal Ujian Statistika Menggunakan Classical Test Theory dan Rasch Model. Square: Journal of Mathematics and Mathematics Education, 2(1), 49.

Fitrianawati, M. (2017). Peran Analisis Butir Soal Guna Meningkatkan Kualitas Butir Soal, Kompetensi Guru dan Hasil Belajar Peserta Didik. Guenole, N., Chernyshenko, O., Stark, S., & Drasgow, F. (2014). Are Predictions Based on Situational Judgement Tests Precise Enough for Feedback in Leadership Development? European Journal of Work and Organizational Psychology, 24(3), 433-443. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1985, December 31). Item Response Theory. SpringerLink. Jumini, S., Madnasri, S., Cahyono, E., & Parmin, P. (2023, June). Analisis Kualitas Butir Soal Pengukuran Literasi Sains Melalui Teori Tes Klasik Dan Rasch Model. In Prosiding Seminar Nasional Pascasarjana (Vol. 6, No. 1, pp. 758-765). Karabatsos, G. (2000). A Critique of Rasch Residual Fit Statistics. Journal of Applied Measurement, 1(2), 152–176. Katz, D., Clairmont, A., & Wilton, M. (2021). Chapter 3 the Rasch model | Measuring what matters: Introduction to Rasch analysis in R. Bookdown. Kementerian BUMN. (2019). Kamus Kompetensi ASN di Lingkungan Kementerian BUMN. Kementerian BUMN. (2021). Permen BUMN no. PER-11/MBU/07/2021 Tahun 2021. Krabbe, P. F. M. (2017). Item Response Theory. The Measurement of Health and Health Status, 171–195. Krumm, S., Lievens, F., Hüffmeier, J., Lipnevich, A. A., Bendels, H., & Hertel, G. (2015). How “situational” is judgment in situational judgment tests? Journal of Applied Psychology, 100(2), 399–416. Kurniawan, U., & Andriyani, K. D. (2018). Analisis Soal Pilihan Ganda dengan Rasch Model. Statistika, 6(1), 34-39. Labola, Y. A. (2019). Konsep Pengembangan Sumber Daya Manusia Berbasis Kompetensi, Bakat dan Ketahanan dalam Organisasi. Jurnal Manajemen & Kewirausahaan, 7(1), 28-35. Lievens, F., & Motowidlo, S. J. (2015). Situational Judgment Tests: From Measures of Situational judgment to Measures of General Domain Knowledge. Industrial and Organizational Psychology, 9(1), 3-22. Lievens, F., & Patterson, F. (2011). The Validity and Incremental Validity of Knowledge Tests, Low-Fidelity Simulations, and High-Fidelity Simulations for Predicting Job Performance in Advanced-Level High-Stakes Selection. Journal of Applied Psychology, 96(5), 927-940. Lievens, F., Peeters, H., & Schollaert, E. (2008). Situational Judgment Tests: A Review of Recent Research. Personnel Review, 37(4), 426-441. Lievens, F., Buyse, T., & Sackett, P. R. (2005). Retest effects in operational selection settings: Development and test of a framework. Personnel Psychology, 58(4), 981–1007. Linacre, J.M. (2002). What Do Infit and Outfit Mean-Square and Standardized Mean?. Rasch Measurement Transaction, 16, 878.

Linacre, J.M. (2002). Understanding Rasch Measurement: Optimizing Rating Scale Category Effectiveness. Journal of Applied Measurement. 3. 85-106. Linacre, J. M. (2012). Expected score ICC, IRF (Rasch-half-point thresholds). Winsteps and Facets: Rasch Analysis + Rasch Measurement Software + 1PL IRT. Linden, W. J., & Hambleton, R. K. (1997). Item Response Theory: Brief History, Common Models, and Extensions. Handbook of Modern Item Response Theory, 1-28.

Muktamiroh, H., Herqutanto, H., Soemantri, D., & Purwadianto, A. (2021). The Potential of Situational Judgement Test as an Instrument of Ethical Competence Assessment: A Literature Review. Jurnal Pendidikan Kedokteran Indonesia: The Indonesian Journal of Medical Education, 10(3), 314.

Musid, N. A., Matore, M. E., & Hamid, H. A. (2023, September 23). Inter-rater reliability for assessing digital leadership situational judgement test linguistic validation using Cohen kappa. Journal for ReAttach Therapy and Developmental Diversities. Olsen, L. W. (2003). Essays on Georg Rasch and His Contributions to Statistics. Københavns Universitet, Økonomisk Institut. Passi, V., Doug, M., Peile, E., Thistlethwaite, J., & Johnson, N. (2010). Developing medical professionalism in future doctors: A systematic review. International Journal of Medical Education, 1, 19-29. Rasch, G. (1966). An Item Analysis which Takes Individual Differences into Account. British Journal of Mathematical and Statistical Psychology, 19(1), 49–57. Rasch, G. (1960). Studies in Mathematical Psychology: I. Probabilistic Models for Some Intelligence and Attainment Tests. Nielsen & Lydiche. Rost, J., & Von Davier, M. (1994). A Conditional Item-Fit Index for Rasch Models. Applied Psychological Measurement, 18(2), 171-182. Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2021). Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. Journal of Applied Psychology, 106(7), 1031–1052. Seol, H. (2020). Item Analysis - Jamovi. Sumintono, B. (2017). Rasch Model Measurement as Tools in Assessment for Learning. Advaces in social science. Education and Humanities Research, 173. Sumintono, B. (2014). Model Rasch untuk penelitian sosial kuantitatif. Sumintono. (2013). Ukuran Sampel untuk Kalibrasi Aitem. Rasch Model: Riset Kuantitatif. Sumintono, B., & Widhiarso, W. (2013). Aplikasi Model Rasch Untuk Penelitian Ilmu-Ilmu Sosial (Edisi Revisi). Trim Komunikata Publishing House. Walsh, J. L., Woolley, M. R., Brady, M. F., Melick, S. R., & Carretta, T. R. (2021, December). Air Force Officer Qualifying Test (AFOQT) form T: Psychometric Evaluation of the Situational Judgment Test. DTIC. Widhiarso, W. (2021). Panduan Penulisan Situational Judgment Test (SJT). Yogyakarta: UPAP Fakultas Psikologi UGM. Widhiarso, W., Hidayat, R., & Anggoro, W. J. (2018). Panduan Pengembangan Tes Penilaian Situasional (Situational Judgement Test). Yogyakarta: Fakultas Psikologi UGM & Pusat Penilaian Pendidikan Balitbang Kemdikbud. Widhiarso, W. (2017). Penerapan Model Rasch untuk Mengevaluasi Tes UKKS dan UKPS. Yukl, G. A. (2002). Leadership in organizations (5th ed.). Prentice Hall. Zubairi, A.M., & Kassim, N.L.A. (2006). Classical and rasch analyses of dichotomously scored reading comprehension test items. Malaysian Journal of ELT Research, 2(1), 1-20.


