Psychometric properties of the formative assessment test for the statistics course

Authors

  • Helli Ihsan Universitas Negeri Jakarta , Universitas Pendidikan Indonesia
  • Wardani Rahayu Universitas Negeri Jakarta
  • Riyan Arthur Universitas Negeri Jakarta

DOI:

https://doi.org/10.64014/jik.v23i2.325

Keywords:

classical test theory, formative assessment, item response theory, IRT, psychological statistics

Abstract

This study is motivated by the importance of formative assessment in enhancing students’ understanding and engagement in Psychological Statistics learning, as well as the limited availability of psychometrically sound instruments. The study aims to develop a structured formative assessment instrument that is valid and reliable, and to analyze item characteristics using Classical Test Theory (CTT) and Item Response Theory (IRT). This research employed an instrument development design involving 191 first-year psychology students. The initial instrument consisted of 40 multiple-choice items constructed based on learning outcomes and validated through expert judgment using Aiken’s V. Data analysis was conducted in stages, including item-total correlation analysis, IRT 1PL for item difficulty, and IRT 2PL for both difficulty and discrimination parameters. The results showed that after the selection process, 29 items met the psychometric criteria, with difficulty levels predominantly in the easy-to-moderate range, good discrimination indices, and high reliability. These findings indicate that the developed instrument accurately and consistently measures students’ understanding and can be effectively used as a formative assessment tool to support statistics learning in higher education.

 

Penelitian ini dilatarbelakangi oleh pentingnya asesmen formatif dalam meningkatkan pemahaman dan keterlibatan mahasiswa pada pembelajaran Statistik Psikologi, serta keterbatasan ketersediaan instrumen yang memiliki kualitas psikometris yang teruji. Penelitian ini bertujuan untuk mengembangkan instrumen asesmen formatif terstruktur yang valid dan reliabel, serta menganalisis karakteristik butir menggunakan pendekatan Classical Test Theory (CTT) dan Item Response Theory (IRT). Metode yang digunakan adalah penelitian pengembangan instrumen dengan melibatkan 191 mahasiswa semester pertama Program Studi Psikologi. Instrumen awal terdiri dari 40 butir soal pilihan ganda yang disusun berdasarkan capaian pembelajaran dan divalidasi melalui expert judgment menggunakan Aiken’s V. Analisis data dilakukan secara bertahap melalui uji korelasi item-total, analisis IRT 1PL untuk parameter kesulitan, serta IRT 2PL untuk parameter kesulitan dan daya diskriminasi. Hasil penelitian menunjukkan bahwa setelah proses seleksi, diperoleh 29 butir soal yang memenuhi kriteria psikometris dengan tingkat kesulitan yang dominan mudah hingga sedang, daya diskriminasi yang baik, serta reliabilitas yang tinggi. Temuan ini menunjukkan bahwa instrumen yang dikembangkan mampu mengukur pemahaman mahasiswa secara akurat dan konsisten, serta dapat digunakan sebagai alat asesmen formatif yang efektif untuk mendukung pembelajaran statistik di pendidikan tinggi.

Kata Kunci: asesmen formatif; classical test theory; item response theory; statistik psikologi

References

Abdullah, M. S. (2025). Model evaluasi formatif dan sumatif: strategi untuk meningkatkan proses dan hasil pembelajaran di pendidikan dasar pada Kurikulum Merdeka. Dewantara: Jurnal Pendidikan, 2(4), 30-35.

Ajid, S. N., Kusumaningtyas, D. A., Ratih, K., & Lava, S. (2025). Strategies for integrating problem-based learning, teaching modules, and formative assessments to enhance learning outcomes and critical thinking skills. Indonesian Journal on Learning and Advanced Education (IJOLAE), 218-232.

Arhin, A. K. (2024). Developing distractors for mathematics multiple choice items: a literature review. Acta Educationis Generalis, 14(3), 103-120.

Chen, Z., Jiao, J., & Hu, K. (2021). Formative assessment as an online instruction intervention: student engagement, outcomes, and perceptions. International Journal of Distance Education Technologies (IJDET), 19(1), 50-65.

Dayal, H. C. (2021). How teachers use formative assessment strategies during teaching: evidence from the classroom. Australian Journal of Teacher Education (Online), 46(7), 1-21.

Dianti, K., Ulfah, M., Salam, A., Gunawan, G., & Luthfiyah, L. (2025). Analisis asesmen diagnostik, formatif dan sumatif serta implikasinya terhadap efektivitas sistem evaluasi pendidikan. Jurnal Pendidikan dan Pembelajaran Indonesia (JPPI), 5(2), 555-565.

Dimova Popovska, H., Popovski, F., & Popovska Nalevska, G. (2024). Using formative assessment to foster confidence and motivation to learn. International Journal of Research Studies in Education, 13(1), 113-121.

Gamage, S. H., Ayres, J. R., & Behrend, M. B. (2022). A systematic review on trends in using Moodle for teaching and learning. International Journal of STEM Education, 9(1), 1-24.

Gyamfi, A., & Acquaye, R. (2023). Parameters and models of Item Response Theory (IRT): a review of literature. Acta Educationis Generalis, 13(3), 68-78.

Hanefar, S. B. M., Anny, N., & Rahman, S. (2022). Enhancing teaching and learning in higher education through formative assessment: teachers’ perceptions. International Journal of Assessment Tools in Education, 9(1), 61-79.

Himelfarb, I. (2019). A primer on standardized testing: History, measurement, classical test theory, item response theory, and equating. Journal of Chiropractic Education, 33(2), 151–163.

Iliya, A. (2024). Item parameters, scoring models and ability estimates of distance education students: implications for the national open University of Nigeria. Sokoto Educational Review, 23(1), 162-172.

Ismail, S. M., Rahul, D. R., Patra, I., & Rezvani, E. (2022). Formative vs. summative assessment: impacts on academic motivation, attitude toward learning, test anxiety, and self-regulation skill. Language Testing in Asia, 12(1), 1-23.

Iwintolu, R. O., Opesemowo, O. A. G., & Adetutu, P. O. (2024). Effect of 2-PL and 3-PL models on the ability estimate in Mathematics binary items. Journal on Efficiency and Responsibility in Education and Science, 17(3), 257-272.

Jordan, P., & Spiess, M. (2019). Rethinking the interpretation of item discrimination and factor loadings. Educational and Psychological Measurement, 79(6), 1103–1132.

Jumini, J., & Retnawati, H. (2022). Estimating item parameters and student abilities: an IRT 2PL analysis of Mathematics examination. Al-Ishlah: Jurnal Pendidikan, 14(1), 385-398.

Kalkan, Ö. K., & Çuhadar, İ. (2020). An evaluation of 4PL IRT and DINA models for estimating pseudo-guessing and slipping parameters. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 131-146.

Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: a quality assurance test for an assessment tool. Medical Journal Armed Forces India, 77(1), S85-S89.

Lase, A. G. (2024). Penerapan asesmen formatif berbasis Quizizz untuk meningkatkan hasil belajar matematika siswa SMA Negeri 2 Medan. Education Journal: Journal Educational Research and Development, 8(2), 466-475.

Leenknecht, M., Wijnia, L., Köhlen, M., Fryer, L., Rikers, R., & Loyens, S. (2021). Formative assessment as practice: the role of students’ motivation. Assessment & Evaluation in Higher Education, 46(2), 236-255.

Lichtenberger, A., Hofer, S. I., Stern, E., & Vaterlaus, A. (2025). Enhanced conceptual understanding through formative assessment: results of a randomized controlled intervention study in physics classes. Educational Assessment, Evaluation and Accountability, 37(1), 5-33.

Lu, C., & Cutumisu, M. (2022). Online engagement and performance on formative assessments mediate the relationship between attendance and course performance. International Journal of Educational Technology in Higher Education, 19(1), 1-23.

Lubis, A., & Setiawan, A. (2025). Adaptive learning dalam desain instruksional: pendekatan strategis meningkatkan keterlibatan mahasiswa di e-learning perguruan tinggi. Pendas: Jurnal Ilmiah Pendidikan Dasar, 10(2), 780-792.

Meguellati, S., Samia, A., Ferhat, A., & Djelloul, A. (2024). A critical analysis of the use of Classical Test Theory (CTT) in psychological testing: a comparison with Item Response Theory (IRT). Pakistan Journal of Life & Social Sciences, 22(2), 9442-9449.

Menéndez, I. Y. C., Napa, M. A. C., Moreira, M. L. M., & Zambrano, G. G. V. (2019). The importance of formative assessment in the learning teaching process. International Journal of Social Sciences and Humanities, 3(2), 238-249.

Morris, R., Perry, T., & Wardle, L. (2021). Formative assessment and feedback for learning in higher education: a systematic review. Review of Education, 9(3), 1-26.

Munaroh, N. L. (2024). Asesmen dalam pendidikan: memahami konsep, fungsi dan penerapannya. Dewantara: Jurnal Pendidikan Sosial Humaniora, 3(3), 281-297.

Nisrina, P., Primawati, R. I., & Nahadi, N. (2025). Analysis of the implementation of formative assessment on students' conceptual understanding in chemistry learning. Hydrogen: Jurnal Kependidikan Kimia, 13(1), 174-185.

Pai, G. (2025). Using formative assessment and feedback from Student Response Systems (SRS) to revise statistics instruction and promote student growth for all. Journal of Statistics and Data Science Education, 33(1), 16-25.

Panadero, E., Andrade, H., & Brookhart, S. (2018). Fusing self-regulated learning and formative assessment: a roadmap of where we are, how we got here, and where we are going. The Australian Educational Researcher, 45(1), 13-31.

Prastikawati, E. F., Adeoye, M. A., & Ryan, J. C. (2024). Fostering effective teaching practices: integrating formative assessment and mentorship in Indonesian preservice teacher education. Indonesian Journal on Learning and Advanced Education (IJOLAE), 230-253.

Rezigalla, A. A., Eleragi, A. M. E. S. A., Elhussein, A. B., Alfaifi, J., ALGhamdi, M. A., Al Ameer, A. Y., ... & Adam, M. I. E. (2024). Item analysis: the impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items. BMC Medical Education, 24(1), 1-7.

Robitzsch, A. (2023). Comparing robust linking and regularized estimation for linking two groups in the 1PL and 2PL models in the presence of sparse uniform differential item functioning. Stats, 6(1), 192-208.

Saekoko, N., Benu, S., Oematan, I. W. A., & Pa, H. D. B. (2025). Peran evaluasi formatif dalam meningkatkan kualitas pembelajaran di era digital. Jurnal Ilmiah Literasi Indonesia, 1(2), 336-350.

Sharma, L. R. (2021). Analysis of difficulty index, discrimination index and distractor efficiency of multiple choice questions of speech sounds of English. International Research Journal of MMC, 2(1), 15-28.

Stanja, J., Gritz, W., Krugel, J., Hoppe, A., & Dannemann, S. (2023). Formative assessment strategies for students' conceptions—the potential of learning analytics. British Journal of Educational Technology, 54(1), 58-75.

Stenhaug, B. A., & Domingue, B. W. (2022). Predictive fit metrics for item response models. Applied Psychological Measurement, 46(2), 136-155.

Wafubwa, R. N. (2020). Role of formative assessment in improving students’ motivation, engagement, and achievement: A systematic review of literature. International Journal of Assessment and Evaluation, 28(1), 17-31.

Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “p< 0.05”. The American Statistician, 73(1), 1-19.

Wong, J. T., Richland, L. E., & Hughes, B. S. (2025). Immediate versus delayed low-stakes questioning: Encouraging the testing effect through embedded video questions to support students’ knowledge outcomes, self-regulation, and critical thinking. Technology, Knowledge and Learning, 30(3), 1421-1456.

Zainuddin, Z., Shujahat, M., Haruna, H., & Chu, S. K. W. (2020). The role of gamified e-quizzes on student learning and engagement: an interactive gamification solution for a formative assessment system. Computers and Education, 145(2020), 1-48.

Zhang, D., Wang, C., Yuan, T., Li, X., Yang, L., Huang, A., ... & Zhang, L. (2023). Psychometric properties of the Coronavirus Anxiety Scale based on Classical Test Theory (CTT) and Item Response Theory (IRT) models among Chinese front-line healthcare workers. BMC Psychology, 11(1), 1-10.

Published

2026-05-29

How to Cite

Ihsan, H., Rahayu, W., & Arthur, R. (2026). Psychometric properties of the formative assessment test for the statistics course. Inovasi Kurikulum, 23(2), 409-426. https://doi.org/10.64014/jik.v23i2.325

Similar Articles

1-10 of 65

You may also start an advanced similarity search for this article.