Quality of Scholastic Tests in the New Student Admission Selection (SPMB) at Universitas Negeri Surabaya in the 2023 Academic Year

Tri Rijanto; Edy Sulistyo; Joko; Puput Wanarti Rusimamto

doi:10.58526/jsret.v5i2.984

Authors

Tri Rijanto State University of Surabaya, East Java, Indonesia
Edy Sulistyo State University of Surabaya, East Java, Indonesia
Joko State University of Surabaya, East Java, Indonesia
Puput Wanarti Rusimamto State University of Surabaya, East Java, Indonesia

DOI:

https://doi.org/10.58526/jsret.v5i2.984

Keywords:

admission testing, psychometric evaluation, test reliability, item analysis, Classical Test Theory

Abstract

Scholastic tests play a critical role in university admission systems, serving as gatekeeping mechanisms that influence both institutional quality and educational equity. Despite their widespread use, empirical evaluations of institutional admission test quality remain limited, particularly in developing country contexts. This study aimed to comprehensively evaluate the psychometric quality of the Scholastic Potential and Basic Ability Test (SPMB) administered at Universitas Negeri Surabaya during the 2023 academic year, examining reliability, item difficulty, discrimination indices, and distractor effectiveness. A quantitative descriptive research design using ex-post facto analysis was employed. The study analyzed test response data from 270 candidates who completed the 45-item SPMB, consisting of three subtests: Verbal Ability (15 items), Numerical and Reasoning Ability (15 items), and Figural Comprehension Ability (15 items). Data analysis utilized Classical Test Theory frameworks, calculating Kuder-Richardson Formula 20 (KR-20) reliability coefficients, item difficulty indices (p-values), point-biserial discrimination coefficients (rpbis), upper-lower 27% discrimination indices (D), and distractor effectiveness metrics using SPSS 26.0 and ITEMAN 4.3 software. The total test demonstrated good internal consistency reliability (KR-20 = 0.84) with a mean score of 25.84 (SD = 6.78, 57.42% of maximum). Approximately 62% of items exhibited optimal moderate difficulty (0.40 ≤ p < 0.80), and 73% demonstrated good-to-excellent discrimination (rpbis ≥ 0.30). However, three items showed poor discrimination (rpbis < 0.20), 22 distractors were non-functional (16.30%), and six distractors exhibited problematic positive discrimination (4.44%). Subtest reliabilities ranged from 0.70 to 0.75, classified as acceptable. The SPMB demonstrated generally satisfactory psychometric quality but requires targeted improvements through systematic item revision, enhanced item writer training, and continuous quality monitoring. Findings provide actionable guidance for evidence-based test refinement and contribute empirical evidence to admission testing literature in Southeast Asian higher education contexts.

Downloads

Download data is not yet available.

References

Ackerman, T., Ma, Y., Ma, M., Pacico, J. C., Wang, Y., Xu, G., Ye, T., Zhang, J., & Zheng, M. (2022). Item Response Theory. In International Encyclopedia of Education: Fourth Edition. https://doi.org/10.1016/B978-0-12-818630-5.10010-7

Brookhart, S. M., & McMillan, J. H. (2019). Classroom Assessment and Educational Measurement. In Classroom Assessment and Educational Measurement. https://doi.org/10.4324/9780429507533

Camara, W. J., & Echternacht, G. (2000). The SAT I[R] and High School Grades: Utility in Predicting Success in College. College Board Research Report.

Claudia-Nicoleta Păun, C.-N. P., & Adrian Costea, A. C. (2025). The Impact of Teacher Quality on Student Achievement: A Quantitative Analysis. International Journal of Advances in Engineering and Management. https://doi.org/10.35629/5252-0707368376

DiCerbo, K. (2019). Psychometric Methods: Theory into Practice. Measurement: Interdisciplinary Research and Perspectives. https://doi.org/10.1080/15366367.2018.1521190

Downing, S. M. (2004). the metric of medical education Reliability : on the reproducibility of assessment data. Medical Education.

Glewwe, P., & Kremer, M. (2006). Chapter 16 Schools, Teachers, and Education Outcomes in Developing Countries. In Handbook of the Economics of Education. https://doi.org/10.1016/S1574-0692(06)02016-2

Gronlund, N. E. (1965). Measurement & Evaluation in Teaching. Bioedukasi.

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. In Developing and Validating Test Items. https://doi.org/10.4324/9780203850381

Hambleton, R. K., & Swaminathan, H. (2013). Item Response Theory: Principles and Applications. Journal of Chemical Information and Modeling.

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of the Pakistan Medical Association.

Jeffrey, R. (2017). Validity in educational and psychological assessment. Educational Review. https://doi.org/10.1080/00131911.2017.1291210

Kellaghan, T., & Greaney, V. (2019). Public Examinations Examined. In Public Examinations Examined. https://doi.org/10.1596/978-1-4648-1418-1

Kuncel, N. R., & Hezlett, S. A. (2010). Fact and fiction in cognitive ability testing for admissions and hiring decisions. Current Directions in Psychological Science. https://doi.org/10.1177/0963721410389459

Lane, S. (2015). Handbook of Test Development. In Handbook of Test Development. https://doi.org/10.4324/9780203102961

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist. https://doi.org/10.1037/0003-066X.50.9.741

Mohsen, T., & Reg, D. (2011). Making Sense of Cronbach’s Alpha. International Journal of Medical Education.

Mountford-Zimdars, A. (2018). Who gets in?: strategies for fair and effective college admissions. British Journal of Educational Studies.

Nguyen, T. D., Cannata, M., & Miller, J. (2018). Understanding student behavioral engagement: Importance of student interaction with peers and teachers. Journal of Educational Research. https://doi.org/10.1080/00220671.2016.1220359

Paniagua, M., & Swygert, K. (2016). Constructing Written Test Questions For the Basic and Clinical Sciences. Director.

Pekrun, R., Lichtenfeld, S., Marsh, H. W., Murayama, K., & Goetz, T. (2017). Achievement Emotions and Academic Performance: Longitudinal Models of Reciprocal Effects. Child Development. https://doi.org/10.1111/cdev.12704

Posselt, J. R. (2016). Inside Graduate Admissions. In Inside Graduate Admissions. https://doi.org/10.4159/9780674915640

Rupp, A. A., & Böhme, K. (2008). Handbook of Test Development. International Journal of Testing. https://doi.org/10.1080/15305050701813433

Sackett, P. R., Kuncel, N. R., Arneson, J. J., Cooper, S. R., & Waters, S. D. (2009). Does Socioeconomic Status Explain the Relationship Between Admissions Tests and Post-Secondary Academic Performance? Psychological Bulletin. https://doi.org/10.1037/a0013978

Sackett, P. R., Kuncel, N. R., Beatty, A. S., Rigdon, J. L., Shen, W., & Kiger, T. B. (2012). The Role of Socioeconomic Status in SAT-Grade Relationships and in College Admissions Decisions. Psychological Science. https://doi.org/10.1177/0956797612438732

Schult, J., & Sparfeldt, J. R. (2016). Do non-g factors of cognitive ability tests align with specific academic achievements? A combined bifactor modeling approach. Intelligence. https://doi.org/10.1016/j.intell.2016.08.004

Sireci, S., & Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema. https://doi.org/10.7334/psicothema2013.256

Sireci, S. G. (2021). Valuing Educational Measurement. Educational Measurement-Issues and Practice.

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Medical Education. https://doi.org/10.1186/1472-6920-9-40

Vahrenhold, J., & Paul, W. (2014). Developing and validating test items for first-year computer science courses. Computer Science Education. https://doi.org/10.1080/08993408.2014.970782

Xitao, F. (1998). Item response theory and classical test theory : an empirical comparison of their item/peson statistics. Educational and Psychological Measurement.

Zainuddin, Z., Shujahat, M., Haruna, H., & Chu, S. K. W. (2020). The role of gamified e-quizzes on student learning and engagement: An interactive gamification solution for a formative assessment system. Computers &education. https://www.sciencedirect.com/science/article/pii/S0360131519302829

Zumbo, B. D. (2007). Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going. Language Assessment Quarterly. https://doi.org/10.1080/15434300701375832