Similarity of the cut score in test sets with different item amounts using the modified Angoff, modified Ebel, and Hofstee standard-setting methods for the Korean Medical Licensing Examination | Zendy

Janghee Park | Zendy; Mi Kyoung Yim | Zendy; Na Jin Kim | Zendy; Duck Sun Ahn | Zendy; YoungMin Kim | Zendy

Open Access

Similarity of the cut score in test sets with different item amounts using the modified Angoff, modified Ebel, and Hofstee standard-setting methods for the Korean Medical Licensing Examination

Author(s) -

Janghee Park,

Mi Kyoung Yim,

Na Jin Kim,

Duck Sun Ahn,

YoungMin Kim

Publication year - 2020

Publication title -

journal of educational evaluation for health professions

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.397

H-Index - 9

ISSN - 1975-5937

DOI - 10.3352/jeehp.2020.17.28

Subject(s) - set (abstract data type) , statistics , similarity (geometry) , test (biology) , reliability (semiconductor) , mathematics , significant difference , item analysis , natural language processing , psychology , computer science , artificial intelligence , psychometrics , paleontology , power (physics) , physics , quantum mechanics , image (mathematics) , biology , programming language

Purpose The Korea Medical Licensing Exam (KMLE) typically contains a large number of items. The purpose of this study was to investigate whether there is a difference in the cut score between evaluating all items of the exam and evaluating only some items when conducting standard-setting. Methods We divided the item sets that appeared on 3 recent KMLEs for the past 3 years into 4 subsets of each year of 25% each based on their item content categories, discrimination index, and difficulty index. The entire panel of 15 members assessed all the items (360 items, 100%) of the year 2017. In split-half set 1, each item set contained 184 (51%) items of year 2018 and each set from split-half set 2 contained 182 (51%) items of the year 2019 using the same method. We used the modified Angoff, modified Ebel, and Hofstee methods in the standard-setting process. Results Less than a 1% cut score difference was observed when the same method was used to stratify item subsets containing 25%, 51%, or 100% of the entire set. When rating fewer items, higher rater reliability was observed. Conclusion When the entire item set was divided into equivalent subsets, assessing the exam using a portion of the item set (90 out of 360 items) yielded similar cut scores to those derived using the entire item set. There was a higher correlation between panelists’ individual assessments and the overall assessments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research