First Advisor

Hutchinson, Susan

Document Type

Dissertation

Date Created

12-2022

Embargo Date

12-2024

Abstract

The rise of performance-based assessments in high-stakes licensure examinations has brought many challenges in the field of measurement. Although examiners are commonly trained to apply a standardized rating of examinees, the introduction of human subjectivity to the measurement process poses a threat to the reliability and validity of scores from these assessments (Wind & Guo, 2021). Considering the high-stakes decisions based on performance-based assessments when they are used for licensure, it is essential to provide evidence of the consequences that rater effects have on decision consistency and decision accuracy, within and across rating designs. The purpose of this dissertation study was to examine the impact of rater effects on decision consistency and decision accuracy in high-stakes performance-based assessments and the extent to which the effects of rater characteristics on decision consistency and decision accuracy vary within and across different rating designs, under the Many-Facet Rasch measurement framework. This dissertation was composed of two studies (i.e., a simulation study and an empirical study). Using Monte Carlo simulation techniques, percentages of examiners exhibiting four types of rater effects (i.e., leniency, severity, central tendency, and halo) and rating designs (e.g., fully crossed rating design, nested rating designs) were varied in a factorial design and their effects on decision consistency and decision accuracy were assessed. A series of factorial ANOVAs was conducted to examine the impact of rater effects, rating designs, and number of proficiency levels on decision consistency and decision accuracy. The results of the simulation study showed that the impact of halo effect on decision consistency and decision accuracy based on classification into two and three proficiency levels was accentuated when leniency and/or severity effects were also present in the simulated data. Particularly, leniency and severity were found to be most detrimental of the four rater effects simulated in this present study and they had the greatest impact on decision accuracy estimates in the nested rating designs when examinees’ classifications were based on three performance levels. The findings of the empirical study reinforced that decision consistency estimates are higher when examinees are classified into two proficiency levels compared to when examinees were classified into three proficiency levels. The empirical study also highlights the fact that assessment with higher decision consistency estimates, under the Many-Facet Rasch measurement framework, may not be the one with the best model-data fit indices. The observed impact of rater effects on decision consistency and decision accuracy highlights the fact that observed scores should not be taken at face value, especially when high-stakes decisions about examinees are involved, in order to ensure fairness of the performance-based assessment in the event some examiners exhibit any rater effects. Overall, the results of both the simulation and empirical studies provided insights that could be explored in future research.

Extent

289 pages

Local Identifiers

Edi_unco_0161D_11051.pdf

Rights Statement

Copyright is held by the author.

Available for download on Sunday, December 01, 2024

Share

COinS