Desktop version

Home arrow Education arrow Handbook of Test Development


Criteria for Two Types of Evaluation of Difficulty and Discrimination

The purpose of a field test (tryout) of items determines the eligibility of any item for future test forms.

Criterion-Referenced Evaluation

For measuring achievement, some items will have p-values above 0.90. These items will not be discriminating. However, if SMEs determine an item measures important content, then that item is retained, despite its easiness. In other words, psychometric item data does not trump the judgment of SMEs.

Norm-Reference Evaluation

For measuring achievement or ability that we think is normally distributed for the purposes of discriminating among examinees, items with the highest discrimination are preferred. Higher discrimination increases reliability and reduces random error, so that measurement is more precise. Items with p-values above 0.90 are usually rejected no matter how important the content is.

Item Discrimination and Dimensionality

As stated previously in this chapter, the definition of content and the work of SMEs are vital for understanding the dimensionality of the ability being measured. Item discrimination is most accurate when the total test score is highly correlated to item performance. That is, the point-biserial correlation is high. If the ability being measured is multidimensional, an item analysis using total score as the criterion will accurately estimate difficulty but not discrimination.

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >

Related topics