Desktop version

Home arrow Education arrow Handbook of Test Development



  • • Staff logs out test taker when session is completed or time expires
  • • Staff ensures test session closes after each administration
  • • Staff collects all scratch paper/other materials and immediately locks up or shreds
  • • Staff provides test taker with required documents issued by test system upon completion of test session
  • • Staff does not attempt to interpret results reports issued to test taker.


The interpretability and usefulness of test scores rely in part on the expectation that every test administration is conducted under the same, standardized conditions of measurement. Standardization in test administrations is vital because it helps to reduce opportunities for the introduction of CIV Many of the validity threats discussed in this chapter can be minimized by identifying potential sources of CIV and working to prevent them from occurring in the first place.


Allalouf, A. (2007). Quality control procedures in the scoring, equating, and reporting of test scores. Educational Measurement: Issues and Practice, 26(1), 36—43. doi:10.1111/j.1745—3992.2007.00087.x

American Educational Research Association, American Psychological Association & National Council on Measurement in Education. (2014). S tandards for educational and psychological testing. Washington, DC: American Educational Research Association.

Brockly, M. E. (2013). The role of test administrator and error. Open Access Theses, Paper 13. Retrieved from

Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.). New York, NY: Harper & Row.

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443—507). Washington, DC: American Council on Education.

Crooks, T. J., Kane, M. T., & Cohen, A. S. (1996). Threats to the valid use of assessment. Assessment in Educationt

3(3), 265-285. doi:10.1080/0969594960030302

Drasgow, F., Luecht, R. M., & Bennett, R. E. (2006). Technology and testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 471-515). Westport, CT: American Council on Education and Praeger.

Fuchs, D., & Fuchs, L. S. (1986). Test procedure bias: A meta-analysis of examiner familiarity effects. Review of Educational Researcht 56(2), 243-262. doi:10.3102/00346543056002243

Government Accountability Office. (2013). K-12 education: States’ test security policies and procedures varied (Report No. GAO-13-495R). Washington, DC: U.S. Government Accountability Office. Retrieved from

Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evidence. Educational Measurement: Issues and Practicet 18(4), 5-9. doi:10.1111/j.1745-3992.1999.tb00276.x

Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practicet 23(1), 17-27. doi:10.1111/j.1745-3992.2004.tb00149.x

Haladyna, T. M., Nolen, S. B., & Haas, N. A. (1991). Raising standardized achievement test scores and the origins of test score pollution. Educational Researchert 20(5), 2-7. doi:10.3102/0013189X020005002

Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice, 20(3), 16-25. doi:10.1111/j.1745-3992.2001.tb00066.x Impara, J. C., & Foster, D. (2006). Item and test development strategies to minimize test fraud. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 91-114). Mahwah, NJ: Lawrence Erlbaum.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Westport, CT: American Council on Education and Praeger.

Kane, M. T. (2011). Errors of our ways. Journal of Educational Measurement, 48(1), 12-30. doi:10.1111 /j. 1745-3984.2010.00128.x

Kane, M. T (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement , 50(1), 1-73. doi:10.1111/jedm.12000

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley.

Lovett, B. J. (2010). Extended time testing accommodations for students with disabilities: Answers to five fundamental questions. Review of Educational Research , 80(4), 611-638. doi:10.3102/0034654310364063 Lu, Y, & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29-37. doi:10.1111/j.1745-3992.2007.00106.x

Luecht, R. M., & Sireci, S. G. (2011). A review of models for computer-based testing (Research Report No. 2011-12). New York, NY: College Board.

McCallin, R. C. (2006). Test administration. In S. M. Downing & T M. Haladyna (Eds.), Handbook of test development (pp. 625-652). Mahwah, NJ: Lawrence Erlbaum.

Messick, S. (1984). The psychology of educational measurement. Journal of Educational Measurement, 21(3), 215-237. doi:10.1111/j.1745-3984.1984.tb01030.x

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments.

Educational Researcher, 23(2), 13-23. doi:10.3102/0013189X023002013 National Center for Education Statistics. (2013). Testing integrity: Issues and recommendations for best practice (Report No. 2013454). Washington, DC: U.S. Department of Education: Institute of Education Sciences. Retrieved from National Council on Measurement in Education. (2012, October). Testing and data integrity in the administration of statewide student assessment programs. Madison, WI: Author.

Noll, V. H. (1965). Introduction to educational measurement (2nd ed.). Boston, MA: Houghton Mifflin.

Peters, H. J. (1959). Some key sources of error in test administration. The Clearing House, 34(3), 161-164.

Retrieved from Pitoniak, M. J., & Royer, J. M. (2001). Testing accommodations for examinees with disabilities: A review of psychometric, legal, and social policy issues. Review of Educational Research, 71(1), 53-104. doi:10.3102/ 00346543071001053

Pommerich, M. (2004). Developing computerized versions of paper-and-pencil tests: Mode effects for passage- based tests. Journal ofTechnology, Learning, and Assessment, 2(6). Retrieved from Popham, W. J. (2003). Seeking redemption for our psychometric sins. Educational Measurement: Issues and Practice, 22(1), 45-48. doi:10.1111/j.1745-3992.2003.tb00117.x Rossett, A., & Gautier-Downes, J. (1991). A handbook of job aids. San Francisco, CA: Pfeiffer.

Saad, L. (2014, October 31). Teachers concerned about common core’s computer testing. Gallup Poll, August 11, 2014—September 7, 2014, #179102 [Data set]. Washington, DC: Gallup World Headquarters. Retrieved from Sireci, S. G., Scarpati, S. E., & Li, S. (2005). Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research , 75(4), 457-490. doi:10.3102/00346543075004457 Traxler, A. E. (1951). Administering and scoring the objective test. In E. F. Lindquist (Ed.), Educational measurement (pp. 329-416). Washington, DC: American Council on Education.

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >

Related topics