Faulty Assumption #2: Principals Should Conduct Classroom Observations of Teachers to Determine Effectiveness
The second faulty assumption forms the foundation of the entire evaluation process: that evaluations of teacher effectiveness should be based on having the principal observe them teach. It sounds reasonable, but it's probably the leading cause of ineffective teacher evaluations.
Principal observations have a long history in teacher evaluations. As outlined in Chapter 1, this line of thinking has been part of our appraisal systems from the beginning. The practice took a firmer grip with the movement to make principals "instructional leaders" starting in the early 1990s. A simple search in ERIC for "principal observations" starting in 2001 yields 1,605 queries. Race to the Top required principals to conduct multiple observations of teachers in a calendar year.
Observing teachers is not ineffective in and of itself. Teachers should be observed, and they should be given feedback. The really good teachers will use that feedback to get even better. The problem is that the whole system rests on observations made by the principal, and those observations are the determining factor in a teacher's rating.
Policymakers seem to place great faith in principals' ability to evaluate a good lesson, stamina to conduct multiple observations on every teacher, and skill to produce reliable evaluations for all the teachers in the building. Perhaps principals appreciate the vote of confidence. However, most administrators are not that good, or simply don't have time. Search for professional books on teacher evaluations, and most of what you will find are self-help programs for principals to manage their schedule and find the time needed to implement evaluations in a school. Other works focus on having critical conversations with teachers explaining how their rating was calculated. They are riddled with protocols for teacher goal setting, focused on narrowly improving a particular skill listed on the rubric, such as "Checking for Understanding."
Some texts and programs focus entirely on managing the documentation component of teacher evaluation systems. In fact, it has become almost impossible to implement one of these teacher evaluation systems without dedicated software and data storage capabilities. The workshops and programs rarely focus on teaching and its improvement: "Evaluation training focuses on the instrument" (Hazi, 2018, p. 200). We're spinning our wheels, not improving teaching or providing growth opportunities.
One person observing a teacher conducting a lesson in isolation is not an effective way to assess teacher effectiveness. It encourages individualistic success in a profession that requires collective efficacy. Teaching is a team sport.
Basing performance evaluation on one person's observation is especially problematic because all performance evaluations are subjective. Time and time again in the literature of performance evaluations, scholars have concluded that all performance evaluations are subjective. Measuring teacher skill is difficult. Evaluators risk biasing the results based on their personal feelings toward a teacher. Additionally, an evaluator may emphasize a skill that he or she values while downplaying another skill that he or she does not. The true quality of a teacher's performance cannot be accurately evaluated using a system that uses the observations of one person.
With teacher evaluations, we have done a good job of making the subjective objective—or so we think. Inter-rater reliability is near impossible. As we discussed in the previous chapter, inter-rater reliability, in multiple fields, is very low—when two or more people observe and rate a characteristic, they frequently rate it differently, even when provided with a clear rubric to base their rating on.
The chance that a principal observing a teacher will provide an evaluation that every skilled observer would agree with is about equivalent to the chance of a basketball player making a full-court shot—it might happen, but it's very unlikely. But we've staked our chances of improving our schools on the chance of making that full-court shot every time. If we had taken the time to develop good processes, we could have had a great chance at a layup.
The body of literature on this subject confirms the idea that this practice is not effective. Firestone et al. (2019) found that the effects of principal observations of teachers are weak. Principals' observations of teachers have little to no effect on their instructional improvement. Lavigne (2014) found that there is no evidence that high-stakes teacher evaluations can improve teacher effectiveness. There is limited research to support the improvement of teaching through principal observations. The vast amount of research has taught us two things: most teachers are rated effective and student achievement is not increasing, even after evaluation reforms have been implemented. Teachers are begging for a more inclusive evaluation system. Principal observations are dreadful for the principal and even more so for the teacher. "There's something deeply personal about appraisals of our teaching," Minkel (2018) has stated. "It's not just our professional competence that's wrapped up in an observation, but a sense of our worth as human beings." The intended purposes of principal observations have not been well served in practice. Teachers dislike them and principals have a hard time navigating the human consequences. Minkel, a former Arkansas Teacher of the Year, deducted that educational administrators can improve observations by focusing on student engagement and creativity, rather than on narrow, less impactful actions such as "the obedient inscription of every lesson's objective on the board" (Minkel, 2018).
Principals themselves can reflect on this assumption—how much impact do your observations have on the instructional quality of your school? If you're like most, you'll say observations have little to no impact.
We're using up our precious time on low-impact actions. Teaching by its very nature cannot be standardized—it is fluid and requires creativity and adaptability. A collaborative conversation with peers has more impact on the improvement of teaching practice than principal observations.