Teacher Evaluations – The good, the bad and the ridiculous

BY: James Hamric

On May 5, 2014, the Texas Education Agency posted some details on the new Teacher Evaluation and Support System (TESS) that has been submitted to the US Department of Education. This submission was required under the terms of Texas receiving a waiver from many of the more extreme requirements of No Child Left Behind.

This new evaluation system is slated to replace the Professional Development and Appraisal System (PDAS) that has been in use by over 80% of the districts since 1997. Commissioner Michael Williams justified this change calling PDAS ‘outdated’ and of little value of providing real feedback to educators.¹ While I have no opinion on the ‘outdated’ nature of PDAS, my concerns with either PDAS or TESS will be outlined below.

According to the details posted on the TEA website, 80% of TESS will be rubric based evaluations consisting of formal observations, self assessment and professional development across six domains. These six domains have innocuous titles and each contain anywhere from three to six sub-categories that, in turn, have multiple bullet points.² I do not have an issue with having some kind of rubric or instrument…indeed I do not believe most dedicated teachers would have any concern over having an evaluation instrument. As professionals, we tend to crave constructive feedback and are anxious to continually become better at our craft. The concern, in my opinion, lays not in the what, but in the who.

The administrators who are tasked with conducting the evaluations must have adequate training – and not just in how to do the evaluation itself. Ideally, the evaluators should be knowledgeable in the subject they are observing in addition to having a strong background in pedagogy, preferably in the form of extensive experience in the classroom. In my early years as a teacher, I was observed and evaluated by my principal who had an advanced degree in science and had spent many years as a classroom teacher. Some of those evaluations were tough to discuss, but they provided valuable feedback and helped me become the teacher I am today. In the last couple years, I have received ‘exceeds expectations’ on formal observations from two different administrators. That must mean I have markedly improved my teaching from those first couple years, right? Well, yes, I have…but I don’t think it tells the whole story.

I have had administrators tell me that they had no idea specifically what I was teaching that day but that it ‘sounded good’. I have had an administrator give me ‘exceeds expectations’ after observing a math lesson when the same administrator had previously asked a colleague how to find a percentage given two numbers. My apologies if that ‘exceeds expectations’ didn’t mean a whole lot to me personally. Don’t get me wrong, I’ll certainly put those evaluations on every job application I ever fill out. But I would sure like to know how an administrator that was highly qualified in math would evaluate those same lessons.

The remaining 20% of TESS ‘will be reflected in a student growth measure at the individual teacher level that will include a value-add score based on student growth as measured by state assessments.’ These value added measures (VAMs) will only apply to approximately one quarter of the teachers – those that teach testable subjects/grades. For all the other teachers, local districts will have flexibility for the remaining 20% of the evaluation score.¹ While the lack of consistency between districts for the 75% percent of teachers not under the VAM umbrella is a point of concern, the following discussion will focus on core teachers that will have a part of their evaluation come from VAMs.
In assessing the veracity of using VAMs to evaluate teachers, Jane David, with the Association for Supervision and Curriculum Development (ASCD) analyzed several studies. Ms. David addresses several concerns with regards to fairness of VAMs such as if “test score gains are biased because students are not randomly assigned to teachers”, if different assessment instruments could lead to dramatically different effectiveness ratings and the overall stability of effectiveness ratings over longer periods. Research seems to conclude that each of these concerns is valid and should be considered when making a final decision on whether to include VAMs in teacher evaluations. Ms. Davis also notes two different studies that suggest VAMs do not do a substantially better job of evaluating effectiveness when compared to standard, subjective evaluations.

On April 8, 2014, the American Statistical Association issued a statement strongly cautioning against the use of VAMs for high-stakes decisions in the education realm:

The American Statistical Association (ASA) makes the following recommendations regarding the use of VAMs:
• The ASA endorses wise use of data, statistical models, and designed experiments for improving the quality of education.
• VAMs are complex statistical models, and high-level statistical expertise is needed to develop the models and interpret their results.
• Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.
o VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.
o VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
o Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.
• VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.⁴

The ASA statement is careful to note that there has been consistent research that growth that is both measurable and related to the classroom teacher actually makes up a small percentage of total variation. It is also established that VAMs have large standard errors, making “rankings unstable, even under the best scenarios for modeling,”⁴ even when multiple years of data are taken into account. Finally, while VAMs may provide families, schools and districts with general areas that are in need of improvement, they do not provide a way to actually make that improvement a reality.
The underlying mathematics of VAMs would be considered esoteric by the vast majority of people. Heck, I have a degree in Mathematics, consider myself to be a pretty intelligent guy, and I don’t fully understand the intricacies of these things. I am, however, aware of the substantial error that is inherent in complex models. This error is only exacerbated by adding multiple levels to the model (student to classroom to school). A friend and colleague of mine who is nearing completion of his PhD in Education Psychology with a concentration in Measurement, put it this way: “The models are sound, but, in my opinion, only in the natural sciences and perhaps some of the social sciences. We are talking about human beings as it relates to standardized testing, and it just isn’t as easy to predict.”⁵ Because of that I would not feel comfortable being put into a position of making high stakes personnel decisions based, even partially, on VAMs. Imagine a Human Resources Director, with little to no upper level math or statistical background, making those same decisions to retain or terminate a teacher.

I remember having a conversation with Karen Lewis, President of the Chicago Teachers Union, at the national conference of the Network for Public Education that was held in Austin, Texas, in March of this year. She agreed that the vast majority of educators want constructive feedback, almost to a fault. As long as the administrator is well trained and qualified, a rubric based evaluation should be sufficient to assess the effectiveness of a teacher. While the mathematical validity of value added models are accepted in more economic and concrete realms, they should not be even a small part of educator evaluations and certainly not any part of high-stakes decisions as to continuing employment. It is my hope that, as Texas rolls out TESS in pilot districts in the 2014-2015 school year, serious consideration will be given to removing the VAM component completely.

References:
¹http://www.tea.state.tx.us/index4.aspx?id=25769811000
²http://txcc.sedl.org/our_work/tx_educator_evaluation/index.php
³http://www.ascd.org/publications/educational_leadership/may10/vol67/num08/Using_Value-Added_Measures_to_Evaluate_Teachers.aspx
⁴http://www.amstat.org/policy/pdfs/ASA_VAM_Statement.pdf
⁵Text message from colleague May 12, 2014

This article can be found on James Hamric's blog here http://edreformblog.wordpress.com/2014/05/13/teacher-evaluations-the-good-the-bad-and-the-ridiculous/

Badass Teachers Association Blog

Pages

Wednesday, May 14, 2014

Teacher Evaluations – The good, the bad and the ridiculous

BY: James Hamric

No comments:

Post a Comment