Hero Image

Summative Assessment in Disarray

Well, my updates here have stalled. There's actually, I think, a pretty good reason for this, and that is that SARS-CoV-2 has wreaked havoc on my dissertation plans. If you've been following along, my original intent was to explore the intersection of Indigenous Education and technology, from the perspective of a white settler seeking connection and reciprocal understanding. Well, when COVID-19 became a thing and on-site and face-to-face research activities were halted in March 2020, my plans evaporated. Indigenist research really needs to happen in the context of meaningful and reciprocal relationships, and the barriers to that increased substantially with the restrictions enacted by universities worldwide. So my own personal pivot became required.

One of the things I noticed very early in the pivot to emergency remote teaching in March was that there is a deep and persistent lack of assessment literacy in higher ed. There was, and likely still is, an assumption that summative assessments in courses taught by individual faculty ought to be modelled after large-scale assessments like the SAT, GMAT, MCAT, or NCLEX and their claimed high validity and reliability scores. There are clear and obvious pressures to use selected-response items in summative assessments in that they are perceived (likely wrongly) to be more objective, and they are certainly easier to score. Often, if these assessments are delivered in a technology-mediated environment, scoring is instantaneous and requires zero intervention from the assessor. In times when faculty time is at an absolute premium, this model represents a significant reduction in opportunity cost for faculty who would much rather spend their time on other pursuits.

This led me to engage in one more course before my candidacy efforts. The course I took this fall was EDPY507 at UAlberta (raise a glass for the Western Deans' Agreement!). EDPY507 is an introductory course in Test Theory, covering Classical Test Theory and some Item Response Theory Models as well as modern test administration, standard setting, and other ideas. I can happily report that after a bit of a rough start, I completed the course relatively well and learned a tonne about measurement and evaluation in psychology and education.

The primary take-away for me was a confirmation that there is a lack of assessment literacy among higher ed faculty. This is not to dunk on faculty, because it is a symptom of a long history of prioritizing research over teaching in the professoriate, not to mention the ongoing adjunctification of higher ed. In short, faculty teach in the same manner that they were taught, and they assess their learners in the same manner that they were assessed.

This all leads me to an article that I just finished reading that outlines some of the many problems with summative assessment practices in higher ed.

Knight, P. T. (2002). Summative Assessment in Higher Education: Practices in disarray. Studies in Higher Education, 27(3), 275–286. https://doi.org/10/b25nb2

I highly recommend this article as a starting point if you are thinking about assessment in higher ed.

Knight's main point, as I understand it, is that, despite advances in statistical procedures to make visible the psychometric qualities of measurement instruments, summative assessments, including large-scale assessments and faculty-created course-based summative assessments, are fundamentally unreliable.

Of particular note is that summative assessments are asked to do far more work with far greater precision than they are capable of doing. Knight argues that this reliance on unreliable measures that are sold as being reliable (through the accreditation process) is unethical and exposed to legal challenge. For example, if a person completes a degree in nursing, but their skills are assessed using unreliable measures, then there can be no confidence that they will be adequately prepared for the demands of the profession.

This lack of precision is a result of the fact that summative assessment practice compromises three key characteristics of assessments that are consistent with the learning that has been completed

They have to be faithful to the curriculum (charged with developing understandings, skills, self-theories and reflectiveness). They must align with the notion that education is concerned with some degree of abstraction, generalisation or transfer. They should not impede student engagement in communities of practice, but should encourage behaviours associated with good learning. (p. 276)

Knight quotes Entwistle

The single, strongest influence on learning is surely the assessment procedures ... even the form of an examination question or essay topics set can affect how students study ... It is also important to remember that entrenched attitudes which support traditional methods of teaching and assessment are hard to change. (p. 111-112)

Entwistle, N. (1996) Recent research on student learning, in: J. Tait & P. Knight (Eds) The Management of Independent Learning, pp. 97–112 (London, Kogan Page).

The importance of assessment practices in learning and in providing warrant for inferences based on assessment instruments, combined with the disarray noted by Knight in 2002 (which I suspect is magnified today as faculty and higher ed institutions grapple with the effects of COVID-19 and moving teaching and learning interactions online) lead me to believe that this will be a tremendously fruitful and important line of investigation for my dissertation.

Knight concludes

Better, I suggest, to explore assessment as complex systems of communication, as practices of sense-making and claim-making. This is about placing psychometrics under erasure while revaluing assessment practices as primarily communicative practices. (p. 286)

I concur.