The Challenge of Subjective Marking

Ever notice how marking writing doesn’t always produce the results teachers expect? Teachers can come to different judgements about similar pieces of work, even with the same task and criteria.

Have you ever wondered why?

Rubrics are structured and they appear very precise, but in practice the wording is open to interpretation. Rubrics ask teachers to interpret written descriptors and then decide how closely a student’s work matches those descriptions. Phrases like “developing control” or “insightful ideas” can mean different things to different people.

Research has shown that even when rubrics are carefully designed, translating from words to scores is difficult for teachers.

An alternative is to remove that intermediate step and work directly with student samples. When teachers can see actual samples, they don’t need to interpret descriptions of performances. They can simply compare performances with performances.

A Brightpath scale is built by comparing and scaling hundreds of real student work samples to determine how they sit in relation to one another. This process produces calibrated exemplars — anchor samples at different points on the scale.

When teachers mark with Brightpath, their task is simply to work out where their student’s work fits among this already-ordered set of performances.

W5 - Assessments View-6795aa.jpg

Large-scale assessments, like NAPLAN, rely on external marking. External marking is reliable if the markers are well trained, work to a strict rubric, and use exemplars to guide their decisions. This consistency is important for system-level reporting but it has almost no direct impact on a student’s learning. By the time external results come back, the opportunity for feedback and growth has passed.

The person who can make the biggest difference to a student’s progress is the teacher. However, teacher judgement is only effective if it is reliable. Brightpath was designed to address this problem directly — supporting consistent, reliable judgements while keeping assessment embedded in everyday classroom practice.

But reliability on its own is not enough. For teacher judgement to translate into effective teaching, teachers need to understand not just how good a piece of work is, but what the next step looks like.

Brightpath was designed to assist teachers in this endeavour. Because teachers judge a student’s work against exemplars arranged in order from least advanced writing to most advanced, they can immediately see what students slightly further along are doing differently. This makes learning visible. To assist teachers with next steps, Brightpath provides teaching points. Teaching points describe the features evident in the work of students just above the one being assessed — in other words, they identify the skills and understandings that sit within a student’s zone of proximal development.

w1-reports-5fce0a.jpg

Figure 2: Brightpath’s Teaching Points Report. Student performances—represented by coloured bubbles—are shown along the ruler. Pressing on a bubble shows the student’s performance. To the left, Teaching Points describe the students’ zones of proximal development. Our calibrated exemplars, as well as the other student performances from your school, provide concrete examples of performances at each level of ability.

Viewing scaled work samples along with teaching points gives actionable insights in a form teachers already recognize from rubrics, but with one crucial difference. Each teaching point is anchored to real student exemplars and the annotations that accompany them. Teachers can see exactly what these skills look like in practice, rather than relying only on descriptions of performances.

The Problem of Inconsistency Across Schools

Even when teachers use the same rubric, the same task, and the same assessment conditions, student work can still be judged differently from one classroom to the next. This isn’t a reflection of the quality of teacher judgement; it reflects the inherent limitations of rubric-based marking.

Because rubrics rely on descriptive language, teachers inevitably bring their own interpretation to each score category. A phrase like “clear explanation” can be interpreted slightly differently in each classroom. So, even if student work crosses the corridor to another teacher, the interpretation can shift. Variation can become even more noticeable across different schools.

To learn more about the limitations of rubrics, see: Our research examining rubrics to assess writing.

When this kind of variation matters, schools look for ways to bring judgements back into alignment.

This is one of the reasons many schools invest time in moderation. The purpose of moderation is to ensure that a piece of student work would receive the same mark from another teacher — that is what we mean by a reliable judgement. Reliability is simply consistency: the same teacher scoring the same work the same way on different occasions, or different teachers scoring it similarly.

Brightpath’s Calibrated Exemplars and Scales

On the surface it may seem that teachers can just collect new student work samples themselves. Having samples, however, is only part of the challenge. The more difficult task is ordering those samples meaningfully, especially through the middle of the bell curve, where most student work tends to cluster. At the extremes, differences are typically quite clear. In the middle, where many performances are similar, deciding which work is stronger becomes much harder.

One well-established way of achieving consistent ordering is through paired comparisons (also known as comparative judgements). Rather than asking judges to rank many pieces of work at once, paired comparison reduces the task to a simpler decision: which of these two samples demonstrates stronger performance? When this process is repeated across many judges and many pairs, the resulting judgements are usually highly consistent. The difficulty is that paired comparisons are extremely time-consuming and impractical to carry out for everyday classroom assessment.

Screenshot 2026-02-26 at 1.31.16 pm.png

Calibration exists to address this problem by pre-ordering writing performances. In practice, calibration involves conducting large numbers of paired comparisons in advance. Teachers make these judgements independently, without knowing how others have responded, and statistical analyses are used to test whether samples can be reliably ordered. This work is done once, rigorously, so that teachers do not need to repeat it themselves.

Once the paired comparisons are complete, the information they generate is used to construct a scale. Scaling does not simply line samples up mechanically. It allows the evidence from thousands of comparisons to be combined into an internally consistent ordering, which is then carefully reviewed. Ordered work samples are vetted to ensure that the progression makes sense in qualitative detail, not just in statistical terms. The result is a set of calibrated exemplars — anchor points along a scale that represent increasing levels of performance.

This process provides the foundation for reliable judgements. When teachers use a Brightpath scale, they can see exemplars ordered from weaker to stronger performance levels and place their own students’ work at the appropriate point along the scale.

Assisted Marking Tools and Predictive Scoring

Direct comparisons using calibrated exemplars are reliable and more direct than rubric-based marking, but they still require time and careful attention from teachers. Today, however, Artificial Intelligence tools make it possible to reduce that burden without removing professional judgement from the process. Marking does not need to be done manually simply for its own sake when well-designed tools can assist by narrowing the field and pointing teachers toward likely judgements.

For this reason, Brightpath implements an automated assisted marking tool to support teacher decision-making. Teachers see automated predictions of where student performances sit on a scale, but they can confirm or override the prediction based on their professional judgement. Automation handles much of the heavy lifting, leaving teachers to focus on applying expertise to student work. This approach eases workload while valuing informed, professional judgement.

Conclusion: Making Judgement Visible and Consistent

Reliable judgement matters because it sits at the centre of effective teaching. When teachers can make consistent, defensible judgements about student work, assessment stops being an administrative task and becomes a meaningful part of learning. Students receive clearer feedback, teachers gain better insight into progress, and conversations about performance become more constructive and focused.

The best way to understand how this works in practice is to see it. Exploring sample Brightpath scales and exemplars shows how real student work is ordered, how teaching points emerge from that ordering, and how judgement can be both reliable and instructionally useful.

Teachers and school leaders are invited to explore sample exemplars and see how Brightpath supports consistent judgement, clearer feedback, and next-step teaching grounded in real student work.

← Back to article list