If you've watched a KS3 child use a homework-helping chatbot, you'll have noticed a particular pattern. They paste the question. They get the answer. They write it down. They move to the next question. The whole loop takes about eight seconds and produces an entirely unchanged child.

This is the loop Pi exists to break. And the single most important mechanical lever it pulls is asking, before marking anything as right or wrong:

"Show me your working — what did you do first?"

This article is the long-form version of why, what it catches, and what it looks like in practice.

The mechanical reason: a wrong answer tells you nothing

Here is a Year 8 problem and a wrong answer:

"Solve 3x - 5 = 10." Student writes: "x = 7."

What do we know? Almost nothing. The student might have:

  • Added 5 to one side only (very common): 3x - 5 + 5 = 103x = 10 → x = 10/3 → confused → wrote 7
  • Subtracted 5 instead of adding (sign error): 3x = 5 → x = 5/3 → confused → wrote 7
  • Got it almost right and slipped on the last step: 3x = 15 → x = 15/3 → wrote 7 because 5 and 7 are visually similar (genuinely happens, it's not a maths error)
  • Guessed: tried 7 because the question felt like a "small whole number" question

Without the working, Pi cannot tell which of these is happening. And the intervention is different in each case. Sign errors need attention to inverse operations. The first error needs attention to "balance both sides". The slip needs no intervention at all — say "check your last step" and the student fixes it themselves. The guess needs a confidence conversation.

So Pi's protocol is: never mark wrong without seeing the working. It's not pedantry. It's the only way to give a useful next response.

The empirical reason: model tracing works

The protocol has a name in the literature — model tracing — and it's been studied at scale. Carnegie Learning's MATHia, the dominant intelligent tutoring system in US schools, made model tracing its core engine and accumulated the highest tier of educational evidence (ESSA Tier 1, meaning multiple large randomised controlled trials show measurable improvement).

The headline finding: +8 percentile points average, +11 for struggling students. That's enormous for an educational intervention. It's the difference between getting your Year 9 from below the median to comfortably above it.

The mechanism: model tracing turns "I don't know what went wrong" into "I can see exactly which step broke". And targeted intervention at the broken step is dramatically more efficient than generic re-teaching of the whole topic.

The three misconceptions working reveals most often

In the prototype phase, we logged the broken steps across thousands of stuck moments. Three patterns dominated.

1. Sign errors on line two

By a long way the most common. Student writes:

3x - 5 = 10
3x = 5      ← sign error here
x = 5/3

The student knew to "do something to both sides" but did the wrong thing — they treated -5 as something to subtract again, not as something to undo by adding. This is a procedural error, not a conceptual one. Once Pi names it, the student usually fixes it instantly and the mastery jumps.

How working reveals it: the second line of the working is visibly wrong even when the final answer happens to be visually close.

2. Fraction-as-division confusion

KS3 students often parse 3/4 as "the answer is 0.75" but parse 3 ÷ 4 as a different operation. They're the same thing. The misconception shows up beautifully when the working contains a division and the student has flipped it.

"What is 1/2 of 60?"
Working: 60 ÷ 1 = 60, then × 2 = 120     ← flipped numerator/denominator role
Answer: 120

The student knows 1/2 of should be a halving operation. The procedural form ÷ numerator × denominator is the wrong mnemonic, and only the working reveals it. Without seeing the working, Pi would just see "answer is 120, expected 30" and have no idea where to intervene.

3. "The variable IS the answer"

This is the deepest one and the one I find most fascinating. KS3 students sometimes parse 3x - 5 = 10 as a kind of word puzzle where x is the question and the answer is whatever they're being asked. So they read it as "3, then this thing, then -5, equals 10" and produce answers like x = 3 because, well, 3 + 7 - 5 = 5... no wait.

Working reveals this immediately. The first line of the working is something like:

3x = 3 times x
But what's x?

— the student hasn't internalised that x is a placeholder to be solved for. They're treating it as a label for the answer. The intervention here is conceptual, not procedural, and it requires backing up to the meaning of a variable — a Year 6 / Year 7 foundation that didn't fully land.

What Pi actually asks

The protocol in practice is a short, friendly question after the student gives an answer:

"Show me how you got that — what did you do first?"

Or, when an answer comes without context:

"What did your first line look like?"

Pi accepts:

  • Typed-out steps (most common)
  • A photo of paper working (if the parent has enabled image upload)
  • Mental working described in words ("I added 5 to both sides, then divided by 3")

It does not accept "I just knew" without follow-up. If a student claims pure intuition, Pi asks the next problem and waits to see whether the intuition reproduces. Usually it doesn't, and the conversation turns into a constructive one about pattern-spotting vs reliable method.

A second, quieter benefit: anti-cheating

A side effect of the protocol that I didn't expect to like as much as I do: it's a soft anti-cheating layer.

A student who has used ChatGPT to generate the answer to their Pi practice problem cannot, when asked to "show your working", produce a coherent working. The answer is correct but the steps are missing, or they're suspiciously fluent in a way that doesn't match the student's prior session pattern, or they're literally pasted prose that doesn't read like Year 8 maths working.

Pi doesn't accuse — that would be the wrong move — but it does push back with another problem on the same topic, untimed, with working required. The mismatch surfaces in about a minute. It's a much better anti-cheating mechanism than browser monitoring or proctoring, because it's pedagogically aligned: the student who needs to "cheat" their way through a Pi session is the student who needs Pi the most, and the protocol routes them gently to the help they actually need.

How working fits with the hint ladder

The two protocols stack. Pi sees the answer, asks for working, sees the broken step, then enters the hint ladder at the right level for that step. So a sign error at L1 ("what's the inverse of subtracting 5?") is a 15-second interaction, not a 5-minute re-teach.

This is the whole architecture, really. Working reveals the gap. The ladder fills it. Confidence rating closes the loop. Each part depends on the others.

What I'd tell another parent

If you're a parent watching your child use Pi: when you walk past and see them typing out the steps before pressing enter, that's the protocol working. It's the part where the actual learning lives. The answer is the symptom. The working is the medicine.


Jason runs aitutors.me. He has a Year 8 child and has logged approximately 800 hours of one-to-one KS3 maths sessions in building this system. Updated 21 May 2026.