Here is the easy version of "Socratic AI tutor": refuse to give answers, ask questions instead, hope the student gets there. It sounds great in a pitch deck and it falls apart in week two, because what actually happens is the student gets stuck, the AI keeps asking vague questions, and eventually the child closes the app and uses ChatGPT.
The hint ladder is the engineering that stops that from happening. Four levels, sequenced, each more specific than the last. The student is never trapped without a next move.
This is one of the design decisions I'm most attached to. So let me show you the actual rungs.
The four levels
| Level | Trigger | What Pi does |
|---|---|---|
| L1 | First wrong / first "I don't know" | Diagnostic question |
| L2 | Second time stuck on the same step | Concept hint |
| L3 | Third time | Step nudge |
| L4 | Fourth time | Worked twin (one number changed) |
Let me unpack each, with a real-feeling example threading through. Say the problem is 3x - 5 = 10, and the student says they don't know where to start.
L1: Diagnostic question
Pi's first move is a question — and the question is specifically designed to surface where the student is.
"OK — looking at
3x - 5 = 10, what operation do you see happening to x? What would 'undo' it?"
The L1 question is open enough that the student has to do the thinking. But it's targeted — it points at the specific cognitive move ("recognise the inverse operation") that the problem requires. It's not a vague "what do you think?". It's a diagnostic.
Why this works: the most common reason a Year 8 is stuck on a linear equation is that they haven't parsed it yet — they haven't identified what's being done to the variable. L1 forces that parse to happen.
Roughly 60-70% of stuck moments resolve here. The student says "subtracting 5", Pi says "great, what's the opposite of subtracting?", and they're off.
L2: Concept hint
If L1 didn't unstick them, Pi escalates. L2 names the concept they're missing.
"To get x on its own, we need to undo what's being done to it. There are two things being done —
multiplied by 3andsubtract 5. We undo them in reverse order. So what should we undo first?"
L2 is more directive than L1. It names the concept (inverse operations, undo in reverse order) but still requires the student to make the move. The thinking work is smaller. The scaffolding is firmer.
Most of the remaining stuck students resolve here. They've heard "inverse operations" before but hadn't pattern-matched the current problem to it.
L3: Step nudge
If L2 didn't work either, Pi gives explicit step-level guidance on the first step only.
"Let's just do the first step. Add 5 to both sides. What do you get?"
This is where most other AI tutors would have started — and where we deliberately delay until two attempts have failed. The reason: a student who reaches L3 has now had two diagnostic prompts that have informed Pi about where exactly the misunderstanding sits. Pi can target the L3 hint at the right step. And the student, having tried twice, is in a different cognitive state — they're primed for the hint to land.
L3 doesn't solve the whole problem. It hands over step 1, and lets the student attempt step 2.
L4: Worked twin
If L3 still doesn't unlock it, Pi reveals — but not the answer to the actual problem.
Pi gives a fully worked solution to a twin problem with one number changed:
"Watch me solve a similar one:
2x + 3 = 11. Step 1: subtract 3 from both sides →2x = 8. Step 2: divide both sides by 2 →x = 4. Step 3: check by substituting back →2(4) + 3 = 11✓. Now your turn — try the original problem the same way."
This is the genius of the design (and yes, it's nicked from how good human tutors actually behave). The student gets a complete, scaffolded example. They get the method explicit and visible. But they still have to apply the method to their own numbers. There's no copy-paste available.
L4 reveal is rare. A typical 25-minute session might see L4 once, often zero times. It's the safety net, not the destination.
What this looks like over a session
In the wild, the ladder doesn't fire in tidy four-step sequences. A single problem might see L1 twice (different misconceptions on different steps), or jump from L1 straight to L3 because the student has already shown they understand the concept and just need the step guidance.
The structure is:
- Pi attempts the lightest intervention first
- Each escalation is a deliberate choice based on what the student's last response revealed
- The ladder is allowed to skip levels upward if the student's response makes it clear the lower rung isn't useful — but never downward (because going from "step nudge" back to "diagnostic question" would feel demeaning)
Why the worked twin matters more than people think
The L4 worked-twin design is the part of the ladder I had to argue hardest for. The intuitive design is "after four attempts, just give the answer". That would be wrong, and here's the empirical reason.
Carnegie Learning's MATHia, the gold-standard intelligent tutoring system in US schools, found that directly revealing the answer after failure produced no measurable transfer — students could repeat the answer but couldn't generalise it. Revealing a worked example of a similar problem produced significantly better transfer because the student still had to do the application work.
This is consistent with the Vygotskian theory of the zone of proximal development: learning happens just above current ability, with appropriate scaffolding. L4 deliberately keeps the student doing the last bit themselves.
How the ladder interacts with the energy protocol
The Mentor's energy protocol changes how patiently Pi climbs the ladder. On a GREEN + high session, Pi might let a student sit at L1 for longer, give more time, push harder before escalating. On AMBER or "tired" mood, Pi escalates faster — the student doesn't have the cognitive headroom for a long climb, and the goal of the session is reinforcement rather than struggle. The ladder is the same; the patience parameter changes.
What this is not
The ladder is not a quiz format. It's not a fixed script. It's not gamification. There are no points for "resisting hints" or streaks for "L1 wins". Those would be the wrong incentives. The ladder is a protocol Pi follows internally, mostly invisible to the student. From the child's perspective, they're having a conversation with a tutor who keeps asking the right next question. They don't see the level numbers. They feel the texture.
A worked session moment
Real-shaped example, lightly cleaned up.
Student: "I don't get question 3." Pi: "Show me the question and what you've tried so far." Student: "It's
2(x + 5) = 16. I got x = 8." Pi: "OK, show me your working — what did you do first?" (asking for working, not yet on the ladder) Student: "I divided both sides by 2. Sox + 5 = 8. Then I did x = 8." Pi: "Aha — let me look at your last step. You hadx + 5 = 8. What does that equation say x must be?" (L1: diagnostic question targeted at the actual break point) Student: "Oh. x + 5 = 8 means x = 3." Pi: "Yes! Beautifully spotted. The break was on the last move — you wrote x = 8 instead of x = 3. Try the next problem."
Forty-five seconds, one L1 hint, the student fixed her own work. That's the ladder doing exactly what it should.
Related reading
- Meet Professor Pi
- "Show your working" explained
- Why AI tutors don't give answers (and how to live with it)
Jason runs aitutors.me. The hint ladder is calibrated from approximately 6,000 logged stuck-moments across the prototype phase. Updated 21 May 2026.