The Librarian Forgets You Instantly — How We Fake a Memory

Picture yourself in the middle of a long homework session with a chatbot. You have explained that you are studying the Water Cycle, mentioned that your teacher wants diagrams described in words, and asked three follow-up questions — all sensible, all building on what came before. Then you type question four, and the chatbot answers as if it has never heard of the Water Cycle in its life.

If that has happened to you, you have just run into the most surprising fact about how AI works. The Librarian behind the mail slot does not remember anything. Not a word. Not a single earlier note.

This article is about the trick that hides that fact — and the moment the trick stops working.

The Librarian's one honest limitation

We met the Librarian in the first two articles in this series. The quick version: the Librarian has read every book ever written, sits behind a wall with only a mail slot, and can only do one thing — finish whatever is written on your note. One note in, one note back out.

That is the whole job. No phone calls, no peek at yesterday's post, no memory of previous notes whatsoever. The moment your reply slides back out under the slot, the Librarian forgets you completely. Not because they are rude. Because they literally cannot hold anything between notes. Every single note is, for the Librarian, the first note they have ever read.

This is not a quirk that will be patched in the next update. It is how large language models — LLMs, the technical name for the AI inside a chatbot — are built. Each response is generated fresh from the contents of the note alone.

So why does a chatbot seem to remember?

Here is the trick, and once you see it, you cannot unsee it.

When you type your fourth message in a conversation, the app does not just send that message to the Librarian. It takes your entire conversation so far — every message you have written, every reply that came back — staples all of it onto the front of your new message, and slips the whole combined stack through the mail slot as one very long note.

The Librarian reads that long note from the top, sees your earlier context, reads your latest question at the bottom, and replies. It looks exactly like memory. It is not memory. It is a very long note.

That stapled-on background — all the earlier conversation prepended to your new message — is what engineers call context (sometimes written as the context window). The saved pile of past messages that you keep re-stapling onto each new note is what people mean when they say an AI has memory.

The chatbot app manages the stapling invisibly, behind the scenes, so you never see it happening. That is the entire trick.

A kitchen-table version

Imagine you are writing notes back and forth with a friend who lives upstairs, using a letterbox in the floor. Your friend has an extraordinary brain — they know everything about science, history, cooking, geography, you name it — but there is one catch: after they reply to any note, they immediately forget the whole exchange. Total blank slate.

You want to discuss a recipe across several notes. So on your second note, before you write your new question, you copy out your first note and their first reply at the top. On your third note, you copy out notes one, two, and both replies. Your friend reads all of it fresh each time, and because they can see the full story, they respond in a way that seems perfectly continuous.

This is exactly what is happening when a chatbot maintains a conversation. You are not talking to something that remembers you. You are talking to something that gets handed the full transcript every single time.

The catch: notes cannot be infinite

The trick works brilliantly — right up until the conversation gets very long.

Here is the problem. There is a limit to how much text the Librarian can read in one sitting. Engineers call this the context window — the maximum length of a single note. Current AI assistants can handle quite a lot (tens of thousands of words in many cases, and growing all the time), but there is always a ceiling.

When your conversation grows longer than that ceiling, something has to give. The app cannot staple the whole history on anymore; it has to leave some of it off. Usually it drops the oldest messages first, because the recent ones are more relevant to your current question. Which means the Librarian never even sees the Water Cycle you mentioned forty messages ago.

That is why a long chat eventually starts to drift. The AI is not getting confused or tired. It simply stopped receiving the early part of the transcript, because the note would have been too long to fit through the slot.

Squashing the pile: summarisation

There is a neat workaround for this, and you can use it yourself.

When the conversation history is getting long, you (or the app automatically) can ask the Librarian to summarise the old part of the conversation before it disappears. The summary is much shorter than the full transcript, so it fits onto the next note without crowding out your new question. The Librarian keeps the gist without keeping every word.

Engineers call this compression. It is a good trick. But there is a trade-off: a summary loses detail. If you said something precise and specific ten messages ago, the summary might reduce it to a vague phrase. The Librarian still has the broad shape of the conversation, but the fine grain is gone.

Professional AI tools try to manage this automatically, compressing older messages and keeping recent ones verbatim, so you never notice the join. It works most of the time. When it does not, you notice the drift.

What to do when your chatbot forgets

Now that you know the trick, the fix is obvious. If a long conversation starts going off the rails — the AI has forgotten your name, your subject, your constraints — do not panic. Just tell it again, in plain terms, at the top of your next message.

Something like: "Quick reminder: I am writing about the Water Cycle, Year 8 level, no diagrams." That sentence goes into the note directly, so the Librarian sees it freshly even if all the earlier context has been trimmed off. You are manually re-stapling the essential bit.

If you want to start a fresh, clean conversation, close the chat and open a new one. The slate is already blank; you are just removing any leftover confusion from the truncated history.

The words to take away

Two terms, both simple:

Context is the background you include in a note so the Librarian has the information needed to give a useful reply. In a conversation, the context is the previous messages stapled to the front. But context can also include other things the app adds — instructions, facts from a file you uploaded, search results. Anything on the note before your question.

Memory, in AI conversation, means the running transcript that gets re-attached to each new note. It is not memory in the human sense — there is no storage, no recall, no experience. It is a copy-paste job, done invisibly, every single time.

Understanding this changes how you use AI tools. If you want the chatbot to know something, put it in the note. If the chatbot seems to have forgotten something, it almost certainly has — because the note was too long and that part got left behind.

Where this leads

Faking memory by re-stapling a transcript is a clever workaround, but it has limits. As soon as you need an AI to remember things across multiple different conversations — or to go and find relevant information rather than having it handed over in a note — you need something more than a longer note.

That is where a whole new character enters the story. In the next article we will meet the Runner: a helper who does the legwork the Librarian cannot, fetching the right information so the notes stay sharp and focused.

Read it here: What is an AI Agent?

Back to the AI in Plain English series. Start from the beginning with AI is a Word-Guessing Machine, or explore everything at aitutors.me.