Tutorial Part 2: When the Agent Forgets

Part 2 of the Command-Line LLM Tools tutorial series. Previously: Part 1 — Getting Started with Claude Code & Codex CLI — install, instructions file, the five rules that prevent day-one mistakes.

In Part 1, we set up a working CLI agent with an instructions file, a folder structure, and five behavioral rules. That's enough to get useful work done for a few weeks. Then you'll hit the wall this article is about: the agent stops being sharp.

It gets repetitive. It re-asks questions you already answered. It makes the same correction-worthy mistake three times in a session. By turn forty of a long debugging conversation, the output is noticeably worse than it was at turn three. None of this is the model getting dumber. It's the context getting dumber — too full, too cluttered, too noisy for the model to find the signal in.

This is the heaviest tutorial in the series because memory failures are the most common reason serious users get frustrated and bounce. Almost every "AI got worse" story is actually a "context got too full" story. Once you understand how context behaves under load and what to do about it, the agent stays useful across hours-long sessions and weeks-long projects instead of degrading on you halfway through.

Section 1: How the context window actually behaves

You've heard "the context window fills up." Here's what that actually means in practice.

Every model has a fixed amount of working memory it can hold in a single session — measured in tokens (roughly three-quarters of a word). Frontier models in 2026 have context windows in the 200,000 to 1,000,000 token range. That sounds like a lot. It fills up faster than you think. A long debugging session on a real codebase will burn 200,000 tokens in an hour or two.

What actually fills it: every file the agent reads, every shell command output, every message you send, every plan it drafts, every retry on a failed attempt, every time it lists files to figure out where it is. None of this is unusual or wasteful — it's the agent doing real work. But it accumulates.

When the context window fills up, three things happen, in order:

The agent gets slower. More text means more tokens to process per response. You feel this as latency — replies take longer to start, longer to finish.
The agent gets less accurate. Relevant detail buried in noise. The brain-file rule from session-start gets lost in the middle of a 50,000-token conversation. The agent re-reads the same file three times because it's forgotten what it learned from reading it the first time.
The session hits a hard limit and refuses to continue. This is the visible failure. By the time you hit it, you've already been working in degraded conditions for a while.

The dangerous stage is the middle one. Slow is annoying but obvious. Wall is annoying but obvious. Less accurate is when the agent starts confidently producing wrong work — and you don't notice, because the conversation looks the same as it did at turn three.

How to tell, in practice: the agent re-asks questions you already answered. It forgets recent decisions. It gets verbose about things it should know are settled. It writes plans that ignore the constraint you specified twenty messages ago. Any of these signals means context is rotting and you need to act.

Section 2: Compaction patterns that actually work

/compact is one tool, not one technique. Here are the patterns that earn their keep.

The natural-pause compact

After completing a discrete task, before starting the next one. This is the easiest habit to build because the trigger is naturally legible: you finished something. Instead of moving directly to the next task with the conversation full of detail from the previous one, run /compact and start clean.

Example: you finished implementing function X. Before starting feature Y, compact. The agent keeps your instructions and the broad shape of what just happened; it discards the line-by-line debugging trace that's no longer load-bearing.

The directed compact

/compact accepts instructions about what to keep:

/compact remember the database schema and the bug we just fixed, but discard everything before that.

This is the most useful pattern once you're comfortable with it. You're telling the agent which parts of the conversation matter for what comes next. The summary it produces is markdown — readable, editable. Read it. If it compacted away something important, fix it.

The pre-disaster compact

Before a long-running operation that will produce massive output — running a test suite, processing a large file, inspecting a verbose log — compact first. High-volume output will rot context unless it lives somewhere bounded. The pattern: compact, then ask the agent to direct the output to a file rather than letting it pour into the conversation.

The conversation should never be the storage layer for output that's bigger than a few screens.

The "I don't trust this anymore" compact

When the agent's recent answers feel off — circular, repetitive, confidently wrong — context has rotted. Compact and restart the line of work. Trying to argue your way back to coherence costs more tokens than starting fresh, and arguing rarely works anyway.

Don't try to debug the compaction. Just run it. If something important was lost, the files (covered next) preserved it.

Section 3: Files as the real memory

Part 1 introduced this principle: conversation context is ephemeral, files persist. Here's the working version.

Three categories of files for the agent

Rules files. What the agent is allowed and expected to do. The instructions file is the canonical example. As the project grows, this may become multiple specialized rule files.
Knowledge files. What the agent should know about your project, domain, codebase. Stable reference material — facts, decisions, conventions, glossaries.
Working memory files. What this engagement needs the agent to remember. Decisions made, conventions discovered, patterns chosen, status notes from the last session.

The discipline: if it matters past this session, write it down. Conversation context is the scratchpad. Files are the memory.

When and how to write to files

After making a non-obvious decision: write a one-line decision note to a session log. After discovering a project convention: add it to the instructions file or a project-specific knowledge file. After establishing a pattern that worked: capture it as a knowledge entry.

The agent can do this for you. "Add this to our knowledge file: when generating monthly reports, always include the prior-month comparison column." The agent appends, you review the diff, you accept. The knowledge grows with the work.

The file structure that actually scales

A sketch of how to organize project files for the agent, after the first few weeks:

project-root/
├── CLAUDE.md           ← instructions file (rules, conventions)
├── docs/               ← knowledge files (stable reference)
│   ├── architecture.md
│   ├── conventions.md
│   └── glossary.md
├── notes/              ← working memory (in-progress decisions)
│   └── decisions/
│       └── 2026-01-decision-database-choice.md
└── src/                ← actual work

This isn't load-bearing. The exact structure isn't the point. The point is: there are durable categories of file the agent reads, and a working space where files change.

Section 4: The instructions file beyond the first five rules

In Part 1, the instructions file had five rules. As you do real work, it grows. Patterns for growth:

Add rules when you've corrected the agent twice on the same thing

One miss is a fluke. Two is a pattern. Pattern is a rule. Don't pre-write rules speculatively — you'll write the wrong ones, or write rules that paper over a single weird incident that wasn't actually a pattern.

Group rules into sections as the file grows

Once you have fifteen-plus rules, a flat list is hard to navigate. Group by topic: Truth & Verification, Code Conventions, Process Discipline, Style Preferences, Boundaries. The agent reads the whole file. Sections are for you — finding things to edit or remove later.

Watch for dilution

Past roughly 150 to 200 instructions, frontier models reliably follow fewer of them. This is a real ceiling, not a vibe. When you hit it, the answer isn't "compress the rules." It's progressive disclosure — splitting instructions into context-specific files that load only when relevant. That's the topic of Part 5.

For now, when you sense the file is getting too dense to be obeyed, that's the signal you've outgrown a single flat instructions file. It's not a problem to solve in this article.

Let the file teach you what the agent needs

The rules you find yourself writing repeatedly tell you what's actually load-bearing for your work. The rules that show up but aren't pulling weight (the agent already does the right thing without them) can be removed. The instructions file is a living thing; refactor it occasionally.

Section 5: Task isolation — when to reset

Sometimes compaction isn't enough. You need to start fresh.

Signs that compaction won't help

The work just done was structurally different from the next work — different domain, different file types, different cognitive job.
The conversation contains evidence-heavy material that shouldn't bleed into the next task — financial data, applicant info, legal text, sensitive notes.
You're transitioning across phases — research to writing, ingest to drafting, exploration to commitment.

In these cases, compaction would either lose the evidence (bad) or carry residue forward into work where it doesn't belong (worse).

How to reset cleanly

Before exiting the session: write decision summaries to files. The new session can re-read them. Then end the session and start a new one.

The new session will re-read the instructions file, the knowledge files, and any session log you just wrote. Mental model: the work persisted (in files); the context started fresh (the conversation).

This is what files-as-memory actually buys you. Without files, ending the session is a loss. With files, it's a compaction at the strongest possible level — keeping what matters, discarding the conversation noise entirely.

Three-tier transition framework

When moving between tasks or domains, ask which tier this transition lives at:

Continue — same domain, same artifacts, no risky residue. Just unload the previous task and load the next. No compaction needed.
Compact — same or adjacent domain, but bulky material loaded (transcripts, large files, extensive debugging). Compress before next phase.
Reboot — cross-domain, sensitive evidence, or contaminated context. End session, start fresh.

The cost of an unnecessary reboot is a few seconds of re-reading instruction files. The cost of contaminated context is wrong output you might not catch. Err toward the safer tier.

Section 6: A real walkthrough

Two concrete examples, anonymized.

Example 1: The long debugging session

A bug investigation that took three hours of session time across two days.

Start of session one: clean. Agent reads instructions file. I describe the bug. Agent reads the relevant source files, proposes hypotheses, runs tests. Each test failure adds context. By turn 30, we've narrowed the problem to a specific module but accumulated several thousand lines of stack traces and test output in the conversation.

What I did right: at turn 30, I ran /compact remember the bug we're tracking, the failing test case, and the modules we've ruled out. Discard the rest. The agent compacted to a paragraph summary. We continued for another hour and resolved it.

What I would have done if I hadn't compacted: by turn 60, the agent would have started forgetting which modules we'd already ruled out. We'd have re-tested already-failed paths. The session would have hit the context wall, lost everything, and forced a reboot from raw notes.

End of session one: I asked the agent to write a bug-2026-01-debugging-notes.md file summarizing what we learned. Session two: I reopened with that file in context. Compact memory across sessions, no scratch.

Example 2: The cross-engagement transition

Working on Project A in the morning, switching to Project B in the afternoon.

Project A loaded a sensitive financial document early in the session. Reasoning happened against it. By noon, the agent's working memory had specific numbers, ratios, decisions — none of which I wanted bleeding into the unrelated work I'd be doing on Project B that afternoon.

What I did: ended the Project A session with a session-log file capturing the decisions made (no specific numbers — those stay in the original document, which the agent can re-read if I return to Project A). Started a new session for Project B. The new session never saw the Project A material at all.

Why this matters: even if the residue wouldn't have caused a visible error, residual context shapes how the agent reasons. Loading sensitive financial data and then asking it to draft an unrelated memo means the memo's reasoning is colored by what it saw earlier. Reset cleanly to keep judgment frames separate.

Closing: what's next

Part 3 covers what to do when the agent doesn't just forget but actively goes off the rails — git as time machine, hooks for hard guarantees, approval modes. Different problem. Different solves. Same goal: keeping the agent inside the lines so it's safe to do real work.

The Command-Line LLM Tools Tutorial Series

Why Use Command-Line LLM Tools — the case for moving from chat to terminal
Part 1: Getting Started with Claude Code & Codex CLI — install, instructions file, day-one rules
Part 2: When the Agent Forgets (you are here)
Part 3: When the Agent Goes Off the Rails — git, hooks, approval modes
Part 4: When the Agent Doesn't Know Enough — extending capability with skills and sub-agents
Part 5: When Configuration Becomes Architecture — what happens when you outgrow a single instructions file

Tutorial Part 2: When the Agent Forgets

Tutorial Part 2: When the Agent Forgets

Section 1: How the context window actually behaves

Section 2: Compaction patterns that actually work

The natural-pause compact

The directed compact

The pre-disaster compact

The "I don't trust this anymore" compact

Section 3: Files as the real memory

Three categories of files for the agent

When and how to write to files

The file structure that actually scales

Section 4: The instructions file beyond the first five rules

Add rules when you've corrected the agent twice on the same thing

Group rules into sections as the file grows

Watch for dilution

Let the file teach you what the agent needs

Section 5: Task isolation — when to reset

Signs that compaction won't help

How to reset cleanly

Three-tier transition framework

Section 6: A real walkthrough

Example 1: The long debugging session

Example 2: The cross-engagement transition

Closing: what's next

Read more

Tutorial Part 5: When Configuration Becomes Architecture

Tutorial Part 4: When the Agent Doesn't Know Enough

Tutorial Part 3: When the Agent Goes Off the Rails

How I Choose Between Claude, ChatGPT, and Gemini