Tutorial Part 4: When the Agent Doesn't Know Enough
Tutorial Part 4: When the Agent Doesn't Know Enough
Part 4 of the Command-Line LLM Tools tutorial series. Previously: Part 3 — When the Agent Goes Off the Rails — git as time machine, approval modes, hooks for hard guarantees.
Parts 1, 2, and 3 made the agent safe and stable. It runs in a contained folder, holds memory across long sessions, and can't easily damage your work. Now the next problem shows up: the agent doesn't know your domain.
It doesn't know the conventions you use for code review specifically in your codebase. It doesn't have the procedure you've refined for evaluating purchases or screening admissions. It doesn't have the specialty knowledge for the work you do most. And when you try to handle that with longer instructions or longer prompts, you hit a different wall — the instructions file is getting bloated and the agent is following fewer of its rules.
Skills and sub-agents are the two primitives for extending what the agent can do without bloating the instructions file or turning every session into a kitchen sink.
The two primitives
Both Anthropic and OpenAI have shipped versions of these. Naming differs across tools; the cognitive shape is the same.
Skills
Named, invocable instruction modules. Loaded on demand via a trigger phrase or explicit invocation. They encapsulate procedural knowledge — when the user wants X, here's how to do it. They don't pollute the global instructions file. They live as their own files, often in a skills/ directory, and only get loaded when relevant.
Think of a skill as a recipe the agent can pull off the shelf when you ask for the dish.
Sub-agents
Spawn a fresh, scoped agent for a specific task. The sub-agent has its own context window — separate from yours. It does the work, returns a result, and its conversation history doesn't bleed back into the main session. Used when work would otherwise dirty your main session's context, or when the work needs a different judgment frame than what the main agent has loaded.
Think of a sub-agent as hiring a contractor for a specific job. They show up, do the job in their own workspace, hand you the result, and leave.
When to reach for skills
Skills shine when:
- You do the same kind of task repeatedly with non-trivial procedure. Examples: "review this code change," "summarize this transcript with these conventions," "draft a project report following our format," "evaluate a vendor proposal against our rubric."
- The procedure is stable enough to write down. If it changes every time, a skill won't capture it — you're better off in the conversation.
- The procedure is too long or specialized for the instructions file. A 200-line workflow doesn't belong in a global rules file. It belongs in a skill that loads only when invoked.
What goes in a skill
A name and trigger description. The procedure itself, in markdown — a checklist, a series of steps, a decision tree. Optionally: example outputs, edge cases, gotchas. Optionally: scripts the skill can invoke.
What stays out of skills
- One-off instructions (those go in the conversation, not in a saved skill)
- Project-wide conventions (those stay in the instructions file, applying everywhere)
- Sensitive data (skill files often get committed; secrets should never be)
A worked example
A "code review" skill that:
- Takes a diff or file path as input
- Walks the agent through a structured review: correctness first, safety second, style third, scope fourth
- Produces a markdown review document with sections for each dimension
- Uses the same checklist every time you invoke it
Without the skill, every code review is a fresh negotiation with the agent about what dimensions to check, in what order, in what format. With the skill, you invoke it and get consistent, structured output. The "remind the agent how to do code review" tax disappears.
This generalizes far beyond code. Document review, decision evaluation, applicant screening, recipe planning, trip itinerary drafting — anything you do more than three times with a stable procedure earns a skill.
When to reach for sub-agents
Sub-agents are for isolation, not just delegation.
- The work has high token volume that you don't want polluting your session. Reading 50 files to find one pattern. The sub-agent does the read; returns the answer. Your session never sees the file content. Critical for keeping the main agent's working memory focused.
- The work needs a different judgment frame. Generator-critic patterns: main agent generates, sub-agent critiques. The sub-agent's critique is independent because it doesn't see your reasoning leading up to the generation.
- The work is parallelizable. Three independent tasks can run as three sub-agents at once. You synthesize the results without waiting for sequential completion.
- The work is risky and you want it sandboxed. A sub-agent operating on a feature branch can fail without affecting your main reasoning thread.
What sub-agents look like in practice
Both Claude Code and Codex CLI have agent-spawning constructs. The general pattern: you describe the task, the platform launches a fresh agent, the sub-agent works, returns. You see the answer, not the work.
This is structurally different from telling the main agent "go do X" — that work happens in your context window. A sub-agent's work happens in its own context window, and only the result comes back.
When not to use a sub-agent
- The task is short and central to your current thinking. Running it inline is faster than spawning a sub-agent.
- The task needs back-and-forth with you. Sub-agents work best with clear, complete prompts. They're not great at iterative collaboration.
- The task is structurally connected to your current reasoning. Splitting it loses thread — the sub-agent doesn't have the lead-up that made the question land where it did.
Skills + sub-agents together
The combination unlocks a real composition pattern.
The "specialist sub-agent" pattern
Define a skill for a specific kind of work — say, "research this question." Spawn a sub-agent and load that skill. The sub-agent has both the procedure (the skill) and the isolation (the sub-agent's own context window). It returns a clean result without bloating your session.
This is the same pattern as having two human collaborators with different roles. Different agents, different skills, different results.
The generator-critic example
Main agent drafts a document. You spawn a sub-agent and load a "critical reviewer" skill. The sub-agent reads the draft and the skill — but doesn't see your reasoning that produced the draft. It reviews independently. Returns a critique. Main agent revises with the critique in hand.
The independence is the value. If the same agent that wrote the document tries to critique it, you mostly get the same reasoning back, repackaged. A sub-agent loaded with a critical-review skill produces actual friction.
What this costs
Skills and sub-agents are not free.
Skills require maintenance. A skill is code-adjacent. It has versions, edge cases, places where it stops working as the underlying tool evolves. Skills you don't use rot — and rotted skills are worse than no skills, because they look usable and aren't.
Sub-agents cost tokens. Each spawn is a fresh context, which means re-loading instructions, re-loading any context the sub-agent needs. Heavy use can be expensive on per-token plans.
You have to remember they exist. This is the actual hardest part. The agent doesn't always remember it could spawn a sub-agent or invoke a skill. You have to prompt it. The benefit only shows up when you build the habit of asking yourself "is this a skill thing? Is this a sub-agent thing?"
Routing decisions get harder. Now every task has a "do I just do this, write a skill, or spawn a sub-agent?" question. The decision becomes lighter with practice but never fully disappears.
Three patterns that earned their keep
Document review as a skill
The procedure was stable, ran often, took ten minutes of context tax to set up each time. A skill collapsed it to one trigger. Years of cumulative time savings, with consistency that wasn't achievable via instructions alone because the procedure was too long.
Bulk-read research as a sub-agent
A research task that needed the agent to read thirty markdown files and find a pattern across them. Loading those thirty files into my main session would have killed the context. The sub-agent read them in isolation, returned the pattern, my session stayed clean.
Adversarial review as a sub-agent with a skill
For high-stakes drafts — board packages, public essays, structural decisions — main agent writes, sub-agent loads the "adversarial reviewer" skill, finds weak claims and bad arguments. Main agent revises. The final draft is meaningfully better than what either pass alone produced.
The problem that's about to show up
You now have an instructions file with rules, skills for recurring procedures, sub-agents for isolated work, a growing collection of knowledge files, custom hooks, and workflow conventions you've discovered.
This is no longer a chat tool. It's a configuration. And configurations have a property the instructions file alone didn't: the question stops being "how do I do this?" and starts being "where does this belong?"
Should this new behavior be a rule in the instructions file, a skill you can invoke, a knowledge file the agent reads as context, or a hook that enforces it? Should this knowledge be loaded always, or only when relevant? Should this judgment live in the main agent or get spawned as a sub-agent role?
Those are architecture questions. They're what Part 5 is about.
The Command-Line LLM Tools Tutorial Series
- Why Use Command-Line LLM Tools — the case for moving from chat to terminal
- Part 1: Getting Started with Claude Code & Codex CLI — install, instructions file, day-one rules
- Part 2: When the Agent Forgets — managing memory, context, and compaction
- Part 3: When the Agent Goes Off the Rails — git, hooks, approval modes
- Part 4: When the Agent Doesn't Know Enough (you are here)
- Part 5: When Configuration Becomes Architecture — what happens when you outgrow a single instructions file