Tutorial Part 5: When Configuration Becomes Architecture

Share

Tutorial Part 5: When Configuration Becomes Architecture

Part 5 — the final part — of the Command-Line LLM Tools tutorial series. Previously: Part 4 — When the Agent Doesn't Know Enough — extending capability with skills and sub-agents.

Every fix in this tutorial series scales until it doesn't.

The instructions file with five rules grows past two hundred and the agent starts ignoring them silently. Compaction patterns that worked at session-scale get ad-hoc and fragile when you have multiple parallel projects. Hooks proliferate without conventions. Skills accumulate with overlapping responsibilities. Sub-agents get spawned without clear roles.

The configuration you've assembled was never architected. It accumulated. And accumulation eventually demands architecture — not because someone says so, but because the accumulated thing stops working.

This is the meta-tutorial. Not new tactics — the structural decisions every previous tactic was secretly setting up. By the end of this article, you'll see why your configuration is no longer a configuration but an operating model, and what to do about it.

Where you are now

By Part 4, you have:

  • An instructions file (probably a hundred to two hundred rules, organized into sections)
  • Patterns for compaction, files-as-memory discipline, task isolation
  • Git habits, approval modes, a few hooks
  • Two or three skills that earned their keep
  • Sub-agents used for parallel and isolated work
  • Knowledge files for stable reference material

It works. It also creaks. The creaking is the subject of this article.

Three symptoms of outgrowing tactics

Three patterns signal "this is no longer a config; it's a system you don't yet understand."

Symptom 1: The agent ignores rules you wrote

You added a rule. The agent followed it for a week, then started ignoring it. Or it follows the rule in some contexts but not others. Or it follows the rule when you say it explicitly but forgets it when you don't.

This isn't a model failure. It's a structural one — the rule was supposed to apply sometimes, and a flat rule list can't express that. Every rule loads with equal weight in every session. The model handles a few-dozen instructions reliably, gets fuzzier in the hundreds, and past two hundred it's effectively guessing which rules to attend to.

Adding more rules makes it worse. The fix isn't quantity. It's structure — rules that load when relevant, not rules that load always.

Symptom 2: Skills don't compose

You wrote a "code review" skill. It works when the user is reviewing code. You also wrote a "decision review" skill. It works for decisions. But when you want a code-review-with-decision-context — say, reviewing a code change that's also an architectural decision — neither skill alone is right and combining them is awkward. You write a third skill. Then a fourth. They start to overlap.

This isn't a skill problem. It's that skills are encapsulating procedure but not types — and when you need to mix types, encapsulation doesn't compose. A code review is a procedure for checking work; a decision review is a procedure for evaluating choices. They share neither inputs nor stopping criteria nor evaluative dimensions, but you've been treating them as parallel.

Symptom 3: Sub-agents have no clear identity

You spawn sub-agents for various tasks but you can't articulate why this kind of work belongs in a sub-agent. Sometimes a sub-agent helps. Sometimes it adds overhead. You can't tell in advance which.

This isn't a sub-agent problem. It's that you're using "sub-agent" as a tool when it's actually a role — and role isn't a runtime decision, it's a structural one. When you spawn a "research sub-agent," you're asking the platform to assume an identity. If the identity isn't clearly defined, the sub-agent's output is noise dressed as specialization.

All three symptoms are the same underlying issue: the configuration is treating different kinds of things as the same kind.

Section 1: Progressive disclosure — the first architectural move

The simplest version of "structure your config." Take the monolithic instructions file and break it into parts that load conditionally.

How it works

The instructions file becomes a router — short, stating universal rules and listing what's available elsewhere. Specific contexts (a project, a domain, a kind of work) get their own files. Files load explicitly when the agent enters that context. The agent isn't holding all rules all the time — it holds the rules relevant to now.

Concrete example

Old: a single CLAUDE.md is 800 lines covering six different kinds of work.

New: CLAUDE.md is 80 lines (universal rules — truth, scope, plan-before-doing) plus six specialized files for specific contexts. When working on context X, the agent loads the universal CLAUDE.md plus context-X.md. Cleaner, smaller working set, fewer instruction collisions.

What this exposes

You start asking questions you didn't have to ask before:

  • Is this rule universal or context-specific? — types of rules
  • When does this context start and end? — phases of work
  • What rules should always be loaded vs only when needed? — load-time decisions

These are architecture questions. They're not optional once your config is big.

Section 2: Where does X live? — the routing question

Once you accept that configuration has structure, every new piece of information has a routing question.

  • "The agent should always cite primary sources." — universal rule. Goes in the instructions file.
  • "When reviewing residential admissions, do the financial check first." — context-specific procedure. Goes in a skill or context file, loaded when admissions work begins.
  • "Don't store applicant SSNs in conversation." — universal rule, but specific to a category of work. Goes in the universal instructions but tagged to its trigger.
  • "Use this template for monthly reports." — knowledge / procedure for specific recurring work. Goes in a skill.

You start needing categories for these — types of configuration that go in different places, load at different times, and serve different purposes.

The categories I converged on, after more iterations than I want to admit:

Type What Where When loaded
Behavior How the agent should think and act Universal rules in instructions file Always
Procedure How to do a specific kind of work Skills On invocation
Criteria How to evaluate quality of a thing Reusable across procedures Loaded with relevant procedures
Knowledge What to know about a domain Knowledge files Loaded when working in that domain
Policy What's allowed / required / forbidden Rules with different scopes Loaded by context

Don't worry about getting these exact names right — the names will be different for your work. The point is that there are categories, and putting different kinds of things in different places stops the configuration from collapsing into noise.

Section 3: Phases and authority — the decisions that sneak in

The trickiest part of this transition is that some decisions you've been making implicitly are actually structural.

Phase decisions

  • "We're in the research phase." — the agent should be loading sources, not drafting.
  • "We're in the drafting phase." — the agent should be writing, not researching.
  • "We're in the review phase." — the agent should be critical, not generative.

You've been making these phase calls verbally session-by-session. They're not session-level — they're engagement-level, and the configuration that's right for one phase is wrong for another.

Authority decisions

  • The agent might be authorized to edit files in some contexts and not others.
  • It might be allowed to send drafts when working on email triage but not when working on general documents.
  • It might run tests automatically in dev work but require approval in production work.

You've been making these authority calls implicitly through approval modes. They're real structural decisions that deserve to be configuration, not vibes.

When you start naming phases and authority explicitly

This is the moment most people realize they're not configuring an assistant anymore. They're operating a system. The config is no longer "instructions for Claude" — it's "the operating model for how I do certain kinds of work with AI assistance."

That's the architecture moment.

Section 4: What changes once you cross this line

When you start treating your AI configuration as architecture rather than tips:

  • Decisions are explicit. Why something is a rule vs a skill vs a context file isn't preference; it's a typed decision with reasoning behind it.
  • Composition becomes possible. Different parts of the configuration combine cleanly because they have clear roles. A code-review skill and a decision-review skill can run together because their inputs, outputs, and dimensions are typed differently.
  • The agent isn't just an assistant; it's a runtime. Your configuration is the operating model. The agent is what executes it.
  • You can hand work off. A clean configuration means another agent (or another version of you, or a collaborator) can pick up the work without retraining the same patterns.

You haven't built a product. You've built a personal system. And once you have a system, you start asking questions that hobbyist-level configuration never has to answer.

Section 5: The question I hit that started everything

I was running this kind of configuration for nine months before I realized I was building something that needed real design.

The trigger was a specific moment. I was working on a residential governance issue that crossed legal, financial, and operational lines. The agent had everything — the instructions file, all the relevant skills, the domain knowledge, the right hooks. And it produced output that was structurally wrong in a way no individual rule violation could explain.

The output was wrong because the agent didn't know which role it was supposed to play. Was this a legal review? A financial analysis? An operational decision? It had context for all three. It tried to satisfy all three. It satisfied none.

The fix wasn't another rule. It was realizing that the unit I'd been organizing around — domain teams of skills and knowledge — wasn't structural. The structural unit was the engagement — the specific kind of work, with its phases, actors, decisions, and rules. The agent needed to know which engagement it was inside, and the configuration needed to make that explicit.

What I built next is the subject of next week's piece.

It's called NeuralPlexus. The public repo opens the same day. The article that introduces the architecture goes live alongside it.

This tutorial series was the practical lead-up. The vision piece is the structural answer.

If the symptoms in this article describe your work, the next article is for you.


The Command-Line LLM Tools Tutorial Series