Getting Started with Command Line LLM Tools: Claude Code & Codex CLI

Share

Part 1 of the Command-Line LLM Tools tutorial series. Last reviewed: April 2026.


You're here because you're ready to try Claude Code or Codex CLI but don't know how to start. If you don't know what these are or why you should use them, read the Why Use Command Line LLM Tools piece first.

The major difference between these tools and the web or app version of Claude and ChatGPT: command line tools can run commands and read/write files on your computer... that's the unlock. It can act as you on your machine. It's also why a little setup is required, and why you should give the tool a contained place to live before pointing it at anything important.

Let's get started.

1. What Operating System Are You Using?

On a Mac: This article assumes you're on a Mac. macOS is Unix-based, and the built-in Terminal app will work for everything in this guide. You can find Terminal.app under /Applications/Utilities/ via Spotlight search.

On Windows: You have a choice. PowerShell is Windows’ built-in terminal for typing commands. The native PowerShell will work for many basic agent commands, but a lot of the open-source tooling assumes a Unix-like environment. The cleaner path is WSL 2 — Windows Subsystem for Linux 2 — which gives you a real Linux environment running alongside Windows. Run wsl --install in PowerShell (admin) and you'll get WSL 2 by default on any modern Windows 10/11 machine. Full instructions in Microsoft's setup guide. Plan to spend an hour on this; it pays back the time within a week.

If you are absolutely allergic to setup, native PowerShell is fine for getting started. You can graduate to WSL 2 later.

2. The Terminal: Don't Overthink It

If you've never played with a terminal (anyone remember MS-DOS?), it'll feel foreign to you at first. Just dive right in and you'll be used to it in no time. Open the app... you'll see a "command prompt" — usually your username and folder, ending in $ or >. This is where you'll spend most of your time with these tools.

Check out Anthropic's guide to using a terminal for new users.

When you start to feel comfortable, you may start to wonder about alternatives — like Ghostty or iTerm2 — fast terminals with GPU acceleration, modern protocols, native tabs and split-pane support. I use Ghostty… it's a native app, with sensible defaults you don't have to configure but you'll switch later. Not necessary on day one.

3. Make a Playground Folder

Before installing anything, give the agent a place to live. The agent's reach is bounded by the directory you start it in — so where you start it matters. Don't start it in your home directory or something like Documents/.

Make a dedicated folder; let's call it Playground. Use Finder (Mac) or File Explorer (Windows) — no terminal commands required:

  • macOS: Open Finder, click your username in the sidebar, right-click in the empty space, "New Folder," name it Playground.
  • Windows: Open File Explorer, navigate to C:\Users\YourName\, right-click, "New" → "Folder," name it Playground.

That's your sandbox; we'll launch the tools in that folder. Anything the agent does inside Playground stays in Playground. If something goes wrong, you delete the folder and start over.

Let's test navigating to this folder from inside the terminal. When you opened Terminal in the previous step, it dropped you in your home folder by default. Type ls (short for "list") and press Enter:

ls

You should see Playground in the output along with all the other folders in your home directory. Now move into it. Type cd Playground (short for "change directory") and press Enter:

cd Playground

Your prompt updates to show you're now inside Playground. Run ls again. It should be empty, because you haven't put anything in there yet. Good. That's the state we want before installing the tool.

cd and ls are the only two terminal commands worth memorizing on day one. Everything else, you can ask the agent to do for you.

4. Choose Your LLM: Don't Overthink It Part 2

There are two leading command line tools as of April 2026:

  • Claude Code (from Anthropic) — pairs with Claude models.
  • Codex CLI (from OpenAI) — pairs with ChatGPT.

You can (and will) run both, but start with one. If you're already paying for Claude Pro ($20/month), start with Claude Code. Already paying for ChatGPT Plus ($20/month)? Start with Codex CLI.

You'll be using a lot of tokens — every read of a file, every search, every multi-step plan. If you pay per-token via API, a single day of work can easily run $20 or more. The fixed-price subscriptions are subsidized for exactly this kind of usage. Command line users almost always come out ahead on a subscription.

If you aren't already paying for a subscription, just flip a coin. If you still can't decide, use Claude Code. You'll want to be using both in a month anyway (if not sooner). You may be wondering why I didn't mention Gemini CLI. As of April 2026 (Gemini Pro 3.1), it's not strong enough to be your first or primary tool, but once you've found your rhythm with one (ideally both) of the above, Gemini CLI earns a spot in the rotation as a third tool.

5. Install and Authenticate

Claude Code is a single self-contained program — you don't have to install anything else first. (In developer-speak: a "native binary" with no runtime dependencies.) I'll walk you through the installation, but you can also check out Anthropic's quickstart guide. There are two install paths:

Option 1 — Recommended (cross-platform curl install script)

curl -fsSL https://claude.ai/install.sh | bash

What that does, left to right: curl is a built-in command that fetches things from URLs. The flags tell it how to behave: -f fail quietly on errors, -s silent (don't print progress), -S but do show errors if something goes wrong, -L follow redirects. The URL is Anthropic's installer script. The | (pipe) sends the output to bash, which runs it as a shell script. So: download Anthropic's install script and execute it.

Option 2 — Homebrew (Mac, if you already use it)

brew install --cask claude-code

brew is the Homebrew command. install --cask tells it to install a Mac application (rather than a command-line library). claude-code is the package name.

Once that finishes, close your terminal window and open a new one. (The installer added Claude Code to your shell's PATH — the list of places your terminal looks for commands — but the change only takes effect in newly-opened terminal windows. Without the restart, typing claude will get you "command not found.")

Once you've restarted in a new window, navigate to your Playground folder. Use ls to confirm where you are, and cd Playground to change into it. Your prompt updates. Now run Claude Code by typing:

claude

When you hit Enter, Claude Code launches inside your terminal. On first run, it walks you through some quick setup — picking a theme, accepting terms — then prompts you to log in. Choose your account type (Pro, Max, Team, Enterprise, or Console), and it'll open a browser for the actual sign-in. Once you authenticate, the browser hands you back to the terminal. Credentials are stored locally so you won't have to log in again next time.

You're now in an interactive Claude Code session — you'll see a welcome screen with your current folder and a prompt waiting for input. You're connected.

Codex CLI installation is slightly more complicated. OpenAI's overview assumes you understand how npm works (which requires Node.js). Don't worry, I'll walk you through it. There are two install paths:

Option 1 — Homebrew (Mac, if you already use it, no Node.js needed)

brew install --cask codex

Option 2 — Using npm (cross-platform, requires Node.js):

npm is the package manager that ships with Node.js, so you need Node.js installed first. Go to the Node.js download page and copy the bash snippet there into your terminal. It installs nvm (Node Version Manager) first, then uses nvm to install Node.js itself. The snippet includes its own version-check commands, so you'll see Node.js and npm confirm themselves once it's done.

Then install Codex CLI:

npm install -g @openai/codex

npm install installs a package. The -g flag means "global" — install it system-wide so you can run codex from anywhere, not just from one project folder. @openai/codex is the package name (the @openai/ prefix is npm's way of saying "the package called codex published by OpenAI's namespace," to avoid name collisions with other packages also called codex).

No matter which way you installed it, close your terminal window and open a new one before continuing. (Same reason as Claude Code: the installer added the codex command to your shell's PATH, and the change only takes effect in newly-opened windows. If you skip this and codex returns "command not found," that's why.)

Once you've restarted in a new window, use ls to confirm where you are, and cd Playground to change into your sandbox folder. Your prompt updates. Now run:

codex login

When you hit Enter, Codex prompts you to choose how you want to authenticate — sign in with a ChatGPT account (most common) or use an API key from the OpenAI platform. Pick the ChatGPT option if you're on Plus or Pro. Codex opens a browser for the actual sign-in. Once you authenticate, the browser hands you back to the terminal. Credentials are stored locally so you won't have to log in again next time.

After login, run codex (without login) to start an interactive session. You'll see Codex's welcome screen and a prompt waiting for input. You're connected.

That's it! The hardest part is done... you now have an LLM agent that can see and modify the Playground folder you just made.

6. The Instructions File: CLAUDE.md or AGENTS.md

Every time you've explained your project to ChatGPT, started a new chat the next day, and explained it again. Every time you've corrected the same writing tic for the fifth time. Every time you've thought "I just told you this last week" — you've been hitting the same wall: the chat starts from zero every time. The agent doesn't remember your rules, your conventions, or the corrections you've already made. You re-explain. It re-fails.

The web versions of these tools have started patching this with persistent-memory features — Custom GPTs, Projects, Gems, ChatGPT memory, Claude Projects. They help, but they're shallow: one bucket of remembered facts per workspace, edited through a UI, with no real way to organize what's inside. The CLI version skips the bucket entirely. You just write a file. The agent reads it at the start of every session, before you've typed a single thing. The most important file you'll write is the instructions file.

Think of it as a README for the agent. A plain markdown file at the root of whatever folder you start the agent in: Claude Code reads CLAUDE.md, Codex reads AGENTS.md. Same idea, different filename.

What is markdown? It's the simplest possible format for structured documents. A plain text file, edited in any text editor (TextEdit, Notepad, VS Code), saved with a .md extension. Structure comes from punctuation — # Heading for headings, **bold** for emphasis, - list item for lists. That's the whole format. The reason it won across AI tools, developer platforms, and note-taking apps: a human reading it sees clean structure, and a machine parsing it sees clean structure too — with no proprietary software in between. The agent reads it the same way it reads any other file.

If a CLAUDE.md (or AGENTS.md) exists in your Playground folder, the agent reads it at the start of every session — your conventions, your preferences, the rules that should always apply. Without it, every session starts from zero, the agent makes its own assumptions about how you want things done, and you spend the first ten minutes correcting choices it shouldn't have had to make.

So what should you put in there? Don't try to write a perfect instructions file on day one. Start with these five rules:

Prioritize Truth Over Agreement

This is the rule that matters most — and the one that fixes the worst thing about AI today.

You've felt it. You ask ChatGPT a factual question and get a confident answer with a confident citation. The citation doesn't exist. You ask it to summarize a document; it invents a paragraph that wasn't there. You float a half-formed idea; it tells you the idea is brilliant and helps you pursue it down a wrong path. You pay for a subscription, and the tool you're paying for confidently lies to you several times a day.

This is hallucination, and it's not a quirk — it's the failure mode that makes AI feel unreliable for serious work. It's why I rank Gemini as a third tool rather than first or primary: as of April 2026, it hallucinates more confidently than the alternatives, and the gap is large enough to matter.

You can't make hallucination go away completely. You can dramatically reduce it by telling the agent up front that truth beats agreement: it should disagree with you when you're wrong, refuse to fabricate when it doesn't know, and resist the helpful-sounding answer when the honest answer is "I don't know."

Type: Prioritize truth over agreement. When my idea is wrong, say so directly. When you don't know something, say so.

This single instruction will save you hours and prevent decisions you'd regret. It's the load-bearing rule of the five.

Show Your Work

The truth rule's close cousin. The agent answers your question. The answer sounds confident, specific, plausible. You don't know if it actually checked anything or just pattern-matched what the answer probably looks like — because the tone is identical either way. You make a decision based on it. Then later you find out it was guessing.

Make the agent tell you which mode it's in. Verified (read a file, ran a command, cited a real source) or making an assumption — its best guess, based on patterns it's seen before. You'll be surprised how often "I'm making an assumption" is the honest answer — and how many bad decisions you'll dodge by knowing.

Type: When you make a factual claim, say whether you verified it (read a file, ran a command, cited a source) or you're making an assumption (e.g., training data, pattern-matching, plausible-sounding guess, something you assumed rather than checked). If you're making an assumption and it matters, check before asserting.

Stay in Scope

You ask the agent to clean up the formatting on one paragraph of a draft. It cleans up the paragraph, then rewrites the headline because it thought yours was weak, restructures the sentence after for "better flow," and helpfully adds a closing line you didn't ask for. The fix you wanted is correct. The other three changes are now your problem to review — at midnight, when you just wanted that one thing done.

Agents are trained to be helpful, which often manifests as scope creep dressed up as initiative. Tell it to stay in its lane.

Type: Stay within the scope of what I asked. When you notice adjacent issues, mention them and ask whether to address them — don't fix them silently.

Read Files Before Editing Them

You ask the agent to tighten the second paragraph of pitch-draft.md. It edits the file. The new version looks fine. Then you actually open it and realize the agent rewrote a paragraph that wasn't yours — it guessed what a "pitch draft" probably says and confidently improved an imaginary version of your document. Your real second paragraph is now gone, replaced by something that vaguely sounds like what you might have written.

This happens often enough that it's worth a dedicated rule.

Type: Read a file's contents before editing it. Use the file as the source of truth for what's in it.

Plan Before Doing

Anyone who's worked on something they cared about has learned this lesson the hard way: rushing in feels productive, then the cleanup costs you the time you thought you were saving. "Measure twice and cut once."

The cost of this habit is a little reading time. The cost of not doing it is the agent confidently making forty-seven changes that don't quite match what you wanted, and you sorting through which to keep.

For anything beyond a one-line change, make it draft a plan first. You read it. You approve it. Then it executes.

Type: For tasks touching multiple files or making structural changes: draft a markdown plan first and wait for my approval before executing.

You have the five rules... but remember, you don't have to create this file yourself. The whole point of these tools is that the agent can read and write files — so the cleanest way to make CLAUDE.md or AGENTS.md is to ask the agent to do it. Inside your running Claude Code or Codex session, paste this prompt:

Type: Create a CLAUDE.md (or AGENTS.md for Codex) file in this folder with the following five rules under it, each as a markdown subheading with a one-line instruction:

Prioritize Truth Over Agreement. Prioritize truth over agreement. When my idea is wrong, say so directly. When you don't know something, say so.

Show Your Work. When you make a factual claim, say whether you verified it (read a file, ran a command, cited a source) or you're making an assumption (e.g., training data, pattern-matching, plausible-sounding guess, something you assumed rather than checked). If you're making an assumption and it matters, check before asserting.

Stay in Scope. Stay within the scope of what I asked. When you notice adjacent issues, mention them and ask whether to address them — don't fix them silently.

Read Files Before Editing Them. Read a file's contents before editing it. Use the file as the source of truth for what's in it.

Plan Before Doing. For tasks touching multiple files or making structural changes: draft a markdown plan first and wait for my approval before executing.

Show me the full contents of the file before writing it.

You've just used the agent to write its own instructions file; from now on, every new rule gets added the same way.

As you work, you'll discover others to add. The trigger to add a new rule: when the agent makes the same mistake twice. One miss is a fluke. Two is a pattern. Pattern becomes a rule. Each new rule gets added the same way — tell the agent. The instructions file grows with the work.

Bonus Tip: Write Rules In Positive Framing

Re-read the five rules above. Notice they tell the agent what to do, not what not to do: "Prioritize truth," "Show your work," "Stay in scope," "Read files before editing," "Draft a plan first." This isn't accidental.

Try this: Don't think of a white bear.

You just thought of the white bear. To follow the instruction "don't think of X," your brain has to first call up X — and once X is activated, the "don't" is fighting an uphill battle. Psychologists call this ironic rebound — formalized by Daniel Wegner in his 1987 paper "Paradoxical Effects of Thought Suppression" and his book White Bears and Other Unwanted Thoughts.

LLMs do something structurally similar. When you tell the model "don't be verbose" or "don't hallucinate," the model has to summon the very concept you're trying to suppress before it can suppress it — and the suppression is unreliable. A 2025 paper on Ironic Negation showed that under cognitive load, the suppression fails while the primed concept stays active in the model's reasoning. Earlier work by Truong et al. (2023) established that even modern LLMs handle negation as a measurable weak spot.

Practical principle: "Respond concisely" outperforms "Don't be verbose." "Use verified sources" outperforms "Don't hallucinate." Tell the agent what to do, not what to avoid. Each negation you bypass is one fewer failure-prone step.

Most prompt-engineering tutorials skip this. Once you know it, you'll see it everywhere — and your instructions file will be measurably more reliable for it.

7. Go Play

Drop a file in Playground/, then ask the agent to do something with it. Some ideas:

Tone shifting. Drop a draft email into a file. Ask:

Type: Read my draft email and make it sound professional but firm. Show me the revised version.

Explain Like I'm Five summarization. Save a complex legal paragraph or a dense technical paper to a file. Ask:

Type: Read this and explain it to me like I'm five years old.

Document comparison. Save two versions of something — an old contract and a new one, two competing product specs, two vendor proposals, your original draft and an edited version, last year's strategy doc and this year's, two articles giving different advice on the same question. Ask:

Type: Compare these two documents and show me what changed. Group the changes by significance: structural, substantive, cosmetic.

Reading list triage. Save a list of articles or papers (just URLs and titles in a text file). Ask:

Type: Read the titles, infer what each one's about, and group them: must-read, skim, skip. Explain the calls.

Resume-to-job-description fit. Save your resume and a job description as separate files. Ask:

Type: Read both. Tell me where the resume already aligns, what's missing, and where I'm overselling. Don't rewrite anything yet — just diagnose.

These all share two properties: the agent has to read a file before responding, and the output is text you can sanity-check yourself. You're building the muscle of giving the agent material to work from instead of expecting it to know things.

A note on what these tools are for: they were built as coding agents, and the docs and tutorials lean heavily that way. Don't let the "code" framing limit your imagination. They're equally useful for any work where having an agent read your files matters — research, writing, household coordination, planning, document review. Most of what makes them powerful is the file-and-context interface, not the code generation.

While you're playing, one concept worth practicing:

Templates. Pick a document you've written that worked — a meeting recap, a project update, a client email, a decision memo — and save it to your Playground/ folder. Ask:

Type: Read this. Extract the pattern — sections, structure, tone, what kinds of information go where. Save it as a reusable template I can hand back to you next time I need to write one of these.

Now you have a template the agent can fill in for future versions of the same document type, without you having to re-explain the format every time. Pattern in, instances out. The agent is much better at filling structure than inventing it — so giving it the structure once pays off across every future use.

Useful Commands

Both Claude Code and Codex have built-in commands that start with /. The handful worth knowing on day one:

  • /help — shows all available commands. Use this anytime you forget the others.
  • /compact — summarizes the conversation so far and clears out the noise while preserving your instructions. As you chat, the agent's "memory" fills up with messages, responses, files it read, and commands it ran. Eventually this slows the agent down and degrades reasoning. Run /compact when the agent feels sluggish or forgetful — better, run it when you finish one task before starting the next.
  • /status — shows the current session: which model you're using, how much of the context window you've burned, and other state. Useful when something feels off and you want to see what's actually loaded.
  • /model — switch the model mid-session. Both tools default to a sensible model, but you can swap to a faster/lighter or smarter/slower one if the work warrants it. When to switch is a topic for another day; for now, it's enough to know the command exists.

One Safety Rule Before You Turn It Loose

A getting-started guide that doesn't address what to never feed the agent is incomplete.

If you wouldn't email it to a stranger, don't put it in front of Claude or Codex.

That covers most of the cases that matter:

  • Passwords, account numbers, social security numbers, anything you don't want leaked
  • Tax returns, bank statements, brokerage exports
  • Medical records and health information (HIPAA)
  • Client or work data covered by an NDA
  • Production credentials, API keys, signing certificates

Both Anthropic and OpenAI publish policies on how subscription-tier conversations are used (the short version: not used for training by default — but the data is still stored on their servers, which means it can theoretically be exposed by a breach, a subpoena, or a policy change). Protect yourself regardless. For the highest-sensitivity work, consider local model alternatives — but that's a topic for a different article.

8. Create Some Structure With Folders

Playground is fine for week one. Past that, you'll want real structure.

A CLI agent treats your filesystem as its external working memory. Mess in, mess out. One giant folder with hundreds of files will be unmanageable. What matters is both how you think about your files and what the agent does with them. Four categories cover it:

Bucket What's in it
Instructions CLAUDE.md, AGENTS.md, other guidance and conventions, and eventually... skills, scripts, etc. Things that govern the agent's behavior.
Projects Active and ongoing work — drafts, decisions, in-flight materials. This is the workspace where files change rapidly.
Knowledge Saved reference materials and updated knowledge for the agent to consume when appropriate.
Archives Frozen completed work. Mostly ignored; touched only to recover something.
Instructions/   ← CLAUDE.md, AGENTS.md, skills, scripts
Projects/       ← active and ongoing work
Knowledge/      ← reference material the agent consults
Archives/       ← frozen completed work

You don't need this exact layout on day one. A single CLAUDE.md in your Playground folder is enough to start. The four-bucket structure (with subfolders) is what the file system grows into once you're doing real work across multiple domains.

9. What Comes Next

You now have a working CLI tool for agentic work, an instructions file with five rules that prevent the worst mistakes, a folder structure that isn't chaos, and some practice on real tasks. That's enough to be useful for weeks.

When you're ready to go deeper, the next four articles in this tutorial series cover topics to give you guidance for different areas to level up:

  • Part 2: When the Agent Forgets — managing memory, context, and compaction across long work.
  • Part 3: When the Agent Goes Off the Rails — git as time machine, hooks as hard guarantees, approval modes.
  • Part 4: When the Agent Doesn't Know Enough — extending capability with skills and sub-agents.
  • Part 5: When Configuration Becomes Architecture — what happens when you outgrow a single instructions file.

For now, you have what you need. Open a terminal. cd Playground. Run claude or codex. Try one of the ideas above. Make a mistake. Recover from it.

That's how the early days are supposed to feel.


The Command-Line LLM Tools Tutorial Series