Loop Engineering 101

Published June 26, 2026

Most people use AI like a vending machine. Ask a question, wait, ask again. You're the one holding it together, turn after turn.

Boris Cherny, who built Claude Code at Anthropic, put it plainly: "I don't prompt Claude anymore. I have loops running that prompt Claude and figure out what to do. My job is to write loops." Days later, OpenClaw creator Peter Steinberger posted nearly the same thing. That post hit 6.5 million views in days.

They're not talking about better prompts. They're talking about a different job entirely. Instead of crafting turn-by-turn instructions, you define a goal and build a system that plans, acts, verifies against real signals (tests, linters, type checkers), remembers what worked, and iterates until the job is actually done. You walk away. The loop keeps going.

Why this matters

If you've ever asked Claude to fix something, gotten a confident answer, and then spent 20 minutes checking whether it actually worked, you already know the problem. Single prompts can't grade their own homework. They stay confident until something external proves them wrong.

Loop engineering takes you out of that seat. You write the stopping condition once. A separate check runs after every turn. The agent keeps working until the condition is true, or until you tell it to stop. That's the shift from prompt engineering to loop engineering.

The two commands that make loops real

1. `/goal` — the loop that works until it's done

This is the one that matches everything above. You type /goal followed by what "done" actually looks like, and Claude keeps working turn after turn on its own.

Here's the part that makes it real: after every single turn, a second AI quietly checks whether you've hit the goal yet. If you haven't, it tells Claude why and Claude keeps going. The moment the goal is truly met, the loop stops on its own. That self-check after every turn is the whole difference between a real loop and a prompt that runs once and hopes.

Try this in Claude Code:

/goal every blog post in the /posts folder has a meta description under 160 characters and a title under 60 characters. After each file, print the character counts so you can confirm it, and do not touch the body copy. Stop when every file passes, or after 25 files.

See why that works: the goal is something Claude can actually measure (character counts), so the checker can tell the exact moment it passed. Claude edits a file, counts the characters, sees it is still too long, fixes it, counts again, and only moves on once it clears the bar. You never typed "try again" once.

Codex has the same primitive, also called /goal. It keeps working across turns until a verifiable stopping condition holds, with pause and resume. Same idea, both tools.

2. `/loop` — the loop that repeats on its own

Reach for this when the work is not "finish a pile" but "keep checking on something." You type /loop with how often and what to do, and Claude re-runs it for you, either on a schedule you set or at a timing it picks itself.

This is the loop for things like watching a deploy, or triaging the same inbox every morning.

Try this in Claude Code:

/loop 30m check whether my live site is back up by loading the homepage, and the moment it returns a normal page, tell me and stop checking.

The 30m just means every 30 minutes. You can also lead with a plain instruction like "every morning, triage my inbox" and Claude will schedule it for you. Press Esc to stop a loop that is waiting for its next run.

Key insight: /loop re-runs on a cadence. /goal keeps going until a condition you wrote is actually true. The checker is a separate small model, so the agent that wrote the code isn't the one grading it.

The building blocks of a real loop

Once you understand /goal and /loop, the rest is about making the loop touch your real world instead of just your filesystem.

Automations — the heartbeat

Automations are what make a loop an actual loop and not just one run you did once.

In the Codex app, you make one in the Automations tab: pick the project, the prompt it will run, how often, and whether it runs on your local checkout or on a background worktree. Runs that find something go to a Triage inbox. Runs that find nothing archive themselves.

OpenAI uses them internally for boring stuff like daily issue triage, summarizing CI failures, writing commit briefings, and hunting bugs somebody added last week. An automation can call a skill too, so you fire $skill-name instead of pasting a giant wall of instructions into a schedule nobody will ever update.

Claude Code gets to the same place through scheduling and hooks. You can run a prompt on an interval with /loop, schedule a cron task, fire shell commands at certain points in the agent lifecycle with hooks, or push the whole thing to GitHub Actions if you want it to keep running after you close the laptop. Same idea: define an autonomous task, give it a cadence, and let the findings come to you.

Worktrees — so parallel doesn't turn into chaos

The second you run more than one agent, files start colliding. Two agents writing the same file is the exact same headache as two engineers committing to the same lines without talking first.

A git worktree fixes it. It's a separate working directory on its own branch, sharing the same repo history, so one agent's edits literally cannot touch the other one's checkout.

Codex builds worktree support right in so several threads hit the same repo at once without bumping into each other. Claude Code gives you the same isolation with git worktree, a --worktree flag to open a session in its own checkout, and an isolation: worktree setting you stick on a subagent so each helper gets a fresh checkout that cleans itself up after.

The worktrees take away the mechanical collision. You are still the ceiling. Your review bandwidth decides how many agents you can actually run, not the tool.

Skills — so you stop explaining your project every single time

A skill is how you stop re-explaining the same project context every session like a goldfish. Both tools use the same format: a folder with a SKILL.md inside holding instructions and metadata, plus optional scripts, references, and assets.

Codex runs a skill when you call it with $ or /skills, or by itself when your task matches the skill description. Claude Code does it the same way. A tight, boring description beats a clever one, because the agent uses the description to decide when to fire.

Without skills, the loop re-derives your whole project from zero every cycle. With skills, it compounds. The conventions, the build steps, the "we don't do it like this because of that one incident" get written one time where the agent reads them every run.

One thing to keep straight: the skill is the authoring format. A plugin is how you ship it. When you want to share a skill across repos or bundle a few together, you package them as a plugin. True in Codex, true in Claude Code.

Plugins and connectors — the loop touches your real tools

A loop that can only see the filesystem is a tiny loop. Connectors, which are built on MCP (Model Context Protocol), let the agent read your issue tracker, query a database, hit a staging API, or drop a message in Slack. Codex and Claude Code both speak MCP, so the connector you wrote for one usually just works in the other.

Plugins bundle connectors and skills together so your teammate installs your setup in one go instead of rebuilding the whole thing from memory.

This is the difference between an agent that says "here is the fix" and a loop that opens the PR, links the Linear ticket, and pings the channel once CI is green by itself.

Sub-agents — keep the maker away from the checker

The most useful structural thing in a loop is splitting the one who writes from the one who checks. The model that wrote the code is way too nice grading its own homework. A second agent with different instructions, and sometimes a different model, catches the stuff the first one talked itself into.

Codex spawns subagents when you ask, runs them at the same time, and folds the results back into one answer. You define your own agents as TOML files in .codex/agents/, each with a name, description, instructions, and optional model and reasoning effort.

Claude Code does the same with subagents in .claude/agents/ and agent teams that pass work between them. The usual split in both: one agent explores, one implements, one verifies against the spec.

Real example: the self-checking research brief

Here is the simplest way to see why a loop beats a prompt.

Say you ask Claude to write you a one-page brief on a topic. That is the task. The goal, the part that turns it into a loop, is a bar it can measure: every claim has at least three sources, and every link actually opens to a page that backs up what it said.

Without a loop, this is exactly where AI quietly burns you. It hands you a clean-looking brief with sources that sound real but do not exist. A single prompt cannot catch its own invented source, because it stays confident it is right until something actually opens the link.

With a loop, Claude writes the brief, then goes link by link, actually opening each source to confirm it is real and that it supports the claim. It throws out the fake ones, finds real replacements, and keeps checking until every source on the page is something you can actually open.

How you'd actually run it:

/goal write a one page brief on [your topic] where every claim has at least three sources and every link actually opens to a page that supports the claim. Open each link to confirm it before you call it done. Replace any source that is dead or does not back up the claim. Stop only when every source on the page checks out.

That is the loop doing the part you used to do by hand.

Three honest caveats

Loops are not free and not for everything.

One-off tasks don't need loops. If the work is a single answer (write this email, summarize this doc), a prompt is faster. Loops earn their setup cost on repeating or multi-item work.

Loops burn more usage than prompts. A loop that checks itself and retries is running multiple agent turns per item. On a Claude plan, that means you'll hit usage limits faster. Start with small batches. The "after 25 files" line in the blog post example exists for exactly this reason.

Verification is the whole game. An unverified loop just makes mistakes faster. If you can't describe how the loop should check its own work, the task isn't ready to be a loop yet. Keep it as a prompt until you can.

Cost control is still the biggest unsolved problem in this space. Building a good loop is secondary to keeping it from billing you $400 overnight. Set caps, start small, and supervise for the first few runs before you walk away.

Here are some related guides to check out: