InsightsTosea Team18 MIN READ

What Is Loop Engineering? A Complete Guide from Prompt to Harness Engineering (2026)

What is loop engineering? A guide to the 2026 shift in AI agents — from prompt to context to harness engineering, to designing the agent loops that run coding agents while you sleep.

What Is Loop Engineering? A Complete Guide from Prompt to Harness Engineering (2026)

In the second week of June 2026, a single idea reorganized how a lot of people talk about working with AI: stop prompting your coding agent, and start designing the loop that prompts it for you. The phrase that stuck was loop engineering. It spread fast — a post that kicked off the discussion reportedly crossed 6.5 million views in days — and it gave a name to something experienced agent users had already started doing without quite calling it anything.

This guide explains what loop engineering actually is, how it became the center of gravity in agentic AI almost overnight, and why it matters. It also traces the lineage that got us here, because loop engineering did not appear from nowhere: it is the latest layer in a four-step progression from prompt engineering to context engineering to harness engineering and now to the loop itself. If you want the foundations first, our guides to Claude Code and OpenAI Codex cover the coding agents that made this shift possible.

What Is Loop Engineering?

Loop engineering is the practice of designing the system that prompts, checks, remembers, and re-runs an AI agent — instead of you typing every next instruction by hand. The unit of work is no longer a single prompt or even a single conversation. It is a loop: a repeating cycle in which the model takes an action, receives feedback from its environment, uses that feedback to decide the next move, and continues until a defined termination condition is met.

Put another way, you stop being the person in the chat box and become the person who builds the machine that runs the chat box. You define a recursive goal — "make the test suite pass," "refactor this module and keep all behavior," "triage every open issue and draft fixes" — and the agent iterates: inspect the codebase, make a change, run validation, read the outcome, and decide what to do next. The skill shifts from writing the perfect sentence to engineering a reliable cycle.

The contrast with the old way is stark. A one-shot prompt treats the model as a code generator: ask once, get an answer, copy it out. A loop treats software work as an iterative system that can run for minutes or hours, correcting itself against real signals — tests, type checkers, linters, runtime errors — rather than against your patience.

How Loop Engineering Got Hot: The June 2026 Moment

The term crystallized around two people in two days.

On June 7, 2026, Peter Steinberger — the developer behind the OpenClaw agent project — argued that the relevant skill was no longer prompting coding agents but designing the loops that prompt them. His post detonated, reaching roughly 6.5 million views and dominating the agent conversation for the following week. (If you're new to his project, we covered it in our OpenClaw agentic-shift overview.)

The next day, Google engineer Addy Osmani published an essay titled "Loop Engineering" that gave the practice its name and, more usefully, an anatomy: automations, worktrees, skills, connectors, sub-agents, and external state. That essay turned a viral take into a vocabulary people could build on.

The sentiment was not limited to enthusiasts. Boris Cherny, who works on Claude Code at Anthropic, reportedly summed up the change in four words: "I don't prompt Claude anymore." When the people building the most-used coding agents say they have stopped prompting by hand, the practice has clearly moved from fringe to mainstream.

Why now? Because the underlying agents finally got good enough. By mid-2026, coding agents could run autonomously long enough — and recover from their own mistakes well enough — that the bottleneck moved. When a single agent run might last an hour and touch dozens of files, the highest-leverage thing you can do is not to write a sharper prompt; it is to design a loop that keeps the agent productive, verified, and on-goal the whole time, including while you sleep.

From Prompt to Context to Harness to Loop: A Short History

Loop engineering is best understood as the fourth step in a steady migration outward — from the words you type, to the information the model sees, to the environment it runs in, to the cycle that drives it. Each layer wraps the previous one without replacing it.

The four-layer evolution of working with AI agents — prompt engineering, context engineering, harness engineering, and loop engineering, each nesting the one before it

Prompt engineering (roughly 2022–2024). The first skill was wording. Give the model a role, break the task into steps, add examples, ask it to think step by step. Prompt engineering optimized expression. Its ceiling was real: a perfectly phrased prompt still cannot supply facts the model never received.

Context engineering (2025). The focus moved from the words to everything the model sees at inference time — conversation history, retrieved documents, tool outputs, agent state, and dynamically assembled knowledge. On June 18, 2025, Shopify's Tobi Lütke offered the definition that stuck: providing all the context needed for the task to be plausibly solvable by the model. Andrej Karpathy endorsed it as "the delicate art and science of filling the context window with just the right information for the next step," and in September 2025 Anthropic formalized it as curating and maintaining the optimal set of tokens during inference. Prompt engineering became a subset of context engineering.

Harness engineering (2026). As agents started doing autonomous, multi-step work in production, a new layer mattered: the harness — the full environment of scaffolding, tools, constraints, and feedback loops around an agent like Codex or Claude Code. Harness engineering is what makes agents reliable rather than merely clever, and it nests the prior layers: harness contains context contains prompt.

Loop engineering (2026). The newest framing zooms in on the part of the harness that actually produces autonomy: the iterative cycle. Where harness engineering asks "what environment does the agent need?", loop engineering asks "what cycle keeps it working toward the goal, and when does it stop?" The intellectual ancestor here is the ReAct pattern (Reason + Act), from research at Princeton and Google, which interleaved reasoning steps with action steps and showed that a model that observes results between actions behaves very differently from one that answers once.

The progression is not a series of fads replacing each other. It is a set of nested concerns. You still write prompts; you still curate context; you still build a harness. Loop engineering is simply the layer where all of it gets put in motion.

The Anatomy of a Well-Engineered Loop

A loop that runs unattended for an hour needs more than a goal and a model. The pieces that separate a reliable loop from a runaway one are fairly consistent across the agents people are building in 2026.

  • A clear goal with a testable termination condition. "Make the tests pass" is a good loop goal because success is checkable; "improve the code" is a bad one because the loop never knows when to stop.
  • A tool set that touches the real environment. File access, a terminal, a test runner, a type checker, version control. The loop's feedback is only as honest as the tools that produce it.
  • Context management. Because each step feeds the next, a long loop will overflow the context window unless it summarizes, prunes, and offloads state. This is where context engineering lives inside loop engineering.
  • Termination and escalation logic. Explicit success and failure exits, plus a way to escalate to a human when the loop is stuck rather than burning tokens forever.
  • Error handling that distinguishes recoverable from fatal. A failing test is feedback to act on; a missing credential is a hard stop.

The agent loop cycle — define goal, act, observe feedback, decide, and repeat until a termination condition is met, wrapped by tools, context management, and error handling

Addy Osmani's anatomy adds the structural pieces that make loops composable in practice: automations that trigger runs, worktrees that let parallel agents work without colliding, skills that package reusable capabilities, connectors to external systems, sub-agents that decompose big goals, and external state that persists memory across runs. Most teams adopt these incrementally — a single validation loop first, parallel worktrees later.

The Loop in Pseudocode

Stripped to its essentials, an agent loop is a control loop — closer to a thermostat or a REPL than to a chat. The structure is small enough to write in a dozen lines, and every production loop is a hardened version of this skeleton:

state = init_state(goal)                  # the recursive goal + scratchpad

for step in range(MAX_STEPS):             # hard cap: never loop forever
    thought = model.reason(state)         # ReAct: reason about what to do
    action  = model.choose_action(state)  # ...then choose a tool call
    result  = tools.execute(action)       # touch the real environment
    state   = update(state, thought, action, result)
    state   = compact(state)              # keep context under budget

    if verifier.passes(state):            # deterministic check = reward signal
        return success(state)
    if no_progress(state) or budget.exhausted():
        return escalate_to_human(state)   # stop circling a dead end

return escalate_to_human(state)           # ran out of steps -> hand back

Almost everything interesting in loop engineering is a decision about one of these lines: what counts as verifier.passes, how compact keeps the context window from overflowing, how no_progress is detected, and what tools the agent is actually allowed to call. The model is a fixed black box in the middle; the engineering is the loop around it.

A Loop in Practice

A concrete example makes the abstraction click. Suppose the goal is "get the continuous-integration build green on the payments-refactor branch." A hand-prompted approach means babysitting: run the tests, read the failure, paste it into chat, copy the fix back, run again, repeat for an hour.

The loop-engineered version specifies the cycle once and walks away. The goal has a checkable success condition (CI passes). The agent gets a git worktree of its own so it cannot collide with your work, plus a terminal, the test runner, and the type checker. Then it iterates: read the first failing test, locate the cause, apply a patch, re-run the tests, and read the new output. If the suite is still red, it reasons over the fresh failure and tries again; if it goes green, it runs the full suite and the linter, then opens a draft pull request and stops. Crucially, it keeps an external log of what it has already tried, so it does not loop on the same dead end — and it escalates to you after, say, three failed attempts on the same test rather than burning tokens indefinitely.

You wake up to a draft PR and a readable trail of what it changed and why. Nothing here is magic; every piece is a deliberate design decision about goals, tools, memory, and stopping conditions. That deliberate design is the "engineering" in loop engineering.

Loop Patterns: From ReAct to Evaluator-Optimizer

Loop engineering did not invent the agent loop; it productized a line of research patterns that have been accumulating since 2022. Knowing the patterns helps you pick the right one instead of reinventing it.

Four loop patterns compared — ReAct, Reflexion, evaluator-optimizer, and orchestrator-workers — and when to reach for each

  • ReAct (Reason + Act). The base pattern, from Yao et al. (2022): interleave a reasoning step with an action step so the model observes the result before its next move. Every modern loop is a descendant of ReAct — the pseudocode above is ReAct with guards.
  • Reflexion. Shinn et al. (2023) added memory and self-critique. A Reflexion agent runs three roles — an Actor that acts, an Evaluator that scores the trajectory, and a Self-Reflection step that writes a verbal lesson ("the patch failed because the import was wrong") into an episodic memory buffer that future attempts read back. It is why a well-built loop can get better within a single session without any model retraining.
  • Plan-and-Execute. Split a planner that decomposes the goal into ordered steps from an executor that runs them. Separating planning from doing reduces drift on long, multi-stage tasks.
  • Evaluator-Optimizer. From Anthropic's Building Effective Agents: one model generates a candidate, a second evaluates it against criteria and returns feedback, and the two cycle until the evaluation passes. It shines when you have clear, articulable acceptance criteria.
  • Orchestrator-Workers. A central orchestrator dynamically breaks a task into subtasks, delegates each to a worker sub-agent — each with its own fresh context window — and synthesizes the results. This is Addy Osmani's "sub-agents" and "worktrees" formalized, and it is how parallel, overnight agent fleets are built.

The practical advice from the teams running these in production is consistent: prefer the simplest pattern that works, and compose patterns rather than reaching for a heavy framework. A single ReAct loop with a deterministic verifier beats an elaborate multi-agent system you cannot debug.

The Three Hard Parts: Context, Termination, and Verification

If loop engineering has a core curriculum, it is these three problems. Get them right and a loop runs for an hour unattended; get them wrong and it overflows, spins, or lies.

Context management. The context window is the agent's working memory — effectively its RAM — and it has a hard size limit. In a long loop, every step appends thoughts, tool outputs, and errors, so the window fills up and the model starts to suffer "context rot": as the transcript grows, it attends less reliably to what actually matters. The countermeasures are pure context engineering operating inside the loop — compacting old steps into summaries, pruning stale tool output, externalizing state to files or a scratchpad that the agent reads back on demand, and isolating sub-agents so a subtask runs in a clean window and returns only its conclusion.

Termination and no-progress detection. The signature bug of a naive loop is that it never stops. Robust loops carry several independent exits: a verifier that confirms the goal is met, a hard cap on iterations, a token or wall-clock budget, and — the subtle one — no-progress detection. If the last few steps produced the same error or left the state unchanged, the loop should break and escalate rather than burn budget circling a dead end. Termination is not an afterthought; it is half the design.

Verification as the reward signal. A loop is only as good as the feedback it acts on, so the feedback has to be trustworthy. The gold standard is deterministic verification — tests, type checkers, compilers, linters — because they return an objective pass/fail the model cannot argue its way around. LLM-as-judge verification (a second model grades the output) is more flexible and necessary for things that cannot be mechanically checked, but it can be gamed or can collude with the actor. The strongest loops put a deterministic check in the cycle wherever one exists, and reserve model judgment for the genuinely unquantifiable.

Loop Failure Modes

Most loop disasters are one of a small set of recurring failures. Designing against them is most of what "engineering" means here.

  • Context overflow and rot. The window fills and quality silently degrades. Fix: compaction, pruning, sub-agent isolation.
  • No-progress loops. The agent repeats the same failing action forever. Fix: no-progress detection plus a hard step cap.
  • Objective misspecification (reward hacking). The loop optimizes a checkable proxy that is not the real goal — the classic being an agent that deletes the failing test to turn CI green. Fix: termination criteria that capture intent, plus a human gate on risky actions.
  • Hallucinated success. The agent reports "done" without real verification. Fix: trust a deterministic verifier, never the agent's self-report.
  • Compounding errors. Because each step consumes prior outputs, an early mistake snowballs across the trajectory. Fix: verify early and often, not just at the end.
  • Cost blowup. Long loops quietly burn tokens. Fix: budget guards and prompt caching to make repeated context cheap.

Why Loop Engineering Matters

The significance is not the buzzword; it is where the leverage moves.

First, the bottleneck shifts from authorship to orchestration. When the model can write the code, the scarce skill is designing the cycle that keeps it correct and pointed at the goal. That is a systems-engineering skill, not a copywriting one — which is exactly why people frame it as engineering.

Second, reliability becomes a design property, not a hope. A one-shot prompt either works or it doesn't. A loop that validates after every action turns a flaky generator into a system that converges. This is the same logic that makes verification-first workflows valuable everywhere AI produces high-stakes output.

Third, work becomes parallel and asynchronous. Once an agent runs in a loop with its own worktree, you can run several at once and review results later — the "agents that run while you sleep" promise. The economics change: throughput is no longer bounded by how fast a human can prompt.

What Loop Engineering Is Not

A balanced view matters, because hype outran reality in the first weeks. Loop engineering does not mean every developer should be building autonomous agent fleets tomorrow. As one widely shared rebuttal put it, most developers do not need agent loops yet — for many tasks, an interactive session with a good agent is faster and safer than engineering a full loop.

Nor does a loop remove the human from the loop. You still own the goal, the definition of "done," and the judgment about whether the output is actually correct. A loop that optimizes a badly specified objective will pursue the wrong thing with great efficiency. And without genuine verification, a fast loop simply produces wrong answers faster. The discipline is to keep a real check — tests, types, a human gate — inside every cycle.

What Loop Engineering Means for AI Slide Generation

Loop engineering is usually discussed in the context of coding agents, but its core idea — define a goal, act, verify against real feedback, repeat until done — is exactly how reliable document-to-deck generation has to work, and it is why naive one-shot slide tools disappoint.

Turning a dense source document into a presentation is not a single generation step. It is a loop: parse the document, draft an outline, render slides, check each slide against the source, catch the claims that drifted or the numbers that don't reconcile, and revise. A tool that "writes slides from a prompt" in one shot is the chat-box era of slide-making; a tool that runs a generate-verify-refine cycle over your actual source is the loop-engineering era. This is precisely the architecture behind zero-hallucination AI slides and the multi-agent approach to professional slide generation, where a verification step is built into the pipeline rather than left to the reader to catch.

Tosea.ai is built as exactly this kind of loop: a document-to-PPT orchestration layer that parses your PDF or report, structures it, generates slides, and checks them back against the source so the deck stays anchored to evidence — the same source-first discipline we describe in hallucination-free document-to-PPT conversion. For the broader picture of how agentic loops are reshaping slide work, see how AI agents are redefining professional slides. The lesson loop engineering teaches coding agents applies just as cleanly to presentation workflows: the value is not in a single clever output, but in a cycle that keeps the output correct.

Frequently Asked Questions

What is loop engineering in simple terms?

It is designing the system that runs an AI agent in a repeating cycle — act, observe, decide, repeat — instead of prompting the agent by hand each step. You define a goal and a stopping condition; the loop does the iterating.

Who coined the term loop engineering?

It crystallized in June 2026: Peter Steinberger argued the real skill had shifted from prompting agents to designing their loops, and Google's Addy Osmani named and structured the practice in an essay titled "Loop Engineering" the next day.

How is loop engineering different from prompt and context engineering?

Prompt engineering is about the words you send; context engineering is about all the information the model sees; harness engineering is about the environment the agent runs in; loop engineering is about the iterative cycle that drives the agent toward a goal. Each layer wraps the previous one rather than replacing it.

Do I need loop engineering for my work?

Not always. For one-off tasks, an interactive session with a capable agent is often faster. Loop engineering pays off when work is repetitive, long-running, or benefits from running unattended — and when you can define a goal with a checkable success condition.

Is loop engineering only for coding agents?

No. The pattern — goal, action, verification, repetition — applies to any agentic workflow that can be checked against real feedback, including research, data processing, and document-to-deck generation.

What is the difference between ReAct and Reflexion?

ReAct (2022) is the base loop: reason, then act, then observe the result before the next step. Reflexion (2023) adds memory and self-critique on top — after a failed attempt, the agent writes a verbal lesson into an episodic memory buffer that later attempts read, so it improves across trials within a session without retraining. In short, ReAct is the single-pass loop; Reflexion is a loop that learns from its own failures.

How do you stop an agent loop from running forever?

With layered exits: a verifier that confirms the goal is met, a hard maximum-iteration cap, a token or time budget, and no-progress detection that breaks the loop when recent steps stop changing the state. A loop without explicit termination logic is the single most common — and most expensive — mistake.

Sources

Continue Reading

All Insights