Vibe coding, where you prompt an agent, wait, review the result, and prompt again by hand, is the beginner level. The people getting the most out of AI coding wrap the agent in a system. They teach it their conventions once with a rules file, package anything they do twice into a skill, trigger agents on events with automations, and let agents run in a loop until a goal is met. They run many agents in parallel using cloud environments and git worktrees, and they pick a different model for each step. This guide walks through that setup, and every tool claim here is checked against the official docs, because the space moves fast and a lot of advice is already out of date.
- Rules file: teach the agent your conventions once (AGENTS.md, or CLAUDE.md for Claude Code).
- Skills: turn anything you do more than once into a reusable /command the agent can also invoke on its own.
- Automations: run an agent automatically on a trigger, like a pull request opening or a nightly schedule.
- Loops: let an agent repeat an action until a goal is met, then stop.
- Cloud agents and worktrees: run many agents in parallel without them stepping on each other.
- Multi-model: plan with a strong model, build with a fast one, review with a third.
- The flywheel underneath it all: full tests, current docs, and complete logs.
There's a moment most people hit a few weeks into using an AI coding agent. You realize you're babysitting it. You type a prompt, you wait, you read what it did, you fix it, you prompt again. It's faster than typing every line yourself, but you're still the bottleneck, sitting there feeding the machine.
The people getting real mileage out of these tools stopped doing that. They built a system around the agent so the repetitive parts happen without them. I've been setting this up across the tools I use, and I want to lay out the whole thing in plain terms. I also fact-checked the popular advice against the official docs while writing this, because a surprising amount of it is wrong or already stale.
The levels of AI coding
The jump is from doing the work to designing the system that does the work.
Think of it as a ladder. At the bottom, you prompt and wait. One step up, you write down your preferences so you stop repeating yourself. Above that, you package your common tasks so you stop copy-pasting prompts. Higher still, agents start running on their own, on triggers and in loops, and you move from operator to designer. You are no longer writing the code or even most of the prompts. You are setting up the system that writes the code.
None of this requires a special tool. It is a set of habits supported by features that almost every serious agent now has. Let me go through them in order, because they build on each other.
1. Teach the agent your conventions once
A rules file is the difference between explaining yourself every session and never again.
The first thing to set up is a rules file. This is where you tell the agent how you work: your commit style, your deploy process, your coding preferences, even the tone you want it to reply in. You write it once and the agent reads it every session, so you stop re-explaining yourself.
Most tools have standardized on a file called AGENTS.md in your project root. Both Cursor and OpenAI Codex read it. Codex even builds a chain of these files, from a global one in your home directory down to the folder you are working in, with the closest file winning. There is one real exception worth knowing: Claude Code does not read AGENTS.md. It reads its own CLAUDE.md file, and the documented way to reuse an existing AGENTS.md is to import it from CLAUDE.md or symlink the two together.
A common mix-up, corrected
You will hear that Cursor's "Rules" feature is just a way of writing to AGENTS.md. That is not right. Cursor's structured Rules write .mdc files into a .cursor/rules folder, with frontmatter that controls when each rule applies. AGENTS.md is the separate, plain-markdown option for people who do not want structured rules. Both work, but they are two different mechanisms, not the same one.
Start simple. Write down the personality you want, how you like commits and pull requests structured, and the one or two things the agent always gets wrong about your codebase. You can grow it from there.
2. Turn anything you do twice into a skill
If you have done it more than once, it should have been a skill.
A skill is a packaged task. Instead of copy-pasting the same long prompt over and over, you save it once and invoke it with a slash command. In Claude Code a skill is a folder with a SKILL.md file that holds instructions, plus any scripts or reference files it needs. You can call it directly by typing /skill-name, and the agent can also load it on its own when your request matches the skill's description. That second part matters: well-written skills get used without you remembering to ask for them.
The best uses of skills are the boring, repeated ones. A house writing style. The way you like issues filed. How to run your tests, or a specific slice of them. How to call an internal API so the agent does not have to rediscover the endpoints every time. And quality gates, like "run all tests and do not open a pull request unless they pass."
You do not have to write them all yourself. There are large public skill libraries you can install in a couple of minutes. The one most people point to is addyosmani/agent-skills, which has around 61,000 stars and covers the full cycle from defining an idea to planning, building, verifying, reviewing, and shipping. Grab the repo URL, tell your agent to install it, and the skills show up as commands.
3. Trigger agents on events
An automation is an agent that starts itself when something happens.
Once your agent knows your conventions and has skills, you can stop kicking it off by hand. An automation pairs a trigger with a prompt. The trigger fires, the agent runs. It's a first-class feature now, not a hack.
Cursor calls it Automations. You pick a trigger, such as a GitHub pull request opening, write the instructions, optionally attach tools and memories, and it runs in the background. Codex has the same idea, including scheduled runs on a cron expression, set up either by describing what you want in plain language or by filling in the fields. Claude Code calls its version Routines, which can run on a schedule, on a webhook, or on GitHub events.
A clean first automation: when a pull request opens, have an agent run the test suite, and if anything fails, fix it and push the change back to the branch. You wake up to green checks instead of a list of broken builds. Automated pull-request review tools fit the same slot, reviewing each new pull request and leaving notes, but you can get a lot of value from automations you build yourself before you pay for anything.
4. Let an agent run until a goal is met
A loop has a trigger, an action it repeats, and a goal that ends it.
Not sure which AI model to use?
12 models · Personalized picks · 60 seconds
A loop is an agent that keeps going until it hits a target. Three parts: something to start it, an action it repeats, and a stopping condition so it does not run forever. The clearest examples are the ones people actually run every night.
- Overnight docs sweep. Each night, review the day's changes, find gaps in the docs, update them, and open a pull request. Your README and internal docs stay current without anyone remembering to do it.
- Page-load loop. Load every page and view in the app, and for anything slower than your target, optimize the queries until it is fast. Then continue until everything clears the bar.
- Production-error sweep. Each night, read the production logs, find errors, work out the cause, write a fix, and open a pull request. You wake up to fixes for bugs you had not even seen yet.
Notice what makes those work: full logs, real tests, and current docs. That is the flywheel under the whole setup. With complete test coverage, you can have an agent keep coverage at full. With complete logging, you can task an agent with fixing any error that appears. The boring infrastructure is what lets the agents run unattended. If you want ready-made loops to start from, the Forward Future Loop Library is a free, growing collection.
One honest warning about loops
A loop that runs for hours is a loop that spends tokens for hours. People run the page-load loop overnight and it does make the app fast, but it is not free. Set a stopping condition you actually mean, watch the first few runs, and keep an eye on the bill. "Run until perfect" can get expensive fast.
5. Cloud agents, local agents, and worktrees
The trick to running many agents at once is keeping them out of each other's way.
When you want to run several agents in parallel, two problems show up: your laptop slows to a crawl, and agents writing to the same files get confused and corrupt each other's work. Cloud agents and worktrees solve those two problems.
A cloud agent runs in an isolated environment in a data center instead of on your machine. That means near-unlimited parallelism, access from anywhere including a phone, and clean isolation so parallel agents do not collide. Cursor, Codex, and Claude Code all offer this now, and Claude Code's web version keeps running after you close the browser and can be steered from the mobile app. The trade-offs are real though. Local agents are faster because the environment is already warm, you get a stronger sense of control watching files change on your own machine, and the newest features usually land locally first and reach the cloud later. One setup note people miss: a cloud agent needs the same environment your local one has, so give it your environment variables and secrets, or it will run half-blind.
A git worktree is a second working copy of your repo, linked to the same history. Spin up one worktree per agent and each one can edit the same files freely. You resolve any conflicts later, at merge time, instead of in real time while four agents fight over one file. Cursor and Codex both create and manage worktrees from their interfaces, and Claude Code documents the same pattern. There is a little latency cost to setting them up, but for parallel work it is worth it almost every time.
Where the main agents stand
| Cursor | OpenAI Codex | Claude Code | |
|---|---|---|---|
| Instructions file | AGENTS.md (Rules write .cursor/rules) | AGENTS.md | CLAUDE.md |
| Own coding model | Composer 2.5 | GPT-5.5 | Runs on Claude (Opus, Sonnet) |
| Cloud agents | Yes | Yes | Yes (web and mobile) |
| Automations | Yes (event triggers) | Yes (scheduled) | Routines |
| Git worktrees | Yes | Yes | Yes |
Source: Official Cursor, OpenAI Codex, and Claude Code documentation, checked June 2026
6. Use the right model for each step
You don't need the most expensive model for every keystroke.
Tools like Cursor and Factory let you switch between models from different providers. The reason to bother is speed and cost. Not everyone has unlimited tokens, and running the top model for every small task is slow and pricey. So split the work by what each step needs.
Here is the split I use, and you can save it as a skill so the agent picks the model for you. Plan with a strong reasoning model. For that, Claude Opus 4.8 set to max effort is my pick. Planning is the one pass where it is worth spending: it runs once, it has to see around corners and understand the whole codebase, and a better plan saves you a dozen bad edits later. Anthropic's effort control goes up to max, which removes the limits on how hard the model thinks, and the planning step is exactly where that pays off.
Then build with a fast model. Once the plan exists, you do not need a frontier model to type it out. A quick coding model like Cursor's Composer 2.5 or GPT-5.5 is excellent at writing code it has already been told how to write, and it keeps the loop snappy and cheaper. Finally, review with a third model to get an outside opinion on what the first two produced.
Why model availability is now part of the plan
Older guides tell you to plan with Anthropic's Fable. As of June 12, 2026, Fable 5 and Mythos 5 are offline after a US government export-control order, so that advice no longer holds. It is a good reminder that a model you build your workflow around can vanish on short notice. We wrote up the full story in why the US government suspended Fable 5 and Mythos 5. If you are weighing the broader options, our Claude Code vs Cursor breakdown and our pick for the best open-source coding model are the places to start.
What is still broken
One part of this is not solved, and pretending otherwise helps no one.
If you run a dozen agents in parallel and try to land all of their code around the same time, you hit a wall. One agent merges to main and kicks off the build and deploy. The second wants to merge, sees new changes, has to rebase, rerun its tests, and try again. Add a third and a fourth and they start stepping on each other, locking the merge and deploy process, each one restarting every time another gets through. It is slow and it is genuinely frustrating.
This is a real, recognized problem, and it is not fully solved. The honest workaround for now is patience, plus batching: let one agent gather everyone's changes, combine them, and merge and deploy once. It is a recognized enough bottleneck that Cursor announced a new product around it on June 16, 2026, called Origin, described as a git forge built for the agentic era. It is built on technology from Graphite, the code-review company Cursor acquired in December 2025, and it is aimed squarely at reviewing and merging agent-generated code at scale. It is on a waitlist and unproven, so treat it as a sign of where things are heading rather than a fix you can use today.
Two smaller cautions while I am being honest. Loops cost money the whole time they run, so do not set "run forever" and walk away. And cloud agents tend to trail local ones on the newest features, so if something just shipped, you may only have it locally for a while.
If you are not running 20 agents
You don't need the full machine to get most of the benefit.
Most of the value here shows up early, long before you are orchestrating a fleet. If you do nothing else, do these three things this week.
- Write one rules file. An
AGENTS.md(orCLAUDE.mdfor Claude Code) with your conventions and the mistakes your agent keeps making. This alone removes a lot of repetition. - Make two or three skills out of the prompts you paste most often. Installing a public skill library is a fine head start.
- Set up one nightly loop. The documentation sweep is the gentlest place to begin, and it keeps paying off quietly.
The shift that matters is the mental one. Stop thinking of yourself as the person writing the prompts and start thinking of yourself as the person designing the system that runs the prompts. Vibe coding gets you in the door. This is what the next few rooms look like.
Sources
The official documentation behind the claims above.
- Cursor: Automations and Cursor: Worktrees and Cursor: Rules
- OpenAI Codex: AGENTS.md and Codex: Automations
- Claude Code: Memory and CLAUDE.md and Claude Code: Routines
- Anthropic: Effort control for Claude Opus 4.8
- Git: git-worktree documentation
FAQ
What is the difference between vibe coding and the expert AI coding setup?
Vibe coding is the entry level. You prompt the agent, wait, review the result, and prompt again, by hand, every time. The expert setup wraps agents in a system so the repetitive parts run on their own. It uses a rules file to teach the agent your conventions once, skills to package anything you do twice, automations to trigger agents on events, loops to run an agent until a goal is met, cloud agents and worktrees for safe parallelism, and a different model for each step.
Do Cursor, Codex, and Claude Code all use AGENTS.md?
Cursor and OpenAI Codex both read AGENTS.md. Claude Code does not read it natively. It reads its own CLAUDE.md, and the documented way to reuse an AGENTS.md is to import it from CLAUDE.md or symlink the two. One more thing to know: in Cursor, the structured Rules feature writes .mdc files to a .cursor/rules folder. AGENTS.md is the separate plain-markdown alternative, not the same thing as Rules.
Which model should I use for planning versus writing code?
Use a strong reasoning model for the plan and a fast model for the build. For planning, Claude Opus 4.8 set to max effort is a good choice, since the planning pass runs once and is where quality matters most. For writing the code from that plan, a fast coding model such as Cursor Composer 2.5 or GPT-5.5 keeps the loop quick and cheaper. A third model can review the result for a fresh perspective.
What is an agentic coding loop?
A loop is an agent process with three parts: a trigger to start it, an action it repeats, and a goal that stops it. The agent takes an action, checks the result, and repeats until the goal is met. Common examples are a nightly documentation sweep, a loop that optimizes every page until it loads under a target time, and a nightly production-error sweep that reads logs, writes a fix, and opens a pull request.
Keep Reading
Stay ahead of the AI curve
We test new AI tools every week and share honest results. Join our newsletter.


