Codex experience — SiteBloom

The Codex app is excellent. It allows parallel work far better than the terminal app. I think the terminal tools are phase that is now over. Codex allows so much parallel work that having a mental model of the code becomes even more of a problem.

As the agents are doing more work, part of our work is giving them the environment to succeed. I spent time setting up the web and worktree environments such that the agent has access to all the libraries it needs. I updated my AGENTS.md to ensure the agent ran quality checks of its work (linting, testing, interacting with the UI). I specified that it should write code such that it can easily test UIs with Playwright selectors.

As others have said, the agent having a verification loop makes a big difference. And once custom containers are released the quality testing loop will get even better. This setup is measly compared to what can be done. I don’t yet have a software factory with digital twins of every service for testing scenarios. But nor do I have that budget.

Containers coming soon

I prefer agents running in the cloud, letting them run riot without having to check permissions or risk my machine. People have set up their own VMs with Tailscale to access them. One of the Tailscale founders just launched exe.dev, partly around having a safe place for agents to work. I like the look of it for making quick agent projects easily deployable but I haven’t used it.

Because as with the AI orchestration tools, I just don’t quite see the value of the time or price given that $20 a month to OpenAI gets me almost everything I need.

The codex CLI is open source. They’ve released App Server such that you can build tools on top of codex. That may result in better product experiences than OpenAI’s own tools. But unfortunately, the app server can only hook into the local codex threads but not the cloud.

Now, I think agents in cloud containers are the future but I had some annoyances with Codex in the present:

Setting up local, worktree, and web containers separately is annoying. They’re all slightly different. I need to read three versions of documentation. It should be one container, works everywhere.
The containers are slow to spin up. But given that OpenAI pays and I don’t that’s fair.
Bugs like the web agent not finding skills in the folder the docs tell you to put them in.
Setting up the UI verification was finicky.
Getting the agent to provide images of the UI it has made would be great. This seemed to only happen randomly for me. I didn’t want to have to deploy to a preview URL each time.

exe.dev would solve all of those for just $20 a month. But I’ll wait a little longer before tool hopping, Codex was only released a few days ago.