OpenAI Codex 不再只写代码:它现在能控制你的 Mac、锁屏后继续干活、还能自主运行好几天

OpenAI Codex Is No Longer Just a Code Writer: It Now Controls Your Mac, Works While Locked, and Runs Autonomously for Days

openaicodexai-agentdesktop-agentagentic-codinggpt-5.5

> 📌 TL;DR
> OpenAI 在 2026 年春天用 6 周时间,把 Codex 从一个云端代码沙盒变成了能控制你 Mac 桌面的全能 Agent——它能操作应用、锁屏后继续工作、在手机上远程监控、甚至连续自主运行数天。400 万开发者每周在用,AI 写代码这件事正式进入「桌面接管」时代。

六周三次跃迁:Codex 的疯狂进化

2025 年 5 月,Codex 以 codex-1(基于 o3 微调)的身份回归,定位是云端软件工程 Agent。一年后的 2026 年春天,OpenAI 在六周内连发三次重大更新,彻底改写了 Codex 的定义:

| 日期 | 更新内容 | 意义 |
|------|---------|------|
| 4 月 16 日 | Computer Use + 图像生成 + 记忆系统 | Codex 第一次「看见」你的屏幕 |
| 4 月 20 日 | 多 Agent 并行 + Appshots | 同时跑多个 Agent,互不干扰 |
| 5 月 14 日 | 移动端预览 + 全计划开放 | 手机变成远程遥控器,免费用户也能用 |
| 5 月 21 日 | Locked Use + Goal 模式正式版 | 锁屏后继续工作,连续运行数天 |

这不是渐进式改进,这是品类重定义。Codex 不再是「帮你写代码的工具」,而是「住在你电脑里的数字员工」。

Computer Use:不截图,读 AX Tree

Codex 的桌面控制和竞品最大的不同在于实现方式。它没有走 Anthropic Claude 那种「截屏 → 识别坐标 → 点击」的路线,而是直接读取 macOS 的无障碍层级(Accessibility Tree / AX Tree)——这跟 VoiceOver 读屏幕用的是同一套系统 API。

MacStories 的 Federico Viticci 评价说:「这是我测试过的所有 LLM 和桌面 Agent 中,最好的 Computer Use 实现。」

实际体验中,AX Tree 方案的优势很明显:
- 精度高:直接定位 UI 元素,不靠像素猜测
- 速度快:不需要反复截图和视觉识别
- 稳定性好:不会因为分辨率、主题、深色模式等视觉变化而出错

多个 Agent 可以在后台并行工作,各自操作不同的应用窗口,完全不影响你自己的鼠标键盘操作。

Locked Use:你锁屏了,它还在干活

5 月 21 日推出的 Locked Use 是最让人「又兴奋又紧张」的功能。

安装 Computer Use 插件并授予屏幕录制和辅助功能权限后,Codex 可以在你合上笔记本盖子或锁定屏幕后继续操作。它通过一个 Apple 授权插件临时解锁 Mac,带有严格的时间和行为约束:

能做的:在你允许的应用中点击、输入、导航菜单、操作剪贴板。
不能做的:操作终端应用、操作 Codex 自身(防止自我提权)、管理员认证、修改安全和隐私权限。

这个功能目前在欧洲经济区、英国和瑞士不可用(【2026-05-21】),大概率是因为 GDPR 和数据保护法规的限制。

Goal 模式:从「回答问题」到「追求目标」

如果说 Computer Use 是 Codex 的「手」,那 Goal 模式就是它的「意志」。

2026 年 4 月底,Codex CLI 0.128.0 引入了 /goal 命令。和普通的「一问一答」不同,Goal 模式让 Codex 进入一个持续循环:规划 → 执行 → 测试 → 审查 → 迭代,直到目标完成或 token 预算耗尽。

有开发者用 Goal 模式跑了一个设备驱动项目,连续运行 14 小时没有停下。另一位开发者用它在一个多小时内从零构建了一个完整的射击游戏。

5 月 21 日,Goal 模式正式脱离实验阶段,在 Codex App、IDE 扩展和 CLI 中全面可用。关键设计细节:

- 持久化目标:目标状态可以跨会话保存,中断后能恢复
- 软停机:token 预算耗尽时不会直接中断,而是注入「收尾引导」让 Agent 优雅结束
- 审计逻辑:模型端有内置的完成度判断,而不是简单的循环计数

OpenAI 联合创始人 Greg Brockman 在 X 上总结:「Codex 现在内置了 Ralph Loop++。」(Ralph Loop 是社区发明的一种让 AI Agent 持续运行的循环模式。)

GPT-5.5:不只是更聪明,还更省

Codex 的桌面 Agent 进化背后,是 GPT-5.5(2026 年 4 月 23 日发布)的底层支撑。

几个关键数据:
- Terminal-Bench 2.0 得分 82.7%——这是衡量 Agent 在真实终端环境中完成复杂任务的基准测试
- 1M token 上下文窗口(API 层面),Codex 应用层面 400K
- 同等任务消耗 token 减少约 40%——更聪明的同时更便宜
- 延迟与 GPT-5.4 持平——没有用速度换智能

值得注意的是,从 GPT-5.4 开始,OpenAI 不再维护独立的 Codex 编码模型。GPT-5.3 是最后一个专门的 Codex 模型(2026 年 2 月发布)。这意味着 OpenAI 认为通用模型的编码能力已经足够强,不需要单独的编码专家了。

API 定价:标准层 $5/百万输入 token,$30/百万输出 token(【2026-04-23】)。

400 万开发者和 Airbnb 的 60%

规模数据说明一切:
- 每周 400 万活跃开发者在使用 Codex(OpenAI 官方数据)
- NVIDIA 内部超过 10,000 名员工已经在用 GPT-5.5 驱动的 Codex
- Airbnb 透露,AI 现在编写了其 60% 的新代码

这不再是早期采用者的玩具。当一家估值超千亿美元的公司(Airbnb)把 60% 的代码交给 AI 写,当全球最大的 GPU 公司(NVIDIA)内部万人级别使用同一个工具——我们讨论的已经不是「AI 能不能写代码」,而是「你还在自己写多少代码」。

安全:Chronicle 的双刃剑

Codex 的记忆系统 Chronicle 让它能捕获你的屏幕内容构建「环境记忆」,理解你在做什么、上下文是什么。这显著提升了 Agent 的决策质量,但也引入了一个 OpenAI 自己承认的风险:

提示注入攻击面扩大。如果你浏览了一个包含恶意指令的网页,这些指令可能被写入 Chronicle 记忆库,之后被 Codex 执行。

这不是理论风险。2026 年 5 月的安全研究已经证明,AI Agent 框架中的提示注入可以升级为远程代码执行(Microsoft 安全博客在 5 月 7 日详细记录了 Semantic Kernel 中的两个相关 CVE)。

对于个人开发者,建议:
1. 定期审查 Chronicle 记忆内容
2. 在处理敏感项目时关闭 Computer Use
3. 对 Goal 模式的长时间运行设置合理的 token 预算上限
4. 不要授予 Codex 超出必要的应用权限

竞争格局:桌面 Agent 的三国演义

Codex 不是唯一的桌面 Agent,但目前它的生态最完整:

| 能力 | OpenAI Codex | Anthropic Claude Code | Google Gemini Code |
|------|-------------|----------------------|-------------------|
| 桌面控制 | ✅ AX Tree(高精度) | ✅ 截图+坐标 | ❌ 暂无 |
| 锁屏工作 | ✅ Locked Use | ❌ | ❌ |
| 移动端监控 | ✅ iOS + Android | ❌ | ❌ |
| Goal 持续运行 | ✅ 正式版 | ✅ 类似功能 | ❌ |
| 免费用户可用 | ✅ | ❌ | ✅ 部分 |

Claude Code 在代码质量和推理深度上有自己的优势(特别是 Opus 4 系列),但在桌面集成方面,Codex 目前领先一个身位。Google 的 Gemini Code Assist 则更聚焦在 IDE 内集成,暂时没有进入桌面 Agent 赛道的迹象。

这意味着什么

六周前,Codex 是一个你打开浏览器、输入指令、等结果的工具。现在它是一个住在你系统里的 Agent——有眼睛(屏幕感知)、有手(桌面控制)、有意志(Goal 模式)、有记忆(Chronicle)、有分身(多 Agent 并行),甚至能在你不在的时候自己干活。

对开发者来说,这改变的不只是效率,而是工作模式。你不再是「写代码的人 + 用 AI 辅助」,而是「定义目标的人 + 监督 Agent 执行」。

如果 2025 年的关键词是「AI 辅助编程」,那 2026 年的关键词已经变成了「AI 自主工程」。

> ✨ 金句
> Codex 用六周时间完成了一个品类跃迁:从「帮你写代码」到「替你操电脑」。当 400 万开发者每周都在把工作交给一个能锁屏后继续干活的 Agent,软件开发的定义本身正在被重写。


> 📌 TL;DR
> In just six weeks this spring, OpenAI transformed Codex from a cloud code sandbox into a full desktop agent that controls your Mac apps, works while your screen is locked, streams to your phone, and runs autonomously for days. With 4 million weekly developers and GPT-5.5 under the hood, AI coding has entered the "desktop takeover" era.

Six Weeks, Three Leaps: Codex's Rapid Evolution

In May 2025, Codex returned as codex-1 (fine-tuned from o3), positioned as a cloud-based software engineering agent. One year later, in spring 2026, OpenAI shipped three major updates in six weeks that completely redefined what Codex is:

| Date | Update | Significance |
|------|--------|-------------|
| April 16 | Computer Use + Image Generation + Memory | Codex "sees" your screen for the first time |
| April 20 | Multi-agent parallel + Appshots | Run multiple agents simultaneously, no interference |
| May 14 | Mobile preview + All plans access | Phone becomes remote control; free users included |
| May 21 | Locked Use + Goal mode GA | Works after screen lock; runs for days |

This isn't incremental improvement — it's category redefinition. Codex is no longer "a tool that helps you write code." It's "a digital worker living inside your computer."

Computer Use: Reading the AX Tree, Not Screenshots

What sets Codex's desktop control apart from competitors is its implementation. Instead of the screenshot-then-click approach used by Anthropic's Claude, Codex reads macOS's Accessibility Tree (AX Tree) directly — the same system API that VoiceOver uses for screen reading.

Federico Viticci of MacStories called it "the best computer use feature I have ever tested in any LLM or desktop agent."

The AX Tree approach has clear advantages in practice:
- High precision: Direct UI element targeting, no pixel guessing
- Fast: No repeated screenshot capture and visual recognition needed
- Stable: Unaffected by resolution, theme, or dark mode changes

Multiple agents can work in the background simultaneously, each operating different app windows without interfering with your own mouse and keyboard input.

Locked Use: It Keeps Working After You Walk Away

The most exciting — and nerve-wracking — feature dropped on May 21: Locked Use.

After installing the Computer Use plugin and granting Screen Recording and Accessibility permissions, Codex can continue operating after you close your laptop lid or lock the screen. It uses an Apple authorization plugin to temporarily unlock the Mac with strict temporal and behavioral safeguards:

Can do: Click, type, navigate menus, and operate the clipboard in apps you explicitly allow.
Cannot do: Operate terminal apps, operate Codex itself (preventing self-escalation), authenticate as admin, or modify security/privacy permissions.

This feature is currently unavailable in the European Economic Area, UK, and Switzerland (as of May 21, 2026), likely due to GDPR and data protection constraints.

Goal Mode: From "Answer Questions" to "Pursue Objectives"

If Computer Use gives Codex hands, Goal mode gives it willpower.

In late April 2026, Codex CLI 0.128.0 introduced the /goal command. Unlike standard prompt-response interactions, Goal mode puts Codex into a continuous loop: plan → execute → test → review → iterate, running until the objective is met or the token budget is exhausted.

One developer ran a device driver project in Goal mode for 14 hours straight without stopping. Another built a complete extraction shooter game from scratch in just over an hour.

On May 21, Goal mode graduated from experimental to generally available across the Codex App, IDE extension, and CLI. Key design details:

- Persistent goals: Goal state survives across sessions and can resume after interruptions
- Soft stops: When token budget runs out, the system injects wrap-up steering for graceful conclusion rather than hard-cutting
- Audit logic: Model-side completion assessment, not simple loop counting

OpenAI co-founder Greg Brockman summarized on X: "Codex now has a built-in Ralph Loop++." (The Ralph Loop is a community-invented pattern for keeping AI agents running persistently.)

GPT-5.5: Smarter and Cheaper

Behind Codex's desktop agent evolution is GPT-5.5 (released April 23, 2026).

Key numbers:
- 82.7% on Terminal-Bench 2.0 — measuring agent performance on complex tasks in real terminal environments
- 1M token context window at the API level (400K in the Codex app)
- ~40% fewer tokens for equivalent tasks — smarter and more cost-efficient
- Latency matching GPT-5.4 — no speed-for-intelligence tradeoff

Notably, starting with GPT-5.4, OpenAI discontinued the standalone Codex coding model. GPT-5.3 (released February 2026) was the last dedicated Codex model. This signals that OpenAI believes general-purpose models are now strong enough at coding without needing a specialist.

API pricing: Standard tier $5/million input tokens, $30/million output tokens (as of April 23, 2026).

4 Million Developers and Airbnb's 60%

The scale numbers speak for themselves:
- 4 million weekly active developers using Codex (OpenAI's official figure)
- Over 10,000 NVIDIA employees already using GPT-5.5-powered Codex internally
- Airbnb revealed that AI now writes 60% of its new code

This is no longer an early-adopter toy. When a company valued at over $100 billion (Airbnb) hands 60% of its code to AI, and the world's largest GPU company (NVIDIA) has 10,000+ employees on the same tool — the question isn't "can AI write code" anymore. It's "how much code are you still writing yourself?"

Security: Chronicle's Double-Edged Sword

Codex's memory system, Chronicle, captures screen content to build "ambient memory" — understanding what you're doing and the context around it. This significantly improves agent decision quality but introduces a risk that OpenAI itself acknowledges:

Expanded prompt injection attack surface. If you browse a webpage containing malicious instructions, those instructions could be written into Chronicle's memory store and later executed by Codex.

This isn't theoretical. Security research in May 2026 has demonstrated that prompt injection in AI agent frameworks can escalate to remote code execution (Microsoft's security blog documented two related CVEs in Semantic Kernel on May 7).

Recommendations for individual developers:
1. Regularly audit Chronicle memory contents
2. Disable Computer Use when working on sensitive projects
3. Set reasonable token budget caps for long Goal mode runs
4. Don't grant Codex more app permissions than necessary

Competitive Landscape: Three-Way Battle for Desktop Agents

Codex isn't the only desktop agent, but its ecosystem is currently the most complete:

| Capability | OpenAI Codex | Anthropic Claude Code | Google Gemini Code |
|-----------|-------------|----------------------|-------------------|
| Desktop control | ✅ AX Tree (high precision) | ✅ Screenshots + coordinates | ❌ Not yet |
| Locked screen work | ✅ Locked Use | ❌ | ❌ |
| Mobile monitoring | ✅ iOS + Android | ❌ | ❌ |
| Persistent Goal runs | ✅ GA | ✅ Similar feature | ❌ |
| Free tier access | ✅ | ❌ | ✅ Partial |

Claude Code has its own strengths in code quality and reasoning depth (especially with the Opus 4 series), but in desktop integration, Codex currently leads by a full generation. Google's Gemini Code Assist remains more focused on in-IDE integration, with no signs of entering the desktop agent space yet.

What This Means

Six weeks ago, Codex was a tool you opened in your browser, typed commands into, and waited for results. Now it's an agent living in your system — with eyes (screen perception), hands (desktop control), willpower (Goal mode), memory (Chronicle), clones (multi-agent parallel), and the ability to work while you're away.

For developers, this changes not just efficiency, but the work model itself. You're no longer "the person writing code + using AI to assist." You're "the person defining objectives + supervising agent execution."

If the keyword of 2025 was "AI-assisted programming," the keyword of 2026 has already become "AI autonomous engineering."

> ✨ Key Takeaway
> Codex completed a category leap in six weeks: from "helping you write code" to "operating your computer for you." When 4 million developers weekly are delegating work to an agent that keeps working after you lock your screen, the very definition of software development is being rewritten.