跪拜 Guibai
← All articles
Frontend · Backend · Artificial Intelligence

The Complete Guide to AI Agents: From Context Engineering to Harness Design

By oil欧哟 ·
Read original on juejin.cn ↗ Google Translate ↗ Alt translation
Why it matters

For Western developers, this guide represents a mature, battle-tested perspective on Agent-driven development from one of the world's most competitive tech ecosystems. It moves past hype to provide a systematic engineering framework—Context Engineering and Harness Design—that is directly applicable to any AI coding tool. The insights on model selection, cost management, and building a reliable Agent environment are critical for anyone moving from casual AI use to production-level, autonomous development.

Summary

This guide, born from over a year of hands-on experience, systematically breaks down how to move beyond simple AI chat and truly leverage Agent products like Codex, Claude Code, and Cursor. It starts by clarifying the fundamental difference between an AI chat product and an Agent—the latter being an autonomous system that can plan, use tools, and execute tasks on your computer.

The guide dives deep into practical engineering: how to choose the right model (GPT-5.5 vs. Claude Opus 4.7 vs. Gemini), manage context to prevent 'context rot,' and structure conversations for maximum efficiency. It covers Codex's specific features like Plan mode, Goal mode, sub-agents, and the /side command for parallel work.

A major focus is on building a robust 'Harness'—the engineering environment that enables an Agent to work reliably over long periods. This includes writing effective AGENTS.md files, creating custom Skills to encode workflows and domain knowledge, and using automation tools like Hooks and scheduled tasks. The guide concludes with practical advice on cost optimization, UI design workflows using AI image generation, and the philosophy of Vibe Coding.

Key takeaways
Agent products (Codex, Claude Code) differ from AI chat (ChatGPT, Doubao) by having autonomous planning and tool-calling abilities, allowing them to directly control the file system, terminal, and browser.
Context is a limited attention budget; effective context engineering means prioritizing information density over volume to prevent 'context rot' and maintain execution quality.
Monthly subscriptions (e.g., ChatGPT Pro) are significantly more cost-effective than API pay-as-you-go for heavy daily Agent use.
Not all tasks require the strongest model; using cheaper models (e.g., Gemini for design, GLM for simple fixes) for routine execution saves costs without sacrificing quality.
Skills are standardized, progressively-loaded operation guides that encode domain knowledge and workflows, making Agents more reliable and specialized.
A 'Harness' is the systematic engineering environment (AGENTS.md, Hooks, MCP connections) designed to let an Agent work autonomously and reliably on long-running tasks.
Using sub-agents with independent contexts isolates exploration work, keeping the main Agent's context lean and focused.
Tools like RTK (Rust Token Killer) can reduce Token consumption by 60-80% by compressing noisy tool outputs before they enter the Agent's context.
Plan mode allows an Agent to analyze and propose a strategy before executing, preventing costly mistakes on large tasks.
The /goal command enables an Agent to autonomously pursue a complex objective, self-verifying progress until completion or encountering a blockage.
Our take

The most significant shift for developers is moving from 'Prompt Engineering' to 'Context Engineering' and finally to 'Harness Engineering'—designing the entire environment for autonomous agent work, not just crafting better inputs.

The real bottleneck in Agent effectiveness is often the developer's own clarity of thought, not the model's capability. A clear, concise goal is more powerful than a long, prescriptive prompt.

The distinction between 'Coding Agents' and 'General Agents' is artificial and temporary; any Agent with file system and command execution access is inherently a general-purpose tool.

Automatic memory features in current Agents are often counterproductive, filling context with irrelevant information. Manual, file-based memory management is more reliable and controllable.

The most valuable Skills are not those that teach an AI what it already knows, but those that encode human know-how—the edge cases, the company-specific conventions, and the hard-won lessons from past failures.

Cost optimization in Agent development is counter-intuitive: using a cheap model that fails repeatedly and burns through Tokens is often more expensive than using a top-tier model that gets it right the first time.

The true power of AI in UI design isn't just generating code, but the two-step workflow: generating a high-quality design image first, then restoring it to code, which leverages the model's superior visual aesthetic.

Harness design is an iterative process; every rule or constraint encodes an assumption about the model's current limitations, which must be re-evaluated as models improve.

Describing the observed behavior or desired outcome is far more effective than prescribing the code change, as it allows the Agent to leverage its own problem-solving capabilities.

The /side command is a critical but underutilized tool for maintaining context hygiene, allowing developers to ask temporary questions without polluting the main task's context.

Concepts & terms
ReAct Loop
The core operating cycle of an AI Agent: Think (analyze the goal and decide the next action), Act (call a tool to perform the action), Observe (read the result of the tool call), and repeat. This loop allows the Agent to autonomously work towards a goal.
Context Engineering
The practice of actively managing the information (context) fed to a large language model to optimize its performance. It focuses on information density, relevance, and preventing 'context rot' where too much noise degrades the model's accuracy.
Harness Engineering
The systematic design of the entire engineering environment (project files, hooks, tools, MCP connections) to enable an AI Agent to work autonomously, reliably, and consistently over long periods and across multiple tasks.
Progressive Disclosure (Skills)
A design pattern for Agent Skills where only the Skill's name and description are loaded into the context by default. The full body and resources are only loaded when the Agent determines the Skill is relevant to the current task, saving context space.
Vibe Coding
A development style where the programmer relies entirely on an AI Agent to write code. The developer describes the desired outcome or behavior, and the Agent autonomously plans, writes, tests, and debugs the code without the developer manually editing files.
Context Rot
The phenomenon where a large language model's accuracy and recall ability degrade as the number of tokens in its context window increases, even before the window is full. The model's attention becomes diluted by the sheer volume of information.
MCP (Model Context Protocol)
An open protocol that standardizes how applications provide context and tools to Large Language Models (LLMs). It allows Agents like Codex to connect to external services (e.g., databases, GitHub) to read data and perform actions.
Worktree
A Git feature that allows you to check out multiple branches of the same repository into separate working directories simultaneously. In Agent development, it's used to isolate parallel tasks so they don't interfere with each other.
TTFT (Time to First Token)
A performance metric for LLMs that measures the delay between sending a prompt and receiving the first token of the response. A lower TTFT means the model starts responding faster, which is crucial for interactive tasks.
TPS (Tokens Per Second)
A performance metric for LLMs that measures the speed at which the model generates output tokens. A higher TPS means faster code generation and text output, improving the perceived speed of the Agent.
Source: juejin.cn ↗ Google Translate ↗ Backup ↗