根据 Anthropic 的 Context Engineering 研究,在 2025 年,真正重要的不是“prompt engineering”,而是“context engineering”。问题不再是“如何打造完美的 prompt”,而是“哪种 context 组合能引发期望的行为”。
我会带你走一遍当前研究(Anthropic、OpenAI、Google、Wharton)对 AI Agent prompting 的结论——以及如何在 n8n 工作流中具体落地。
你将学到:
我经常见到:有人在 Reddit 上找到一个“完美”的 prompt 模板,复制到 AI Agent Node 里,然后期待它“魔法生效”。
剧透:不会。
复制模板失败的原因:
Anthropic 的 Prompt Engineering 指南强调“找到合适的高度”(right altitude)——对指导足够具体,但又为推理留有空间。模板对你的特定场景来说,几乎总是在错误的高度。
第二个问题:Prompt 过于复杂
“越多越好”的思路会带来巨大问题:
解决方案:与模型一起生成 prompt
真正的游戏规则改变者是:让模型为你写 prompt。
不要花数小时打磨 prompt,而是给模型:
模型会生成为你场景优化的 prompt。你测试、与模型迭代、再细化。
为什么有效:模型最了解自己的“偏好”。它知道哪种表述、结构、示例最有效。
稍后我会展示具体做法。
n8n 的 AI Agent prompting 中最基础也最常见的错误:混淆 System Message 和 User Prompt。
在 AI Agent Node 中,有两个不同的 prompt 区域:
System Message(Options → System Message):
User Prompt(主输入):
为什么重要:Token 经济学与 Prompt Caching
两者都会随每次 API 调用发送。但正确分离对成本和性能都至关重要:
错误做法(把一切都塞进 User Prompt):
"You are Senior Support Engineer. Tools: search_docs, create_ticket.
Use search_docs first. Max 150 words. Friendly.
User question: {{$json.message}}"
若每天 1,000 次请求、每次 400 tokens: = 400,000 个冗余 tokens = 以 Claude Sonnet($3/M)计:$1.20/天 = 每月 $36 的纯冗余 context
正确做法:
System Message(只定义一次):
YouareSenior Support Engineer.
TOOLS:
-search_docs(query):SearchProduct Docs
-create_ticket(title, priority):CreateSupport Ticket
WORKFLOW:
1.FAQ → search_docs
2.Complex Issue → create_ticket
BEHAVIOR:
-Max150words
-Whenuncertain:Createticket, don't guess
User Prompt 仅为:{{$json.message}}
= 每次 50 tokens 而非 400 = 节省:每天 350K tokens = 每月约 $31.50(以 Claude Sonnet 计)
Prompt Caching:为什么 System Message 应尽量保持静态
Anthropic 和 OpenAI 提供 Prompt Caching——System Message 会被缓存,不必每次都重新处理。可将延迟降低 50–80%,对已缓存的 tokens 成本最高可降至 10%。
但:一旦你更改 System Message,缓存就会失效。因此:
缓存影响示例:
无缓存: 请求 1:500 tokens 的 System Message = $0.0015 请求 2:500 tokens 的 System Message = $0.0015 请求 1000:500 tokens 的 System Message = $0.0015 总计:1,000 次请求 $1.50
有缓存(System Message 保持稳定): 请求 1:500 tokens 的 System Message = $0.0015(写入缓存) 请求 2:500 tokens 缓存命中 = $0.00015(便宜 90%) 请求 1000:500 tokens 缓存命中 = $0.00015 总计:~$0.15/1000 次请求 = 90% 节省
Dynamic System Messages:强大但要谨慎
你可以用 n8n Expressions 让 System Message 动态化——但要注意缓存:
YouareSupportEngineerfor{{$('Get Config').item.json.company_name}}.
PRODUCT: {{$('Get Config').item.json.product_description}}
TONE: {{$('Get Config').item.json.support_tone}}
适用场景:多租户系统——一个工作流,多个客户配置。
工作流:Webhook(Customer ID) → DB Lookup → AI Agent(动态 System Message) → Response
缓存权衡:动态 System Message 会破坏缓存——仅在必要时使用。
来自Anthropic、OpenAI、Google 在 2024–2025 的研究显示:有一些对所有模型都有效的基本技巧。以下五条最重要:
Anthropic 的 Prompt Engineering 指南称之为“找到合适的高度”(right altitude)——既足够具体以提供指导,又为推理保留灵活性。
“同事测试”:如果一个聪明的同事看不懂这条指令,AI 也不会懂。
反例:
Classifyemails intelligently and accurately.
“intelligently” 是什么?有哪些类别?输出格式是怎样?
正例:
Classify emailsinto: sales, support, billing, general
URGENCY CRITERIA:
- high: contains"urgent","asap","immediately","broken"
- medium: time-related request without extremity
- low: everythingelse
OUTPUT: JSON
{
"category":"support",
"urgency":"high",
"confidence":0.92
}
为何有效:
Bsharat 等(2024)研究显示,正向指令明显优于负向指令。将“不要做 X”改为“请做 Y”,平均带来 57% 的质量提升。
负向指令为何失效:
负向反例:
Don'tbe too wordy.
Don'tusetechnical jargon.
Don'tmake assumptions about customer intent.
正向改写:
Keepresponsesunder150words.
Useplainlanguagethatanon-technicalcustomerunderstands.
Whencustomer intent is unclear, ask clarifying questions.
实际影响:
在生产环境的邮件分类 agent 中,负向指令(“不要误判紧急请求”)造成 31% 的漏判。正向改写(“凡含时间限制的请求一律标记为 urgent”)将漏判降至 8%。
Few-shot 示例非常有效——但大多数人用错了。
研究共识:
糟糕的 few-shot(过于相似):
EXAMPLES:
1."How do I reset my password?"→category:support,urgency:low
2."Where is the password reset option?"→category:support,urgency:low
3."I can't find password settings."→category:support,urgency:low
全是同一种问题。模型学不到边界处理。
良好的 few-shot(多样且含 edge cases):
Example 1 (Standard):
Input: "How do I reset my password?"
Output: {"category": "support", "urgency": "low", "confidence": 0.95}
Example 2 (Urgent):
Input: "URGENT: System down, can't access customer data!"
Output: {"category": "support", "urgency": "high", "confidence": 0.98}
Example 3 (Mixed Intent):
Input: "I want to upgrade my plan but also report a billing error."
Output: {"category": "billing", "urgency": "medium", "confidence": 0.78, "note": "Multiple intents detected"}
Example 4 (Edge Case - Unclear):
Input: "help"
Output: {"category": "general", "urgency": "low", "confidence": 0.45, "action": "request_clarification"}
为何有效:
AI agents 的大问题之一:hallucination(幻觉)。找不到答案时它们会编造。
解决方案:显式约束,将 agent “落地”。
糟糕做法(无约束):
Answer customer support questions basedonour documentation.
后果:找不到信息时 agent 会胡编。
良好做法(显式约束):
Answer customer support questionsusingONLY informationfromthe documentation you can access via search_docs tool.
CONSTRAINTS:
-Ifinformationisnotindocs:"I don't have that information in our current documentation. I'll create a ticket for our team to help you."
- Never make assumptions about featuresorfunctionality
- Never provide workarounds that aren't documented
-Ifmultiple solutions exist: Present all documented options
ESCALATION CRITERIA:
- Customer mentions"urgent","broken","down"→ create ticket immediately
- Question requires account-specific data → create ticketwithdetails
- Documentationisincomplete/contradictory → create ticket noting the issue
为何有效:
生产影响:
在每月处理 2000+ 询问的客服 agent 中,加入约束将幻觉率从 23% 降至 3%。升级的人工工单质量显著提升,因为工单会包含具体的文档缺口信息。
Anthropic 的研究很明确:不是“更多 context”,而是“正确的 context”。
原则:Smallest High-Signal Token Set
糟糕的 context(冗长、重复):
You are a helpful AI assistant designed tohelpcustomerswiththeir questionsandconcerns. You should always be polite, professional,andcourteousinyour responses. Make sure to read the customer's question carefully and provide a thorough and complete answer that addresses all of their concerns. If you'renotsure about something, it's better to say you don't know than to provide incorrect information...
350 个 token 的空话,几乎没有可执行指导。
良好的 context(密度高、具体):
YouareSupport Agent.
RESPONSE REQUIREMENTS:
-Max150words
-Plainlanguage(non-technical)
-Structure: Problem acknowledgment → Solution → Next steps
TOOLS:
-search_docs(query) →searchproduct documentation
-create_ticket(title, priority, details) → escalatetohuman team
WORKFLOW:
1.Searchdocsforrelevant information
2.If found: Provide answerwithdoc reference
3.IfnotfoundORcustomer mentions "urgent"/"broken":Createticket
110 个 token,信号密度很高。每行都有可执行信息。
Token 审计:
对 prompt 中每个句子问一句:“删掉它,agent 会变差吗?”如果不会,就删。
核心技巧适用于所有场景。下面这些高级模式非常强,但要“对症下药”。
沃顿商学院 2025 年 6 月的研究给出了迄今最全面的分析:CoT 对复杂推理有帮助,但对简单任务效果参差。
何时使用 CoT:
不该用 CoT 的场景:
在 n8n 中的实现:
TASK:Analyze customer requestanddetermine best resolution path.
REASONING PROCESS (thinkstep-by-step):
1. IDENTIFY: Whatisthe core issue? (Quote specific partsofmessage)
2. CLASSIFY: Which category? (sales/support/billing/general)
3. ASSESS URGENCY: Time-sensitive keywords? Tone indicators?
4. CHECK PREREQUISITES: Can we resolvewithavailable tools?
5. DECIDE: Routetoappropriate handlerwithreasoning
Think througheachstepexplicitly before providing your final answer.
性能影响:
结论:只有当准确度提升能抵消成本和延迟的权衡时,才使用 CoT。
当你的 agent 需要:
RAG 就是必需的。
n8n 中的 RAG 基本流程:
Webhook/Trigger
↓
Extract Query (user's question)
↓
Vector Search (retrieve relevant chunksfromknowledge base)
↓
AI Agent (answerusingretrieved context)
↓
Response
关键 RAG 要点(基于kapa.ai 的分析):
RAG Prompt 示例:
Answer the customer's question using ONLY the information provided below.
CONTEXT FROM DOCUMENTATION:
{{$json.retrieved_chunks}}
CUSTOMER QUESTION:
{{$json.user_message}}
INSTRUCTIONS:
- Base answer strictly on provided context
- If context doesn't contain the answer:"I don't have that information in our current documentation."
- Include source reference:"According to [doc_title]..."
- If multiple relevant sections: Synthesize informationfromall
CONFIDENCE ASSESSMENT:
- High confidence: Answer directly statedincontext
- Medium confidence: Answer can be inferredfromcontext
- Low confidence: Contextisincomplete → escalate
Wang 等(2024)研究发现:context 的“顺序”影响显著。
发现要点:
最优排序策略:
示例(RAG context):
MOST RELEVANT DOCUMENTATION:
[Chunk with highest relevance score]
ADDITIONAL CONTEXT:
[Supporting chunks]
CONSTRAINTS (IMPORTANT):
- Answer onlyfromprovided context
- If uncertain: Escalate to human team
OpenAI 的 Structured Outputs(GPT-4o)及其他模型的类似能力,解决了一个大问题:获得一致、可解析的输出。
传统 prompting 的问题:
Outputformat:JSONwithfields category, urgency, confidence
模型可能会输出:
你得为这些情况全部做兜底。
Structured Outputs 的方案:
定义 JSON schema,配合 Structured Output Parser 节点拦截异常即可。
示例 schema:
{
"type":"object",
"properties":{
"category":{
"type":"string",
"enum":["sales","support","billing","general"]
},
"urgency":{
"type":"string",
"enum":["low","medium","high"]
},
"confidence":{
"type":"number",
"minimum":0,
"maximum":1"
},
"reasoning": {
"type": "string"
}
},
"required": ["category", "urgency", "confidence"]
}
好处:
何时使用:
我构建 AI agents 的方式就此改变:别再手写 prompt,让模型来生成。
流程:
Meta-prompt 示例:
I'm building an AI agent for customer support email classification. Help me create an optimal system message prompt.
REQUIREMENTS:
- Classify emailsinto: sales, support, billing, general
- Assess urgency: low, medium, high
- Output format: JSONwithcategory, urgency, confidence
- Must handle edge cases: unclear intent, multiple topics, spam
TOOLS AVAILABLE:
- search_docs(query): Search documentation
- create_ticket(title, priority, description): Escalatetohumans
EXAMPLESOFDESIRED BEHAVIOR:
[Include3-5diverse exampleswithinputandexpected output]
CONSTRAINTS:
- Never make up information
-Whenuncertain (confidence <0.7): Escalate
- Response under150wordsfordirect answers
- Include reasoninginoutput
Generate an optimized system message that will consistently produce these results.
模型会生成一个:
为何有效:
大多数“模型特定技巧”并不靠谱。但有些差异确实重要:
Claude(Anthropic):
GPT-4o(OpenAI):
GPT-4o-mini:
Gemini(Google):
选型经验法则:
好 prompt 远远不够——你需要生产级工作流。
用真实的 edge cases 测,别只测“快乐路径”:
Test casesforemail triager:
✓ Standard support request
✓ Angrycustomer(caps, exclamation marks)
✓ Sales inquiry with technicalquestions(mixed intent)
✓ Veryshortmessage("help")
✓ Wronglanguage(ifonly English supported)
✓ Spam/irrelevant content
AI agents 可能失败——要有兜底:
n8n workflow:
AI Agent Node
→IFErrorORconfidence <0.7:
→ Fallback: RoutetoHuman
→ELSE:
→Continuewithautomated workflow
带 confidence 的 System Message 约定:
Ifyou're uncertain (confidence < 70%):
Set"needs_human_review":trueinoutput
高并发下,每个 token 都很宝贵:
跟踪关键指标:
在 n8n 中:用 Webhook → Google Sheets 进行轻量记录:
AfterAIAgentNode:
→SetNode(ExtractMetrics):
-latency: {{$now - $('AI Agent').json.startTime}}
-input_tokens: {{$('AI Agent').json.usage.input_tokens}}
-output_tokens: {{$('AI Agent').json.usage.output_tokens}}
-confidence: {{$('AI Agent').json.confidence}}
→GoogleSheets(AppendRow)
上线前:
Prompt 质量:
测试:
性能:
监控:
迭代:
五大通用核心技巧:
情境性高级模式:
元结论:
你的下一步:挑一个现有的 n8n AI Agent 工作流,套用以上五大核心技巧。对比前后 token 使用。通常你会看到成本大幅下降,同时输出质量不降反升。
这就是“勉强可用”的 prompting 与“可规模化、可上生产”的 prompting 的区别。
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |