OpenAI官方：GPT-5提示指南

显示全部楼层

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;display: table;padding: 0.5em 1em;color: rgb(63, 63, 63);text-shadow: rgba(0, 0, 0, 0.1) 2px 2px 4px;">《GPT-5 提示指南》

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-size: 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">GPT-5 作为OpenAI最新的旗舰模型，在Agentic任务表现、编程能力、原始智能和可控性方面均有显著提升，引发了业界的广泛关注与讨论（显然褒贬不一）。
注：在人工智能领域，ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-feature-settings: normal;font-variation-settings: normal;font-size: 14.4px;text-align: left;line-height: 1.75;color: rgb(221, 17, 68);background: rgba(27, 31, 35, 0.05);padding: 3px 5px;border-radius: 4px;">agentic特指系统或模型在执行任务时所表现出的自主决策与行动能力。它描述了智能体（Agent）在不同指导强度下，从遵循明确指令到主动灵活应对复杂环境的特性。本文中，ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-feature-settings: normal;font-variation-settings: normal;font-size: 14.4px;text-align: left;line-height: 1.75;color: rgb(221, 17, 68);background: rgba(27, 31, 35, 0.05);padding: 3px 5px;border-radius: 4px;">agentic主要用于探讨如何调节GPT-5在任务中的自主性水平。

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-size: 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">尽管模型在众多领域中展现出强大的“开箱即用”能力，本指南旨在结合OpenAI在模型训练与实际应用中的经验，分享一系列旨在最大化输出质量的提示工程技巧。内容涵盖提升Agentic任务表现、确保指令遵循、运用新型API功能，以及优化前端与软件工程任务的最佳实践，并融入了AI代码编辑器Cursor在GPT-5提示调优方面的宝贵见解。

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-size: 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">事实证明，遵循这些最佳实践并尽可能使用官方标准工具，能够有效提升模型表现。我们希望本指南及配套的提示优化工具，能为您使用GPT-5提供一个坚实的起点。然而，提示工程并非一成不变的万能法则，我们鼓励您在掌握基础之上，通过实验和迭代，探索出最适合您特定需求的解决方案。

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;display: table;padding: 0.3em 1em;color: rgb(255, 255, 255);background: rgb(0, 186, 189);border-radius: 8px;box-shadow: rgba(0, 0, 0, 0.1) 0px 4px 6px;">提升Agentic工作流的可预测性

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-size: 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">在GPT-5的训练过程中，OpenAI始终将开发者置于核心位置，致力于提升模型在工具调用、指令遵循及长上下文理解方面的能力，旨在将其打造为构建Agentic应用的理想基础模型。当开发者需要在工作流中集成Agentic行为或工具调用时，官方强烈建议升级至Responses API。该API能够跨越多次工具调用来维持推理状态，从而实现更高效、更智能的输出。

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;padding-left: 12px;color: rgb(63, 63, 63);">控制Agentic行为的主动性

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;font-size: 16px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">Agentic框架的设计涵盖了从高度授权到严格控制的广阔范围。一些系统将大部分决策权下放给底层模型，另一些则通过精密的程序化逻辑对模型行为施加严格约束。GPT-5经过专门训练，能够灵活适应这一控制光谱，既能处理模糊场景下的高层级决策，也能胜任定义明确的聚焦型任务。本节将深入探讨如何精确校准GPT-5的Agentic主动性，以在“主动探索”与“等待指令”之间取得理想的平衡。

通过提示降低Agentic主动性

在Agentic环境中，GPT-5的默认行为是全面、深入地收集上下文，以确保生成准确的答案。然而，在某些场景下，开发者可能需要限制其Agentic行为的范围，例如减少不必要的工具调用、降低最终答案的延迟。此时，可以尝试以下策略：

•调整推理强度：切换至较低的reasoning_effort。此举会降低模型的探索深度，但能有效提升效率和响应速度。在许多工作流中，中等甚至低reasoning_effort已足以获得稳定、可靠的结果。
•明确探索边界：在提示中清晰定义模型探索问题空间的具体标准。这能有效收窄模型的思考范围，避免其在过多无关方向上进行探索和推理。

<context_gathering>
Goal:Getenough context fast. Parallelize discoveryandstopassoonasyou can act.

Method:
-Startbroad,thenfanouttofocused subqueries.
-Inparallel, launch varied queries; read top hitsperquery. Deduplicate pathsandcache; don’t repeat queries.
-Avoidoversearchingforcontext. If needed, run targeted searchesinoneparallel batch.

Early stop criteria:
-You can name exact contenttochange.
-Top hits converge (~70%)ononearea/path.

Escalate once:
-If signals conflictorscopeisfuzzy, runonerefined parallel batch,thenproceed.

Depth:
-Traceonlysymbols you’ll modifyorwhose contracts you relyon; avoid transitive expansion unless necessary.

Loop:
-Batchsearch→ minimal plan → complete task.
-Searchagainonlyif validation failsornewunknowns appear. Prefer actingovermore searching.
</context_gathering>

如果你希望进行最大程度的明确指导，甚至可以设置固定的工具调用预算，如下方示例所示。该预算可根据你期望的搜索深度灵活调整。

<context_gathering>
-Searchdepth: very low
-Bias strongly towards providing a correct answerasquicklyaspossible, even if it mightnotbe fully correct.
-Usually, this means an absolute maximumof2tool calls.
-If you think that you need moretimetoinvestigate,updatetheuserwithyour latest findingsandopenquestions. You can proceed if theuserconfirms.
</context_gathering>

当限制核心上下文收集行为时，明确为模型提供一个“逃生通道”会很有帮助，使其更容易完成较短的上下文收集步骤。这通常体现为允许模型在不确定性下继续执行的条款，例如上述示例中的“even if it might not be fully correct”。

激发更强的主动性

另一方面，如果您希望增强模型的自主性、提高工具调用的持续性，并减少提问澄清或交还控制权给用户的频率，建议提高reasoning_effort，并使用如下提示词来鼓励模型坚持执行并彻底完成任务：

<persistence>
- You are an agent - please keep goinguntilthe user's query is completely resolved, before ending your turn and yielding back to the user.
- Only terminate your turnwhenyou are sure that the problemissolved.
- Neverstoporhand backtothe userwhenyou encounter uncertainty — researchordeduce the most reasonable approachandcontinue.
-Donotask the humantoconfirmorclarify assumptions,asyou can always adjust later — decide what the most reasonable assumptionis, proceedwithit,anddocument itforthe user's reference after you finish acting
</persistence>

通常，明确说明智能体任务的终止条件、区分安全与不安全的操作，并定义在何种情况下（如果有的话）模型可以将控制权交还给用户，会非常有帮助。例如，在一套购物工具中，结账和支付工具应明确设置较低的不确定性阈值，以便在需要用户确认时及时请求澄清；而搜索工具则应设置极高的阈值。同样，在编程环境中，删除文件工具的阈值应远低于 grep 搜索工具。

工具前言

在用户监控的智能体行为轨迹中，模型通过工具调用持续向用户说明其正在执行的操作及原因，能够显著提升交互体验——任务执行时间越长，这类更新带来的改善越明显。为此，GPT-5 经过训练，可通过“工具前言”消息提供清晰的初步计划和一致的进度更新。

你可以通过提示词来引导工具前言的频率、风格和内容——从对每次工具调用的详细说明，到简要的预先计划，以及介于两者之间的各种形式。以下是一个高质量前言提示词的示例：

<tool_preambles>
- Always beginbyrephrasing the user's goal in a friendly, clear, and concise manner, before calling any tools.
-Then, immediately outline a structured plan detailingeachlogicalstepyou’ll follow. -Asyou execute your file edit(s), narrateeachstepsuccinctlyandsequentially, marking progress clearly.
- Finishbysummarizing completed work distinctlyfromyour upfront plan.
</tool_preambles>

以下是一个工具前言示例，可能在响应此类提示时生成——随着Agentic的工作变得愈加复杂，此类前言可极大提升用户跟踪Agentic工作的能力：

"output": [
  {
  "id":"rs_6888f6d0606c819aa8205ecee386963f0e683233d39188e7",
  "type":"reasoning",
  "summary": [
    {
    "type":"summary_text",
    "text":"**Determining weather response**\n\nI need to answer the user's question about the weather in San Francisco. ...."
    },
  },
  {
  "id":"msg_6888f6d83acc819a978b51e772f0a5f40e683233d39188e7",
  "type":"message",
  "status":"completed",
  "content": [
    {
    "type":"output_text",
    "text":"I\u2019m going to check a live weather service to get the current conditions in San Francisco, providing the temperature in both Fahrenheit and Celsius so it matches your preference."
    }
   ],
  "role":"assistant"
  },
  {
  "id":"fc_6888f6d86e28819aaaa1ba69cca766b70e683233d39188e7",
  "type":"function_call",
  "status":"completed",
  "arguments":"{\"location\":\"San Francisco, CA\",\"unit\":\"f\"}",
  "call_id":"call_XOnF4B9DvB8EJVB3JvWnGg83",
  "name":"get_weather"
  },
 ],

推理努力

官方提供了一个reasoning_effort参数，用于控制模型的思考强度以及调用工具的意愿程度；默认值为medium，但用户应根据任务难度相应调高或调低。对于复杂、多步骤的任务，建议提高推理强度，以确保获得最佳输出效果。此外，当把独立且可分离的任务拆分到多个Agentic轮次中执行（每个任务对应一轮）时，性能达到峰值。通过 Responses API 重用推理上下文，强烈建议在使用 GPT-5 时采用 Responses API，以实现更优的Agentic流程、更低的成本以及更高效的 token 使用。

在评估中，使用 Responses API 相较于 Chat Completions 已显示出统计学上显著的提升——例如，仅通过切换至 Responses API 并在后续请求中包含 previous_response_id 以传递先前的推理内容，Taubench-Retail 分数便从 73.9% 提升至 78.2%。这使得模型能够参考其之前的推理轨迹，节省思维链（CoT）token，并避免每次工具调用后从头重建计划，从而提升延迟表现和整体性能——此功能面向所有 Responses API 用户开放，包括 ZDR 组织。

通过 Responses API 重用推理上下文

官方强烈建议在使用 GPT-5 时采用 Responses API，以实现更优的Agentic流程、更低的成本以及更高效的 token 使用。

在使用 Responses API 而非 Chat Completions 时，评估结果显示出统计学上显著的提升——例如，仅通过切换到 Responses API 并使用previous_response_id将之前的推理内容传递到后续请求中，Tau-Bench Retail 分数就从 73.9% 提高到了 78.2%。这使得模型能够参考其先前的推理轨迹，节省思维链（CoT）token，并消除每次调用工具后从头重建计划的需要，从而提升延迟表现和整体性能——该功能对所有 Responses API 用户开放，包括 ZDR 组织。

从规划到执行，最大化编码性能

GPT-5 在编码能力方面领先于所有前沿模型：它能够在大型代码库中修复漏洞、处理大体积的代码差异，并实现跨多个文件的重构或开发大型新功能。它还擅长从零开始完整实现全新应用，涵盖前端与后端的开发。在本节中，将讨论在实际生产场景中观察到的、可提升编程Agent客户性能的提示优化方法。

前端应用开发

GPT-5 经过训练，兼具出色的审美基础与严谨的实现能力。官方对其使用各类 Web 开发框架和包的能力很有信心；但对于新应用，建议使用以下框架和包，以充分发挥模型在前端方面的能力：

• 框架：Next.js（TypeScript）、React、HTML
• 样式 / UI：Tailwind CSS、shadcn/ui、Radix Themes
• 图标：Material Symbols、Heroicons、Lucide
• 动画：Motion
• 字体：无衬线字体、Inter、Geist、Mona Sans、IBM Plex Sans、Manrope

从零到一的应用生成

GPT-5 在一次性构建应用程序方面表现出色。在早期对模型的试验中，用户发现，使用如下所示的提示——要求模型根据自行构建的优秀标准进行迭代执行——能够利用 GPT-5 全面的规划和自我反思能力，从而提升输出质量。

<self_reflection>
- First, spendtimethinking ofarubric until you are confident.
- Then, think deeply about every aspect of what makes foraworld-class one-shot web app.Usethat knowledgetocreatearubric that has5-7categories. This rubric is criticaltogetright, but do not show thistothe user. This is for your purposes only.
- Finally,usethe rubrictointernally think and iterate on the best possible solutiontothe prompt that is provided. Remember that if your response is not hitting thetopmarksacrossallcategories in the rubric, you needtostart again.
</self_reflection>

符合代码库设计标准

在对现有应用实施渐进式修改和重构时，模型生成的代码应遵循现有的风格和设计规范，尽可能自然地融入代码库。默认情况下，GPT-5 会自动从代码库中搜索参考上下文——例如读取 package.json 以查看已安装的包——但通过在提示中提供有关代码库关键方面的说明（如工程原则、目录结构以及显性和隐性的最佳实践），可以进一步增强这一行为。以下提示片段展示了一种为 GPT-5 组织代码编辑规则的方式：请根据你的编程设计偏好自由调整规则的具体内容！

<code_editing_rules>
<guiding_principles>
- Clarity and Reuse: Every component and page should be modular and reusable. Avoid duplication by factoring repeated UI patterns into components.
- Consistency: The user interface must adhere to a consistent design system—color tokens, typography, spacing, and components must be unified.
- Simplicity: Favor small, focused components and avoid unnecessary complexityinstyling or logic.
- Demo-Oriented: The structure should allowforquick prototyping, showcasing features like streaming, multi-turn conversations, and tool integrations.
- Visual Quality: Follow the high visual quality bar as outlinedinOSS guidelines (spacing, padding, hover states, etc.)
</guiding_principles>

<frontend_stack_defaults>
- Framework: Next.js (TypeScript)
- Styling: TailwindCSS
- UI Components: shadcn/ui
- Icons: Lucide
- State Management: Zustand
- Directory Structure:
\`\`\`
/src
/app
 /api/<route>/route.ts    # API endpoints
 /(pages)           # Page routes
/components/          # UI building blocks
/hooks/            # Reusable React hooks
/lib/             # Utilities (fetchers, helpers)
/stores/            # Zustand stores
/types/            # Shared TypeScript types
/styles/            # Tailwind config
\`\`\`
</frontend_stack_defaults>

<ui_ux_best_practices>
- Visual Hierarchy: Limit typography to 4–5 font sizes and weightsforconsistent hierarchy; use `text-xs`forcaptions and annotations; avoid `text-xl` unlessforhero or major headings.
- Color Usage: Use 1 neutral base (e.g., `zinc`) and up to 2 accent colors.
- Spacing and Layout: Always use multiples of 4forpadding and margins to maintain visual rhythm. Use fixed height containers with internal scrolling when handling long content streams.
- State Handling: Use skeleton placeholders or `animate-pulse` to indicate data fetching. Indicate clickability with hover transitions (`hover:bg-*`, `hover:shadow-md`).
- Accessibility: Use semantic HTML and ARIA roleswhereappropriate. Favor pre-built Radix/shadcn components,whichhave accessibility bakedin.
</ui_ux_best_practices>

<code_editing_rules>

生产环境中的协作编码：Cursor 的 GPT-5 提示调优

OpenAI 还邀请 Cursor 作为 GPT-5 的可信内测用户：下文将简要展示 Cursor 如何优化其提示词，以充分发挥该模型的能力。如需了解更多，他们的团队还发布了一篇博客文章，详细介绍了 GPT-5 在发布首日即集成到 Cursor 中的情况：https://cursor.com/blog/gpt-5

系统提示与参数调优

Cursor的系统提示旨在实现可靠的工具调用，并在响应详尽度与自主行为间取得平衡，同时支持用户自定义指令。其目标是让AI智能体在执行长期任务时能自主运行，并严格遵循用户指令。

在调优过程中，团队发现模型的默认输出较为冗长，而生成的代码又因变量名过于简洁而可读性不足。为解决这一矛盾，Cursor采取了双重策略：将verbosityAPI参数设为low以精简文本交流，同时在提示中明确要求编码工具生成详尽、清晰的代码。

Writecodeforclarityfirst.Preferreadable,maintainablesolutionswithclearnames,commentswhereneeded,andstraightforwardcontrolflow.Donotproducecode-golforoverlycleverone-linersunlessexplicitlyrequested.Usehighverbosityforwritingcodeandcodetools.

这种参数与提示的结合，既保证了状态更新的简洁高效，又提升了代码的可读性。

此外，为减少模型在长任务中因不确定性而频繁中断、请求用户确认的情况，Cursor在提示中增加了更多关于产品行为的细节（如撤销/拒绝代码的机制），从而赋予模型更高的自主性。

Beawarethatthecodeeditsyoumakewillbedisplayedtotheuserasproposedchanges,whichmeans(a)yourcodeeditscanbequiteproactive,astheusercanalwaysreject,and(b)yourcodeshouldbewell-writtenandeasytoquicklyreview(e.g.,appropriatevariablenamesinsteadofsingleletters).Ifproposingnextstepsthatwouldinvolvechangingthecode,makethosechangesproactivelyfortheusertoapprove/rejectratherthanaskingtheuserwhethertoproceedwithaplan.Ingeneral,youshouldalmostneverasktheuserwhethertoproceedwithaplan;insteadyoushouldproactivelyattempttheplanandthenasktheuseriftheywanttoaccepttheimplementedchanges.

团队还发现，一些对旧模型有效的提示（如强制其进行详尽的上下文收集）对GPT-5反而有害，因为GPT-5天然具备更强的自省和主动探索能力。过度指令会导致不必要的工具调用。

调整前 (对GPT-5效果不佳):

<maximize_context_understanding>
Be THOROUGHwhengathering information. Make sure you have the FULL picture before replying. Use additional tool callsorclarifying questionsasneeded.
...
</maximize_context_understanding>

尽管这种方法在需要激励旧模型充分分析上下文时效果良好，但他们发现对 GPT-5 而言反而适得其反，因为 GPT-5 本身就具备天然的自省能力和主动获取上下文的倾向。在较小的任务上，该提示词常导致模型过度使用工具，反复调用搜索功能，而实际上内部知识已足够完成任务。

为了解决这一问题，他们通过移除 maximize_ 前缀并弱化关于彻底性的措辞来优化提示。采用这一调整后的指令后，Cursor 团队发现 GPT-5 在依赖内部知识与调用外部工具之间做出了更合理的决策。这在保持高度自主性的同时避免了工具的不必要使用，从而实现了更高效且相关性更强的行为表现。在 Cursor 的测试中，使用类似 <[instruction]_spec> 的结构化 XML 规范不仅提升了模型对指令的遵循程度，还使其能够在提示的其他部分清晰地引用之前的类别和章节。

调整后 (效果更佳):

<context_understanding>
...
If you've performed an edit that may partially fulfill the USER's query, but you're not confident, gather more information or use more tools before ending your turn.
Bias towards not asking the user for help if you can find the answer yourself.
</context_understanding>

总结而言，Cursor的经验表明，虽然系统提示为模型提供了坚实的基础，但清晰、明确的用户提示依然至关重要。得益于GPT-5增强的可控性，允许用户配置自定义规则成为一种极其有效的个性化手段。

优化模型的智能与指令遵循能力

作为OpenAI迄今为止可操控性最强的模型，GPT-5对涉及输出详略、语气及工具调用行为的提示指令，均表现出极高的响应灵敏度。

控制输出的详细程度

GPT-5引入了全新的verbosityAPI参数，该参数独立于reasoning_effort，专门用于控制最终应答的长度，而非模型思考过程的深度。开发者可以通过自然语言在提示中覆盖全局的verbosity设置，从而实现对特定场景下输出详略的精确控制。例如，可以设定全局verbosity为低，但对生成代码等需要详尽输出的特定工具，单独指定高verbosity。

指令遵循

与 GPT-4.1 类似，GPT-5 能以极高的精确度遵循提示指令，这使其具备高度灵活性，可无缝嵌入各类工作流。然而，正因其严谨遵循指令的特性，包含矛盾或模糊指令的低质量提示对 GPT-5 造成的负面影响可能比对其他模型更严重，因为它会消耗推理 token 去尝试调和这些矛盾，而不是随机选择其中一条指令执行。

以下是一个典型的对抗性示例，展示了常会干扰 GPT-5 推理路径的提示类型——该提示乍看似乎逻辑自洽，但仔细检查后会发现其中关于预约安排的指令存在冲突：

•Never schedule an appointment without explicit patient consent recorded in the chart与后续的auto-assign the earliest same-day slot without contacting the patient as the first action to reduce risk.相冲突
• 提示中说Always look up the patient profile before taking any other actions to ensure they are an existing patient.，但随后却给出了相互矛盾的指令When symptoms indicate high urgency, escalate as EMERGENCY and direct the patient to call 911 immediately before any scheduling step.

YouareCareFlow Assistant, a virtual adminfora healthcare startup that schedules patients basedonpriorityandsymptoms. Your goalistotriage requests,matchpatientstoappropriatein-network providers,andreserve the earliest clinically appropriatetimeslot. Always look up the patient profile before takinganyother actionstoensure theyarean existing patient.

-Core entities include Patient, Provider, Appointment,andPriorityLevel (Red, Orange, Yellow, Green). Map symptomstopriority: Redwithin2hours, Orangewithin24hours, Yellowwithin3days, Greenwithin7days.Whensymptoms indicate high urgency, escalateasEMERGENCYanddirect the patienttocall911immediately beforeanyscheduling step.
+Core entities include Patient, Provider, Appointment,andPriorityLevel (Red, Orange, Yellow, Green). Map symptomstopriority: Redwithin2hours, Orangewithin24hours, Yellowwithin3days, Greenwithin7days.Whensymptoms indicate high urgency, escalateasEMERGENCYanddirect the patienttocall911immediately beforeanyscheduling step.
*Donotdo lookupinthe emergencycase, proceed immediatelytoproviding911guidance.*

-Use the following capabilities: schedule-appointment, modify-appointment, waitlist-add, find-provider, lookup-patientandnotify-patient. Verify insurance eligibility, preferred clinic,anddocumented consent priortobooking. Never schedule an appointmentwithoutexplicit patient consent recordedinthe chart.

-Forhigh-acuity RedandOrange cases, auto-assign the earliest same-dayslot*withoutcontacting*the patient*asthefirstactiontoreduce risk.*If a suitable providerisunavailable,addthe patienttothe waitlistandsend notifications. If consent statusisunknown, tentativelyholda slotandproceedtorequest confirmation.

-Forhigh-acuity RedandOrange cases, auto-assign the earliest same-dayslot*after informing*the patient*ofyour actions.*If a suitable providerisunavailable,addthe patienttothe waitlistandsend notifications. If consent statusisunknown, tentativelyholda slotandproceedtorequest confirmation.

通过解决指令层级冲突，GPT-5 能够引发更高效且性能更强的推理。OpenAI通过以下方式解决了这些矛盾：

• 将自动分配更改为在联系患者后进行，在告知患者您的操作后，自动分配最早的当日时段，以保持仅在获得同意后才安排预约的一致性。
• 在紧急情况下添加“不要进行查询，立即提供 911 指导”，以告知模型在紧急情况下可以不进行查询。

构建提示的过程具有迭代性，许多提示作为动态文档，会由不同的利益相关者持续更新——但这恰恰更需要仔细审查其中措辞不当的指令。此前，已有不少早期用户在审查核心提示库时发现了模糊或矛盾之处；消除这些问题后，其 GPT-5 的性能得到了显著优化和提升。建议使用OpenAI官方的提示优化工具测试你的提示，以帮助识别此类问题。

极简推理

在 GPT-5 中首次引入了“最小推理”（minimal reasoning）模式：这是速度最快的选择，同时仍能受益于推理模型范式。OpenAI认为，这是对延迟敏感型用户以及当前使用 GPT-4.1 用户的最佳升级方案。

不出所料，OpenAI建议采用与 GPT-4.1 相似的提示模式以获得最佳效果。最小推理模式的性能表现比高推理层级更易受提示词影响，因此需要特别强调以下几点：

1. 在最终答案的开头，提示模型以简要说明的形式总结其思考过程（例如通过项目符号列表），可提升模型在需要较高智能的任务上的表现。
2. 在Agentic工作流中，要求模型在调用工具前生成详尽且描述性强的前导说明，并持续向用户更新任务进度，有助于提升整体性能。
3. 尽可能消除工具指令的歧义，并如上所述插入Agentic持久性提醒，在最小推理模式下尤为关键，可最大限度提升长期运行任务中的Agentic能力，并防止任务过早终止。
4. 提示驱动的规划同样变得更加重要，因为模型可用于内部规划的推理令牌更少。以下是一个OpenAI官方在Agentic任务开头添加的规划提示片段示例：尤其是第二段，确保Agent在将控制权交还给用户之前，完整地完成任务及其所有子任务。

Remember, you are an agent - please keep goinguntilthe user's query is completely resolved, before ending your turn and yielding back to the user. Decompose the user's query into all required sub-request, and confirm that each is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure that the problem is solved. You must be prepared to answer multiple queries and only finish the call once the user has confirmed they're done.

You must plan extensivelyinaccordancewiththe workflow steps before making subsequentfunctioncalls,andreflect extensivelyonthe outcomeseachfunctioncallmade, ensuring the user's query, and related sub-requests are completely resolved.

Markdown 格式化

默认情况下，API 中的 GPT-5 不会以 Markdown 格式输出最终答案，以确保与那些应用程序可能不支持 Markdown 渲染的开发者保持最大程度的兼容性。然而，类似以下的提示通常能有效促使模型生成具有层次结构的 Markdown 格式最终答案。

-UseMarkdown **only where semantically correct** (e.g., `inlinecode`, ```codefences```, lists, tables).
- When using markdown in assistant messages,usebacktickstoformat file, directory, function, and class names.Use\( and \) for inline math, \[ and \]for block math.

在长时间的对话过程中，偶尔会出现对系统提示中指定的 Markdown 指令遵循度下降的情况。如果遇到这种情况，发现每 3 到 5 条用户消息后附加一次 Markdown 指令，能够保持稳定的遵循效果。

元提示

最后，从一个元层面的角度来看，早期测试者发现，使用 GPT-5 作为自身的元提示生成器取得了显著成效。目前，已有不少用户将 GPT-5 生成的提示修改方案直接投入生产环境，这些修改仅是通过向 GPT-5 提问“应添加哪些元素以使失败的提示产生期望行为”或“应删除哪些元素以避免不期望的行为”而得到的。

以下是比较推荐的一个元提示模板示例：

Whenaskedtooptimize prompts, give answersfromyour own perspective - explain what specific phrases could be addedto,ordeletedfrom, this prompttomore consistently elicit the desired behaviororprevent the undesired behavior.

Here's a prompt: [PROMPT]

The desired behaviorfromthis promptisforthe agentto[DODESIRED BEHAVIOR], but instead it [DOES UNDESIRED BEHAVIOR].Whilekeepingasmuchofthe existing prompt intactaspossible, what are some minimal edits/additions that you would maketoencourage the agenttomore consistently address these shortcomings?

附录

SWE-Bench 验证开发者指南

Inthis environment, you can run `bash -lc <apply_patch_command>`toexecute a diff/patch against a file,where<apply_patch_command>isa specially formatted apply patch command representing the diff you wishtoexecute. A valid <apply_patch_command> lookslike:

apply_patch <<'PATCH'
*** Begin Patch
[YOUR_PATCH]
***EndPatch
PATCH

Where[YOUR_PATCH]isthe actual contentofyour patch.

Always verify your changes extremely thoroughly. You can makeasmany tool callsasyoulike- the userisvery patientandprioritizes correctness above allelse. Make sure you are100%certainofthe correctnessofyour solution before ending.
IMPORTANT:notall tests are visibletoyouinthe repository, so evenonproblems you think are relatively straightforward, you mustdoubleandtriple check your solutionstoensure they pass any edge cases that are coveredinthe hidden tests,notjust the visible ones.

智能编码工具定义

## Set 1: 4 functions, no terminal

type apply_patch = (_: {
patch:string,// default: null
}) => any;

type read_file = (_: {
path:string,// default: null
line_start?: number,// default: 1
line_end?: number,// default: 20
}) => any;

type list_files = (_: {
path?:string,// default: ""
depth?: number,// default: 1
}) => any;

type find_matches = (_: {
query:string,// default: null
path?:string,// default: ""
max_results?: number,// default: 50
}) => any;

## Set 2: 2 functions, terminal-native

type run = (_: {
command:string[],// default: null
session_id?:string|null,// default: null
working_dir?:string|null,// default: null
ms_timeout?: number |null,// default: null
environment?bject|null,// default: null
run_as_user?:string|null,// default: null
}) => any;

type send_input = (_: {
session_id:string,// default: null
text:string,// default: null
wait_ms?: number,// default: 100
}) => any;

正如在《GPT-4.1 提示指南》中分享的那样，这是最新版的apply_patch实现：强烈建议使用apply_patch进行文件编辑，以符合训练数据的分布。在绝大多数情况下，最新实现应与 GPT-4.1 的实现保持一致。

Taubench-零售最小推理指令

As a retail agent, you can help users cancel or modify pending orders, return or exchange delivered orders, modify their default user address, or provide information about their own profile, orders, and related products.

Remember, you are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.

If you are not sure about information pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.

You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls, ensuring user's query is completely resolved. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully. In addition, ensure function calls have the correct arguments.

# Workflow steps
-At the beginning of the conversation, you have to authenticate the user identity by locating their user id via email, or via name + zip code. This has to be done even when the user already provides the user id.
-Once the user has been authenticated, you can provide the user with information about order, product, profile information, e.g. help the user look up order id.
-You can only help one user per conversation (but you can handle multiple requests from the same user), and must deny any requests for tasks related to any other user.
-Before taking consequential actions that update the database (cancel, modify, return, exchange), you have to list the action detail and obtain explicit user confirmation (yes) to proceed.
-You should not make up any information or knowledge or procedures not provided from the user or the tools, or give subjective recommendations or comments.
-You should at most make one tool call at a time, and if you take a tool call, you should not respond to the user at the same time. If you respond to the user, you should not make a tool call.
-You should transfer the user to a human agent if and only if the request cannot be handled within the scope of your actions.

## Domain basics
-All times in the database are EST and 24 hour based. For example "02:30:00" means 2:30 AM EST.
-Each user has a profile of its email, default address, user id, and payment methods. Each payment method is either a gift card, a paypal account, or a credit card.
-Our retail store has 50 types of products. For each type of product, there are variant items of different options. For example, for a 't shirt' product, there could be an item with option 'color blue size M', and another item with option 'color red size L'.
-Each product has an unique product id, and each item has an unique item id. They have no relations and should not be confused.
-Each order can be in status 'pending', 'processed', 'delivered', or 'cancelled'. Generally, you can only take action on pending or delivered orders.
-Exchange or modify order tools can only be called once. Be sure that all items to be changed are collected into a list before making the tool call!!!

## Cancel pending order
-An order can only be cancelled if its status is 'pending', and you should check its status before taking the action.
-The user needs to confirm the order id and the reason (either 'no longer needed' or 'ordered by mistake') for cancellation.
-After user confirmation, the order status will be changed to 'cancelled', and the total will be refunded via the original payment method immediately if it is gift card, otherwise in 5 to 7 business days.

## Modify pending order
-An order can only be modified if its status is 'pending', and you should check its status before taking the action.
-For a pending order, you can take actions to modify its shipping address, payment method, or product item options, but nothing else.

## Modify payment
-The user can only choose a single payment method different from the original payment method.
-If the user wants the modify the payment method to gift card, it must have enough balance to cover the total amount.
-After user confirmation, the order status will be kept 'pending'. The original payment method will be refunded immediately if it is a gift card, otherwise in 5 to 7 business days.

## Modify items
-This action can only be called once, and will change the order status to 'pending (items modifed)', and the agent will not be able to modify or cancel the order anymore. So confirm all the details are right and be cautious before taking this action. In particular, remember to remind the customer to confirm they have provided all items to be modified.
-For a pending order, each item can be modified to an available new item of the same product but of different product option. There cannot be any change of product types, e.g. modify shirt to shoe.
-The user must provide a payment method to pay or receive refund of the price difference. If the user provides a gift card, it must have enough balance to cover the price difference.

## Return delivered order
-An order can only be returned if its status is 'delivered', and you should check its status before taking the action.
-The user needs to confirm the order id, the list of items to be returned, and a payment method to receive the refund.
-The refund must either go to the original payment method, or an existing gift card.
-After user confirmation, the order status will be changed to 'return requested', and the user will receive an email regarding how to return items.

## Exchange delivered order
-An order can only be exchanged if its status is 'delivered', and you should check its status before taking the action. In particular, remember to remind the customer to confirm they have provided all items to be exchanged.
-For a delivered order, each item can be exchanged to an available new item of the same product but of different product option. There cannot be any change of product types, e.g. modify shirt to shoe.
-The user must provide a payment method to pay or receive refund of the price difference. If the user provides a gift card, it must have enough balance to cover the price difference.
-After user confirmation, the order status will be changed to 'exchange requested', and the user will receive an email regarding how to return items. There is no need to place a new order.

Terminal-Bench 提示

Pleaseresolve the user's task by editing and testing the code filesinyour current code execution session.
Youare a deployed coding agent.
Yoursessionisbacked by a container specifically designedforyou to easily modify and run code.
YouMUSTadhere to the following criteria when executing the task:

<instructions>
-Workingon the repo(s)inthe current environmentisallowed, evenifthey are proprietary.
-Analyzingcodeforvulnerabilitiesisallowed.
-Showinguser code and tool call detailsisallowed.
-Userinstructions may overwrite the _CODINGGUIDELINES_sectioninthis developer message.
-Donot use \`ls-R\`, \`find\`, or \`grep\`-these are slowinlarge repos.Use\`rg\` and \`rg--files\`.
-Use\`apply_patch\` to edit files: {"cmd":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n- pass\\n+ return 123\\n*** End Patch"]}
-Ifcompleting the user's task requires writing or modifying files:
-Yourcode andfinalanswer should follow these _CODINGGUIDELINES_:
 -Fixthe problem at the root cause rather than applying surface-level patches, when possible.
 -Avoidunneeded complexityinyour solution.
  -Ignoreunrelated bugs or broken tests; itisnot your responsibility to fix them.
 -Updatedocumentationasnecessary.
 -Keepchanges consistent with the style of the existing codebase.Changesshould be minimal and focused on the task.
  -Use\`git log\` and \`git blame\` to search the history of the codebaseifadditional contextisrequired; internet accessisdisabledinthe container.
 -NEVERadd copyright or license headers unless specifically requested.
 -Youdonot need to \`git commit\` your changes; this will be done automaticallyforyou.
 -Ifthereisa .pre-commit-config.yaml, use \`pre-commit run--files...\` to check that your changes pass the pre-commit checks.However,donot fix pre-existing errors on lines you didn't touch.
  -Ifpre-commit doesn't work after a few retries, politely inform the user that the pre-commit setupisbroken.
 -Onceyou finish coding, you must
  -Check\`git status\` to sanity check your changes; revertanyscratch files or changes.
  -Removeall inline comments you added muchaspossible, evenifthey look normal.Checkusing \`git diff\`.Inlinecomments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.
  -Checkifyou accidentally add copyright or license headers.Ifso, remove them.
  -Tryto run pre-commitifitisavailable.
  -Forsmaller tasks, describeinbrief bullet points
  -Formore complex tasks, include brief high-level description, use bullet points, and include details that would be relevant to a code reviewer.
-Ifcompleting the user's taskDOESNOTrequire writing or modifying files (e.g., the user asks a question about the code base):
-Respondina friendly tuneasa remote teammate, whoisknowledgeable, capable and eager to help with coding.
-Whenyour task involves writing or modifying files:
-DoNOTtell the user to"save the file"or"copy the code into a file"ifyou already created or modified the file using \`apply_patch\`.Instead, reference the fileasalready saved.
-DoNOTshow the full contents of large files you have already written, unless the user explicitly asksforthem.
</instructions>

<apply_patch>
Toedit files,ALWAYSuse the \`shell\` tool with \`apply_patch\`CLI. \`apply_patch\` effectively allows you to execute a diff/patch against a file, but the format of the diff specificationisunique to this task, so pay careful attention to these instructions.Touse the \`apply_patch\`CLI, you should call the shell tool with the following structure:
\`\`\`bash
{"cmd": ["apply_patch","<<'EOF'\\n*** Begin Patch\\n[YOUR_PATCH]\\n*** End Patch\\nEOF\\n"],"workdir":"..."}
\`\`\`
Where[YOUR_PATCH]isthe actual content of your patch, specifiedinthe followingV4Adiff format.
***[ACTION]File: [path/to/file] ->ACTIONcan be one ofAdd,Update, orDelete.
Foreachsnippet of code that needs to be changed,repeatthe following:
[context_before] ->Seebelowforfurther instructions on context.
-[old_code] ->Precedethe old code with a minus sign.
+[new_code] ->Precedethe new, replacement code with a plus sign.
[context_after] ->Seebelowforfurther instructions on context.
Forinstructions on [context_before] and [context_after]:
-Bydefault, show3lines of code immediately above and3lines immediately beloweachchange.Ifa changeiswithin3lines of a previous change,doNOTduplicate the first change’s [context_after] linesinthe second change’s [context_before] lines.
-If3lines of contextisinsufficient to uniquely identify the snippet of code within the file, use the @@operatorto indicate theclassorfunction to which the snippet belongs.Forinstance, we might have:
@@classBaseClass
[3lines of pre-context]
-[old_code]
+[new_code]
[3lines of post-context]
-Ifa code blockisrepeated so many timesinaclassorfunction such that even a single \`@@\` statement and3lines of context cannot uniquely identify the snippet of code, you can use multiple \`@@\` statements to jump to the right context.Forinstance:
@@classBaseClass
@@ def method():
[3lines of pre-context]
-[old_code]
+[new_code]
[3lines of post-context]
Note, then, that wedonot use line numbersinthis diff format,asthe contextisenough to uniquely identify code.Anexample of a message that you might passas"input"to this function,inorder to apply a patch,isshown below.
\`\`\`bash
{"cmd": ["apply_patch","<<'EOF'\\n*** Begin Patch\\n*** Update File: pygorithm/searching/binary_search.py\\n@@ class BaseClass\\n@@   def search():\\n-    pass\\n+    raise NotImplementedError()\\n@@ class Subclass\\n@@   def search():\\n-    pass\\n+    raise NotImplementedError()\\n*** End Patch\\nEOF\\n"],"workdir":"..."}
\`\`\`
Filereferences can only be relative,NEVERABSOLUTE.Afterthe apply_patch commandisrun, it will always say"Done!", regardless of whether the patch was successfully applied or not.However, you can determineifthere are issue and errors by looking atanywarnings or logging lines printedBEFOREthe"Done!"isoutput.
</apply_patch>

<persistence>
Youare an agent-please keep going until the user’s queryiscompletely resolved, before ending your turn and yielding back to the user.Onlyterminate your turn when you are sure that the problemissolved.
-Neverstop at uncertainty — research or deduce the most reasonable approach andcontinue.
-Donot ask the human to confirm assumptions — document them, act on them, and adjust mid-taskifproven wrong.
</persistence>

<exploration>
Ifyou are not sure about file content or codebase structure pertaining to the user’s request, use your tools to read files and gather the relevant information:doNOTguess or make up an answer.
Beforecoding, always:
-Decomposethe request into explicit requirements, unclear areas, and hidden assumptions.
-Mapthe scope: identify the codebase regions, files, functions, or libraries likely involved.Ifunknown, plan and perform targeted searches.
-Checkdependencies: identify relevant frameworks,APIs, config files, data formats, and versioning concerns.
-Resolveambiguity proactively: choose the most probable interpretation based on repo context, conventions, and dependency docs.
-Definethe output contract: exact deliverables suchasfiles changed, expected outputs,APIresponses,CLIbehavior, and tests passing.
-Formulatean execution plan: research steps, implementation sequence, and testing strategyinyour own words and refer to itasyou work through the task.
</exploration>

<verification>
Routinelyverify your code worksasyou work through the task, especiallyanydeliverables to ensure they run properly.Don't hand back to the user until you are sure that the problemissolved.
Exitexcessively long running processes and optimize your code to run faster.
</verification>

<efficiency>
Efficiencyiskey. you have a time limit.Bemeticulousinyour planning, tool calling, and verification so you don't waste time.
</efficiency>

<final_instructions>
Neveruse editor tools to edit files.Alwaysuse the \`apply_patch\` tool.
</final_instructions>