Ollama发布更新，支持带工具调用的流式响应

显示全部楼层

实时交互和即时响应是AI应用体验的关键，但阻塞式的工具调用往往会打断内容的流畅性，导致用户在模型与外部工具交互时经历不必要的等待。Ollama 近日推出v0.8更新，带来了带ingFang SC", system-ui, -apple-system, BlinkMacSystemFont, "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;line-height: 26px;color: black;font-size: 15px;text-align: justify;">工具调用ingFang SC", system-ui, -apple-system, BlinkMacSystemFont, "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;line-height: 26px;color: black;font-size: 15px;text-align: justify;">的流式响应 (Streaming responses with tool calling)功能，让开发者构建的聊天应用从此能够像流式输出普通文本一样，实时地调用工具并展示结果。

Ollama now has upgraded tool support

这一更新使得所有聊天应用都能够在模型生成内容的同时，实时地调用外部工具，并将整个过程（包括模型的思考、工具的调用指令、以及最终的文本回复）流畅地展示给用户。该功能已在 Ollama 的 Python 和 JavaScript 库以及 cURL API 中得到全面支持。

本次更新的核心亮点包括：

即时工具调用与内容流式输出：应用不再需要等待模型完整响应后才能处理工具调用，模型生成内容和工具调用指令可以同步、分块地流式传输。
全新智能增量解析器：Ollama 构建了新的解析器，它专注于理解工具调用的结构，而不仅仅是寻找JSON。这使得Ollama能够：

实时分离：在流式输出用户内容的同时，准确检测、抑制和解析工具调用相关的Token。
兼容广泛模型：无论模型是否经过工具特定Token的训练，都能有效工作，甚至能处理模型输出的部分前缀或在必要时回退到JSON解析。
提升准确性：通过前缀匹配和状态管理，显著改善了工具调用的可靠性，避免了以往可能出现的重复或错误解析问题。
广泛的模型支持：包括 Qwen 3, Devstral, Qwen2.5 系列, Llama 3.1, Llama 4 等众多支持工具调用的模型。
开发者友好的集成：提供了清晰的 cURL, Python, JavaScript 示例，方便快速上手。
模型上下文协议 (MCP) 增强：使用 MCP 的开发者现在也可以享受流式聊天内容和工具调用的好处，并且官方建议使用更大的上下文窗口（如 32k）可以进一步提升工具调用的性能和结果质量。

在技术实现层面，开发者可以通过以下方式启用该功能：

REST API (cURL):在/api/chat请求中设置"stream": true并通过tools数组定义可用的工具。
Python:使用ollama.chat()时，设置stream=True并将工具定义（可以是函数对象）传递给tools参数。
JavaScript:使用ollama.chat()时，设置stream: true并将工具schema对象传递给tools参数。

下面是 Python 的示例 (调用自定义的数学函数):

# Define the python function
defadd_two_numbers(a: int, b: int)-> int:
"""
 Add two numbers

 Args:
  a (set): The first number as an int
  b (set): The second number as an int

 Returns:
  int: The sum of the two numbers
 """
returna + b

fromollamaimportchat
messages = [{'role':'user','content':'what is three minus one?'}]

response: ChatResponse = chat(
 model='qwen3',
 messages=messages,
 tools=[add_two_numbers],# Python SDK supports passing tools as functions
 stream=True
)

forchunkinresponse:
# Print model content
 print(chunk.message.content, end='', flush=True)
# Print the tool call
ifchunk.message.tool_calls:
  print(chunk.message.tool_calls)

预期输出 (示例，取决于模型行为和用户问题是否匹配工具):

<think>
Okay, the user is asking ...
</think>

[ToolCall(function=Function(name='subtract_two_numbers', arguments={'a': 3,'b': 1}))]

cURL 示例 (查询天气):

curl http://localhost:11434/api/chat -d'{
 "model": "qwen3",
 "messages": [
  {
   "role": "user",
   "content": "What is the weather today in Toronto?"
  }
 ],
 "stream": true,
 "tools": [
  {
   "type": "function",
   "function": {
    "name": "get_current_weather",
    "description": "Get the current weather for a location",
    "parameters": {
     "type": "object",
     "properties": {
      "location": {
       "type": "string",
       "description": "The location to get the weather for, e.g. San Francisco, CA"
      },
      "format": {
       "type": "string",
       "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
       "enum": ["celsius", "fahrenheit"]
      }
     },
     "required": ["location", "format"]
    }
   }
  }
 ]
}'

流式输出:

...
{
"model":"qwen3",
"created_at":"2025-05-27T22:54:57.641643Z",
"message": {
 "role":"assistant",
 "content":"celsius"
 },
"done":false
}
{
"model":"qwen3",
"created_at":"2025-05-27T22:54:57.673559Z",
"message": {
 "role":"assistant",
 "content":"</think>"
 },
"done":false
}
{
"model":"qwen3",
"created_at":"2025-05-27T22:54:58.100509Z",
"message": {
 "role":"assistant",
 "content":"",
 "tool_calls": [
   {
   "function": {
    "name":"get_current_weather",
    "arguments": {
     "format":"celsius",
     "location":"Toronto"
     }
    }
   }
  ]
 },
"done":false
}
...

官方同时指出，为了获得最佳工具调用效果，对于需要高精度工具调用或复杂交互的场景，如下所示，可以尝试通过options中的num_ctx增加模型的上下文窗口（例如设置为32000），但这会增加内存使用。

curl -X POST"http://localhost:11434/api/chat"-d'{
 "model": "llama3.2",
 "messages": [
  {
   "role": "user",
   "content": "why is the sky blue?"
  }
 ],
 "options": {
  "num_ctx": 32000 # Update context window here
 }
}'