本地大模型的最佳实践系列——ollama（附docker-compose命令）

显示全部楼层

引言

本地运用大模型的方式，也有很多工具。多个使用下来，最便捷的方式还是 ollama。其生态很丰富，除了命令行也有web端，与主流的工具框架都有配合。并且在 langchain 、 llama-index 中也都有提供相应模块。

启动和安装

官方github主页给了不同系统安装的方式，包括docker。我建议使用docker进行安装，用docker-compose 启动。我的docker-compose.yml文件如下：包含前端程序

https://github.com/ollama/ollama
https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image

version:'3.8'

services:
ollama:
volumes:
-./ollama-data:/root/.ollama#将本地文件夹挂载到容器中的/root/.ollama目录（模型下载位置）
container_namellama
pull_policy:always
tty:true
restart:unless-stopped
imagellama/ollama:latest
ports:
-11434:11434#OllamaAPI端口

open-webui:
build:
context:.
args:
OLLAMA_BASE_URL:'/ollama'
dockerfileockerfile
image:ghcr.io/open-webui/open-webui:main
container_namepen-webui
volumes:
-./open-webui-data:/app/backend/data#前端页面挂载位置
depends_on:
-ollama
ports:
-${OPEN_WEBUI_PORT-3005}:8080
environment:
-'OLLAMA_BASE_URL=http://ollama:11434'
-'WEBUI_SECRET_KEY='
extra_hosts:
-host.docker.internal:host-gateway
restart:unless-stopped

docker compose up -d 启动之后，可以下载模型和聊天。

下载模型

大语言模型运用下面的命令下载模型，模型的列表可以在前面给的官网页面搜索，搜索完之后。通过下面的命令下载：比如下载 qwen:4b 模型

dockerexec-itollamaollamapullqwen:4b

下载后的模型存储在 ./ollama-data 下。

embedding 模型另外，如果需要下载 embedding模型也可以同样在官网进行搜索，复制名称后下载。比如 bge模型

dockerexec-itollamaollamapullznbang/bge:large-zh-v1.5-f16

本地聊天

浏览器 localhost:3005端口访问页面。注册完之后，就可以选取本地的模型聊天了。比如选择 qwen:4b 模型。

接口调用

其实之所以在本地搭建这么一套环境主要目的还是为了在本地进行学习和实验。为此，接口调用就是重中之重。ollama 可以非常便捷的启动支持openai接口的服务。对接下游应用就非常方便了。

上面的docker compose 文件中已经指定了ollama服务端口 11434，因此相应的服务接口为：

聊天接口通过 stream 参数，控制是否流式返回。

curlhttp://localhost:11434/api/generate-d'{
"model":"qwen:4b",
"prompt":"Whyistheskyblue?",
"stream":false
}'

embedding 接口

curlhttp://localhost:11434/api/embeddings-d'{
"model":"milkey/m3e:large-f16",
"prompt":"Representthissentenceforsearchingrelevantpassages:TheskyisbluebecauseofRayleighscattering"
}'

那么在代码中，可以借助openai 的接口样式。分别替换 base_url、model_name即可，非常方便。

fromlangchain.chat_modelsimportChatOpenAI
fromlangchain_community.embeddingsimportOllamaEmbeddings

base_url='http://localhost:11434/v1/'
model_name="qwen:4b"

#语言模型
llm=ChatOpenAI(openai_api_key="",base_url=base_url,model=model_name)
#embedding
model_embed="milkey/m3e:large-f16"
embedding=OpenAIEmbeddings(model=model_embed,base_url=base_url)

此外，ollama 还支持多模态模型，大家也可以根据自己的需要尝试。

总结

感谢开源社区，提供了这么便捷的工具，极大的方便了本地实验的效率。同时丰富的社区，也为进一步拓展学习提供了材料。

如果你还没有尝试，强烈推荐！

本地大模型的最佳实践系列——ollama（附docker-compose命令）

引言

相关信息

启动和安装

下载模型

本地聊天

接口调用

总结