使用 Ollama 和 Weaviate 构建用于隐私保护的本地 RAG 系统

显示全部楼层

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: var(--articleFontsize);letter-spacing: 0.034em;"/>

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;outline: 0px;text-align: left;line-height: 1.75em;">引言

构建一个基于大语言模型（LLM）的应用原型有趣且简单。但是，一旦你想把它用于公司的生产环境，你会立刻遇到各种挑战，例如怎样减少幻觉或如何保护数据隐私。虽然检索增强生成（Retrieval-Augmented Generation, RAG）技术已经被证明能有效减少幻觉，但本地部署则是保护隐私的一个最佳选择。

本篇文章将介绍如何在没有外部依赖的本地环境中，仅使用以下本地组件，用 Python 实现一个基于RAG 的聊天机器人：

使用 Ollama 的本地 LLM 和嵌入模型
通过 Docker 使用 Weaviate 的本地向量数据库实例

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;outline: 0px;text-align: left;line-height: 1.75em;">如何用 Ollama 设置本地语言模型

如果你知道用 Ollama 设置不到5分钟就能完成一个 AI 原型，可能会更早的布局 AI 应用。

步骤1：下载并安装Ollama

从官网下载操作系统对应的 Ollama 版本，并按照安装步骤操作。

步骤2：下载模型

打开终端，下载你选择的 LLMs 和嵌入模型。在这个教程中，我们会用 Meta 的 llama2 作为 LLM，用all-minilm 作为嵌入模型。

ollamapullllama2ollamapullall-minilm

其他可用的嵌入模型还包括 mxbai-embed-large（334M参数）和 nomic-embed-text（137M参数）。

步骤3：安装 Ollama Python 库

因为我们要用 Python 实现 RAG 管道，所以你需要安装 Python 库。这个教程中我们使用的是0.1.8版本。

pipinstallollama

另外，Ollama 还提供了 REST API 和 JavaScript 库。

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;outline: 0px;text-align: left;line-height: 1.75em;">如何用 Docker 设置本地向量数据库实例

在本地 RAG 管道中，你会想把向量数据库托管在本地。下面我们会讲到如何用 Docker，在本地托管开源的 Weaviate 向量数据库。

步骤1：下载并安装Docker

安装 docker（Docker 17.09.0或更高版本）和 docker-compose（Docker Compose V2）CLI工具。

步骤2：启动包含 Weaviate 实例的 Docker 容器

现在，你可以在终端运行下面的命令，从默认的 Docker 镜像启动一个 Weaviate 实例。

dockerrun-p8080:8080-p50051:50051cr.weaviate.io/semitechnologies/weaviate:1.24.8

步骤3：安装 Weaviate Python 客户端

因为我们要用 Python 实现 RAG 管道，所以你需要安装 Python 库。这个教程中我们使用的是4.5.5版本。

pipinstall-Uweaviate-client

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;outline: 0px;text-align: left;line-height: 1.75em;">如何构建本地RAG管道

完成上面的操作，你就可以开始实现RAG 管道了。

以下是基于Ollama博客上的一篇文章(https://ollama.com/blog/embedding-models)，做的示例。

准备：将数据导入向量数据库

构建 RAG 管道的第一步是将你的数据导入向量数据库。为此，你需要生成数据并嵌入。

下面是 Ollama 博客文章中使用的一些示例文档。

documents=["Llamasaremembersofthecamelidfamilymeaningthey'reprettycloselyrelatedtovicuñasandcamels","Llamaswerefirstdomesticatedandusedaspackanimals4,000to5,000yearsagointhePeruvianhighlands","Llamascangrowasmuchas6feettallthoughtheaveragellamabetween5feet6inchesand5feet9inchestall","Llamasweighbetween280and450poundsandcancarry25to30percentoftheirbodyweight","Llamasarevegetariansandhaveveryefficientdigestivesystems","Llamaslivetobeabout20yearsold,thoughsomeonlylivefor15yearsandotherslivetobe30yearsold",]

接下来，你需要连接本地运行的向量数据库实例。

importweaviateclient=weaviate.connect_to_local()

启动时，这个向量数据库是空的。要用你的数据填充它，你需要首先定义存储你数据的结构（在这个例子中叫做docs的集合）。因为示例数据只是一个简单的字符串列表，你可以只定义一个名为 text 的属性和 DataType.TEXT 的数据类型。

import weaviate.classes as wvcfrom weaviate.classes.config import Property, DataType
# Create a new data collectioncollection = client.collections.create(name = "docs", # Name of the data collectionproperties=[Property(name="text", data_type=DataType.TEXT), # Name and data type of the property],)

现在，你可以把数据加载到预定义的结构中。为此，你需要遍历你的文档并使用 Ollama 的embeddings() 方法将每个数据对象嵌入。然后，文本和它的嵌入一起被存储在向量数据库中。

import ollama
# Store each document in a vector embedding databasewith collection.batch.dynamic() as batch:for i, d in enumerate(documents):# Generate embeddingsresponse = ollama.embeddings(model = "all-minilm", prompt = d)
# Add data object with text and embeddingbatch.add_object(properties = {"text" : d},vector = response["embedding"],)

步骤1：检索上下文

在进行推理时，你会想要为你的问题检索额外的上下文。为此，你需要对你的问题进行一个简单的相似性搜索（比如，“What animals are llamas related to?”）。

在进行相似性搜索时，你首先需要像在数据导入阶段一样，使用 embeddings() 方法为你的搜索查询（这里是问题）生成向量嵌入。然后，你可以将得到的嵌入传递给 Weaviate的near_vector() 方法，并指定只检索最接近的结果（limit = 1）。

# An example promptprompt = "What animals are llamas related to?"
# Generate an embedding for the prompt and retrieve the most relevant docresponse = ollama.embeddings(model = "all-minilm",prompt = prompt,)
results = collection.query.near_vector(near_vector = response["embedding"], limit = 1)
data = results.objects[0].properties['text']

Llamasaremembersofthecamelidfamilymeaningthey'reprettycloselyrelatedtovicuñasandcamels

步骤2：增强问题

接下来，你可以用原始问题和检索到的上下文来增强问题模板：

prompt_template=f"Usingthisdata:{data}.Respondtothisprompt:{prompt}"

步骤3：生成回答

最后，你可以使用 Ollama 的 generate() 方法，基于增强后的问题模板生成回答。

# Generate a response combining the prompt and data we retrieved in step 2output = ollama.generate(model = "llama2",prompt = prompt_template,)
print(output['response'])

Llamasaremembersofthecamelidfamily,whichmeanstheyarecloselyrelatedtootheranimalsinthesamefamily,including:1.Vicuñas:Vicuñasaresmall,wildrelativesofllamasandalpacas.TheyarefoundintheAndeanregionandareknownfortheirsoft,woollycoats.2.Camels:Camelsarelarge,even-toedungulatesthatarecloselyrelatedtollamasandvicuñas.Theyarefoundinhot,dryclimatesaroundtheworldandareknownfortheirabilitytogowithoutwaterforlongperiodsoftime.3.Guanacos:Guanacosarelarge,wildanimalsthatarerelatedtollamasandvicuñas.TheyarefoundintheAndeanregionandareknownfortheirdistinctivelongnecksandlegs.4.Llama-likecreatures:Therearealsootheranimalsthataresometimesreferredtoas"llamas,"suchasthelama-likecreaturesfoundinChina,whichareactuallyadifferentspeciesaltogether.Thesecreaturesarenotcloselyrelatedtovicuñasorcamels,butaresometimesreferredtoas"llamas"duetotheirphysicalsimilarities.Insummary,llamasarerelatedtovicuñas,camels,guanacos,andotheranimalsthataresometimesreferredtoas"llamas."

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;outline: 0px;text-align: left;line-height: 1.75em;">总结

这篇文章通过一个非常简单的 RAG 管道示例，指导你了解如何使用本地组件（通过 Ollama 的语言模型，以及通过 Docker 自托管的 Weaviate 向量数据库）构建一个用于隐私保护的本地 RAG 系统。

ingFang SC", Cambria, Cochin, Georgia, Times, "Times New Roman", serif;letter-spacing: normal;outline: 0px;text-align: left;line-height: 1.75em;">资源：

Ollama 下载地址：https://ollama.com/
Docker 下载地址：https://www.docker.com/
Weaviate 向量数据库：https://weaviate.io/blog/what-is-a-vector-database
Ollma Blog 嵌入模型：https://ollama.com/blog/embedding-models
Github 代码地址：https://github.com/qianniucity/llm_notebooks/blob/main/rag/Ollama_Weaviate_Local_rag.ipynb