Spring AI Milvus 实现 RAG 智能问答实战

显示全部楼层

引言

“公司的文档太多，查找信息太慢！”、“客服回答总是千篇一律，不能精准解答用户问题！” —— 这些痛点背后，是传统关键词搜索和规则引擎的局限。如今，语义搜索（Semantic Search）和检索增强生成（Retrieval-Augmented Generation, RAG）正成为解决这些问题的利器。它们能让应用“理解”用户问题的真正意图，并从海量资料中精准找出相关信息，甚至生成自然流畅的答案。

作为 Java 开发者，如何快速构建这样的智能应用？好消息是：Spring AI为 Java 生态带来了便捷的 AI 集成能力！本文将手把手带你使用Spring Boot + Spring AI + Milvus（向量数据库），构建一个基于RAG 架构的智能问答系统。我们将实现：将本地知识库文档转化为向量存储，根据用户问题语义检索最相关的文档片段，并驱动大语言模型（如 OpenAI GPT）生成精准、有据可依的回答。

一、核心概念扫盲

向量（Embedding）：将文本、图片等信息通过 AI 模型转换为高维空间中的一组数字（向量）。语义相近的信息，其向量在高维空间中的距离也相近。
向量数据库（Vector Database）：专门为高效存储、索引和查询高维向量数据而设计的数据库。它能快速找到与查询向量最相似的向量。

Milvus：开源、高性能、可扩展的向量数据库，非常适合生产环境。
Pinecone：流行的云托管向量数据库服务（本文示例使用 Milvus）。
语义搜索（Semantic Search）：不同于传统的关键词匹配，它理解查询的语义，返回在含义上最相关的结果。
检索增强生成（RAG - Retrieval-Augmented Generation）：

检索（Retrieve）：当用户提问时，先将问题转换为向量。
增强（Augment）：用问题向量在向量数据库中搜索最相关的知识片段（Context）。
生成（Generate）：将原始问题 + 检索到的相关片段一起交给大语言模型（LLM），让 LLM 基于这些上下文生成更准确、更相关的答案。这有效缓解了 LLM 的“幻觉”问题（编造事实）。

二、技术栈与环境准备

Java 17+
Spring Boot 3.2+
Spring AI (0.8.1+)：Spring 官方 AI 项目，提供统一接口访问 Embedding 模型、Chat 模型、Vector Store 等。
Milvus：向量数据库。我们将使用 Docker 快速启动一个 Standalone 实例。
OpenAI API (或兼容替代如 Ollama)：用于文本 Embedding 生成和 Chat 回答生成。需要一个有效的 API Key。
依赖管理：Maven 或 Gradle

三、实战步骤

步骤 1：启动 Milvus 向量数据库

docker pull milvusdb/milvus:latest
docker run -d --name milvus-standalone \
  -p 19530:19530 \
  -p 9091:9091 \
  milvusdb/milvus:latest

访问http://localhost:9091可查看 Milvus 管理界面。

步骤 2：创建 Spring Boot 项目 & 添加依赖

使用start.spring.io创建项目，添加依赖：

 org.springframework.boot:spring-boot-starter-web
 org.springframework.ai:spring-ai-openai-spring-boot-starter # 使用 OpenAI
 // 或者本地模型 (例如使用 Ollama)
 // org.springframework.ai:spring-ai-ollama-spring-boot-starter
 org.springframework.ai:spring-ai-milvus-store-spring-boot-starter # Milvus 集成

步骤 3：配置application.properties

# Spring AI - OpenAI 配置 (替换 your-api-key)
spring.ai.openai.api-key=YOUR_OPENAI_API_KEY
# 使用 Embedding 模型
spring.ai.openai.embedding.model=text-embedding-ada-002
# 使用 Chat 模型
spring.ai.openai.chat.model=gpt-3.5-turbo

# Milvus 向量存储配置
spring.ai.vectorstore.milvus.uri=http://localhost:19530
spring.ai.vectorstore.milvus.collection-name=my_knowledge_base # 自定义集合名
spring.ai.vectorstore.milvus.embedding-dimension=1536 # text-embedding-ada-002 输出维度
spring.ai.vectorstore.milvus.drop-collection-on-startup=false # 启动时是否删除重建集合 (首次可设 true)

步骤 4：构建知识库（文档向量化入库）

创建VectorStoreInitializer服务，在应用启动时将本地文档（如 TXT, PDF）加载到 Milvus：

importorg.springframework.ai.document.Document;
importorg.springframework.ai.reader.TextReader;
importorg.springframework.ai.vectorstore.VectorStore;
importorg.springframework.beans.factory.annotation.Autowired;
importorg.springframework.beans.factory.annotation.Value;
importorg.springframework.core.io.Resource;
importorg.springframework.stereotype.Service;
importjakarta.annotation.PostConstruct;
importjava.util.List;

@Service
publicclassVectorStoreInitializer{

 privatefinalVectorStore vectorStore;

 @Value("classpath:/docs/*.txt")// 假设知识库文档放在 resources/docs 下
 privateResource[] documentResources;

 @Autowired
 publicVectorStoreInitializer(VectorStore vectorStore){
   this.vectorStore = vectorStore;
  }

 @PostConstruct
 publicvoidinit(){
   // 遍历文档资源
   for(Resource resource : documentResources) {
     try{
       // 使用 Spring AI 的 TextReader 读取文本文件
        TextReader textReader =newTextReader(resource);
        List<Document> documents = textReader.get();
       // 将文档内容分割成更小的块 (可选，Spring AI 未来会提供 Splitter)
       // ... 这里简化处理，直接将整个文件作为一个 Document
       // 将文档块添加到向量库
        vectorStore.add(documents);
        System.out.println("Loaded documents from: "+ resource.getFilename());
      }catch(Exception e) {
        System.err.println("Error loading document: "+ resource.getFilename() +", "+ e.getMessage());
      }
    }
    System.out.println("Knowledge base initialization complete!");
  }
}

步骤 5：实现 RAG 智能问答服务

创建RagService：

importorg.springframework.ai.chat.ChatClient;
importorg.springframework.ai.chat.messages.UserMessage;
importorg.springframework.ai.chat.prompt.Prompt;
importorg.springframework.ai.chat.prompt.SystemPromptTemplate;
importorg.springframework.ai.document.Document;
importorg.springframework.ai.vectorstore.VectorStore;
importorg.springframework.beans.factory.annotation.Autowired;
importorg.springframework.beans.factory.annotation.Value;
importorg.springframework.core.io.Resource;
importorg.springframework.stereotype.Service;
importjava.util.List;
importjava.util.Map;
importjava.util.stream.Collectors;

@Service
publicclassRagService{

 privatefinalChatClient chatClient;
 privatefinalVectorStore vectorStore;

 // 系统提示词模板 (定义 AI 的角色和回答规则)
 @Value("classpath:/prompts/system-qa.st")
 privateResource systemPromptResource;

 @Autowired
 publicRagService(ChatClient chatClient, VectorStore vectorStore){
   this.chatClient = chatClient;
   this.vectorStore = vectorStore;
  }

 publicStringanswerQuestion(String userQuestion){
   // 1. 检索 (Retrieve)：根据用户问题语义查找最相关的文档片段
    List<Document> relevantDocuments = vectorStore.similaritySearch(userQuestion);

   // 2. 构建上下文 (Context)：将相关文档内容拼接起来
    String context = relevantDocuments.stream()
        .map(Document::getContent)
        .collect(Collectors.joining("\n\n"));

   // 3. 构建系统提示词 (System Prompt)：将上下文注入预设模板
    SystemPromptTemplate systemPromptTemplate =newSystemPromptTemplate(systemPromptResource);
    String systemMessage = systemPromptTemplate.createMessage(Map.of("context", context));

   // 4. 构建完整 Prompt：系统提示词 + 用户问题
    Prompt prompt =newPrompt(List.of(
       neworg.springframework.ai.chat.messages.SystemMessage(systemMessage),
       newUserMessage(userQuestion)
    ));

   // 5. 调用 Chat 模型生成答案 (Generate)
   returnchatClient.call(prompt).getResult().getOutput().getContent();
  }
}

系统提示词模板 (resources/prompts/system-qa.st):

你是一个专业的智能问答助手。请严格根据以下提供的上下文信息来回答用户的问题。
如果上下文信息不足以回答用户的问题，请直接告知用户“根据我掌握的知识，暂时无法回答这个问题”，不要编造答案。

上下文信息如下：
{{context}}

步骤 6：创建 REST 控制器

创建RagController提供问答接口：

importorg.springframework.beans.factory.annotation.Autowired;
importorg.springframework.web.bind.annotation.PostMapping;
importorg.springframework.web.bind.annotation.RequestBody;
importorg.springframework.web.bind.annotation.RestController;

@RestController
publicclassRagController{

 privatefinalRagService ragService;

 @Autowired
 publicRagController(RagService ragService){
   this.ragService = ragService;
  }

 @PostMapping("/ask")
 publicStringaskQuestion(@RequestBody String question){
   returnragService.answerQuestion(question);
  }
}

四、运行与测试

启动 Milvus (docker run ...)。
将你的知识库文档（如product_manual.txt,company_policy.txt）放入src/main/resources/docs/。
启动 Spring Boot 应用。应用启动时，VectorStoreInitializer会将文档内容通过 OpenAI Embedding API 转换为向量，并存储到 Milvus 的my_knowledge_base集合中。

使用curl或 Postman 测试智能问答接口：

POST http://localhost:8080/ask
Content-Type: text/plain

公司今年的年假政策是怎样的？

观察返回结果！Spring AI 会：

将问题“公司今年的年假政策是怎样的？”转换为向量。
在 Milvus 中搜索语义最接近的文档片段（比如company_policy.txt中关于年假的部分）。
将这些片段作为上下文，连同问题一起发送给 OpenAI 的 Chat 模型。
OpenAI 模型结合上下文，生成精准回答：“根据公司2024年最新政策，员工入职满一年后享有15天带薪年假...”。

五、关键点解析

Spring AI 的威力：

统一抽象：通过简单的VectorStore和ChatClient接口屏蔽底层细节（Milvus/Pinecone/Redis, OpenAI/Azure/Ollama 等），代码简洁且可移植。
便捷集成：与 Spring Boot 配置管理、依赖注入无缝结合。

RAG 架构优势：

答案有据可依：生成的答案基于检索到的真实文档片段，大幅减少 LLM “幻觉”。
知识更新便捷：只需更新向量数据库中的文档，无需重新训练昂贵的大模型。
保护私有数据：私有知识库不直接暴露给外部 LLM API。

性能考量：

Embedding 模型选择：text-embedding-ada-002性价比高，也可考虑更新更强的模型。
文档分块（Chunking）：实际应用中，大文档需分割成更小的、语义完整的块（如 500-1000 字符），Spring AI 未来版本将提供TextSplitter。手动实现时需注意块边界语义。
元数据过滤：Milvus 支持基于元数据（如文档来源、日期）过滤检索结果，提升精准度。

成本优化：

本地 Embedding 模型：使用 Ollama 或 Transformers.js 等运行本地 Embedding 模型（如all-MiniLM-L6-v2），避免 OpenAI API 调用费用。
开源/本地 LLM：使用 Ollama（运行 Llama 2, Mistral 等）、GPT4All 或集成本地部署的大模型 API。

扩展场景：

聊天记录上下文：将多轮对话历史纳入检索范围。
多模态：Milvus 支持图片、音频向量，结合多模态 Embedding 模型实现跨模态搜索。
混合搜索：结合传统关键词搜索（BM25）和向量搜索（语义搜索）获得更全面的结果。

六、总结

通过Spring Boot + Spring AI + Milvus的组合，可以轻松构建强大的、基于RAG 架构的智能语义搜索和问答应用。Spring AI 极大地简化了 AI 能力集成，让 Java 后端也能快速拥抱大语言模型和向量数据库技术。