链载Ai

标题: 最新|用Qwen3 Embedding Milvus,搭建最强企业知识库 [打印本页]

作者: 链载Ai    时间: 昨天 21:11
标题: 最新|用Qwen3 Embedding Milvus,搭建最强企业知识库
图片
前言

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">这几天阿里低调放出两款 Qwen3 家族的新模型:Qwen3-EmbeddingQwen3-Reranker(都分别包括0.6B轻量版、4B平衡版、8B高性能版三种尺寸)。两款模型基于 Qwen3 基座训练,天然具备强大的多语言理解能力,支持119种语言,覆盖主流自然语言和编程语言。

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">我简单看了下 Hugging Face 上的数据和评价,有几个点蛮值得分享

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">这意味着,这两款模型不只是“在开源模型里还不错”,而是“全面追平甚至反超主流商用API”,在RAG 检索、跨语种搜索、代码查找等系统,尤其是中文语境中,这两款模型已经具备可直接上生产的实力

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">那么如何用它来搭建一个RAG系统,本文将给出深度教程。

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">01

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">RAG搭建教程(Qwen3-Embedding-0.6B + Qwen3-Reranker-0.6B)

ingFang SC", system-ui, -apple-system, "system-ui", "Helvetica Neue", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: block;background-color: rgb(255, 255, 255);visibility: visible;box-sizing: border-box !important;overflow-wrap: break-word !important;">教程亮点:手把手教你利用Qwen3最新发布的embedding模型和reranker模型搭建一个RAG,两阶段检索设计(召回+重排)平衡了效率与精度!

环境准备

!pipinstall--upgradepymilvusopenairequeststqdmsentence-transformerstransformers

Requires transformers>=4.51.0

Requires sentence-transformers>=2.7.0

在本示例中,我们将使用 OpenAI 作为文本生成的大型语言模型,因此您需要将 API 密钥 OPENAI_API_KEY 作为环境变量准备给大型语言模型使用。

importosos.environ["OPENAI_API_KEY"]="sk-************"数据准备

我们可以使用Milvus文档2.4. x中的FAQ页面作为RAG中的私有知识,这是构建一个基础RAG的良好数据源。

下载zip文件并将文档解压缩到文件夹milvus_docs

!wgethttps://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip!unzip-qmilvus_docs_2.4.x_en.zip-dmilvus_docs我们从文件夹milvus_docs/en/faq中加载所有markdown文件,对于每个文档,我们只需用“#”来分隔文件中的内容,就可以大致分隔markdown文件各个主要部分的内容。
fromglobimportglobtext_lines=[]forfile_pathinglob("milvus_docs/en/faq/*.md",recursive=True):withopen(file_path,"r")asfile:file_text=file.read()text_lines+=file_text.split("#")准备LLM和Embedding模型

本示例中使用 Qwen3-Embedding-0.6B 来进行文本嵌入,使用Qwen3-Reranker-0.6B对检索的结果进行重排序。

fromopenaiimportOpenAIfromsentence_transformersimportSentenceTransformerimporttorchfromtransformersimportAutoModel,AutoTokenizer,AutoModelForCausalLM#InitializeOpenAIclientforLLMgenerationopenai_client=OpenAI()#LoadQwen3-Embedding-0.6Bmodelfortextembeddingsembedding_model=SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")#LoadQwen3-Reranker-0.6Bmodelforrerankingreranker_tokenizer=AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B",padding_side='left')reranker_model=AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B").eval()#Rerankerconfigurationtoken_false_id=reranker_tokenizer.convert_tokens_to_ids("no")token_true_id=reranker_tokenizer.convert_tokens_to_ids("yes")max_reranker_length=8192prefix="<|im_start|>system\nJudgewhethertheDocumentmeetstherequirementsbasedontheQueryandtheInstructprovided.Notethattheanswercanonlybe\"yes\"or\"no\".<|im_end|>\n<|im_start|>user\n"suffix="<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"prefix_tokens=reranker_tokenizer.encode(prefix,add_special_tokens=False)suffix_tokens=reranker_tokenizer.encode(suffix,add_special_tokens=False)输出结果示例

定义一个函数,利用 Qwen3-Embedding-0.6B 模型生成文本嵌入。该函数将用于生成文档嵌入和查询嵌入。

defemb_text(text,is_query=False):"""GeneratetextembeddingsusingQwen3-Embedding-0.6Bmodel.Args:text:Inputtexttoembedis_query:Whetherthisisaquery(True)ordocument(False)Returnsistofembeddingvalues"""ifis_query:#Forqueries,usethe"query"promptforbetterretrievalperformanceembeddings=embedding_model.encode([text],prompt_name="query")else:#Fordocuments,usedefaultencodingembeddings=embedding_model.encode([text])returnembeddings[0].tolist()定义重排序函数以提升检索质量。这些函数使用Qwen3-Reranker实现完整的重排序管道,根据文档与查询的相关性对候选文档进行评估和重新排序。其中各函数主要作用分别是:
  1. format_instruction(): 将查询、文档和任务指令格式化为重排序模型的标准输入格式

  2. process_inputs(): 对格式化后的文本进行分词编码,并添加特殊token用于模型判断

  3. compute_logits(): 使用重排序模型计算“查询-文档”对的相关性得分(0-1之间)

  4. rerank_documents(): 基于查询相关性对文档进行重新排序,返回按相关性得分降序排列的文档列表

defformat_instruction(instruction,query,doc):"""Formatinstructionforrerankerinput"""ifinstructionisNone:instruction='Givenawebsearchquery,retrieverelevantpassagesthatanswerthequery'output="<Instruct>:{instruction}\n<Query>:{query}\n<Document>:{doc}".format(instruction=instruction,query=query,doc=doc)returnoutputdefprocess_inputs(pairs):"""Processinputsforreranker"""inputs=reranker_tokenizer(pairs,padding=False,truncation='longest_first',return_attention_mask=False,max_length=max_reranker_length-len(prefix_tokens)-len(suffix_tokens))fori,eleinenumerate(inputs['input_ids']):inputs['input_ids'][i]=prefix_tokens+ele+suffix_tokensinputs=reranker_tokenizer.pad(inputs,padding=True,return_tensors="pt",max_length=max_reranker_length)forkeyininputs:inputs[key]=inputs[key].to(reranker_model.device)returninputs@torch.no_grad()defcompute_logits(inputs,**kwargs):"""Computerelevancescoresusingreranker"""batch_scores=reranker_model(**inputs).logits[:,-1,:]true_vector=batch_scores[:,token_true_id]false_vector=batch_scores[:,token_false_id]batch_scores=torch.stack([false_vector,true_vector],dim=1)batch_scores=torch.nn.functional.log_softmax(batch_scores,dim=1)scores=batch_scores[:,1].exp().tolist()returnscoresdefrerank_documents(query,documents,task_instruction=None):"""RerankdocumentsbasedonqueryrelevanceusingQwen3-RerankerArgs:query:Searchquerydocumentsistofdocumentstoreranktask_instruction:TaskinstructionforrerankingReturnsistof(document,score)tuplessortedbyrelevancescore"""iftask_instructionisNone:task_instruction='Givenawebsearchquery,retrieverelevantpassagesthatanswerthequery'#Formatinputsforrerankerpairs=[format_instruction(task_instruction,query,doc)fordocindocuments]#Processinputsandcomputescoresinputs=process_inputs(pairs)scores=compute_logits(inputs)#Combinedocumentswithscoresandsortbyscore(descending)doc_scores=list(zip(documents,scores))doc_scores.sort(key=lambdax:x[1],reverse=True)returndoc_scores生成一个测试向量,并打印其维度以及前几个元素。
test_embedding=emb_text("Thisisatest")embedding_dim=len(test_embedding)print(embedding_dim)print(test_embedding[:10])结果示例:
1024[-0.009923271834850311,-0.030248118564486504,-0.011494234204292297,-0.05980192497372627,-0.0026795873418450356,0.016578301787376404,-0.04073038697242737,0.03180320933461189,-0.024417787790298462,2.1764861230622046e-05]将数据加载到Milvus

创建集合

frompymilvusimportMilvusClientmilvus_client=MilvusClient(uri="./milvus_demo.db")collection_name="my_rag_collection"关于MilvusClient的参数设置:

检查集合是否已经存在,如果存在则将其删除。

ifmilvus_client.has_collection(collection_name):milvus_client.drop_collection(collection_name)创建一个具有指定参数的新集合。

如果未指定任何字段信息,Milvus将自动创建一个默认的ID字段作为主键,以及一个向量字段用于存储向量数据。一个预留的JSON字段用于存储未在schema中定义的字段及其值。

milvus_client.create_collection(collection_name=collection_name,dimension=embedding_dim,metric_type="IP",#Innerproductdistanceconsistency_level="Strong",#Strongconsistencylevel)插入集合

逐行遍历文本,创建嵌入向量,然后将数据插入Milvus。

下面是一个新的字段text,它是集合中的一个未定义的字段。 它将自动创建一个对应的text字段(实际上它底层是由保留的JSON动态字段实现的 ,你不用关心其底层实现。)

fromtqdmimporttqdmdata=[]fori,lineinenumerate(tqdm(text_lines,desc="Creatingembeddings")):data.append({"id":i,"vector":emb_text(line),"text":line})milvus_client.insert(collection_name=collection_name,data=data)输出结果示例:
Creatingembeddings:100%|██████████████████████████████████████████████████████████████████████████|72/72[00:08<00:00,8.68it/s]{'insert_count':72,'ids':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71],'cost':0}结合重排序技术增强RAG

检索数据

我们来指定一个关于Milvus的常见问题。

question="Howisdatastoredinmilvus?"在集合中搜索该问题,并获取具有最高语义匹配度的前10个候选答案,然后使用重排序器来选出最佳的3个匹配项。
#Step1:Initialretrievalwithlargercandidatesetsearch_res=milvus_client.search(collection_name=collection_name,data=[emb_text(question,is_query=True)],#Usethe`emb_text`functionwithqueryprompttoconvertthequestiontoanembeddingvectorlimit=10,#Returntop10candidatesforrerankingsearch_params={"metric_type":"IP","params":{}},#Innerproductdistanceoutput_fields=["text"],#Returnthetextfield)#Step2:Extractcandidatedocumentsforrerankingcandidate_docs=[res["entity"]["text"]forresinsearch_res[0]]#Step3:RerankdocumentsusingQwen3-Rerankerprint("Rerankingdocuments...")reranked_docs=rerank_documents(question,candidate_docs)#Step4:Selecttop3rerankeddocumentstop_reranked_docs=reranked_docs[:3]print(f"Selectedtop{len(top_reranked_docs)}documentsafterreranking")让我们来看看此次查询的重新排序结果吧!
importjson#Displayrerankedresultswithrerankerscoresreranked_lines_with_scores=[(doc,score)fordoc,scoreintop_reranked_docs]print("Rerankedresults:")print(json.dumps(reranked_lines_with_scores,indent=4))#Alsoshoworiginalembedding-basedresultsforcomparisonprint("\n"+"="*80)print("Originalembedding-basedresults(top3):")original_lines_with_distances=[(res["entity"]["text"],res["distance"])forresinsearch_res[0][:3]]print(json.dumps(original_lines_with_distances,indent=4))输出结果示例:

从结果中我们可以看到Qwen3-Reranker的重排序效果明显,相关性得分区分度较好


Rerankedresults(top3):[["WheredoesMilvusstoredata?\n\nMilvusdealswithtwotypesofdata,inserteddataandmetadata.\n\nInserteddata,includingvectordata,scalardata,andcollection-specificschema,arestoredinpersistentstorageasincrementallog.Milvussupportsmultipleobjectstoragebackends,including[MinIO](https://min.io/),[AWSS3](https://aws.amazon.com/s3/?nc1=h_ls),[GoogleCloudStorage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes)(GCS),[AzureBlobStorage](https://azure.microsoft.com/en-us/products/storage/blobs),[AlibabaCloudOSS](https://www.alibabacloud.com/product/object-storage-service),and[TencentCloudObjectStorage](https://www.tencentcloud.com/products/cos)(COS).\n\nMetadataaregeneratedwithinMilvus.EachMilvusmodulehasitsownmetadatathatarestoredinetcd.\n\n###",0.9997891783714294],["HowdoesMilvusflushdata?\n\nMilvusreturnssuccesswheninserteddataareloadedtothemessagequeue.However,thedataarenotyetflushedtothedisk.ThenMilvus'datanodewritesthedatainthemessagequeuetopersistentstorageasincrementallogs.If`flush()`iscalled,thedatanodeisforcedtowritealldatainthemessagequeuetopersistentstorageimmediately.\n\n###",0.9989748001098633],["Doesthequeryperforminmemory?Whatareincrementaldataandhistoricaldata?\n\nYes.Whenaqueryrequestcomes,Milvussearchesbothincrementaldataandhistoricaldatabyloadingthemintomemory.Incrementaldataareinthegrowingsegments,whicharebufferedinmemorybeforetheyreachthethresholdtobepersistedinstorageengine,whilehistoricaldataarefromthesealedsegmentsthatarestoredintheobjectstorage.Incrementaldataandhistoricaldatatogetherconstitutethewholedatasettosearch.\n\n###",0.9984032511711121]]================================================================================Originalembedding-basedresults(top3):[["WheredoesMilvusstoredata?\n\nMilvusdealswithtwotypesofdata,inserteddataandmetadata.\n\nInserteddata,includingvectordata,scalardata,andcollection-specificschema,arestoredinpersistentstorageasincrementallog.Milvussupportsmultipleobjectstoragebackends,including[MinIO](https://min.io/),[AWSS3](https://aws.amazon.com/s3/?nc1=h_ls),[GoogleCloudStorage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes)(GCS),[AzureBlobStorage](https://azure.microsoft.com/en-us/products/storage/blobs),[AlibabaCloudOSS](https://www.alibabacloud.com/product/object-storage-service),and[TencentCloudObjectStorage](https://www.tencentcloud.com/products/cos)(COS).\n\nMetadataaregeneratedwithinMilvus.EachMilvusmodulehasitsownmetadatathatarestoredinetcd.\n\n###",0.8306853175163269],["HowdoesMilvusflushdata?\n\nMilvusreturnssuccesswheninserteddataareloadedtothemessagequeue.However,thedataarenotyetflushedtothedisk.ThenMilvus'datanodewritesthedatainthemessagequeuetopersistentstorageasincrementallogs.If`flush()`iscalled,thedatanodeisforcedtowritealldatainthemessagequeuetopersistentstorageimmediately.\n\n###",0.7302717566490173],["HowdoesMilvushandlevectordatatypesandprecision?\n\nMilvussupportsBinary,Float32,Float16,andBFloat16vectortypes.\n\n-Binaryvectors:Storebinarydataassequencesof0sand1s,usedinimageprocessingandinformationretrieval.\n-Float32vectorsefaultstoragewithaprecisionofabout7decimaldigits.EvenFloat64valuesarestoredwithFloat32precision,leadingtopotentialprecisionlossuponretrieval.\n-Float16andBFloat16vectors:Offerreducedprecisionandmemoryusage.Float16issuitableforapplicationswithlimitedbandwidthandstorage,whileBFloat16balancesrangeandefficiency,commonlyusedindeeplearningtoreducecomputationalrequirementswithoutsignificantlyimpactingaccuracy.\n\n###",0.7003671526908875]]

使用大型语言模型(LLM)构建检索增强生成(RAG)响应

将检索到的文档转换为字符串格式。

context="\n".join([line_with_distance[0]forline_with_distanceinretrieved_lines_with_distances])为大语言模型提供系统提示(systemprompt)和用户提示(userprompt)。这个提示是通过从Milvus检索到的文档生成的。
SYSTEM_PROMPT="""Human:YouareanAIassistant.Youareabletofindanswerstothequestionsfromthecontextualpassagesnippetsprovided."""USER_PROMPT=f"""Usethefollowingpiecesofinformationenclosedin<context>tagstoprovideananswertothequestionenclosedin<question>tags.<context>{context}</context><question>{question}</question>"""使用OpenAI的大语言模型gpt-4o,根据提示生成响应。
response=openai_client.chat.completions.create(model="gpt-4o",messages=[{"role":"system","content":SYSTEM_PROMPT},{"role":"user","content":USER_PROMPT},],)print(response.choices[0].message.content)输出结果展示:
InMilvus,dataisstoredintwomainforms:inserteddataandmetadata.Inserteddata,whichincludesvectordata,scalardata,andcollection-specificschema,isstoredinpersistentstorageasincrementallogs.Milvussupportsmultipleobjectstoragebackendsforthispurpose,includingMinIO,AWSS3,GoogleCloudStorage,AzureBlobStorage,AlibabaCloudOSS,andTencentCloudObjectStorage.MetadataforMilvusisgeneratedbyitsvariousmodulesandstoredinetcd.02小结

通过以上教程和输出结果展示,不难发现,通义千问团队在Qwen3系列中推出的embedding和reranker模型表现相当不错。这两个模型的结合使用为RAG系统提供了一个相对完整且实用的解决方案。

在设计理念上Embedding模型支持query和document的差异化处理,体现了对检索任务的深入理解;Reranker采用交叉编码器架构,能够捕捉query-document间的精细交互;教程中的两阶段检索设计(召回+重排)更是平衡了效率与精度。特别是Qwen3-Embedding-0.6B(1024维)和Qwen3-Reranker-0.6B都采用了相对轻量的参数规模,支持本地部署,减少了对外部API的依赖,在保证性能的同时,降低了硬件要求,适合中小企业和个人开发者使用。

事实上,Qwen3系列推出embedding和reranker模型,其实不是个例,不是巧合,而是产业共识。

原因很简单,这两个模块,决定了大模型是否具备产品化能力。

生成式大模型最大的问题在于:不确定性高、评估难、成本重。

要解决以上问题,无论是RAG、LLM Memory、Agent ,本质上都依赖一个前提:能否将语义压缩成机器可高效检索和判断的向量表达

Embedding 与 Ranking 则是目前的最优路径:标准清晰、性能可测、成本可控、易于灰度。Embedding 决定你能不能“找得到”,Ranking 决定你能不能“选得准”。这使它们成为模型商品化最先跑通的 API 模块之一:调用频率高(每次检索都需要)、切换成本高(与索引绑定)、商业价值高(可用作底层 infra)。






欢迎光临 链载Ai (https://www.lianzai.com/) Powered by Discuz! X3.5