一、RAG
#依赖安装:pipinstalllangchainlangchain-text-splittersfromlangchain_text_splittersimportRecursiveCharacterTextSplitter#示例长文本(替换为实际文本)text="""自然语言处理(NLP)是人工智能领域的重要分支,涉及文本分析、机器翻译和情感分析等任务。分块技术可将长文本拆分为逻辑连贯的语义单元,便于后续处理。"""#初始化递归分块器(块大小300字符,重叠50字符保持上下文)text_splitter=RecursiveCharacterTextSplitter(chunk_size=300,chunk_overlap=50,separators=["\n\n","\n","。","!","?"]#优先按段落/句子分界[2,4](@ref))#执行分块chunks=text_splitter.split_text(text)#打印分块结果fori,chunkinenumerate(chunks):print(f"Chunk{i+1}:\n{chunk}\n{'-'*50}")#依赖安装:pipinstallsentence-transformersfaiss-cpufromsentence_transformersimportSentenceTransformerfromlangchain_community.vectorstoresimportFAISS#1.文本向量化(使用MiniLM-L6预训练模型)model=SentenceTransformer('paraphrase-MiniLM-L6-v2')embeddings=model.encode(chunks)#2.向量存储到FAISS索引库vector_db=FAISS.from_texts(texts=chunks,embedding=embeddings,metadatas=[{"source":"web_data"}]*len(chunks)#可添加元数据)#保存索引到本地vector_db.save_local("my_vector_db")#示例查询:检索相似文本query="什么是自然语言处理?"query_embedding=model.encode([query])scores,indices=vector_db.similarity_search_with_score(query_embedding,k=3)print(f"Top3相似块:{indices}")二、知识库和知识图谱
RAG构建知识图谱的关键在于检索与生成的协同,其流程包括:
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |