这些组件的交互如下:
让我们深入了解每个组件及其如何协同工作,这一切都可以在仅仅 3 个脚本中定义:*`chainlit_app.py`、`rag_implementation.py` 和 `graph_embedding.py*。
我们添加的知识可以通过以下关系图表示:
Chainlit 是一个开源的 Python 库,用于轻松部署具有用户友好界面的聊天机器人。要在本地启动,您需要一个文件,例如“chainlit_app.py”,该文件可以从命令行以“chainlit "chainlit_app.py"”的形式启动。您还可以将其部署到镜像上,以便在 AWS EC2 实例上运行:*
因此,在提议的应用程序中,chainlet 应用程序包含主要启动。理想情况下,图形 RAG 中添加的知识与实际提示解耦,特别是因为我们将这些知识存储在 Chroma 数据库中。在这个例子中,我们简化了这一点,单个脚本首先增强知识和提示。更具体地说,这里我在initialize_knowledge_base()中硬编码了一些知识,但这些知识可以从文档中自动读取(这是可以解耦的部分),然后有一个异步函数等待用户输入。
importchainlitascl
fromrag_implementationimportMistralRAGSystem
## Initialize RAG system
rag_system = MistralRAGSystem()
## Pre-populate knowledge graph with some initial data
definitialize_knowledge_base():
knowledge_items = [
{
"id":"ai_basics",
"content":"Artificial Intelligence is a broad field of computer science focused on creating intelligent machines that can simulate human-like thinking and learning capabilities.",
"metadata": {"category":"introduction","difficulty":"beginner"}
},
{
"id":"ml_fundamentals",
"content":"Machine Learning is a subset of AI that enables systems to learn and improve from experience without being explicitly programmed, using algorithms that can learn from and make predictions or decisions based on data.",
"metadata": {"category":"core_concept","difficulty":"intermediate"}
}
]
foriteminknowledge_items:
rag_system.add_knowledge(item["id"], item["content"], item["metadata"])
rag_system
## Initialize knowledge base
initialize_knowledge_base()
@cl.on_chat_start
asyncdefstart():
awaitcl.Message(content="RAG System with Mistral is ready! How can I help you today?").send()
@cl.on_message
asyncdefmain(message: cl.Message):
# Check if the message is a knowledge addition command
ifmessage.content.startswith("/add_knowledge"):
# Parse the message to extract node_id and content
parts = message.content.split(maxsplit=3)
iflen(parts) <3:
awaitcl.Message(content="Usage: /add_knowledge <node_id> <content>").send()
return
node_id, content = parts[1], parts[2]
rag_system.add_knowledge(node_id, content)
awaitcl.Message(content=f"Added knowledge node:{node_id}").send()
return
# Regular query processing
# Augment the query with relevant context
augmented_query = rag_system.augment_query(message.content)
# Generate response
response = rag_system.generate_response(augmented_query)
# Send the response back to the user
awaitcl.Message(content=response).send()MistralRAGSystem类作为协调者,将知识图谱与 Mistral LLM 结合。在这个具体实现中,我使用的是在Huggingface 仓库[6]上可访问的模型。因此,我们需要从 Huggingface 获取 API 密钥并将其保存到 .env 文件中。
此外,这个类实现了一些 RAG 功能,这些功能在后面的rag_implementation.py 脚本中的 KnowledgeGraphRAG 类中进行了描述:
importos
fromdotenvimportload_dotenv
importrequests
fromgraph_embeddingimportKnowledgeGraphRAG
classMistralRAGSystem:
def__init__(self):
# Load environment variables
load_dotenv()
# Get Hugging Face API key from environment variable
self.api_key = os.getenv('MISTRAL_API_KEY')
ifnotself.api_key:
raiseValueError("HUGGINGFACE_API_KEY must be set in .env file")
# Default model (corrected name)
self.model ="mistralai/Mistral-7B-v0.1"
# Initialize Knowledge Graph
self.knowledge_graph = KnowledgeGraphRAG()该类的其余部分由主 Chainlit 脚本调用。该脚本几乎添加了知识,查询模型,并返回响应,同时可能还清理了一些输出,以避免响应重复提示:
defaugment_query(self, query:str) ->str:
"""
Augment the query with relevant context from the knowledge graph
Args:
query (str): Original user query
Returns:
str: Augmented query with additional context
"""
# Retrieve similar nodes
similar_nodes =self.knowledge_graph.retrieve_similar_nodes(query)
# If similar_nodes is a list, iterate over it directly
context ="\n".join([str(doc)fordocinsimilar_nodes])
# Create a structured prompt with context
augmented_prompt =f"""
#Context Information:
#{context}
Based on the provided context and your extensive knowledge,
please answer the following query comprehensively:
Query:{query}
Response:
"""
returnaugmented_prompt
defgenerate_response(self, augmented_query:str) ->str:
"""
Generate response using Hugging Face API for Mistral model
Args:
augmented_query (str): Augmented query with context
Returns:
str: Generated response
"""
try:
# Prepare headers with the Hugging Face API key
headers = {
'Authorization':f'Bearer{self.api_key}',
'Content-Type':'application/json'
}
# Prepare payload
payload = {
'inputs': augmented_query
}
# Hugging Face Inference API endpoint for Mistral model
url =f'https://api-inference.huggingface.co/models/{self.model}'
# Make the POST request to generate a response
response = requests.post(url, json=payload, headers=headers)
# Check if the request was successful
ifresponse.status_code ==200:
#return response.json()[0]['generated_text']
generated_text = response.json()[0]['generated_text']
print("Raw response:", response.json())
start_index = generated_text.find("Response:") +len("Response:")
response_without_context = generated_text[start_index:].strip()
returnresponse_without_context
else:
returnf"Error:{response.status_code}-{response.text}"
exceptExceptionase:
returnf"An error occurred:{str(e)}"
defadd_knowledge(self, node_id:str, content:str, metadata:dict=None):
"""
Add knowledge to the graph
Args:
node_id (str): Unique node identifier
content (str): Node content
metadata (dict, optional): Additional metadata
"""
self.knowledge_graph.add_node(node_id, content, metadata)我们系统的核心是 `KnowledgeGraphRAG` 类,位于 *`graph_embedding.py*` 脚本中,它管理图结构和嵌入。正如我们所说,图关系是通过 Networkx 库管理的,而嵌入则永久保存在 Chroma 数据库中。
Chroma 在底层使用 SqLite,尽管之前的版本是基于 DuckDB。请注意,如果您多次运行此代码,可能会发送一些警告或错误,因为您已经创建了数据库或集合。正如我在开头所说的,理想情况下,我们应该将添加知识和提示系统解耦。
该脚本创建一个数据库,并允许与添加节点和关系相关的调用,这些内容保存在数据库中。
importnetworkxasnx
importmatplotlib.pyplotasplt
fromsentence_transformersimportSentenceTransformer
importchromadb
fromchromadb.configimportDEFAULT_TENANT, DEFAULT_DATABASE, Settings
fromtypingimportList,Dict,Any
classKnowledgeGraphRAG:
def__init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"):
# Initialize embedding model
self.embedding_model = SentenceTransformer(model_name)
# Initialize graph
self.graph = nx.DiGraph()
self.chroma_client = chromadb.PersistentClient(
path="test",
settings=Settings(),
tenant=DEFAULT_TENANT,
database=DEFAULT_DATABASE,
)
self.collection =self.chroma_client.create_collection(name="knowledge_base3")
defadd_node(self, node_id:str, content:str, metadata
ict[str,Any] =None):
"""
Add a node to the knowledge graph and embed its content
Args:
node_id (str): Unique identifier for the node
content (str): Text content of the node
metadata (dict, optional): Additional metadata for the node
"""
# Add to networkx graph
self.graph.add_node(node_id, content=content, metadata=metadataor{})
# Generate embedding
embedding =self.embedding_model.encode(content).tolist()
# Ensure metadata is a non-empty dictionary
metadata = metadataor{}
# Add to ChromaDB
self.collection.add(
ids=[node_id],
embeddings=[embedding],
documents=[content],
metadatas=[metadata] # Ensure that the metadata is a valid dictionary
)
defadd_edge(self, source:str, target:str, relationship:str=None):
"""
Add a directed edge between two nodes
Args:
source (str): Source node ID
target (str): Target node ID
relationship (str, optional): Type of relationship
"""
self.graph.add_edge(source, target, relationship=relationship)
defretrieve_similar_nodes(self, query:str, top_k:int=3):
"""
Retrieve most similar nodes to a given query.
Args:
query (str): Search query
top_k (int): Number of top similar nodes to retrieve.
Returns:
List of most similar nodes.
"""
# Generate query embedding
query_embedding =self.embedding_model.encode(query).tolist()
# Get the total number of nodes in the collection
total_nodes =self.collection.count()
# Adjust top_k if it exceeds the number of available nodes
top_k =min(top_k, total_nodes)
# Retrieve from ChromaDB
results =self.collection.query(
query_embeddings=[query_embedding],
n_results=top_k
)
# Return the documents (already adjusted for n_results)
returnresults.get('documents', [])
## Example usage
defcreate_sample_knowledge_graph():
kg = KnowledgeGraphRAG()
#persist_directory="./my_knowledge_base_data2"
# Add some sample nodes about AI
kg.add_node("ai_intro","人工智能是计算机科学的一个分支")
kg.add_node("ml_intro","机器学习是 AI 的一个子集,专注于从数据中学习")
kg.add_node("dl_intro","深度学习使用具有多个层的神经网络")
# Add some relationships
kg.add_edge("ai_intro","ml_intro","包含")
kg.add_edge("ml_intro","dl_intro","高级技术")
returnkg
## For testing
if__name__ =="__main__":
kg = create_sample_knowledge_graph()
kg.visualaze_graph()
# Example retrieval
results = kg.retrieve_similar_nodes("神经网络")
print(results)此外,该类使用SentenceTransformers来生成嵌入,并使用 ChromaDB 进行持久存储。这种组合使我们能够维护信息片段之间的语义关系(通过嵌入)和显式关系(通过图结构)。
该实现展示了如何将现代 RAG 技术与持久存储和知识图谱相结合。该系统为构建更复杂的基于知识的应用程序提供了坚实的基础。ChromaDB 用于持久性,Chainlit 用于界面,使其既实用又用户友好。选择 ChromaDB 而非其他永久向量数据库,取决于需求和可用资源。无论如何,我希望我向你展示了,只需 3 个脚本,你就可以拥有一个使用友好前端的端到端应用程序,能够保存新知识,并顺畅运行预先存在的 LLM,而无需微调。
| 欢迎光临 链载Ai (https://www.lianzai.com/) | Powered by Discuz! X3.5 |