链载Ai

标题: 2025 年 RAG 最佳 Reranker 模型 [打印本页]

作者: 链载Ai    时间: 3 天前
标题: 2025 年 RAG 最佳 Reranker 模型

检索增强生成(RAG) 标志着自然语言处理向前迈出了重要一步。它允许大型语言模型(LLM) 在创建响应之前检查训练数据之外的信息,从而提高其性能。这意味着 LLM 可以很好地处理特定的公司知识或新信息,而无需进行昂贵的重新训练。RAG 的重排序器在优化检索到的信息方面发挥着至关重要的作用,确保提供最相关的上下文。RAG 将信息检索与文本生成相结合,从而生成准确、相关且听起来自然的答案。

为什么初始检索还不够

RAG 的第一步是查找与用户查询相关的文档。系统通常使用关键字搜索或向量相似度等方法。这些方法是不错的起点,但它们返回的文档可能并非都同样有用。所使用的嵌入模型可能无法掌握挑选最相关信息所需的细节。

向量搜索(用于查找相似含义)在处理简短查询或专业术语时可能会遇到困难。此外,LLM 对其上下文处理能力也存在限制。输入过多的文档,即使是稍微相关的文档,也会使模型混乱,降低最终答案的质量。这种初始的“噪声”检索会削弱 LLM 的专注力。我们需要一种方法来精炼这第一批信息。

rag系统架构

这张图描述了 RAG 的检索和生成步骤:用户提出一个问题,然后我们的系统通过搜索向量库,根据问题提取结果。检索到的内容连同问题一起传递给 LLM,LLM 提供结构化的输出。

进入Reranker:优化搜索

这时,重排序器(reranker)就变得至关重要了。重排序可以提高搜索结果的精准度。重排序器使用智能算法来分析最初检索到的文档,并根据它们与用户特定问题和意图的匹配程度进行重新排序。

在 RAG 中,重排序器充当质量过滤器。它们会检查第一组结果,并优先选择那些为查询提供最佳信息的文档。其目标是将最相关的部分提升到最顶部。重排序器可以理解为一位专家,它会仔细检查初始搜索,利用对语言的更深入理解,找到文档与问题之间的最佳匹配。

reranker

此图展示了一个两阶段的搜索过程。第二阶段是重新排序,在此阶段,基于语义或关键词匹配的初始搜索结果集将进行优化,以显著提高最终结果的相关性和排序,从而为用户的查询提供更准确、更实用的结果。

重新排序如何改善 RAG

重排序器提升了提供给 LLM 的上下文的准确性。它们会分析用户问题与每篇检索到的文档之间的含义和关系,而不仅仅是简单的关键词匹配。这种更深入的理解有助于识别最有用的信息。

通过将 LLM 的注意力集中在更小、更优质的文档集上,重排序器可以得出更精确的答案。LLM 获得高质量的上下文,从而能够形成更明智、更直接的响应。重排序器会计算一个分数,显示文档与查询在语义上的接近程度,从而实现更优化的最终排序。即使没有完全匹配的关键词,它们也能找到相关信息。

这种对高质量上下文的关注有助于减少 LLM 的“幻觉”——即模型生成错误但看似合理的信息。将 LLM 建立在经过重排序器验证的文档之上,可以使最终输出更加可信。

标准 RAG 流程包括检索和生成。增强型 RAG 流程在中间添加了重新排序步骤。

这种两阶段方法允许初始检索广撒网(召回率),而重排序器则专注于从中挑选出最佳项(精确度)。这种划分改进了整体流程,并为 LLM 提供了最佳输入。

重新排序可提高 RAG

使用查询来搜索向量数据库,检索出最相关的前 25 个文档。然后,这些文档被传递到“Reranker”模块。重排序器会优化结果,选择最相关的前 3 个文档作为最终输出。

2025 年最佳Reranker模型

让我们来看看 2025 年最热门的重新排名模型。

重新排序模型

有几种重新排序模型是 RAG 流程的热门选择:

重新排序器
模型类型
来源
优点
缺点
适用场景
Cohere[1]
交叉编码器(API)
闭源
高精度、多语言、易于使用、速度(灵活)
成本(API 费用)、闭源
通用 RAG、企业、多语言、易于使用
bge-reranker[2]
交叉编码器
开源
高精度、开源、可在中等硬件上运行
需要自托管
通用 RAG、开源偏好、注重预算
Voyage[3]
交叉编码器(API)
闭源
顶级相关性/准确性
成本(API 费用)、潜在更高的延迟(顶级模型)
最大准确度需求(金融、法律)、相关性关键型应用程序
Jina[4]
交叉编码器/ColBERT 变体
混合
性能均衡、性价比高、长文档(Jina-ColBERT)
可能无法达到峰值准确度
通用 RAG、长文档、平衡成本/性能
FlashRank[5]
轻量级交叉编码器
开源
速度非常快,资源占用低,易于集成
准确率低于大型模型
速度关键型应用程序、资源受限的环境
ColBERT[6]
多载体(晚期相互作用)
开源
规模高效(大型集合),快速检索
索引计算/存储密集型
非常大的文档集,大规模效率
MixedBread (mxbai-rerank-v2)[7]
交叉编码器
开源
SOTA Perf(声称)、快速推理、多语言、长上下文、多功能
需要自托管,相对较新
高性能 RAG、多语言、长文档/代码/JSON、开源首选项

Cohere Rerank

Cohere Rerank 使用一个复杂的神经网络(可能基于 Transformer 架构)充当交叉编码器。它会同时处理查询和文档,以精确判断相关性。它是一个专有模型,可通过 API 访问。

示例代码

首先安装Cohere库。

%pip install --upgrade --quiet cohere

设置 Cohere 和 ContextualCompressionRetriever。

fromlangchain.retrievers.contextual_compressionimportContextualCompressionRetriever
fromlangchain_cohereimportCohereRerank
fromlangchain_community.llmsimportCohere
fromlangchain.chainsimportRetrievalQA

llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
llm=Cohere(temperature=0), retriever=compression_retriever
)

输出:

{'query':'What did the president say about Ketanji Brown Jackson',

'result':" The president speaks highly of Ketanji Brown Jackson, stating that she
is one of the nation's top legal minds, and will continue the legacy of excellence
of Justice Breyer. The president also mentions that he worked with her family and
that she comes from a family of public school educators and police officers. Since
her nomination, she has received support from various groups, including the
Fraternal Order of Police and judges from both major political parties. \n\nWould
you like me to extract another sentence from the provided text? "}

bge-reranker(Base/Large)

这些模型来自北京人工智能研究院 (BAAI),并且是开源的(Apache 2.0 许可证)。它们基于 Transformer,类似交叉编码器,专为重排序任务而设计。它们提供不同大小的版本,例如 Base 版和 Large 版。

示例代码

fromlangchain.retrieversimportContextualCompressionRetriever
fromlangchain.retrievers.document_compressorsimportCrossEncoderReranker
fromlangchain_community.cross_encodersimportHuggingFaceCrossEncoder


model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.invoke("What is the plan for the economy?")
pretty_print_docs(compressed_docs)

输出:

Document 1:
More infrastructure and innovationinAmerica.
More goods moving faster and cheaperinAmerica.
Morejobswhereyou can earn a good livinginAmerica.
And instead of relying on foreign supply chains,let’s make itinAmerica.
Economists call it “increasing the productive capacity of our economy.”
I call it building a better America.
My plan to fight inflation will lower your costs and lower the deficit.

----------------------------------------------------------------------------------------------------

Document 2:

Second – cut energy costsforfamilies an average of$500a year by combatting
climate change.

Let’s provide investments and tax credits to weatherize your homes and businesses to
be energy efficient and you get a tax credit; double America’s clean energy
productioninsolar, wind, and so much more; lower the price of electric vehicles,
saving you another$80a month because you’ll never have to pay at the gas pump
again.

----------------------------------------------------------------------------------------------------

Document 3:

Look at cars.
Last year, there weren’t enough semiconductors to make all the cars that people
wanted to buy.
And guess what, prices of automobiles went up.
So—we have a choice.
One way to fight inflation is to drive down wages and make Americans poorer.
I have a better plan to fight inflation.
Lower your costs, not your wages.
Make more cars and semiconductorsinAmerica.
More infrastructure and innovationinAmerica.
More goods moving faster and cheaperinAmerica.

Voyage Rerank

Voyage AI 提供专有的神经网络模型(voyage-rerank-2、voyage-rerank-2-lite),可通过 API 访问。这些模型很可能是经过精细调整的高级交叉编码器,旨在实现最高的相关性评分。

示例代码

首先安装voyage库

%pip install --upgrade --quiet voyageai
%pip install --upgrade --quiet langchain-voyageai

设置 Cohere 和 ContextualCompressionRetriever

fromlangchain_community.document_loadersimportTextLoader
fromlangchain_community.vectorstoresimportFAISS
fromlangchain.retrieversimportContextualCompressionRetriever
fromlangchain_openaiimportOpenAI
fromlangchain_voyageaiimportVoyageAIRerank
fromlangchain_text_splittersimportRecursiveCharacterTextSplitter
fromlangchain_voyageaiimportVoyageAIEmbeddings
documents = TextLoader("../../how_to/state_of_the_union.txt").load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(
texts, VoyageAIEmbeddings(model="voyage-law-2")
).as_retriever(search_kwargs={"k":20})

llm = OpenAI(temperature=0)
compressor = VoyageAIRerank(
model="rerank-lite-1", voyageai_api_key=os.environ["VOYAGE_API_KEY"], top_k=3
)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)

输出:

Document 1:

One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who willcontinueJustice
Breyer’s legacy of excellence.

----------------------------------------------------------------------------------------------------

Document 2:

Solet’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrantsforits officers.

----------------------------------------------------------------------------------------------------

Document 3:

I spoke with their families and told them that we are foreverindebtfortheir
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.

I’ve worked on these issues a long time.

I know what works: Investingincrime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.

Solet’s not abandon our streets. Or choose between safety and equal justice.

Jina Reranker

这提供了重排序解决方案,包括 Jina Reranker v2 和 Jina-ColBERT 等神经模型。Jina Reranker v2 很可能是一个跨编码器风格的模型。Jina-ColBERT 使用 Jina 的基础模型实现了 ColBERT 架构(下文将详细介绍)。

示例代码

fromlangchain_community.document_loadersimportTextLoader
fromlangchain_community.embeddingsimportJinaEmbeddings
fromlangchain_community.vectorstoresimportFAISS
fromlangchain_text_splittersimportRecursiveCharacterTextSplitter

documents = TextLoader(
"../../how_to/state_of_the_union.txt",
).load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)


embedding = JinaEmbeddings(model_name="jina-embeddings-v2-base-en")
retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={"k":20})


query ="What did the president say about Ketanji Brown Jackson"
docs = retriever.get_relevant_documents(query)

使用 JIna 进行重新排名

fromlangchain.retrieversimportContextualCompressionRetriever
fromlangchain_community.document_compressorsimportJinaRerank


compressor = JinaRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)

输出:

Document 1:

Solet’s not abandon our streets. Or choose between safety and equal justice.
Let’s come together to protect our communities, restore trust, and hold law
enforcement accountable.
That’s why the Justice Department required body cameras, banned chokeholds, and
restricted no-knock warrantsforits officers.

----------------------------------------------------------------------------------------------------

Document 2:

I spoke with their families and told them that we are foreverindebtfortheir
sacrifice, and we will carry on their mission to restore the trust and safety every
community deserves.
I’ve worked on these issues a long time.
I know what works: Investingincrime prevention and community police officers
who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and
safety.
Solet’s not abandon our streets. Or choose between safety and equal justice.

ColBERT

ColBERT(基于 BERT 的后训练的)是一个多向量模型。它不是用一个向量来表示文档,而是创建多个语境化向量(通常每个标记一个)。它使用一种“后期交互”机制,将查询向量与编码后的多个文档向量进行比较。这使得文档向量可以预先计算并索引。

示例代码

安装 Ragtouille 库以使用 ColBERT 重新排序器。

pip install -U ragatouille

现在设置 ColBERT 重新排序器

fromragatouilleimportRAGPretrainedModel
fromlangchain.retrieversimportContextualCompressionRetriever
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
compression_retriever = ContextualCompressionRetriever(
base_compressor=RAG.as_langchain_document_compressor(), base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
"What animation studio did Miyazaki found"
)
print(compressed_docs[0])

输出:

Document(page_content='In June 1985, Miyazaki, Takahata, Tokuma and Suzuki founded
the animation production company Studio Ghibli, with funding from Tokuma Shoten.
Studio Ghibli\'s first film, Laputa: Castleinthe Sky (1986), employed the same
production crew of Nausicaä. Miyazaki\'s designs for the film\'s setting were
inspired by Greek architecture and"European urbanistic templates". Some of the
architectureinthe film was also inspired by a Welsh mining town; Miyazaki
witnessed the mining strike upon his first', metadata={'relevance_score':
26.5194149017334})

FlashRank

FlashRank 被设计为一个非常轻量且快速的重排序库,通常利用较小且经过优化的 Transformer 模型(通常是较大模型的精简或修剪版本)。它旨在以最小的计算开销,在简单的相似性搜索基础上显著提升相关性。它的功能类似于交叉编码器,但使用了一些技术来加速处理过程。它通常以开源 Python 库的形式提供。

示例代码

fromlangchain.retrieversimportContextualCompressionRetriever
fromlangchain.retrievers.document_compressorsimportFlashrankRerank
fromlangchain_openaiimportChatOpenAI
llm = ChatOpenAI(temperature=0)
compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.invoke(
"What did the president say about Ketanji Jackson Brown"
)
print([doc.metadata["id"]fordocincompressed_docs])
pretty_print_docs(compressed_docs)

此代码片段利用 ContextualCompressionRetriever 中的 FlashrankRerank 函数来提升检索到的文档的相关性。它根据查询“总统对 Ketanji Jackson Brown 有何评价”的相关性,对基础检索器(用 检索器 表示)获取的文档进行重新排序。最后,它会打印文档 ID 以及压缩后、重新排序后的文档。

输出:

[0, 5, 3]

Document 1:

One of the most serious constitutional responsibilities a President has is
nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji
Brown Jackson. One of our nation’s top legal minds, who willcontinueJustice
Breyer’s legacy of excellence.
----------------------------------------------------------------------------------------------------

Document 2:

He met the Ukrainian people.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage,
their determination, inspires the world.
Groups of citizens blocking tanks with their bodies. Everyone from students to
retirees teachers turned soldiers defending their homeland.
In this struggle as President Zelenskyy saidinhis speech to the European
Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United
States is here tonight.
----------------------------------------------------------------------------------------------------

Document 3:

And tonight, I’m announcing that the Justice Department will name a chief prosecutor
forpandemic fraud.
By the end of this year, the deficit will be down to less than half what it was
before I took office.
The only president ever to cut the deficit by more than one trillion dollarsina
single year.
Lowering your costs also means demanding more competition.
I’m a capitalist, but capitalism without competition isn’t capitalism
It’s exploitation—and it drives up prices.
The output shoes it reranks the retrieved chunks based on the relevancy.

MixedBread

该系列由 Mixedbread AI 提供,包括 mxbai-rerank-base-v2(0.5 亿参数)和 mxbai-rerank-large-v2(1.5 亿参数)。它们是基于 Qwen-2.5 架构的开源(Apache 2.0 许可证)交叉编码器。其关键区别在于训练过程,在初始训练的基础上融入了三阶段强化学习 (RL) 方法(GRPO、对比学习、偏好学习)。

示例代码

!pip install mxbai_rerank
frommxbai_rerankimportMxbaiRerankV2

# Load the model, here we use our base sized model
model = MxbaiRerankV2("mixedbread-ai/mxbai-rerank-base-v2")

# Example query and documents
query ="Who wrote To Kill a Mockingbird?"

documents = ["To Kill a Mockingbird is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",

"The novel Moby-Dick was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",

"Harper Lee, an American novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",

"Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",

"The Harry Potter series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",

"The Great Gatsby, a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]
# Calculate the scores
results = model.rank(query, documents)
print(results)

输出:

[RankResult(index=0, score=9.847987174987793, document='To Kill a Mockingbird is a
novel by Harper Lee published in 1960. It was immediately successful, winning the
Pulitzer Prize, and has become a classic of modern American literature.'),

RankResult(index=2, score=8.258672714233398, document='Harper Lee, an American
novelist widely known for her novel To Kill a Mockingbird, was born in 1926 in
Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.'),

RankResult(index=3, score=3.579845428466797, document='Jane Austen was an English
novelist known primarily for her six major novels, which interpret, critique and
comment upon the British landed gentry at the end of the 18th century.'),

RankResult(index=4, score=2.716982841491699, document='The Harry Potter series,
which consists of seven fantasy novels written by British author J.K. Rowling, is
among the most popular and critically acclaimed books of the modern era.'),

RankResult(index=1, score=2.233165740966797, document='The novel Moby-Dick was
written by Herman Melville and first published in 1851. It is considered a
masterpiece of American literature and deals with complex themes of obsession,
revenge, and the conflict between good and evil.'),

RankResult(index=5, score=1.8150043487548828, document='The Great Gatsby, a novel
written by American author F. Scott Fitzgerald, was published in 1925. The story is
set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit
of Daisy Buchanan.')]

如何判断 Reranker 是否正常工作

评估重排序工具非常重要。常用指标有助于衡量其有效性:

根据需求选择合适的 Reranker

选择最佳的重新排序器需要平衡几个因素:

存在一些权衡:

最佳的重新排序器适合你的特定性能、效率和成本要求。

写在最后

RAG 的重排序器对于充分利用 RAG 系统至关重要。它们可以优化输入到 LLM 的信息,从而获得更优、更可靠的结果。市面上有各种各样的模型可供选择,从高精度交叉编码器到高效的双编码器,再到像 ColBERT 这样的专用模型,开发者可以自由选择。选择合适的模型需要理解准确率、速度、可扩展性和成本之间的权衡。随着 RAG 的发展,尤其是在处理多样化数据类型方面,RAG 的重排序器将继续在构建更智能、更可靠的 AI 应用中发挥关键作用。谨慎的评估和选择仍然是成功的关键。

常见问题

Q1. 什么是检索增强生成(RAG)?
答:RAG 是一种改进大型语言模型 (LLM) 的技术,它允许模型在生成响应之前检索外部信息。这使得模型更加准确、适应性更强,并且无需重新训练即可吸收新知识。

Q2.为什么在 RAG 系统中初始检索不够?
答:初始检索方法(例如关键词搜索或向量相似度)可以返回许多文档,但并非所有文档都高度相关。这可能会导致输入噪声,从而降低 LLM 的性能。为了提高答案质量,有必要对这些结果进行优化。

Q3. Reranker在 RAG 中起什么作用?
答:重排序器会根据检索到的文档与查询的相关性对其进行重新排序。它们充当质量过滤器,确保最相关的信息在传递给 LLM 生成答案之前得到优先处理。

Q4. 为什么 Cohere Rerank 是一个不错的选择?
答:Cohere Rerank 提供高精度、多语言支持和基于 API 的集成。其“Nimble”版本针对更快的响应进行了优化,使其成为实时应用的理想选择。

Q5. 为什么 bge-reranker 受到开源用户的欢迎?
A. bge-reranker 是开源的,可以自行托管,在保持高精度的同时降低成本。它适合希望完全掌控模型的团队。






欢迎光临 链载Ai (https://www.lianzai.com/) Powered by Discuz! X3.5