Milvus作为一款向量数据库,长期以来专注于基于Embedding的向量检索能力,为RAG等应用提供了高准确率,高性能,高扩展的语义检索功能。随着大模型时代带来各种新型应用探索,社区重新认识到结合传统基于文本匹配的精确检索与混合检索所带来的增益,尤其在一些深度依赖关键字词匹配的场景中,这种需求变得尤为关键。为了满足这一需求,Milvus 2.5引入了全文检索(FTS,Full-Text Search)功能,并将其与2.4版本以来支持的稀疏向量检索能力和混合检索能力结合,从而发挥出强大的协同效应。
混合检索是一项融合了多路搜索结果的搜索方法,用户可以对数据中不同的字段进行多种方式的检索,然后通过混合检索进行融合排序得到一个综合的结果,在当前流行的RAG场景中,典型的混合检索方式是通过结合语义搜索与词汇检索来实现的,具体来说,这种做法会在embedding召回上与基于词汇匹配的bm25检索算法通过RRF的方式融合成一个更优的结果排序。
在本文中我们将使用Anthropic提供的一个RAG数据集来进行展示,这个数据集是一个文本搜索代码的数据集,由9个代码库的片段构成,类似于现在流行的AI辅助编程场景。由于代码数据包含大量定义、关键字等信息,基于文本的检索在这一场景中能够带来更大的增益。同时,经过大量代码数据训练的密集嵌入模型能够理解一些高层次的语义信息。我们希望通过实验观察,二者结合会产生怎样的效果。
为了对混合检索建立起更加具体的认识,我们采样一些具体的案例来进行分析。我们使用一个经过大量代码数据训练过的先进密集嵌入模型(voyage-2)作为基线,分别挑选了混合检索比密集和稀疏结果好的个例结果(top5),看一下能反应出背后的哪些特性。
除了基于个例的微观质量分析外,我们还通过整体评估得出了定量结果,统计了数据集中的Pass@5指标。该指标用于衡量每个查询的Top 5结果中,成功检索到的相关结果占所有相关结果的比例。从这个结果我们可以看出基于先进的embedding模型本身可以达到一个良好的基线效果,但是通过与全文检索方法依然可以带来提升,而通过对于bm25结果进行观察,针对具体场景进行参数调整,可以带来更大的提升。
01.
案例一:混合检索优于语义检索的案例
问题: How is the log file created?
这个问题是希望问一下log file的创建过程,正确答案是一段创建log file的Rust 代码。在语义检索结果中,可以看到了有引入log的头文件,以及c++拿到logger的相关代码,但这个问题关键其实是“logfile”这个变量, 我们在混合检索结果的#hybrid 0发现了这个结果,由于混合检索是融合语义检索和全文检索的结果,自然这个结果就是全文检索出来的。除了这个结果,我们可以在#hybrid 2中发现了很多看起来毫无关系的测试mock代码,尤其是这一句“long string to test how those are handled.”, 反复出现,这就需要理解全文检索算法BM25背后的原理了,全文检索是希望匹配到更多的低频词(因为高频词过于普遍了从而降低了用来甄别检索对象的独特性)。假如在大量自然文本中进行统计,很容易统计出“how”是一个非常常见的词,因此在相关性分数中占很低的比例。然而本文中是一个代码数据,并不会在代码中有很多包含“how”的文本,从而让含有这个词的句子被大量检索出来。
GroundTruth
use{ crate::args: ogArgs, anyhow::{anyhow,Result}, simplelog::{Config,LevelFilter,WriteLogger}, std::fs::File, }; pubstructLogger; implLogger{ pubfninit(args:&implLogArgs)->Result<()>{ letfilter evelFilter=args.log_level().into(); iffilter!=LevelFilter::Off{ letlogfile=File::create(args.log_file()) .map_err(|e|anyhow!("Failedtoopenlogfile:{e:}"))?; WriteLogger::init(filter,Config::default(),logfile) .map_err(|e|anyhow!("Failedtoinitalizelogger:{e:}"))?; } Ok(()) } }语义检索结果:
##dense00.7745316028594971 /* *LicensedtotheApacheSoftwareFoundation(ASF)underoneormore *contributorlicenseagreements.SeetheNOTICEfiledistributedwith *thisworkforadditionalinformationregardingcopyrightownership. *TheASFlicensesthisfiletoYouundertheApacheLicense,Version2.0 *(the"License");youmaynotusethisfileexceptincompliancewith *theLicense.YoumayobtainacopyoftheLicenseat * *http://www.apache.org/licenses/LICENSE-2.0 * *Unlessrequiredbyapplicablelaworagreedtoinwriting,software *distributedundertheLicenseisdistributedonan"ASIS"BASIS, *WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied. *SeetheLicenseforthespecificlanguagegoverningpermissionsand *limitationsundertheLicense. */ #include"logunit.h" #include<log4cxx/logger.h> #include<log4cxx/simplelayout.h> #include<log4cxx/fileappender.h> #include<log4cxx/helpers/absolutetimedateformat.h> ##dense10.769859254360199 voidsimple() { LayoutPtrlayout=LayoutPtr(newSimpleLayout()); AppenderPtrappender=FileAppenderPtr(newFileAppender(layout,LOG4CXX_STR("output/simple"),false)); root->addAppender(appender); common(); LOGUNIT_ASSERT(Compare::compare(LOG4CXX_FILE("output/simple"),LOG4CXX_FILE("witness/simple"))); } std::stringcreateMessage(inti,Pool&pool) { std::stringmsg("Message"); msg.append(pool.itoa(i)); returnmsg; } voidcommon() { inti=0; //Inthelinesbelow,theloggernamesarechosenasanaidin //rememberingtheirlevelvalues.Ingeneral,theloggernames //havenobearingtolevelvalues. LoggerPtrERRlogger=Logger::getLogger(LOG4CXX_TEST_STR("ERR")); ERRlogger->setLevel(Level::getError()); ##dense20.7591114044189453 log4cxx::spi: oggingEventPtrlogEvt=std::make_shared<log4cxx::spi: oggingEvent>(LOG4CXX_STR("foo"), Level::getInfo(), LOG4CXX_STR("AMessage"), log4cxx::spi: ocationInfo::getLocationUnavailable()); FMTLayoutlayout(LOG4CXX_STR("{d:%Y-%m-%d%H:%M:%S}{message}")); LogStringoutput; log4cxx::helpers: oolpool; layout.format(output,logEvt,pool); ##dense30.7562235593795776 #include"util/compare.h" #include"util/transformer.h" #include"util/absolutedateandtimefilter.h" #include"util/iso8601filter.h" #include"util/absolutetimefilter.h" #include"util/relativetimefilter.h" #include"util/controlfilter.h" #include"util/threadfilter.h" #include"util/linenumberfilter.h" #include"util/filenamefilter.h" #include"vectorappender.h" #include<log4cxx/fmtlayout.h> #include<log4cxx/propertyconfigurator.h> #include<log4cxx/helpers/date.h> #include<log4cxx/spi/loggingevent.h> #include<iostream> #include<iomanip> #defineREGEX_STR(x)x #definePAT0REGEX_STR("\\[[0-9A-FXx]*]\\(DEBUG|INFO|WARN|ERROR|FATAL).*-Message[0-9]\\{1,2\\}") #definePAT1ISO8601_PATREGEX_STR("")PAT0 #definePAT2ABSOLUTE_DATE_AND_TIME_PATREGEX_STR("")PAT0 #definePAT3ABSOLUTE_TIME_PATREGEX_STR("")PAT0 #definePAT4RELATIVE_TIME_PATREGEX_STR("")PAT0 #definePAT5REGEX_STR("\\[[0-9A-FXx]*]\\(DEBUG|INFO|WARN|ERROR|FATAL).*:Message[0-9]\\{1,2\\}") ##dense40.7557586431503296 std::stringmsg("Message"); Poolpool; //Theseshouldalllog.---------------------------- LOG4CXX_FATAL(ERRlogger,createMessage(i,pool)); i++;//0 LOG4CXX_ERROR(ERRlogger,createMessage(i,pool)); i++; LOG4CXX_FATAL(INF,createMessage(i,pool)); i++;//2 LOG4CXX_ERROR(INF,createMessage(i,pool)); i++; LOG4CXX_WARN(INF,createMessage(i,pool)); i++; LOG4CXX_INFO(INF,createMessage(i,pool)); i++; LOG4CXX_FATAL(INF_UNDEF,createMessage(i,pool)); i++;//6 LOG4CXX_ERROR(INF_UNDEF,createMessage(i,pool)); i++; LOG4CXX_WARN(INF_UNDEF,createMessage(i,pool)); i++; LOG4CXX_INFO(INF_UNDEF,createMessage(i,pool)); i++; LOG4CXX_FATAL(INF_ERR,createMessage(i,pool)); i++;//10 LOG4CXX_ERROR(INF_ERR,createMessage(i,pool)); i++; LOG4CXX_FATAL(INF_ERR_UNDEF,createMessage(i,pool)); i++; LOG4CXX_ERROR(INF_ERR_UNDEF,createMessage(i,pool)); i++;混合检索结果:
##hybrid00.016393441706895828 use{ crate::args: ogArgs, anyhow::{anyhow,Result}, simplelog::{Config,LevelFilter,WriteLogger}, std::fs::File, }; pubstructLogger; implLogger{ pubfninit(args:&implLogArgs)->Result<()>{ letfilter evelFilter=args.log_level().into(); iffilter!=LevelFilter::Off{ letlogfile=File::create(args.log_file()) .map_err(|e|anyhow!("Failedtoopenlogfile:{e:}"))?; WriteLogger::init(filter,Config::default(),logfile) .map_err(|e|anyhow!("Failedtoinitalizelogger:{e:}"))?; } Ok(()) } } ##hybrid10.016393441706895828 /* *LicensedtotheApacheSoftwareFoundation(ASF)underoneormore *contributorlicenseagreements.SeetheNOTICEfiledistributedwith *thisworkforadditionalinformationregardingcopyrightownership. *TheASFlicensesthisfiletoYouundertheApacheLicense,Version2.0 *(the"License");youmaynotusethisfileexceptincompliancewith *theLicense.YoumayobtainacopyoftheLicenseat * *http://www.apache.org/licenses/LICENSE-2.0 * *Unlessrequiredbyapplicablelaworagreedtoinwriting,software *distributedundertheLicenseisdistributedonan"ASIS"BASIS, *WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied. *SeetheLicenseforthespecificlanguagegoverningpermissionsand *limitationsundertheLicense. */ #include"logunit.h" #include<log4cxx/logger.h> #include<log4cxx/simplelayout.h> #include<log4cxx/fileappender.h> #include<log4cxx/helpers/absolutetimedateformat.h> ##hybrid20.016129031777381897 "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." }; } ##hybrid30.016129031777381897 voidsimple() { LayoutPtrlayout=LayoutPtr(newSimpleLayout()); AppenderPtrappender=FileAppenderPtr(newFileAppender(layout,LOG4CXX_STR("output/simple"),false)); root->addAppender(appender); common(); LOGUNIT_ASSERT(Compare::compare(LOG4CXX_FILE("output/simple"),LOG4CXX_FILE("witness/simple"))); } std::stringcreateMessage(inti,Pool&pool) { std::stringmsg("Message"); msg.append(pool.itoa(i)); returnmsg; } voidcommon() { inti=0; //Inthelinesbelow,theloggernamesarechosenasanaidin //rememberingtheirlevelvalues.Ingeneral,theloggernames //havenobearingtolevelvalues. LoggerPtrERRlogger=Logger::getLogger(LOG4CXX_TEST_STR("ERR")); ERRlogger->setLevel(Level::getError()); ##hybrid40.01587301678955555 std::vector<std::string>MakeStrings(){ return{ "a","ab","abc","abcd", "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext."02.
案例二:混合检索优于全文检索
问题:How do you initialize the logger
这个问题和上个问题很相似,并且答案也是一个,但这个问题却是混合检索找到了(即通过语义检索得到),却不在全文检索的结果中,原因是由于语料中的各个词的统计结果的反应出的权重与我们的心理模型认知不一致的,模型没有理解到“how”这个词的匹配中并不重要,甚至可能由于“logger”比“how”在代码中更多,让“how”这个词更加重要。
GroundTruth
use{ crate::args: ogArgs, anyhow::{anyhow,Result}, simplelog::{Config,LevelFilter,WriteLogger}, std::fs::File, }; pubstructLogger; implLogger{ pubfninit(args:&implLogArgs)->Result<()>{ letfilter evelFilter=args.log_level().into(); iffilter!=LevelFilter::Off{ letlogfile=File::create(args.log_file()) .map_err(|e|anyhow!("Failedtoopenlogfile:{e:}"))?; WriteLogger::init(filter,Config::default(),logfile) .map_err(|e|anyhow!("Failedtoinitalizelogger:{e:}"))?; } Ok(()) } }全文检索结果:
##sparse010.17311954498291 "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." }; } ##sparse19.775702476501465 std::vector<std::string>MakeStrings(){ return{ "a","ab","abc","abcd", "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." ##sparse27.638711452484131 //union("x|y"),grouping("(xy)"),brackets("[xy]"),and //repetitioncount("x{5,7}"),amongothers. // //Belowisthesyntaxthatwedosupport.Wechoseittobea //subsetofbothPCREandPOSIXextendedregex,soit'seasyto //learnwhereveryoucomefrom.Inthefollowing:'A'denotesa //literalcharacter,period(.),orasingle\\escapesequence; //'x'and'y'denoteregularexpressions;'m'and'n'arefor ##sparse37.1208391189575195 /* *LicensedtotheApacheSoftwareFoundation(ASF)underoneormore *contributorlicenseagreements.SeetheNOTICEfiledistributedwith *thisworkforadditionalinformationregardingcopyrightownership. *TheASFlicensesthisfiletoYouundertheApacheLicense,Version2.0 *(the"License");youmaynotusethisfileexceptincompliancewith *theLicense.YoumayobtainacopyoftheLicenseat * *http://www.apache.org/licenses/LICENSE-2.0 * *Unlessrequiredbyapplicablelaworagreedtoinwriting,software *distributedundertheLicenseisdistributedonan"ASIS"BASIS, *WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied. *SeetheLicenseforthespecificlanguagegoverningpermissionsand *limitationsundertheLicense. */ #include"logunit.h" #include<log4cxx/logger.h> #include<log4cxx/simplelayout.h> #include<log4cxx/fileappender.h> #include<log4cxx/helpers/absolutetimedateformat.h> ##sparse47.066349029541016 /* *LicensedtotheApacheSoftwareFoundation(ASF)underoneormore *contributorlicenseagreements.SeetheNOTICEfiledistributedwith *thisworkforadditionalinformationregardingcopyrightownership. *TheASFlicensesthisfiletoYouundertheApacheLicense,Version2.0 *(the"License");youmaynotusethisfileexceptincompliancewith *theLicense.YoumayobtainacopyoftheLicenseat * *http://www.apache.org/licenses/LICENSE-2.0 * *Unlessrequiredbyapplicablelaworagreedtoinwriting,software *distributedundertheLicenseisdistributedonan"ASIS"BASIS, *WITHOUTWARRANTIESORCONDITIONSOFANYKIND,eitherexpressorimplied. *SeetheLicenseforthespecificlanguagegoverningpermissionsand *limitationsundertheLicense. */ #include<log4cxx/filter/denyallfilter.h> #include<log4cxx/logger.h> #include<log4cxx/spi/filter.h> #include<log4cxx/spi/loggingevent.h> #include"../logunit.h"混合检索结果:
##hybrid00.016393441706895828 use{ crate::args: ogArgs, anyhow::{anyhow,Result}, simplelog::{Config,LevelFilter,WriteLogger}, std::fs::File, }; pubstructLogger; implLogger{ pubfninit(args:&implLogArgs)->Result<()>{ letfilter:LevelFilter=args.log_level().into(); iffilter!=LevelFilter::Off{ letlogfile=File::create(args.log_file()) .map_err(|e|anyhow!("Failedtoopenlogfile:{e:}"))?; WriteLogger::init(filter,Config::default(),logfile) .map_err(|e|anyhow!("Failedtoinitalizelogger:{e:}"))?; } Ok(()) } } ##hybrid10.016393441706895828 "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." }; } ##hybrid20.016129031777381897 std::vector<std::string>MakeStrings(){ return{ "a","ab","abc","abcd", "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." "longstringtotesthowthosearehandled.Heregoesmoretext." ##hybrid30.016129031777381897 LoggerPtrINF=Logger::getLogger(LOG4CXX_TEST_STR("INF")); INF->setLevel(Level::getInfo()); LoggerPtrINF_ERR=Logger::getLogger(LOG4CXX_TEST_STR("INF.ERR")); INF_ERR->setLevel(Level::getError()); LoggerPtrDEB=Logger::getLogger(LOG4CXX_TEST_STR("DEB")); DEB->setLevel(Level::getDebug()); //Note:categorieswithundefinedlevel LoggerPtrINF_UNDEF=Logger::getLogger(LOG4CXX_TEST_STR("INF.UNDEF")); LoggerPtrINF_ERR_UNDEF=Logger::getLogger(LOG4CXX_TEST_STR("INF.ERR.UNDEF")); LoggerPtrUNDEF=Logger::getLogger(LOG4CXX_TEST_STR("UNDEF")); ##hybrid40.01587301678955555 //union("x|y"),grouping("(xy)"),brackets("[xy]"),and //repetitioncount("x{5,7}"),amongothers. // //Belowisthesyntaxthatwedosupport.Wechoseittobea //subsetofbothPCREandPOSIXextendedregex,soit'seasyto //learnwhereveryoucomefrom.Inthefollowing:'A'denotesa //literalcharacter,period(.),orasingle\\escapesequence; //'x'and'y'denoteregularexpressions;'m'and'n'arefor我们发现在sparse中出现了不少由低信息量词“How”,“What”匹配带来的低质量结果。通过观察,知道在这个数据中“How”,“What”的匹配会带来干扰,可以采用一种途径来去屏蔽这些词的匹配,即将它们加入加入stopword来忽略这些词的匹配结果。
03.
案例三:混合检索(添加stopword)优于语义检索
在经过这一步处理后,我们再次分析一个经过调优后的混合检索比语义检索好的结果,这一次是由于在query中的“RegistryClient” 一词的匹配让我们找到了只使用语义检索模型没有召回的结果,同时我们也可以注意到在通过hybrid的方式检索出来结果低质量匹配的结果减少了。
问题:How is the RegistryClient instance created in the test methods?
/**Integrationtestsfor{@linkBlobPuller}.*/ publicclassBlobPullerIntegrationTest{ privatefinalFailoverHttpClienthttpClient=newFailoverHttpClient(true,false,ignored->{}); @Test publicvoidtestPull()throwsIOException,RegistryException{ RegistryClientregistryClient= RegistryClient.factory(EventHandlers.NONE,"gcr.io","distroless/base",httpClient) .newRegistryClient(); V22ManifestTemplatemanifestTemplate= registryClient .pullManifest( ManifestPullerIntegrationTest.KNOWN_MANIFEST_V22_SHA,V22ManifestTemplate.class) .getManifest(); DescriptorDigestrealDigest=manifestTemplate.getLayers().get(0).getDigest();语义检索结果:
##dense00.7411458492279053 Mockito.doThrow(mockRegistryUnauthorizedException) .when(mockJibContainerBuilder) .containerize(mockContainerizer); try{ testJibBuildRunner.runBuild(); Assert.fail(); }catch(BuildStepsExecutionExceptionex){ Assert.assertEquals( TEST_HELPFUL_SUGGESTIONS.forHttpStatusCodeForbidden("someregistry/somerepository"), ex.getMessage()); } } ##dense10.7346029877662659 verify(mockCredentialRetrieverFactory).known(knownCredential,"credentialSource"); verify(mockCredentialRetrieverFactory).known(inferredCredential,"inferredCredentialSource"); verify(mockCredentialRetrieverFactory) .dockerCredentialHelper("docker-credential-credentialHelperSuffix"); } ##dense20.7285804748535156 when(mockCredentialRetrieverFactory.dockerCredentialHelper(anyString())) .thenReturn(mockDockerCredentialHelperCredentialRetriever); when(mockCredentialRetrieverFactory.known(knownCredential,"credentialSource")) .thenReturn(mockKnownCredentialRetriever); when(mockCredentialRetrieverFactory.known(inferredCredential,"inferredCredentialSource")) .thenReturn(mockInferredCredentialRetriever); when(mockCredentialRetrieverFactory.wellKnownCredentialHelpers()) .thenReturn(mockWellKnownCredentialHelpersCredentialRetriever); ##dense30.7279614210128784 @Test publicvoidtestBuildImage_insecureRegistryException() throwsInterruptedException,IOException,CacheDirectoryCreationException,RegistryException, ExecutionException{ InsecureRegistryExceptionmockInsecureRegistryException= Mockito.mock(InsecureRegistryException.class); Mockito.doThrow(mockInsecureRegistryException) .when(mockJibContainerBuilder) .containerize(mockContainerizer); try{ testJibBuildRunner.runBuild(); Assert.fail(); }catch(BuildStepsExecutionExceptionex){ Assert.assertEquals(TEST_HELPFUL_SUGGESTIONS.forInsecureRegistry(),ex.getMessage()); } } ##dense40.724872350692749 @Test publicvoidtestBuildImage_registryCredentialsNotSentException() throwsInterruptedException,IOException,CacheDirectoryCreationException,RegistryException, ExecutionException{ Mockito.doThrow(mockRegistryCredentialsNotSentException) .when(mockJibContainerBuilder) .containerize(mockContainerizer); try{ testJibBuildRunner.runBuild(); Assert.fail(); }catch(BuildStepsExecutionExceptionex){ Assert.assertEquals(TEST_HELPFUL_SUGGESTIONS.forCredentialsNotSent(),ex.getMessage()); } }混合检索结果:
##hybrid00.016393441706895828 /**Integrationtestsfor{@linkBlobPuller}.*/ publicclassBlobPullerIntegrationTest{ privatefinalFailoverHttpClienthttpClient=newFailoverHttpClient(true,false,ignored->{}); @Test publicvoidtestPull()throwsIOException,RegistryException{ RegistryClientregistryClient= RegistryClient.factory(EventHandlers.NONE,"gcr.io","distroless/base",httpClient) .newRegistryClient(); V22ManifestTemplatemanifestTemplate= registryClient .pullManifest( ManifestPullerIntegrationTest.KNOWN_MANIFEST_V22_SHA,V22ManifestTemplate.class) .getManifest(); DescriptorDigestrealDigest=manifestTemplate.getLayers().get(0).getDigest(); ##hybrid10.016393441706895828 Mockito.doThrow(mockRegistryUnauthorizedException) .when(mockJibContainerBuilder) .containerize(mockContainerizer); try{ testJibBuildRunner.runBuild(); Assert.fail(); }catch(BuildStepsExecutionExceptionex){ Assert.assertEquals( TEST_HELPFUL_SUGGESTIONS.forHttpStatusCodeForbidden("someregistry/somerepository"), ex.getMessage()); } } ##hybrid20.016129031777381897 verify(mockCredentialRetrieverFactory).known(knownCredential,"credentialSource"); verify(mockCredentialRetrieverFactory).known(inferredCredential,"inferredCredentialSource"); verify(mockCredentialRetrieverFactory) .dockerCredentialHelper("docker-credential-credentialHelperSuffix"); } ##hybrid30.016129031777381897 @Test publicvoidtestPull_unknownBlob()throwsIOException,DigestException{ DescriptorDigestnonexistentDigest= DescriptorDigest.fromHash( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); RegistryClientregistryClient= RegistryClient.factory(EventHandlers.NONE,"gcr.io","distroless/base",httpClient) .newRegistryClient(); try{ registryClient .pullBlob(nonexistentDigest,ignored->{},ignored->{}) .writeTo(ByteStreams.nullOutputStream()); Assert.fail("Tryingtopullnonexistentblobshouldhaveerrored"); }catch(IOExceptionex){ if(!(ex.getCause()instanceofRegistryErrorException)){ throwex; } MatcherAssert.assertThat( ex.getMessage(), CoreMatchers.containsString( "pullBLOBforgcr.io/distroless/basewithdigest"+nonexistentDigest)); } } } ##hybrid40.01587301678955555 when(mockCredentialRetrieverFactory.dockerCredentialHelper(anyString())) .thenReturn(mockDockerCredentialHelperCredentialRetriever); when(mockCredentialRetrieverFactory.known(knownCredential,"credentialSource")) .thenReturn(mockKnownCredentialRetriever); when(mockCredentialRetrieverFactory.known(inferredCredential,"inferredCredentialSource")) .thenReturn(mockInferredCredentialRetriever); when(mockCredentialRetrieverFactory.wellKnownCredentialHelpers()) .thenReturn(mockWellKnownCredentialHelpersCredentialRetriever);我们可以得到一些结论,语义检索模型可以帮助我们直接获得一个较好的结果,但是当query中出现希望匹配的关键词时,语义检索模型缺乏对这一需求的显式表达。而全文检索方法则可以实现这一点。但同时带来的问题是会出现一些无足轻重的匹配干扰到整体质量,这需要我们从具体的结果发现这些给负面案例,从业务的角度针对性地处理来改善检索质量。我们希望通过Milvus2.5的全文检索功能发布,能帮助社区用户在实现RAG系统中带来灵活的自由度,充分进行各种检索策略的组合探索,助力用户应对GenAI时代更加复杂多样的检索需求。如果想要了解如何在Milvus中使用全文检索的具体代码,欢迎进一步阅读使用MIlvus进行全文检索