小模型在RAG（Retrieval-Augmented Generation）系统中的应用：提升效率与可扩展性的新路径 - 链载Ai

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", "Source Han Sans CN", sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 15px; line-height: 1.7; color: rgb(5, 7, 59); background-color: rgb(253, 253, 254);">尽管大型语言模型在性能和功能上具有显著优势，但小型语言模型在RAG系统中同样扮演着重要角色。以下是小型语言模型在RAG系统中的几大优势：

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", "Source Han Sans CN", sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; color: rgb(5, 7, 59); font-size: 20px; border-width: initial; border-style: none; border-color: initial; line-height: 1.7; background-color: rgb(253, 253, 254);">三、小型语言模型在RAG中的应用

ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", "Source Han Sans CN", sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 15px; line-height: 1.7; color: rgb(5, 7, 59); background-color: rgb(253, 253, 254);">在RAG系统中，小型语言模型可以发挥多种作用。以下是一些典型的应用场景：

importpandasaspdimportfitz#PyMuPDFfromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorchimportlancedbfromsentence_transformersimportSentenceTransformerimportjsonimportpyarrowaspaimportnumpyasnpimportre

model_name_long="meta-llama/Llama-3.2-1B-Instruct"tokenizer=AutoTokenizer.from_pretrained(model_name_long)device=torch.device("cuda"iftorch.cuda.is_available()else"cpu")log.info(f"Loadingthemodel{model_name_long}")bf16=Falsefp16=Trueiftorch.cuda.is_available():major,_=torch.cuda.get_device_capability()ifmajor>=8:log.info("YourGPUsupportsbfloat16:acceleratetrainingwithbf16=True")bf16=Truefp16=False#Loadthemodeldevice_map={"":0}#LoadonGPU0torch_dtype=torch.bfloat16ifbf16elsetorch.float16model=AutoModelForCausalLM.from_pretrained(model_name_long,torch_dtype=torch_dtype,device_map=device_map,)log.info(f"Modelloadedwithtorch_dtype={torch_dtype}")

file_path='./data/troubleshooting.pdf'dict_pages={}#OpenthePDFfilewithfitz.open(file_path)aspdf_document:forpage_numberinrange(pdf_document.page_count):page=pdf_document.load_page(page_number)page_text=page.get_text()dict_pages[page_number]=page_textprint(f"ProcessedPDFpage{page_number+1}")

#InitializetheSentenceTransformermodelsentence_model=SentenceTransformer('all-MiniLM-L6-v2')#ConnecttoLanceDBdb=lancedb.connect('./data/my_lancedb')#DefinetheschemausingPyArrowschema=pa.schema([pa.field("page_number",pa.int64()),pa.field("original_content",pa.string()),pa.field("summary",pa.string()),pa.field("keywords",pa.string()),pa.field("vectorS",pa.list_(pa.float32(),384)),#Embeddingsizeof384pa.field("vectorK",pa.list_(pa.float32(),384)),])#Createorconnecttoatabletable=db.create_table('summaries',schema=schema,mode='overwrite')

5、摘要和存储数据

#LoopthrougheachpageinthePDFforpage_number,textindict_pages.items():question=f"""Forthegivenpassage,providealongsummaryaboutit,incorporatingallthemainkeywordsinthepassage.FormatshouldbeinJSONformatlikebelow:{{"summary":<textsummary>,"keywords":<acomma-separatedlistofmainkeywordsandacronymsthatappearinthepassage>,}}MakesurethatJSONfieldshavedoublequotesandusethecorrectclosingdelimiters.Passage:{text}"""prompt=create_prompt(question)response=process_prompt(prompt,model,tokenizer,device)#ErrorhandlingforJSONdecodingtry:summary_json=json.loads(response)exceptjson.decoder.JSONDecodeErrorase:exception_msg=str(e)question=f"""CorrectthefollowingJSON{response}whichhas{exception_msg}toproperJSONformat.OutputonlyJSON."""log.warning(f"{exception_msg}for{response}")prompt=create_prompt(question)response=process_prompt(prompt,model,tokenizer,device)log.warning(f"Corrected'{response}'")try:summary_json=json.loads(response)exceptExceptionase:log.error(f"FailedtoparseJSON:'{e}'for'{response}'")continuekeywords=','.join(summary_json['keywords'])#GenerateembeddingsvectorS=sentence_model.encode(summary_json['summary'])vectorK=sentence_model.encode(keywords)#StorethedatainLanceDBtable.add([{"page_number":int(page_number),"original_content":text,"summary":summary_json['summary'],"keywords":keywords,"vectorS":vectorS,"vectorK":vectorK}])print(f"Dataforpage{page_number}storedsuccessfully.")

6、使用 LLM 纠正输出

#UsetheSmallLLAMA3.21Bmodeltocreatesummaryforpage_number,textindict_pages.items():question=f"""Forthegivenpassage,providealongsummaryaboutit,incorporatingallthemainkeywordsinthepassage.FormatshouldbeinJSONformatlikebelow:{{"summary":<textsummary>example"SomeSummarytext","keywords":<acommaseparatedlistofmainkeywordsandacronymsthatappearinthepassage>example["keyword1","keyword2"],}}MakesurethatJSONfieldshavedoublequotes,e.g.,insteadof'summary'use"summary",andusetheclosingandendingdelimiters.Passage:{text}"""prompt=create_prompt(question)response=process_prompt(prompt,model,tokenizer,device)try:summary_json=json.loads(response)exceptjson.decoder.JSONDecodeErrorase:exception_msg=str(e)#UsetheLLMtocorrectitsownoutputquestion=f"""CorrectthefollowingJSON{response}whichhas{exception_msg}toproperJSONformat.OutputonlythecorrectedJSON.FormatshouldbeinJSONformatlikebelow:{{"summary":<textsummary>example"SomeSummarytext","keywords":<acommaseparatedlistofkeywordsandacronymsthatappearinthepassage>example["keyword1","keyword2"],}}"""log.warning(f"{exception_msg}for{response}")prompt=create_prompt(question)response=process_prompt(prompt,model,tokenizer,device)log.warning(f"Corrected'{response}'")#TryparsingthecorrectedJSONtry:summary_json=json.loads(response)exceptjson.decoder.JSONDecodeErrorase:log.error(f"FailedtoparsecorrectedJSON:'{e}'for'{response}'")continue

五、检索和生成过程

（一）处理用户查询

（二）对检索到的摘要进行排名

（三）提取选定的摘要并生成最终答案

通过使用小型 LLM 如 LLAMA 3.2 1B Instruct（Llama3.2 1B与3B：轻盈而强大的AI新势力），可以高效地对大型文档进行摘要和提取关键词。这些摘要和关键词可以被嵌入并存储在像 LanceDB 这样的数据库中，从而为 RAG 系统提供高效的检索能力（Astute RAG（Retrieval-Augmented Generation）：LLM信息检索与利用的新思路）。在整个工作流程中，小型 LLM 不仅在生成阶段发挥作用，还在检索增强过程中起到了关键作用，包括对检索到的摘要进行排名等。这种方法在降低计算成本的同时，能够为用户提供较为准确和相关的答案，提高了 RAG 系统的实用性和经济性。

链载Ai