ColPali 是一种多模态检索器,直接对图像进行处理,无需OCR。
对数据建立索引后,使用 Qwen2-VL-7B 完成 RAG 的生成部分。
frompdf2imageimportconvert_from_path images=convert_from_path("/content/climate_youth_magazine.pdf") images[5]byaldi 是 answer.ai 开源的工具包,可轻松使用 ColPali
frombyaldiimportRAGMultiModalModel RAG=RAGMultiModalModel.from_pretrained("vidore/colpali")建立索引
RAG.index( input_path="/content/climate_youth_magazine.pdf", index_name="image_index",#indexwillbesavedatindex_root/index_name/ store_collection_with_index=False, overwrite=True )然后就可以搜索了
text_query="Howmuchdidtheworldtemperaturechangesofar?" results=RAG.search(text_query,k=1) results[{'doc_id': 0, 'page_num': 6, 'score': 17.25, 'metadata': {}, 'base64': None}]
答案确实是在第6页,就是上面展示的那页pdf。现在我们可以构建一个 RAG 管道了。使用 Qwen2-VL-7B 模型。
fromtransformersimportQwen2VLForConditionalGeneration,AutoTokenizer,AutoProcessor fromqwen_vl_utilsimportprocess_vision_info importtorch model=Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True,torch_dtype=torch.bfloat16).cuda().eval() processor=AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct",trust_remote_code=True) image_index=results[0]["page_num"]-1 messages=[ { "role":"user", "content":[ { "type":"image", "image":images[image_index], }, {"type":"text","text":text_query}, ], } ] text=processor.apply_chat_template( messages,tokenize=False,add_generation_prompt=True ) image_inputs,video_inputs=process_vision_info(messages) inputs=processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs=inputs.to("cuda") generated_ids=model.generate(**inputs,max_new_tokens=50) generated_ids_trimmed=[ out_ids[len(in_ids):]forin_ids,out_idsinzip(inputs.input_ids,generated_ids) ] output_text=processor.batch_decode( generated_ids_trimmed,skip_special_tokens=True,clean_up_tokenization_spaces=False )print(output_text)["The Earth's average global temperature has increased by around 1.1°C since the late 19th century, according to the information provided in the image."]
答案正确!