链载Ai

标题: 基于QwenVL2.5模型实现发票识别 [打印本页]

作者: 链载Ai    时间: 1 小时前
标题: 基于QwenVL2.5模型实现发票识别

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;letter-spacing: normal;orphans: 2;text-align: start;text-indent: 0px;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">提示词

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;"> 准确的真实发票图片数据

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">提示词

"请提取发票图像中的以下信息:\n"+"-发票代码\n"+"-发票号码\n"+"-开票日期\n"+"-校验码\n"+"-机器编号\n"+"-购买方名称\n"+"-购买方纳税人识别号\n"+"-购买方地址、电话\n"+"-购买方开户行及账号\n"+"-销售方名称\n"+"-销售方纳税人识别号\n"+"-销售方地址、电话\n"+"-销售方开户行及账号\n"+"-项目名称\n"+"-规格型号\n"+"-单位\n"+"-数量\n"+"-单价\n"+"-金额\n"+"-税率\n"+"-税额\n"+"-价税合计(大写)\n"+"-价税合计(小写)\n"+"-收款人\n"+"-复核\n"+"-开票人\n"+"要求准确无误地提取上述关键信息,不要遗漏和捏造虚假信息,模糊或者强光遮挡的单个文字可以用英文问号?代替。返回数据格式以json方式输出,格式为:{\n"+"'发票代码':'xxx','发票号码':'xxx','开票日期':'xxx','校验码':'xxx','机器编号':'xxx','购买方名称':'xxx','购买方纳税人识别号':'xxx','购买方地址、电话':'xxx','购买方开户行及账号':'xxx','销售方名称':'xxx','销售方纳税人识别号':'xxx','销售方地址、电话':'xxx','销售方开户行及账号':'xxx','项目名称':'xxx','规格型号':'xxx','单位':'xxx','数量':'xxx','单价':'xxx','金额':'xxx','税率':'xxx','税额':'xxx','价税合计(大写)':'xxx','价税合计(小写)':'xxx','收款人':'xxx','复核':'xxx','开票人':'xxx'}"


ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">

image

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">从性能数据来看,模型大小与响应时间呈正相关关系:3B参数模型最快(7372ms),而32B参数模型最慢(23942ms)

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">• 值得注意的是,7B模型(10181ms)比OCR专用模型(12434ms)更快,尽管OCR模型参数未明确说明

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">• 72B模型(15573ms)比32B模型(23942ms)更快,这可能是由于模型架构优化或测试环境差异导致

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">• 对于实时性要求高的应用,建议优先考虑3B或7B版本;需要更高精度的场景可权衡使用72B版本

ingFang SC", "Microsoft YaHei", "Helvetica Neue", Helvetica, Arial, sans-serif;font-size: 14px;font-style: normal;font-variant-ligatures: normal;font-variant-caps: normal;font-weight: 400;letter-spacing: normal;orphans: 2;text-align: start;text-transform: none;widows: 2;word-spacing: 0px;-webkit-text-stroke-width: 0px;white-space: normal;background-color: rgb(255, 255, 255);text-decoration-thickness: initial;text-decoration-style: initial;text-decoration-color: initial;">• 32B版本在本次测试中表现异常,可能需要再检查模型配置或测试环境是否存在问题


qwen2.5-vl-72b-instruct模型

接口调用时间:15573 毫秒

{"发票代码":"053002100113","发票号码":"00195236","开票日期":"2022年12月05日","校验码":"02954951630544033053","机器编号":"497003211467","购买方名称":"云南艾拓信息技术有限公司","购买方纳税人识别号":"91530100792866047D","购买方地址、电话":"云南省昆明市高新区经典双城C1幢1单元15层1503室0871-65386218","购买方开户行及账号":"富滇银行昆明市五华支行242021010000495122","销售方名称":"云南亚彩科技有限公司","销售方纳税人识别号":"91530103MA6Q798R69","销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610","销售方开户行及账号":"中国建设银行昆明白云路支行53050188863700000644","项目名称":"*计算机网络设备*光纤箱\n*计算机网络设备*光纤箱\n*计算机网络设备*光纤适配器","规格型号":"DS-ZGHK-24/E\nDS-ZGHK-12/E\nDS-ZMFGHLS-2/E","单位":"个\n个\n个","数量":"3\n8\n168","单价":"143.362831858407\n106.194690265487\n4.42477876106195","金额":"430.09\n849.56\n743.36\n¥2023.01","税率":"13%\n13%\n13%","税额":"55.91\n110.44\n96.64\n¥262.99","价税合计(大写)":"贰仟贰佰捌拾陆圆整","价税合计(小写)":"¥2286.00","收款人":"蒋云芳","复核":"赵建柏","开票人":"董飞霞"}


qwen2.5-vl-32b-instruct模型

接口调用时间:23942 毫秒


{ "发票代码":"053002100113", "发票号码":"00195236", "开票日期":"2022年12月05日", "校验码":"02954951630544033053", "机器编号":"497003211467", "购买方名称":"云南艾拓信息技术有限公司", "购买方纳税人识别号":"91530100792866047D", "购买方地址、电话":"云南省昆明市高新区经典双城C1幢1单元15层1503室 0871-65386218", "购买方开户行及账号":"富滇银行昆明市五华支行 242021010000495122", "销售方名称":"云南亚彩科技有限公司", "销售方纳税人识别号":"91530103MA6Q798R69", "销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号 13629663610", "销售方开户行及账号":"中国建设银行昆明白云路支行 53050188863700000644", "项目名称":"*计算机网络设备*光纤箱\n*计算机网络设备*光纤箱\n*计算机网络设备*光纤适配器\n合计", "规格型号":"DS-ZGHK-24/E\nDS-ZGHK-12/E\nDS-ZMFG HLS-2/E", "单位":"个\n个\n个", "数量":"3\n8\n168", "单价":"143.362831858407\n106.194690265487\n4.42477876106195", "金额":"430.09\n849.56\n743.36\n¥2023.01", "税率":"13%\n13%\n13%", "税额":"55.91\n110.44\n96.64\n¥262.99", "价税合计(大写)":"贰仟贰佰捌拾陆圆整", "价税合计(小写)":"¥2286.00", "收款人":"蒋云芳", "复核":"赵建柏", "开票人":"董飞霞"}


qwen2.5-vl-7b-instruct模型:

接口调用时间:10181 毫秒


{"发票代码":"?53002100113","发票号码":"00195236","开票日期":"2022年12月05日","校验码":"02954951630544033053","机器编号":"497003211467","购买方名称":"云南艾拓信息技术有限公司","购买方纳税人识别号":"91530100792866047D","购买方地址、电话":"云南省昆明市高新区经典双城C1幢1单元15层1503室0871-65386218","购买方开户行及账号":"富滇银行昆明市五华支行242021010000495122","销售方名称":"云南亚彩科技有限公司","销售方纳税人识别号":"91530103MA6Q798R69","销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610","销售方开户行及账号":"中国建设银行昆明白云路支行53050188863700000644","项目名称":"*计算机网络设备*光纤箱*计算机网络设备*光纤箱*计算机网络设备*光纤适配器","规格型号":"DS-ZGHK-24/EDS-ZGHK-12/EDS-ZMFGHLS-2/E","单位":"个元/套套","数量":"3?168","单价":"143,362831858407;106.194690265487;4.42477876106195","金额":"¥:430.09;849.56;743.36","税率":"13%","税额":"55.91元;110.44元;96.64元","价税合计(大写)":"(贰仟贰佰捌拾陆圆整)","价税合计(小写)":"¥?:2286.00?","收款人":"蒋云芳","复核":"赵建柏","开票人":"董飞霞"}

注意:由于部分价格和数字被省略或显示不完整,在解析时进行了简化处理,并用`?`表示无法确定的部分。实际操作中应根据具体情况进行调整和完善。

qwen2.5-vl-3b-instruct模型:

接口调用时间:7372 毫秒


{"发票代码":"053002100113","发票号码":"00195236","开票日期":"2022年12月05日","校验码":"02954951630544033053","机器编号":"497003211467","购买方名称":"云南艾拓信息技术有限公司","购买方纳税人识别号":"91530100792866047D","购买方地址、电话":"云南省昆明市高新区双城C1幢1单元15层1503室0871-65386218","购买方开户行及账号":"富滇银行昆明市五华支行242021010000495122","销售方名称":"云南亚彩科技有限公司","销售方纳税人识别号":"91530103MA6Q798R69","销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610","销售方开户行及账号":"中国建设银行昆明白云路支行53050188863700000644","项目名称":"*计算机网络设备*光纤箱*计算机网络设备*光纤箱*计算机网络设备*光纤适配器合计","规格型号":"DS-ZGHK-24/E?DS-ZGHK-12/E?DS-ZMFGHLS-2/E","单位":"个","数量":"38168","单价":"143.362831858407106.1946902654874.42477876106195","金额":"430.09849.56743.36","税率":"13%","税额":"55.91110.4496.64","价税合计(大写)":"贰仟贰佰捌拾陆圆整","价税合计(小写)":"2286.00","收款人":"蒋云芳","复核":"赵建柏","开票人":"董飞霞"}


qwen-vl-ocr-latest模型:

接口调用时间:12434 毫秒


{"发票代码":"053002100113","发票号码":"00195236","开票日期":"2022年12月05日","校验码":"02954951630544033053","机器编号":"497003211467","购买方名称":"云南艾拓信息技术有限公司","购买方纳税人识别号":"91530100792866047D","购买方地址、电话":"云南省昆明市高新区经典双城C1幢1单元15层1503室0871-65386218","购买方开户行及账号":"富滇银行昆明市五华支行242021010000495122","销售方名称":"云南亚彩科技有限公司","销售方纳税人识别号":"91530103MA6Q798R69","销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610","销售方开户行及账号":"中国建设银行昆明白云路支行53050188863700000644","项目名称":"*计算机网络设备*光纤箱*计算机网络设备*光纤箱*计算机网络设备*光纤适配器合计","规格型号":"DS-ZGHK-24/EDS-ZGHK-12/EDS-ZMFGHLS-2/E","单位":"个个个","数量":"38168","单价":"143.362831858407106.1946902654874.42477876106195","金额":"430.09849.56743.36","税率":"13%13%13%","税额":"55.91110.4496.64","价税合计(大写)":"贰仟贰佰捌拾陆圆整","价税合计(小写)":"2286.00","收款人":"蒋云芳","复核":"赵建柏","开票人":"董飞霞"}



第二轮测试

调整提示词后

请提取这张照片的内容,其中内容格式‘机器编号’、‘发票代码’、‘发票号码’、‘开票日期’、‘校验码’、‘购买方名称’、‘购买方纳税人识别号’、‘购买方地址、电话’、‘开户行及账号’、‘货物或应税劳务、服务名称’、‘规格型号’、‘单位’、‘数量’、‘单价’、‘金额’、‘税率’、‘税额’、‘价税合计(大写)’、‘价税合计(小写)’、‘销售方名称’、‘销售方纳税人识别号’、‘销售方地址、电话’、‘销售方地址、电话’、‘开户行及账号’、‘备注’、‘收款人’、‘复核’、‘开票人’字段返回信息,返回的结果信息以json格式返回


72B-VL

{
"机器编号":"497003211467",
"发票代码":"053002100113",
"发票号码":"00195236",
"开票日期":"2022年12月05日",
"校验码":"02954 95163 05440 33053",
"购买方名称":"云南艾拓信息技术有限公司",
"购买方纳税人识别号":"91530100792866047D",
"购买方地址、电话":"云南省昆明市高新区经典双城C幢1单元15层1503室 0871-65386218",
"开户行及账号":"富滇银行昆明市五华支行 242021010000495122",
"货物或应税劳务、服务名称":[
{
"项目名称":"*计算机网络设备*光纤箱",
"规格型号":"DS-ZGHK-24/E",
"单位":"个",
"数量":"3",
"单价":"143.302831858407",
"金额":"430.09",
"税率":"13%",
"税额":"55.91"
},
{
"项目名称":"*计算机网络设备*光纤箱",
"规格型号":"DS-ZGHK-12/E",
"单位":"个",
"数量":"8",
"单价":"106.194690265487",
"金额":"849.56",
"税率":"13%",
"税额":"110.44"
},
{
"项目名称":"*计算机网络设备*光纤适配器",
"规格型号":"DS-ZMFGHLS-2/E",
"单位":"个",
"数量":"168",
"单价":"4.42477876106195",
"金额":"743.36",
"税率":"13%",
"税额":"96.64"
}
],
"价税合计(大写)":"贰仟贰佰捌拾陆圆整",
"价税合计(小写)":"¥2286.00",
"销售方名称":"云南亚彩科技有限公司",
"销售方纳税人识别号":"91530103MA6Q798R69",
"销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610",
"销售方开户行及账号":"中国建设银行昆明白云路支行53050188863700000644",
"备注":"",
"收款人":"蒋云芳",
"复核":"赵建柏",
"开票人":"董飞霞"
}

32B-VL

{
"机器编号":"497003211467",
"发票代码":"053002100113",
"发票号码":"00195236",
"开票日期":"2022年12月05日",
"校验码":"02954951630544033053",
"购买方名称":"云南艾拓信息技术有限公司",
"购买方纳税人识别号":"91530100792866047D",
"购买方地址、电话":"云南省昆明市高新区经典双城C1幢1单元15层1503室 0871-65386218",
"开户行及账号":"富滇银行昆明市五华支行 242021010000495122",
"货物或应税劳务、服务名称":[
"*计算机网络设备*光纤箱",
"*计算机网络设备*光纤箱",
"*计算机网络设备*光纤适配器"
],
"规格型号":[
"DS-ZGHK-24/E",
"DS-ZGHK-12/E",
"DS-ZMFG HLS-2/E"
],
"单位":[
"个",
"个",
"个"
],
"数量":[
"3",
"8",
"168"
],
"单价":[
"143.362831858407",
"106.194690265487",
"4.42477876106195"
],
"金额":[
"430.09",
"849.56",
"743.36"
],
"税率":[
"13%",
"13%",
"13%"
],
"税额":[
"55.91",
"110.44",
"96.64"
],
"价税合计(大写)":"贰仟贰佰捌拾陆圆整",
"价税合计(小写)":"¥2286.00",
"销售方名称":"云南亚彩科技有限公司",
"销售方纳税人识别号":"91530103MA6Q798R69",
"销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610",
"销售方开户行及账号":"中国建设银行昆明白云路支行 53050188863700000644",
"备注":"",
"收款人":"蒋云芳",
"复核":"赵建柏",
"开票人":"董飞霞"
}

VL-3b

{
"机器编号":"497003211467",
"发票代码":"053002100113",
"发票号码":"00195236",
"开票日期":"2022年12月05日",
"校验码":"02954 95163 05440 33053",
"购买方名称":"云南艾拓信息技术有限公司",
"购买方纳税人识别号":"91530100792866047D",
"购买方地址、电话":"云南省昆明市高新区双城C1幢1单元15层1503室 0871-65386218",
"开户行及账号":"富滇银行昆明市五华支行 242021010000495122",
"货物或应税劳务、服务名称":[
"*计算机网络设备*光纤箱",
"*计算机网络设备*光纤箱",
"*计算机网络设备*光纤适配器"
],
"规格型号":[
"DS-ZGHK-24/E",
"DS-ZGHK-12/E",
"DS-ZMFG HLS-2/E"
],
"单位":"个",
"数量":[
"3",
"8",
"168"
],
"单价":[
"143.362831858407",
"106.194690265487",
"4.42477876106195"
],
"金额":[
"430.09",
"849.56",
"743.36"
],
"税率":"13%",
"税额":[
"55.91",
"110.44",
"96.64"
],
"价税合计(大写)":"贰仟贰佰捌拾陆圆整",
"价税合计(小写)":"¥2286.00",
"销售方名称":"云南亚彩科技有限公司",
"销售方纳税人识别号":"91530103MA6Q798R69",
"销售方地址、电话":"云南省昆明市盘龙区北京路花园10幢503-2号13629663610",
"销售方开户行及账号":"中国建设银行昆明白云路支行53050188863700000644",
"备注":"",
"收款人":"蒋云芳",
"复核":"赵建柏",
"开票人":"董飞霞"
}

传统OCR与多模态整合


importcv2importnumpyasnpfromPILimportImageimportpytesseractfromtransformersimportAutoProcessor,AutoModelForVision2Seq#初始化QwenVL2.5模型processor=AutoProcessor.from_pretrained("Qwen/Qwen-VL-2-5-Int4")model=AutoModelForVision2Seq.from_pretrained("Qwen/Qwen-VL-2-5-Int4")defpreprocess_image(image_path):"""图像预处理流程"""img=Image.open(image_path).convert("RGB")#基础预处理img=img.resize((1024,1024))#统一尺寸img_array=np.array(img)#增强对比度(可选)#lab=cv2.cvtColor(img_array,cv2.COLOR_RGB2LAB)#l,a,b=cv2.split(lab)#clahe=cv2.createCLAHE(clipLimit=3.0,tileGridSize=(8,8))#cl=clahe.apply(l)#limg=cv2.merge((cl,a,b))#img_array=cv2.cvtColor(limg,cv2.COLOR_LAB2RGB)returnimg_arraydefocr_processing(image_array):"""OCR文字识别(使用Tesseract作为备用方案)"""text=pytesseract.image_to_string(Image.fromarray(image_array),lang="chi_sim+eng")returntextdefinvoice_parser(raw_text):"""发票信息结构化解析"""importrepatterns={"invoice_code":r"发票代码[::]\s*(\d+)","invoice_number":r"发票号码[::]\s*(\d+)","check_code":r"校验码[::]\s*(\d+)","date":r"开票日期[::]\s*(\d{4}年\d{1,2}月\d{1,2}日)","amount":r"合计[::]\s*(¥?\d+\.\d{2})","seller":r"销售方[::]\s*(.*?)(?=\n|购买方)","buyer":r"购买方[::]\s*(.*?)(?=\n|销售方)",}results={}forkey,patterninpatterns.items():match=re.search(pattern,raw_text,re.IGNORECASE)ifmatch:results[key]=match.group(1).strip()#金额数值化处理if"amount"inresults:try:results["amount"]=float(re.sub(r"[^\d.]","",results["amount"]))except:passreturnresultsdefqwenvl_processing(image_array,prompt):"""QwenVL2.5多模态处理"""inputs=processor(images=image_array,text=[prompt],return_tensors="pt",padding=True)outputs=model.generate(**inputs)returnprocessor.decode(outputs[0],skip_special_tokens=True)defmain_process(image_path):#1.图像预处理processed_img=preprocess_image(image_path)#2.混合识别流程#方案一:纯OCR方案ocr_result=ocr_processing(processed_img)#方案二:QwenVL2.5多模态方案vl_result=qwenvl_processing(processed_img,"请提取这张发票上的所有关键信息,包括发票代码、号码、日期、金额、销售方和购买方信息",)#3.结果融合combined_text=ocr_result+"\n"+vl_result#4.结构化解析structured_data=invoice_parser(combined_text)returnstructured_data#使用示例if__name__=="__main__":sample_image="sample_invoice.jpg"result=main_process(sample_image)print("识别结果:")fork,vinresult.items():print(f"{k.ljust(12)}:{v}")


结论

多模态融合架构
智能预处理
结构化解析引擎
QwenVL2.5优化使用
扩展性设计

性能优化建议:

对于批量处理场景,建议:
精度提升方向:
企业级部署考虑:

该方案通过融合传统OCR技术和先进的多模态大模型,在保证基础识别能力的同时,利用QwenVL2.5的语义理解优势,可以更好地处理复杂版式发票和非常规字段提取需求。







欢迎光临 链载Ai (https://www.lianzai.com/) Powered by Discuz! X3.5