当前位置: 首页 > news >正文

ddddocr: 对图片处理提升识别率

一,识别有误

dataurl:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAPAAAABQCAMAAAAQlwhOAAAA81BMVEUAAAB/TiCdbD6GVSdoNwnPnnB7ShyDUiTDkmR5SBrVpHbMm22FVCa5iFrUo3WUYzW9jF5yQRPDkmS/jmCFVCZyQRNnNghvPhCVZDZfLgCbajy8i116SRundkjXpni7ilx2RRdlNAbRoHJuPQ/OnW+5iFrbqnyvflCFVCaIVynWpXfFlGaCUSOPXjB2RRdkMwV8Sx3JmGqcaz2kc0XWpXeMWy3ZqHrOnW+xgFJxQBJsOw1qOQuHViiATyG4h1m8i12ndkiUYzV2RRe7ilyHViiMWy2EUyWSYTPQn3GUYzXIl2lpOAraqXuNXC51RBbQn3F8Sx3mH0b2AAAAAXRSTlMAQObYZgAABnpJREFUeJzkm+lOIzkQx11CZEbiEEggAmHEh1E4IsQlQFGiwHAI5kMP7/84K9Lddl0++grH1q5GuGOX6+e/XbY7YD6JXX90AAu26+tvQnyRWjGZd612LIuwiwsf8WZNj2trn5zY83xzszYxLtzXdPIBFuQ9T3Ryf8+JR/Uj+kA7P08mZuXR6IsS125Zjfdrjk59+6rzob45XjAQqLe1kGiwhaLRbVbJO8zNh721pRP/qBxVWjDzcCo2m83SiaEEzvt5lTU8vD/CxFXGvIjEjn27Ct/QPoiZ11eFWLcIb+KYj8o4cCSpISTZzY0jBjSZ/QrXs0Te0QjksLdrTGFTdKD25V3YrdmIw9qAujDMqAEDHJdRdBkBTpodKOw6o97l4II5Pj6ulzYj3boC2yE6nFJivXCs+cfHoQ2L2TTep8sVKAYD3hhaNDlVFf5yTWlh/OIPptPI7U1ZpcD17jBjqILK+NS677y/BLG8vbEOkcKeRNmpwnwwRd9uJmuJS/B67+e93AebM+XDQAitmpynymDbsjdxsVXBPpl/2uv13JjleTjDCptoJy2Yb8+lFSKLi6zuWV60nOgQ0aPewWRZpgjcxZkj5JznLLZnKQOEq8xmM9AMO7eFl0wBS94M6lgCL1u1SgNKofOCBvPy8qJJSbo4bAYonYuxjOVLtYzcCFLxqWv6op0fyQAfHrZLrGesMKBI6mydA/vPcIVD3a90rLC6IsNJW0niRlEYVUC7mjZ8uLyystLlocOzgiNPiJp5cFJgrTpMooNpVjq/IymPlCl7qrcqFAMusF57MploCXJhJ46EFF0+Oj1ViW0G5wprlQuFpfPwg1YtmKLH+NmpHlQx/wKrjl6BZDVxEV2swph3PPbGUD6BwJ2iqGn4oUP35MpdvlfRlrDrbxyNCqIKg7zkKgmCluTQKc1q2bubLd45cT0s6hUxjykHyjhBhWmPMgZWEvs0ul405Ab6ah0fibP3f4bDIY6KTnJ6aQwpTHqMK4wbOy/r/GEtA9iSrzbef8qynJiGPWY1QT8wGurRX9QVtj/jiNbX1+3P9ZHpHAHoO6EyJapTFAsop4+If/VUym/Rbv6ysViff1wc62oSE5XAQL/f98Q9jwrtxVAeN1BZaYcAH008S4NL/ibwNRpA7Yltl2Lxo84rFeYJRFcYV3l8zIlDCrtYYkDAzzqpZleiOyF66wWf+BUuLUFhgFK7OAmeClVMvIMxf33+wa5deQCOK4z6Y3UM5TWhyWxt4saHPB9HW6L9Jef9qxLbJOVmnDe7kiZKb95K6d+QTiaT0iEhdvum34Bd3z28NhYA9gUf7/cIsYXOccawcUtjzW2CPaJWYV7wBkK9lf/TnIKQsU5HR0fOpeAVCpNdogoyoUhtFfIPVjcgCqNAOXTOd+RhK6lYBPgFfwT3yUvhy+q7gsmTWQpRqcJGqiG5MZnCC8Bvi+C2mIhOT0+cOHN+1Ra7u5yYncsB7U5EYVTdbWFG/HoCUDJdYTZPio5SVrDgzbJIh7uKFy4MURjbwIVLvPN9jaBo/ZFvHvBlq/LyzTTvSaYozG0wGBg3s0FsShOcLkNy0UxHE2BatAHHla/OgaoDgxYh8wrz13PoVB954wPYkW3QjLf0birucFGXIvMUCuMJVVHhlkKb2x1OTG05FZOe/v5PKdeyv7m7brWIu/3Oe3fXssKa8Qme2/Kylxjsbzu0GNb29pwY99Kec7NPSkBmc7k0fbz2BJAs71VSre00Z7Vsf3/fvNmSpnAERrnplPZPPLm6UoiHex7XO6F+S7tNqYRt37y9OWL63sIeiiOTysP7TyGW1YZ7ezrxzk4C8e1tZWKDFDb8POayryJkbNeVvKotWOGAuS215AZ7i+wiey617bCRWYXRWVRS9xr0sLT0uYhTFM5/mylkvwOffTLe0oKTOcr7O0T8He3/xvvt7KFZ835bcSzKHh4aEff7fuLHJo6NMRuNWr/5Pkjm1f+MKMD72Ix4Y6MJcX4YHTTwkH+b/adCi4YKnzVXeP6qqLbNef9UIW5kZ2dnzZ004c1tYbzGtMDbsR18dAARe27Z38FBbeLVdiPR7fk5nTjp1qcrfJfQcHW1KrH2lj9qFXhT7vW65S/2YpbIe2mKP8sS3+NE7KRS7WSFVUvhTbTLy0tjptNpZYVPTioTO/tZv2ltKy90VuGq1oT35+KJ41f2Lu0jFf5K9l8AAAD//4N+SSTpkEOFAAAAAElFTkSuQmCC

图片如下:

image

识别结果:

$ python3 ocr1.py 
欢迎使用ddddocr,本项目专注带动行业内卷,个人博客:wenanzhe.com
训练数据支持来源于:http://146.56.204.113:19199/preview
爬虫框架feapder可快速一键接入,快速开启爬虫之旅:https://github.com/Boris-code/feapder
谷歌reCaptcha验证码 / hCaptcha验证码 / funCaptcha验证码商业级识别接口:https://yescaptcha.com/i/NSwk7i
识别结果: 00470

二,处理:

为了提高识别比率,对图片做一定的处理:转为灰度+去噪

代码:

import base64
from ddddocr import DdddOcr
import cv2
import numpy as np
from PIL import Image
import io
from PIL import Image, ImageFilter
from io import BytesIOocr = DdddOcr(det=False, ocr=True)data_url = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAPAAAABQCAMAAAAQlwhOAAAA81BMVEUAAAB/TiCdbD6GVSdoNwnPnnB7ShyDUiTDkmR5SBrVpHbMm22FVCa5iFrUo3WUYzW9jF5yQRPDkmS/jmCFVCZyQRNnNghvPhCVZDZfLgCbajy8i116SRundkjXpni7ilx2RRdlNAbRoHJuPQ/OnW+5iFrbqnyvflCFVCaIVynWpXfFlGaCUSOPXjB2RRdkMwV8Sx3JmGqcaz2kc0XWpXeMWy3ZqHrOnW+xgFJxQBJsOw1qOQuHViiATyG4h1m8i12ndkiUYzV2RRe7ilyHViiMWy2EUyWSYTPQn3GUYzXIl2lpOAraqXuNXC51RBbQn3F8Sx3mH0b2AAAAAXRSTlMAQObYZgAABnpJREFUeJzkm+lOIzkQx11CZEbiEEggAmHEh1E4IsQlQFGiwHAI5kMP7/84K9Lddl0++grH1q5GuGOX6+e/XbY7YD6JXX90AAu26+tvQnyRWjGZd612LIuwiwsf8WZNj2trn5zY83xzszYxLtzXdPIBFuQ9T3Ryf8+JR/Uj+kA7P08mZuXR6IsS125Zjfdrjk59+6rzob45XjAQqLe1kGiwhaLRbVbJO8zNh721pRP/qBxVWjDzcCo2m83SiaEEzvt5lTU8vD/CxFXGvIjEjn27Ct/QPoiZ11eFWLcIb+KYj8o4cCSpISTZzY0jBjSZ/QrXs0Te0QjksLdrTGFTdKD25V3YrdmIw9qAujDMqAEDHJdRdBkBTpodKOw6o97l4II5Pj6ulzYj3boC2yE6nFJivXCs+cfHoQ2L2TTep8sVKAYD3hhaNDlVFf5yTWlh/OIPptPI7U1ZpcD17jBjqILK+NS677y/BLG8vbEOkcKeRNmpwnwwRd9uJmuJS/B67+e93AebM+XDQAitmpynymDbsjdxsVXBPpl/2uv13JjleTjDCptoJy2Yb8+lFSKLi6zuWV60nOgQ0aPewWRZpgjcxZkj5JznLLZnKQOEq8xmM9AMO7eFl0wBS94M6lgCL1u1SgNKofOCBvPy8qJJSbo4bAYonYuxjOVLtYzcCFLxqWv6op0fyQAfHrZLrGesMKBI6mydA/vPcIVD3a90rLC6IsNJW0niRlEYVUC7mjZ8uLyystLlocOzgiNPiJp5cFJgrTpMooNpVjq/IymPlCl7qrcqFAMusF57MploCXJhJ46EFF0+Oj1ViW0G5wprlQuFpfPwg1YtmKLH+NmpHlQx/wKrjl6BZDVxEV2swph3PPbGUD6BwJ2iqGn4oUP35MpdvlfRlrDrbxyNCqIKg7zkKgmCluTQKc1q2bubLd45cT0s6hUxjykHyjhBhWmPMgZWEvs0ul405Ab6ah0fibP3f4bDIY6KTnJ6aQwpTHqMK4wbOy/r/GEtA9iSrzbef8qynJiGPWY1QT8wGurRX9QVtj/jiNbX1+3P9ZHpHAHoO6EyJapTFAsop4+If/VUym/Rbv6ysViff1wc62oSE5XAQL/f98Q9jwrtxVAeN1BZaYcAH008S4NL/ibwNRpA7Yltl2Lxo84rFeYJRFcYV3l8zIlDCrtYYkDAzzqpZleiOyF66wWf+BUuLUFhgFK7OAmeClVMvIMxf33+wa5deQCOK4z6Y3UM5TWhyWxt4saHPB9HW6L9Jef9qxLbJOVmnDe7kiZKb95K6d+QTiaT0iEhdvum34Bd3z28NhYA9gUf7/cIsYXOccawcUtjzW2CPaJWYV7wBkK9lf/TnIKQsU5HR0fOpeAVCpNdogoyoUhtFfIPVjcgCqNAOXTOd+RhK6lYBPgFfwT3yUvhy+q7gsmTWQpRqcJGqiG5MZnCC8Bvi+C2mIhOT0+cOHN+1Ra7u5yYncsB7U5EYVTdbWFG/HoCUDJdYTZPio5SVrDgzbJIh7uKFy4MURjbwIVLvPN9jaBo/ZFvHvBlq/LyzTTvSaYozG0wGBg3s0FsShOcLkNy0UxHE2BatAHHla/OgaoDgxYh8wrz13PoVB954wPYkW3QjLf0birucFGXIvMUCuMJVVHhlkKb2x1OTG05FZOe/v5PKdeyv7m7brWIu/3Oe3fXssKa8Qme2/Kylxjsbzu0GNb29pwY99Kec7NPSkBmc7k0fbz2BJAs71VSre00Z7Vsf3/fvNmSpnAERrnplPZPPLm6UoiHex7XO6F+S7tNqYRt37y9OWL63sIeiiOTysP7TyGW1YZ7ezrxzk4C8e1tZWKDFDb8POayryJkbNeVvKotWOGAuS215AZ7i+wiey617bCRWYXRWVRS9xr0sLT0uYhTFM5/mylkvwOffTLe0oKTOcr7O0T8He3/xvvt7KFZ835bcSzKHh4aEff7fuLHJo6NMRuNWr/5Pkjm1f+MKMD72Ix4Y6MJcX4YHTTwkH+b/adCi4YKnzVXeP6qqLbNef9UIW5kZ2dnzZ004c1tYbzGtMDbsR18dAARe27Z38FBbeLVdiPR7fk5nTjp1qcrfJfQcHW1KrH2lj9qFXhT7vW65S/2YpbIe2mKP8sS3+NE7KRS7WSFVUvhTbTLy0tjptNpZYVPTioTO/tZv2ltKy90VuGq1oT35+KJ41f2Lu0jFf5K9l8AAAD//4N+SSTpkEOFAAAAAElFTkSuQmCC"def data_url_to_image(data_url):# 解析URL字符串mediatype, data = data_url.split(',',1)encoding = mediatype.split(';')[1] if ';' in  mediatype  else '' # 解码数据if encoding =='base64':data = base64.b64decode(data)# 创建Image对象image = Image.open(BytesIO(data))return imageimage = data_url_to_image(data_url)
image.save('x1.png')# 打开图像并进行处理
image = Image.open('x1.png')
# 转换为灰度图像
image = image.convert('L')
# 去噪
image = image.filter(ImageFilter.MedianFilter())
# 保存处理后的图像
image.save('x1_opted.png')with open('x1_opted.png', 'rb') as f:img_bytes = f.read()result = ocr.classification(img_bytes)print(f"识别结果:{result}")  # 处理后的with open('x1.png', 'rb') as f:img_bytes = f.read()result = ocr.classification(img_bytes)print(f"识别结果:{result}")  # 原图的输出

结果 :

$ python3 ocr1.py 
欢迎使用ddddocr,本项目专注带动行业内卷,个人博客:wenanzhe.com
训练数据支持来源于:http://146.56.204.113:19199/preview
爬虫框架feapder可快速一键接入,快速开启爬虫之旅:https://github.com/Boris-code/feapder
谷歌reCaptcha验证码 / hCaptcha验证码 / funCaptcha验证码商业级识别接口:https://yescaptcha.com/i/NSwk7i
识别结果:004707
识别结果:00470

原图:

image

处理后图:

image

http://www.jsqmd.com/news/47904/

相关文章:

  • `np.array` 和 `np.ndarray`是什么关系?
  • 大数据部门和AI部门边界
  • Post Processing
  • 工作草稿
  • 【Rust编程:从新手到大师】Rust 环境搭建(详细版) - 教程
  • 2025年11月南通宠物医疗市场深度分析:专业服务与行业规范的标杆选择
  • 软工团队作业3
  • 电梯调度迭代编程作业复盘
  • 球坐标系的大运动方程组
  • 【数据库】navicat的下载以及数据库约束 - 实践
  • i2c linux
  • hyper-v linux
  • 二维费用背包 尽量前后一直把好像也没关系
  • 详细介绍:在Windows中的Docker与WSL2的关系,以及与WSL2中安装的Ubuntu等其它实例的关系
  • ThinkPHP5 RCE+Linux find提权渗透实战:原理+复现(CVE-2018-20062) - 详解
  • 团队作业3--需求改进与系统设计
  • 单部电梯调度总结
  • hyper for linux
  • https linux
  • 页面
  • Trick——数据结构
  • 锂矿及其投资机会
  • 电梯调度迭代编程作业复盘:从问题剖析到能力进阶
  • MORL | Envelope Q-Learning:有收敛性保证的 MORL 算法
  • 获深圳人才集团认可!「张张讲AI」AI资讯公众号解读AI动态,讲师提供定制化咨询
  • 多重背包 二进制拆分这个向左移动以为是2也是被我写出来了
  • why exams are bad
  • 若依框架源码—2
  • http linux
  • html空间能用于表单吗