简介:本文详解基于TensorFlow与OpenCV的发票信息提取方案,重点解析字符分割技术并附完整Python源码,帮助开发者快速构建发票识别系统。
在财务自动化场景中,发票信息提取是关键环节。传统OCR方案存在对复杂排版发票识别率低的问题,本案例采用深度学习+图像处理技术,通过TensorFlow实现发票区域定位,结合OpenCV完成字符分割,最终实现结构化数据提取。
技术选型依据:
整个识别系统分为三个核心模块:
字符分割是OCR识别的前提,直接影响最终准确率。本案例采用复合分割策略:
import cv2import numpy as npdef preprocess_image(img_path):# 读取图像并转为灰度图img = cv2.imread(img_path)gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# 自适应二值化binary = cv2.adaptiveThreshold(gray, 255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY_INV, 11, 2)# 去噪处理kernel = np.ones((3,3), np.uint8)denoised = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)return denoised
通过计算每列的像素和,识别字符间的空白区域:
def vertical_projection(img):# 计算垂直投影projection = np.sum(img, axis=0)# 寻找分割点(投影值小于阈值的列)threshold = np.mean(projection) * 0.3split_points = []in_char = Falsefor i, val in enumerate(projection):if val < threshold and in_char:split_points.append(i)in_char = Falseelif val >= threshold and not in_char:in_char = Truereturn split_points
对投影分割结果进行验证和修正:
def refine_segments(img, split_points):refined_segments = []contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)# 按x坐标排序轮廓contours = sorted(contours, key=lambda c: cv2.boundingRect(c)[0])for c in contours:x, y, w, h = cv2.boundingRect(c)# 过滤小噪点if w > 10 and h > 10:refined_segments.append((x, y, w, h))return refined_segments
def load_detection_model(model_path):
model = tf.keras.models.load_model(model_path)
return model
2. **主处理流程**:```pythondef process_invoice(img_path, model):# 1. 发票区域检测img = cv2.imread(img_path)input_tensor = tf.convert_to_tensor(img)input_tensor = input_tensor[tf.newaxis, ...]detections = model(input_tensor)boxes = detections['detection_boxes'][0].numpy()scores = detections['detection_scores'][0].numpy()# 获取最高分检测框invoice_box = boxes[np.argmax(scores)]xmin, ymin, xmax, ymax = invoice_box# 2. 裁剪发票区域h, w = img.shape[:2]xmin, ymin = int(xmin*w), int(ymin*h)xmax, ymax = int(xmax*w), int(ymax*h)invoice_roi = img[ymin:ymax, xmin:xmax]# 3. 预处理processed = preprocess_image(invoice_roi)# 4. 字符分割split_points = vertical_projection(processed)segments = refine_segments(processed, split_points)# 5. 提取字符ROIcharacters = []for seg in segments:x, y, w, h = segchar_roi = processed[y:y+h, x:x+w]characters.append(char_roi)return characters
模型优化:
分割算法改进:
后处理优化:
def postprocess_characters(characters):# 字符大小归一化normalized = []for char in characters:h, w = char.shapeaspect_ratio = w / htarget_h = 32target_w = int(target_h * aspect_ratio)resized = cv2.resize(char, (target_w, target_h))normalized.append(resized)# 构建字符序列char_sequence = []for char in normalized:# 这里可以接入OCR引擎(如Tesseract)# 或者使用预训练的字符识别模型char_sequence.append("CHAR_PLACEHOLDER") # 实际应替换为识别结果return char_sequence
invoice_recognition/├── models/│ └── invoice_detector.h5├── utils/│ ├── preprocessing.py│ └── segmentation.py├── main.py└── requirements.txt
app = FastAPI()
@app.post(“/recognize”)
async def recognize_invoice(file: bytes):
# 实现文件接收和识别逻辑return {"result": "structured_data"}
```
本案例展示了基于TensorFlow和OpenCV的发票识别系统实现,重点解决了字符分割这一关键问题。实际应用中,建议:
未来发展方向包括: