简介:本文聚焦中文图像识别编程技术,通过深度学习框架实现字符检测与识别,提供从环境搭建到模型部署的全流程代码示例,助力开发者快速构建中文OCR系统。
中文图像识别(Chinese Optical Character Recognition, COCR)是计算机视觉领域的重要分支,其核心目标是将图像中的中文文本转换为可编辑的电子文本。相较于英文识别,中文OCR面临三大挑战:
现代中文OCR系统普遍采用深度学习架构,典型流程包括:文本检测(定位文本区域)→文本识别(字符序列转换)→后处理(纠错、排版恢复)。其中,CRNN(CNN+RNN+CTC)和Transformer-based模型是当前主流方案。
# Python环境配置(推荐3.8+)conda create -n ocr_env python=3.8conda activate ocr_env# 核心依赖库pip install torch torchvision opencv-python pillow \tensorflow==2.8.0 transformers==4.18.0 \easyocr paddleocr# 中文预训练模型下载wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tarwget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar
from paddleocr import PaddleOCR# 初始化识别器(含检测+识别)ocr = PaddleOCR(use_angle_cls=True, # 角度分类lang="ch", # 中文模式rec_model_dir="./ch_PP-OCRv3_rec_infer",det_model_dir="./ch_PP-OCRv3_det_infer")# 图像识别示例img_path = "test_chinese.jpg"result = ocr.ocr(img_path, cls=True)# 结果解析与输出for line in result:print(f"坐标: {line[0]}, 文本: {line[1][0]}, 置信度: {line[1][1]:.2f}")
import torchimport torch.nn as nnfrom torchvision import modelsclass CRNN(nn.Module):def __init__(self, imgH, nc, nclass, nh):super(CRNN, self).__init__()assert imgH % 32 == 0, 'imgH must be a multiple of 32'# CNN特征提取self.cnn = nn.Sequential(models.resnet18(pretrained=True).conv1,models.resnet18(pretrained=True).bn1,models.resnet18(pretrained=True).relu,models.resnet18(pretrained=True).maxpool,models.resnet18(pretrained=True).layer1,models.resnet18(pretrained=True).layer2)# RNN序列建模self.rnn = nn.Sequential(BidirectionalLSTM(512, 256, 256),BidirectionalLSTM(256, 256, nclass))def forward(self, input):# CNN处理conv = self.cnn(input)b, c, h, w = conv.size()assert h == 1, "the height of conv must be 1"conv = conv.squeeze(2) # [b, c, w]conv = conv.permute(2, 0, 1) # [w, b, c]# RNN处理output = self.rnn(conv)return outputclass BidirectionalLSTM(nn.Module):def __init__(self, nIn, nHidden, nOut):super().__init__()self.rnn = nn.LSTM(nIn, nHidden, bidirectional=True)self.embedding = nn.Linear(nHidden * 2, nOut)def forward(self, input):recurrent, _ = self.rnn(input)T, b, h = recurrent.size()t_rec = recurrent.view(T * b, h)output = self.embedding(t_rec)output = output.view(T, b, -1)return output
import albumenations as Atransform = A.Compose([A.OneOf([A.GaussianBlur(p=0.3),A.MotionBlur(p=0.3),A.MedianBlur(blur_limit=3, p=0.3)]),A.RandomBrightnessContrast(p=0.5),A.OneOf([A.ElasticTransform(alpha=30, sigma=5, alpha_affine=5, p=0.3),A.GridDistortion(num_steps=5, distort_limit=0.3, p=0.3)]),A.RandomRotate90(p=0.5)])
quantized_model = torch.quantization.quantize_dynamic(model, {nn.LSTM, nn.Linear}, dtype=torch.qint8)
dummy_input = torch.randn(1, 3, 32, 100)torch.onnx.export(model, dummy_input, "ocr_model.onnx",input_names=["input"], output_names=["output"],dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}})
#include <opencv2/opencv.hpp>#include <onnxruntime_cxx_api.h>class ONNXOCR {public:ONNXOCR(const std::string& model_path) {Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "OCR");Ort::SessionOptions session_options;session_ = new Ort::Session(env, model_path.c_str(), session_options);}std::vector<std::string> predict(const cv::Mat& img) {// 图像预处理cv::Mat resized;cv::resize(img, resized, cv::Size(100, 32));// 模型推理Ort::AllocatorWithDefaultOptions allocator;std::vector<int64_t> input_shape = {1, 3, 32, 100};// ... 完成输入输出处理}private:Ort::Session* session_;};
当前中文OCR技术已进入深度学习驱动的成熟阶段,开发者通过合理选择预训练模型、优化数据管道和部署方案,可在72小时内完成从零到一的完整系统搭建。建议新手从PaddleOCR等成熟框架入手,逐步过渡到自定义模型开发。