简介:本文详细解析合合TextIn通用文字识别功能的API调用流程,涵盖环境准备、API调用、结果处理等关键环节,提供可操作的代码示例与最佳实践建议。
开发者需通过合合TextIn官方平台完成企业级账号注册,提交营业执照等资质文件后获得API调用权限。建议优先选择企业认证通道,可获得更高的调用配额与技术支持优先级。
合合TextIn提供Java、Python、C++等多语言SDK,以Python为例,通过pip安装官方SDK:
pip install textin-sdk
安装完成后需验证SDK版本是否与API文档要求一致,避免因版本差异导致的接口兼容性问题。
在控制台生成API Key与Secret Key,建议采用环境变量存储密钥:
import osos.environ['TEXTIN_API_KEY'] = 'your_api_key'os.environ['TEXTIN_SECRET_KEY'] = 'your_secret_key'
密钥泄露可能导致调用异常或安全风险,需定期轮换密钥并限制IP白名单访问。
通用文字识别API支持多种参数配置,典型请求体如下:
from textin_sdk import TextInClientclient = TextInClient()request = {"image_base64": "iVBORw0KGgoAAAAN...", # Base64编码图像"image_url": "https://example.com/image.jpg", # 或直接使用URL"recognize_granularity": "word", # 识别粒度:word/char"charset": "auto", # 字符集:auto/chs/cht/en"language_type": "CHN_ENG", # 语言类型"is_pdf_polygon": False, # PDF多边形检测"is_return_char_box": True # 返回字符级坐标}
response = client.general_ocr_sync(request)
task_id = client.general_ocr_async(request)while True:result = client.get_async_result(task_id)if result['status'] == 'SUCCESS':breaktime.sleep(1)
from textin_sdk.exceptions import TextInAPIExceptionmax_retries = 3for attempt in range(max_retries):try:response = client.general_ocr_sync(request)breakexcept TextInAPIException as e:if attempt == max_retries - 1:raisetime.sleep(2 ** attempt) # 指数退避
API返回的JSON包含层级信息:
{"log_id": 123456,"words_result_num": 2,"words_result": [{"words": "合合信息","location": {"width": 100, "height": 20, ...},"chars": [{"char": "合", "location": {...}}, ...]},...]}
可通过Pandas转换为DataFrame便于分析:
import pandas as pddf = pd.DataFrame([{'text': item['words'],'x': item['location']['left'],'y': item['location']['top']} for item in response['words_result']])
对识别结果进行质量筛选(置信度阈值建议>0.9):
high_confidence = [item for item in response['words_result']if item.get('probability', 1) > 0.9]
当启用is_pdf_polygon参数时,可获取文本块坐标,实现:
from concurrent.futures import ThreadPoolExecutordef process_image(img_path):with open(img_path, 'rb') as f:img_base64 = base64.b64encode(f.read()).decode()return client.general_ocr_sync({"image_base64": img_base64})with ThreadPoolExecutor(max_workers=5) as executor:results = list(executor.map(process_image, image_paths))
import cv2img = cv2.imread('image.jpg')gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)_, binary = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY)
def extract_id_info(img_base64):response = client.general_ocr_sync({"image_base64": img_base64,"recognize_granularity": "word","language_type": "CHN_ENG"})id_fields = {'name': next((w['words'] for w in response['words_result']if '姓名' in w['words']), None),'id_number': next((w['words'] for w in response['words_result']if len(w['words']) == 18 and w['words'].isdigit()), None)}return id_fields
def recognize_financial_report(pdf_path):with open(pdf_path, 'rb') as f:pdf_base64 = base64.b64encode(f.read()).decode()response = client.general_ocr_sync({"image_base64": pdf_base64,"is_pdf_polygon": True,"recognize_granularity": "char"})# 提取表格区域并识别数字tables = [item for item in response['words_result']if item['location']['width'] > 200 and item['location']['height'] > 50]# 后续处理逻辑...
通过系统掌握上述流程,开发者可高效实现各类文档的数字化处理,建议结合具体业务场景进行参数调优与结果验证,持续提升识别准确率与处理效率。