简介:本文详细介绍了Jetson Nano套件中CSI相机的配置方法及YOLOv物体检测模型的部署流程,涵盖硬件连接、驱动安装、模型训练与优化、实时检测实现等关键步骤,适合开发者快速上手边缘计算视觉项目。
Jetson Nano作为NVIDIA推出的边缘计算开发套件,凭借其低功耗、高性能的GPU加速能力,成为计算机视觉与AIoT领域的热门选择。本文将围绕Jetson Nano套件,重点讲解CSI相机配置与YOLOv物体检测的完整流程,帮助开发者快速搭建从图像采集到实时推理的完整系统。
Jetson Nano开发者套件(B01版本)支持通过CSI接口连接兼容的摄像头模块(如Raspberry Pi Camera V2),其核心优势包括:
| 型号 | 分辨率 | 帧率 | 接口类型 | 适用场景 |
|---|---|---|---|---|
| Raspberry Pi Camera V2 | 8MP | 30fps | CSI-2 | 通用物体检测 |
| IMX219-77 | 720P | 60fps | CSI-2 | 高速运动目标追踪 |
| Arducam OV5647 | 5MP | 15fps | CSI-2 | 低光照环境 |
关键参数:选择支持MJPEG或YUV422格式的摄像头,避免H.264编码带来的解码开销。
# 更新系统软件包sudo apt-get updatesudo apt-get upgrade -y# 安装GStreamer依赖(用于视频流处理)sudo apt-get install -y libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
Jetson Nano默认已集成CSI驱动,但需验证内核模块:
lsmod | grep tegra_video # 应显示tegra_video_csi等模块
若缺失模块,手动加载:
sudo modprobe tegra_video_csisudo modprobe videobuf2_core
使用nvarguscamerasrc(NVIDIA专用GStreamer插件)测试:
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=1280,height=720,framerate=30/1' ! nvvidconv ! xvimagesink
常见问题:
video组:sudo usermod -aG video $USER
# 安装PyTorch与CUDA(预编译版本)wget https://nvidia.box.com/shared/static/p57jwntq4h6ul5ebtgzq83vr8p7gqm8x.whl -O torch-1.8.0-cp36-cp36m-linux_aarch64.whlpip3 install torch-1.8.0-cp36-cp36m-linux_aarch64.whlpip3 install torchvision numpy opencv-python# 安装TensorRT(优化推理)sudo apt-get install -y tensorrt
推荐使用预训练的YOLOv5s模型(轻量级版本):
git clone https://github.com/ultralytics/yolov5.gitcd yolov5pip install -r requirements.txt
下载预训练权重:
wget https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt
导出ONNX模型:
import torchmodel = torch.load('yolov5s.pt', map_location='cpu')['model'].float().eval()dummy_input = torch.randn(1, 3, 640, 640)torch.onnx.export(model, dummy_input, 'yolov5s.onnx',input_names=['images'], output_names=['output'],dynamic_axes={'images': {0: 'batch'}, 'output': {0: 'batch'}})
使用TensorRT优化:
trtexec --onnx=yolov5s.onnx --saveEngine=yolov5s.trt --fp16
优化参数:
--fp16:启用半精度加速(Jetson Nano支持)--workspace=1024:分配1GB显存
import cv2import numpy as npimport pycuda.driver as cudaimport tensorrt as trtclass HostDeviceMem(object):def __init__(self, host_mem, device_mem):self.host = host_memself.device = device_memdef __str__(self):return f"Host:\n{self.host}\nDevice:\n{self.device}"class YOLOv5TRT:def __init__(self, engine_path):# 初始化TensorRT引擎self.logger = trt.Logger(trt.Logger.INFO)with open(engine_path, "rb") as f, trt.Runtime(self.logger) as runtime:self.engine = runtime.deserialize_cuda_engine(f.read())self.context = self.engine.create_execution_context()# 分配输入/输出缓冲区self.inputs, self.outputs, self.bindings = [], [], []for binding in self.engine:size = trt.volume(self.engine.get_binding_shape(binding))dtype = trt.nptype(self.engine.get_binding_dtype(binding))host_mem = cuda.pagelocked_empty(size, dtype)device_mem = cuda.mem_alloc(host_mem.nbytes)self.bindings.append(int(device_mem))if self.engine.binding_is_input(binding):self.inputs.append(HostDeviceMem(host_mem, device_mem))else:self.outputs.append(HostDeviceMem(host_mem, device_mem))def infer(self, img):# 预处理img = cv2.resize(img, (640, 640))img = img.transpose((2, 0, 1)).astype(np.float32) / 255.0np.copyto(self.inputs[0].host, img.ravel())# 推理stream = cuda.Stream()for inp in self.inputs:cuda.memcpy_htod_async(inp.device, inp.host, stream)self.context.execute_async_v2(bindings=self.bindings, stream_handle=stream.handle)for out in self.outputs:cuda.memcpy_dtoh_async(out.host, out.device, stream)stream.synchronize()# 后处理(解析检测结果)pred = self.outputs[0].host.reshape(-1, 85) # 假设输出格式为[x,y,w,h,conf,class1,class2,...]return pred# 初始化CSI相机cap = cv2.VideoCapture('nvarguscamerasrc ! video/x-raw(memory:NVMM),width=640,height=480 ! nvvidconv ! appsink', cv2.CAP_GSTREAMER)# 初始化YOLOv5模型detector = YOLOv5TRT('yolov5s.trt')while True:ret, frame = cap.read()if not ret: break# 推理results = detector.infer(frame)# 绘制检测框(示例)for det in results:if det[4] > 0.5: # 置信度阈值x, y, w, h = map(int, det[:4] * np.array([frame.shape[1], frame.shape[0], frame.shape[1], frame.shape[0]]))cv2.rectangle(frame, (x-w//2, y-h//2), (x+w//2, y+h//2), (0,255,0), 2)cv2.imshow('Detection', frame)if cv2.waitKey(1) == 27: break
--fp16参数,可提升30%推理速度multiprocessing模块并行处理图像采集与推理
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=640,height=480,framerate=30/1,format=NV12' ! queue leaky=2 ! nvvidconv ! appsink
torch.quantization模块将模型权重转为INT8torch.nn.utils.prune移除冗余通道| 问题现象 | 可能原因 | 解决方案 |
|---|---|---|
| 相机无法识别 | 排线接触不良 | 重新插拔CSI接口,检查卡扣是否锁紧 |
| YOLOv5推理速度慢 | 未启用TensorRT | 重新导出为.trt引擎文件 |
| 检测框闪烁 | 帧率不匹配 | 调整cv2.waitKey()参数 |
| 内存不足(OOM) | 模型过大 | 改用YOLOv5n(Nano专用轻量版) |
通过本文的指导,开发者可快速掌握Jetson Nano套件中CSI相机配置与YOLOv物体检测的核心技术。从硬件选型到模型优化,每个环节均提供了可落地的解决方案。实际测试表明,在Jetson Nano 4GB版本上,YOLOv5s模型可实现15FPS@640x480的实时检测,满足大多数边缘计算场景需求。建议进一步探索多摄像头同步与模型压缩技术,以提升系统整体效能。