简介:本文提供在Linux系统下部署Deepseek的详细步骤,涵盖环境准备、依赖安装、代码配置及运行验证,帮助开发者快速完成部署。
本文详细介绍在Linux环境下部署Deepseek的完整流程,包括系统环境准备、依赖项安装、代码仓库配置、模型加载与推理测试等关键步骤。通过分阶段说明与命令示例,帮助开发者快速掌握部署技巧,同时提供常见问题解决方案与性能优化建议。
推荐使用Ubuntu 20.04 LTS或CentOS 7/8系统,需满足以下硬件条件:
# Ubuntu系统更新sudo apt update && sudo apt upgrade -ysudo apt install -y git wget curl python3-pip python3-dev build-essential# CentOS系统更新sudo yum update -ysudo yum install -y git wget curl python3 python3-devel gcc-c++ make
建议使用conda创建独立环境:
# 安装Minicondawget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shbash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3source ~/miniconda3/bin/activate# 创建虚拟环境conda create -n deepseek python=3.10conda activate deepseek
根据GPU类型选择安装命令:
# CUDA 11.8版本(推荐)pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118# CPU版本(无GPU时)pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install transformers sentencepiece protobufpip install onnxruntime-gpu # 如需ONNX支持
import torchprint(torch.__version__) # 应显示1.13.1+cu118或对应版本print(torch.cuda.is_available()) # GPU环境应返回True
git clone https://github.com/deepseek-ai/DeepSeek-V2.gitcd DeepSeek-V2git checkout v1.0.3 # 指定稳定版本
从官方渠道获取模型权重文件(.bin或.safetensors格式),建议存储在/data/models/deepseek目录。配置环境变量:
export MODEL_PATH=/data/models/deepseekexport CONFIG_PATH=$MODEL_PATH/config.json
编辑configs/inference.yaml,重点调整以下参数:
model:name: deepseek-v2quantization: bfloat16 # 或int4/int8max_seq_len: 4096device:type: cuda # 或cpugpu_id: 0
python infer.py \--model_path $MODEL_PATH \--prompt "解释量子计算的基本原理" \--max_tokens 512 \--temperature 0.7
# batch_infer.pyfrom transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel = AutoModelForCausalLM.from_pretrained("/data/models/deepseek",torch_dtype=torch.bfloat16,device_map="auto")tokenizer = AutoTokenizer.from_pretrained("/data/models/deepseek")prompts = ["Linux系统调优的五个关键步骤","Python异步编程的最佳实践"]for prompt in prompts:inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=256)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# app.pyfrom fastapi import FastAPIfrom transformers import pipelineapp = FastAPI()generator = pipeline("text-generation",model="/data/models/deepseek",device=0 if torch.cuda.is_available() else "cpu")@app.post("/generate")async def generate(prompt: str):result = generator(prompt, max_length=512, do_sample=True)return {"text": result[0]['generated_text']}
启动服务:
pip install fastapi uvicornuvicorn app:app --host 0.0.0.0 --port 8000
| 问题现象 | 可能原因 | 解决方案 |
|---|---|---|
| CUDA内存不足 | 批次过大 | 减少max_tokens或batch_size |
| 模型加载失败 | 路径错误 | 检查MODEL_PATH环境变量 |
| 推理延迟高 | 未启用量化 | 改用int4量化版本 |
# 优化后的推理配置generation:do_sample: truetop_k: 40top_p: 0.95temperature: 0.7repetition_penalty: 1.1
# GPU使用监控nvidia-smi -l 1# 系统资源监控htop# 或sudo apt install sysstatsar -u 1 3 # CPU使用率
# Dockerfile示例FROM nvidia/cuda:11.8.0-base-ubuntu20.04RUN apt update && apt install -y python3-pip gitRUN pip install torch transformers fastapi uvicornCOPY . /appWORKDIR /appCMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
构建与运行:
docker build -t deepseek-service .docker run -d --gpus all -p 8000:8000 deepseek-service
# deployment.yaml示例apiVersion: apps/v1kind: Deploymentmetadata:name: deepseekspec:replicas: 2selector:matchLabels:app: deepseektemplate:metadata:labels:app: deepseekspec:containers:- name: deepseekimage: deepseek-service:latestresources:limits:nvidia.com/gpu: 1memory: "16Gi"cpu: "4"
--trust_remote_code参数禁用时需验证模型来源logging.basicConfig(level=logging.INFO)记录推理请求本指南完整覆盖了从环境准备到服务化部署的全流程,通过代码示例与配置说明降低了部署门槛。实际部署时建议:
遇到具体问题时,可参考官方GitHub仓库的Issues板块或社区论坛获取最新解决方案。