简介:本文详细介绍本地部署DeepSeek模型的完整流程,涵盖环境准备、模型下载、依赖安装、推理服务配置等关键步骤,提供从零开始的实战指南。
安装命令示例:
# 安装依赖库sudo apt update && sudo apt install -y \build-essential \cmake \git \wget \python3-dev \libopenblas-dev# 创建conda环境conda create -n deepseek python=3.10conda activate deepseek
wget或curl下载:
wget https://model-repo.deepseek.ai/releases/v1.0/deepseek-7b.tar.gz
sha256sum deepseek-7b.tar.gz# 对比官方提供的哈希值
├── config.json # 模型配置├── pytorch_model.bin # 权重文件├── tokenizer_config.json└── tokenizer.model
# CUDA 11.8版本pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118# 验证安装python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
pip install transformers==4.35.0 # 指定版本避免兼容问题pip install accelerate sentencepiece
# 安装Flash Attention 2pip install flash-attn --no-build-isolation# 或使用Triton优化pip install triton
from transformers import AutoModelForCausalLM, AutoTokenizermodel_path = "./deepseek-7b"tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto",torch_dtype="auto",trust_remote_code=True)
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
model = AutoModelForCausalLM.from_pretrained(
model_path,
quantization_config=quant_config,
device_map=”auto”
)
### 4.3 推理服务搭建(FastAPI示例)```pythonfrom fastapi import FastAPIfrom pydantic import BaseModelimport uvicornapp = FastAPI()class Query(BaseModel):prompt: strmax_tokens: int = 512@app.post("/generate")async def generate_text(query: Query):inputs = tokenizer(query.prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=query.max_tokens)return {"response": tokenizer.decode(outputs[0], skip_special_tokens=True)}if __name__ == "__main__":uvicorn.run(app, host="0.0.0.0", port=8000)
torch.backends.cuda.enable_mem_efficient_sdp(True)OS_ENV_TORCH_DYNAMIC_SHAPES=1CUDA_LAUNCH_BLOCKING=1调试内存问题
# 安装nvtop监控GPUsudo apt install nvtop# 使用nvidia-smi监控watch -n 1 nvidia-smi
import logginglogging.basicConfig(filename="deepseek.log",level=logging.INFO,format="%(asctime)s - %(levelname)s - %(message)s")
CUDA out of memorybatch_size参数model.gradient_checkpointing_enable()--model-parallel参数分割模型trust_remote_code=True参数
FROM nvidia/cuda:11.8.0-base-ubuntu22.04RUN apt update && apt install -y python3-pipRUN pip install torch transformers accelerateCOPY ./deepseek-7b /app/modelCOPY app.py /app/WORKDIR /appCMD ["python", "app.py"]
apiVersion: apps/v1kind: Deploymentmetadata:name: deepseek-deployspec:replicas: 2selector:matchLabels:app: deepseektemplate:metadata:labels:app: deepseekspec:containers:- name: deepseekimage: deepseek-model:latestresources:limits:nvidia.com/gpu: 1memory: "64Gi"cpu: "8"
本指南提供了从环境准备到高级部署的完整流程,开发者可根据实际需求选择适合的部署方案。建议首次部署时先使用7B模型进行测试,逐步扩展至更大规模。实际生产环境中,建议结合监控系统(如Prometheus+Grafana)建立完整的运维体系。