简介:本文提供电脑本地部署DeepSeek的完整流程,涵盖环境配置、模型下载、代码实现及优化建议,适合开发者及企业用户快速上手。
conda create -n deepseek python=3.10conda activate deepseek
pip install安装核心库:
pip install torch transformers accelerate sentencepiece
git lfs installgit clone https://huggingface.co/deepseek-ai/DeepSeek-V2
pip install bitsandbytes # 量化支持库
| 版本 | 参数规模 | 显存占用 | 推理速度 | 适用场景 |
|---|---|---|---|---|
| 完整版 | 67B | 65GB+ | 慢 | 服务器级部署 |
| 7B量化版 | 7B | 12GB | 快 | 本地开发/轻量级应用 |
| 3.5B微调版 | 3.5B | 6GB | 极快 | 边缘设备/移动端 |
from transformers import AutoModelForCausalLM, AutoTokenizerimport torch# 加载模型与分词器model_path = "./DeepSeek-V2" # 本地模型路径tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True,torch_dtype=torch.bfloat16, # 使用BF16降低显存device_map="auto" # 自动分配设备)# 推理示例input_text = "解释量子计算的原理:"inputs = tokenizer(input_text, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=100)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
device_map="auto":自动分配模型到可用GPU。torch_dtype:推荐torch.bfloat16(平衡精度与速度)。max_new_tokens:控制生成文本长度。bitsandbytes进行8-bit量化:
from transformers import BitsAndBytesConfigquant_config = BitsAndBytesConfig(load_in_8bit=True)model = AutoModelForCausalLM.from_pretrained(model_path,quantization_config=quant_config,device_map="auto")
gradient_checkpointing减少内存占用:
model.gradient_checkpointing_enable()
batch_size或使用量化。nvidia-smi)。trust_remote_code=True。
FROM nvidia/cuda:11.8.0-base-ubuntu20.04RUN apt update && apt install -y python3-pipWORKDIR /appCOPY . .RUN pip install torch transformers accelerateCMD ["python", "inference.py"]
k8s实现多节点负载均衡。使用FastAPI封装推理接口:
from fastapi import FastAPIimport uvicornapp = FastAPI()@app.post("/generate")async def generate(prompt: str):inputs = tokenizer(prompt, return_tensors="pt").to("cuda")outputs = model.generate(**inputs, max_new_tokens=100)return {"response": tokenizer.decode(outputs[0])}if __name__ == "__main__":uvicorn.run(app, host="0.0.0.0", port=8000)
from transformers import Trainer, TrainingArgumentsfrom datasets import load_dataset# 加载数据集dataset = load_dataset("json", data_files="train.json")# 定义训练参数training_args = TrainingArguments(output_dir="./output",per_device_train_batch_size=4,num_train_epochs=3,learning_rate=5e-5,fp16=True)# 初始化Trainertrainer = Trainer(model=model,args=training_args,train_dataset=dataset["train"],)trainer.train()
sentencepiece训练领域特定分词器:
spm_train --input=corpus.txt --model_prefix=sp --vocab_size=32000
deepseek。vLLM:加速推理库。Triton Inference Server:企业级部署方案。通过本文的详细步骤,开发者可在本地环境高效部署DeepSeek,并根据实际需求调整模型规模与推理性能。建议从7B量化版开始测试,逐步扩展至更大模型。