Windows下最详尽的Ollama+Deepseek-r1 本地部署手册

作者:快去debug2025.11.06 11:10浏览量:2

简介:本文提供Windows系统下Ollama与Deepseek-r1模型本地部署的完整指南,涵盖环境配置、依赖安装、模型加载及API调用全流程,适合开发者及企业用户实现本地化AI部署。

Windows下Ollama+Deepseek-r1本地部署全流程指南

一、部署前环境准备

1.1 系统要求验证

  • 硬件配置:建议NVIDIA显卡(CUDA 11.8+支持),显存≥8GB;CPU需支持AVX2指令集(可通过Get-CimInstance Win32_Processor | Select-Object -ExpandProperty L2CacheSize命令验证)
  • 系统版本:Windows 10/11专业版/企业版(家庭版需升级或使用WSL2)
  • 磁盘空间:模型文件约35GB,建议预留60GB以上可用空间

1.2 依赖工具安装

  1. Python环境

    1. # 使用PowerShell安装Miniconda
    2. $ProgressPreference = 'SilentlyContinue'
    3. Invoke-WebRequest -Uri "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile "$env:TEMP\Miniconda3.exe"
    4. Start-Process "$env:TEMP\Miniconda3.exe" -ArgumentList "/S /D=C:\Miniconda3" -Wait

    验证安装:

    1. & "C:\Miniconda3\Scripts\conda.exe" --version
  2. CUDA工具包

    • 访问NVIDIA CUDA Toolkit选择本地版本
    • 安装时勾选”CUDA”和”cuDNN”组件
    • 验证环境变量:
      1. $env:Path -split ";" | Select-String "CUDA"

二、Ollama核心组件部署

2.1 Ollama服务安装

  1. 下载安装包

    1. Invoke-WebRequest -Uri "https://ollama.com/download/windows/amd64/ollama-windows-amd64.zip" -OutFile "$env:TEMP\ollama.zip"
    2. Expand-Archive "$env:TEMP\ollama.zip" -DestinationPath "C:\ollama"
  2. 服务注册

    1. # 创建Windows服务
    2. New-Service -Name "OllamaService" -BinaryPathName "C:\ollama\ollama.exe serve" -DisplayName "Ollama AI Service" -StartupType Automatic
    3. Start-Service -Name "OllamaService"
  3. 端口验证

    1. Test-NetConnection -ComputerName localhost -Port 11434

2.2 模型管理配置

  1. 模型拉取

    1. # 通过CMD执行(需管理员权限)
    2. C:\ollama\ollama.exe pull deepseek-r1:7b

    参数说明:

    • 7b:基础版(70亿参数)
    • 可选版本:14b(140亿参数)、33b(330亿参数)
  2. 模型优化

    1. # 创建量化配置文件quant.yaml
    2. @'
    3. quantize:
    4. method: q4_0
    5. group_size: 128
    6. '@ | Out-File -FilePath "C:\models\quant.yaml" -Encoding utf8
    7. C:\ollama\ollama.exe create deepseek-r1-quantized -f "C:\models\quant.yaml" --model deepseek-r1:7b

三、Deepseek-r1模型集成

3.1 API服务搭建

  1. FastAPI服务端

    1. # save as app.py
    2. from fastapi import FastAPI
    3. import requests
    4. import uvicorn
    5. app = FastAPI()
    6. OLLAMA_API = "http://localhost:11434"
    7. @app.post("/generate")
    8. async def generate(prompt: str):
    9. response = requests.post(
    10. f"{OLLAMA_API}/api/generate",
    11. json={"model": "deepseek-r1:7b", "prompt": prompt, "stream": False}
    12. )
    13. return response.json()["response"]
    14. if __name__ == "__main__":
    15. uvicorn.run(app, host="0.0.0.0", port=8000)
  2. 服务启动

    1. conda activate base
    2. pip install fastapi uvicorn requests
    3. python app.py

3.2 客户端调用示例

  1. # client_demo.py
  2. import httpx
  3. async def query_model():
  4. async with httpx.AsyncClient() as client:
  5. response = await client.post(
  6. "http://localhost:8000/generate",
  7. json={"prompt": "解释量子计算的基本原理"}
  8. )
  9. print(response.json())
  10. # 执行查询
  11. import asyncio
  12. asyncio.run(query_model())

四、性能优化方案

4.1 内存管理策略

  1. 分页锁存优化

    1. # 修改注册表增加页面文件
    2. Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" -Name "PagingFiles" -Value "C:\pagefile.sys 32768 65536"
  2. 显存分配控制

    1. # 在生成请求中添加显存参数
    2. requests.post(
    3. f"{OLLAMA_API}/api/generate",
    4. json={
    5. "model": "deepseek-r1:7b",
    6. "prompt": "...",
    7. "options": {
    8. "num_gpu": 1,
    9. "gpu_layers": 50 # 控制显存占用层数
    10. }
    11. }
    12. )

4.2 请求并发控制

  1. # 使用Nginx做请求限流(需安装nginx for windows)
  2. worker_processes 1;
  3. events {
  4. worker_connections 1024;
  5. }
  6. http {
  7. limit_req_zone $binary_remote_addr zone=one:10m rate=5r/s;
  8. server {
  9. listen 8000;
  10. location /generate {
  11. limit_req zone=one burst=20;
  12. proxy_pass http://127.0.0.1:8000;
  13. }
  14. }
  15. }

五、故障排查指南

5.1 常见问题处理

现象 解决方案
端口11434无法访问 检查防火墙规则:New-NetFirewallRule -DisplayName "Ollama" -Direction Inbound -LocalPort 11434 -Protocol TCP -Action Allow
模型加载失败 验证模型路径:Get-ChildItem "C:\Users\<username>\.ollama\models"
CUDA内存不足 降低gpu_layers参数或使用量化模型

5.2 日志分析方法

  1. Ollama服务日志

    1. Get-EventLog -LogName Application -Source "OllamaService" -After (Get-Date).AddHours(-1) | Format-Table -AutoSize
  2. FastAPI访问日志

    1. # 在app.py中添加中间件
    2. from fastapi import Request
    3. from fastapi.middleware import Middleware
    4. from fastapi.middleware.base import BaseHTTPMiddleware
    5. class LoggingMiddleware(BaseHTTPMiddleware):
    6. async def dispatch(self, request: Request, call_next):
    7. print(f"Request: {request.method} {request.url}")
    8. response = await call_next(request)
    9. print(f"Response status: {response.status_code}")
    10. return response
    11. app.add_middleware(LoggingMiddleware)

六、进阶部署方案

6.1 容器化部署

  1. Dockerfile示例

    1. FROM nvidia/cuda:11.8.0-base-windowsservercore-ltsc2019
    2. SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
    3. RUN Invoke-WebRequest -Uri "https://ollama.com/download/windows/amd64/ollama-windows-amd64.zip" -OutFile "ollama.zip"; \
    4. Expand-Archive ollama.zip -DestinationPath C:\ollama; \
    5. Remove-Item ollama.zip
    6. CMD ["C:\\ollama\\ollama.exe", "serve", "--model", "deepseek-r1:7b"]
  2. Kubernetes部署

    1. # ollama-deployment.yaml
    2. apiVersion: apps/v1
    3. kind: Deployment
    4. metadata:
    5. name: ollama
    6. spec:
    7. replicas: 1
    8. selector:
    9. matchLabels:
    10. app: ollama
    11. template:
    12. metadata:
    13. labels:
    14. app: ollama
    15. spec:
    16. containers:
    17. - name: ollama
    18. image: ollama-windows:latest
    19. resources:
    20. limits:
    21. nvidia.com/gpu: 1
    22. ports:
    23. - containerPort: 11434

6.2 安全加固措施

  1. API认证

    1. # 修改app.py添加JWT验证
    2. from fastapi.security import OAuth2PasswordBearer
    3. from jose import JWTError, jwt
    4. SECRET_KEY = "your-secret-key"
    5. oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
    6. async def get_current_user(token: str = Depends(oauth2_scheme)):
    7. credentials_exception = HTTPException(
    8. status_code=status.HTTP_401_UNAUTHORIZED,
    9. detail="Could not validate credentials",
    10. headers={"WWW-Authenticate": "Bearer"},
    11. )
    12. try:
    13. payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
    14. username: str = payload.get("sub")
    15. if username is None:
    16. raise credentials_exception
    17. except JWTError:
    18. raise credentials_exception
    19. return username
  2. 网络隔离

    1. # 创建专用网络
    2. New-VNetSwitch -Name "OllamaNetwork" -SwitchType Private
    3. Set-NetConnectionProfile -InterfaceAlias "Ethernet" -NetworkCategory Private

本手册覆盖了从基础环境搭建到高级部署方案的完整流程,通过分步骤的详细说明和可执行的代码示例,帮助用户在Windows环境下实现Ollama与Deepseek-r1模型的高效本地部署。实际部署时建议先在测试环境验证,再逐步迁移到生产环境。