简介:本文提供Windows系统下Ollama与Deepseek-r1模型本地部署的完整指南,涵盖环境配置、依赖安装、模型加载及API调用全流程,适合开发者及企业用户实现本地化AI部署。
Get-CimInstance Win32_Processor | Select-Object -ExpandProperty L2CacheSize命令验证)Python环境:
# 使用PowerShell安装Miniconda$ProgressPreference = 'SilentlyContinue'Invoke-WebRequest -Uri "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile "$env:TEMP\Miniconda3.exe"Start-Process "$env:TEMP\Miniconda3.exe" -ArgumentList "/S /D=C:\Miniconda3" -Wait
验证安装:
& "C:\Miniconda3\Scripts\conda.exe" --version
CUDA工具包:
$env:Path -split ";" | Select-String "CUDA"
下载安装包:
Invoke-WebRequest -Uri "https://ollama.com/download/windows/amd64/ollama-windows-amd64.zip" -OutFile "$env:TEMP\ollama.zip"Expand-Archive "$env:TEMP\ollama.zip" -DestinationPath "C:\ollama"
服务注册:
# 创建Windows服务New-Service -Name "OllamaService" -BinaryPathName "C:\ollama\ollama.exe serve" -DisplayName "Ollama AI Service" -StartupType AutomaticStart-Service -Name "OllamaService"
端口验证:
Test-NetConnection -ComputerName localhost -Port 11434
模型拉取:
# 通过CMD执行(需管理员权限)C:\ollama\ollama.exe pull deepseek-r1:7b
参数说明:
7b:基础版(70亿参数)14b(140亿参数)、33b(330亿参数)模型优化:
# 创建量化配置文件quant.yaml@'quantize:method: q4_0group_size: 128'@ | Out-File -FilePath "C:\models\quant.yaml" -Encoding utf8C:\ollama\ollama.exe create deepseek-r1-quantized -f "C:\models\quant.yaml" --model deepseek-r1:7b
FastAPI服务端:
# save as app.pyfrom fastapi import FastAPIimport requestsimport uvicornapp = FastAPI()OLLAMA_API = "http://localhost:11434"@app.post("/generate")async def generate(prompt: str):response = requests.post(f"{OLLAMA_API}/api/generate",json={"model": "deepseek-r1:7b", "prompt": prompt, "stream": False})return response.json()["response"]if __name__ == "__main__":uvicorn.run(app, host="0.0.0.0", port=8000)
服务启动:
conda activate basepip install fastapi uvicorn requestspython app.py
# client_demo.pyimport httpxasync def query_model():async with httpx.AsyncClient() as client:response = await client.post("http://localhost:8000/generate",json={"prompt": "解释量子计算的基本原理"})print(response.json())# 执行查询import asyncioasyncio.run(query_model())
分页锁存优化:
# 修改注册表增加页面文件Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" -Name "PagingFiles" -Value "C:\pagefile.sys 32768 65536"
显存分配控制:
# 在生成请求中添加显存参数requests.post(f"{OLLAMA_API}/api/generate",json={"model": "deepseek-r1:7b","prompt": "...","options": {"num_gpu": 1,"gpu_layers": 50 # 控制显存占用层数}})
# 使用Nginx做请求限流(需安装nginx for windows)worker_processes 1;events {worker_connections 1024;}http {limit_req_zone $binary_remote_addr zone=one:10m rate=5r/s;server {listen 8000;location /generate {limit_req zone=one burst=20;proxy_pass http://127.0.0.1:8000;}}}
| 现象 | 解决方案 |
|---|---|
| 端口11434无法访问 | 检查防火墙规则:New-NetFirewallRule -DisplayName "Ollama" -Direction Inbound -LocalPort 11434 -Protocol TCP -Action Allow |
| 模型加载失败 | 验证模型路径:Get-ChildItem "C:\Users\<username>\.ollama\models" |
| CUDA内存不足 | 降低gpu_layers参数或使用量化模型 |
Ollama服务日志:
Get-EventLog -LogName Application -Source "OllamaService" -After (Get-Date).AddHours(-1) | Format-Table -AutoSize
FastAPI访问日志:
# 在app.py中添加中间件from fastapi import Requestfrom fastapi.middleware import Middlewarefrom fastapi.middleware.base import BaseHTTPMiddlewareclass LoggingMiddleware(BaseHTTPMiddleware):async def dispatch(self, request: Request, call_next):print(f"Request: {request.method} {request.url}")response = await call_next(request)print(f"Response status: {response.status_code}")return responseapp.add_middleware(LoggingMiddleware)
Dockerfile示例:
FROM nvidia/cuda:11.8.0-base-windowsservercore-ltsc2019SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]RUN Invoke-WebRequest -Uri "https://ollama.com/download/windows/amd64/ollama-windows-amd64.zip" -OutFile "ollama.zip"; \Expand-Archive ollama.zip -DestinationPath C:\ollama; \Remove-Item ollama.zipCMD ["C:\\ollama\\ollama.exe", "serve", "--model", "deepseek-r1:7b"]
Kubernetes部署:
# ollama-deployment.yamlapiVersion: apps/v1kind: Deploymentmetadata:name: ollamaspec:replicas: 1selector:matchLabels:app: ollamatemplate:metadata:labels:app: ollamaspec:containers:- name: ollamaimage: ollama-windows:latestresources:limits:nvidia.com/gpu: 1ports:- containerPort: 11434
API认证:
# 修改app.py添加JWT验证from fastapi.security import OAuth2PasswordBearerfrom jose import JWTError, jwtSECRET_KEY = "your-secret-key"oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")async def get_current_user(token: str = Depends(oauth2_scheme)):credentials_exception = HTTPException(status_code=status.HTTP_401_UNAUTHORIZED,detail="Could not validate credentials",headers={"WWW-Authenticate": "Bearer"},)try:payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])username: str = payload.get("sub")if username is None:raise credentials_exceptionexcept JWTError:raise credentials_exceptionreturn username
网络隔离:
# 创建专用网络New-VNetSwitch -Name "OllamaNetwork" -SwitchType PrivateSet-NetConnectionProfile -InterfaceAlias "Ethernet" -NetworkCategory Private
本手册覆盖了从基础环境搭建到高级部署方案的完整流程,通过分步骤的详细说明和可执行的代码示例,帮助用户在Windows环境下实现Ollama与Deepseek-r1模型的高效本地部署。实际部署时建议先在测试环境验证,再逐步迁移到生产环境。