简介:本文详解如何结合DeepSeek R1大模型与Ollama本地化部署工具,构建高可用RAG(检索增强生成)系统,涵盖架构设计、代码实现与性能优化全流程。
RAG(Retrieval-Augmented Generation)技术通过将检索系统与生成模型结合,解决了传统大模型在知识时效性、领域专业性及幻觉问题上的短板。本方案选择DeepSeek R1作为生成核心,Ollama作为本地化部署框架,主要基于以下考量:
系统分为四层:
ollama run deepseek-r1:7b启动模型服务
# 安装依赖pip install chromadb langchain-community sentence-transformers fastapi uvicorn# 启动Ollama服务(需提前安装)ollama serve# 下载Embedding模型git lfs installmkdir -p models/embeddingcd models/embeddingwget https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/pytorch_model.bin
from chromadb import Client, Settingsdef init_vector_db():client = Client(Settings(chroma_db_impl="duckdb+parquet",persist_directory="./chroma_persist"))collection = client.create_collection(name="knowledge_base",metadata={"hnsw:space": "cosine"})return collection
from langchain.document_loaders import DirectoryLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterdef process_documents(doc_dir):loader = DirectoryLoader(doc_dir, glob="**/*.{pdf,docx,txt}")documents = loader.load()text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)split_docs = text_splitter.split_documents(documents)return split_docs
from langchain.embeddings import HuggingFaceEmbeddingsfrom langchain.llms import Ollamafrom langchain.chains import RetrievalQAclass RAGSystem:def __init__(self):self.embeddings = HuggingFaceEmbeddings(model_name="./models/embedding",model_kwargs={"device": "cuda"})self.llm = Ollama(model="deepseek-r1:7b",url="http://localhost:11434")self.collection = init_vector_db()def update_knowledge(self, documents):texts = [doc.page_content for doc in documents]embeddings = self.embeddings.embed_documents(texts)ids = [str(i) for i in range(len(texts))]metadatas = [{"source": doc.metadata["source"]} for doc in documents]self.collection.upsert(ids=ids,embeddings=embeddings,metadatas=metadatas,documents=texts)def query(self, question, k=3):retriever = self.collection.as_retriever(search_type="similarity",search_kwargs={"k": k},embedding_function=self.embeddings)qa_chain = RetrievalQA.from_chain_type(llm=self.llm,chain_type="stuff",retriever=retriever,return_source_documents=True)result = qa_chain(question)return result
from fastapi import FastAPIfrom pydantic import BaseModelapp = FastAPI()rag_system = RAGSystem()class QueryRequest(BaseModel):question: strcontext_files: list[str] = []@app.post("/query")async def query_endpoint(request: QueryRequest):if request.context_files:# 实际应用中应实现文件上传逻辑passresult = rag_system.query(request.question)return {"answer": result["result"],"sources": [doc.metadata["source"] for doc in result["source_documents"]]}
def create_hybrid_retriever(collection, embeddings):
vector_retriever = collection.as_retriever(
search_type=”similarity”,
embedding_function=embeddings
)
# 实际应用中需集成BM25检索器return EnsembleRetriever(retrievers=[vector_retriever],weights=[1.0])
## 2. 生成层优化- **温度参数调优**:根据场景调整`temperature`(0.1-0.7)和`top_p`(0.8-0.95)- **流式输出**:实现SSE(Server-Sent Events)支持```pythonfrom fastapi import Responsefrom fastapi.concurrency import run_in_threadpoolasync def stream_response(llm, question):generator = llm.stream(question)async def generate():async for chunk in generator:yield f"data: {chunk}\n\n"return Response(generate(), media_type="text/event-stream")
容器化部署:使用Docker Compose管理服务依赖
version: '3.8'services:ollama:image: ollama/ollama:latestvolumes:- ./models:/root/.ollama/modelsports:- "11434:11434"deploy:resources:reservations:devices:- driver: nvidiacount: 1capabilities: [gpu]api:build: .ports:- "8000:8000"depends_on:- ollama
数据更新机制:
监控体系:
安全加固:
企业知识管理:
客户服务自动化:
法律文书分析:
本方案通过DeepSeek R1与Ollama的深度整合,构建了可扩展的RAG系统框架。实际部署中需根据业务场景调整模型规模(7B/13B/33B参数)、检索策略(稀疏/稠密检索)及部署架构(单机/分布式)。建议从7B模型开始验证,逐步优化至满足业务需求的性能水平。