简介:本文深入探讨如何利用Spring AI框架与Ollama工具链,构建并调用deepseek-r1大模型的API服务。通过详细的步骤解析与代码示例,帮助开发者快速实现模型部署与接口调用。
Spring AI作为Spring生态中专注于AI开发的子项目,提供模型服务化、流式处理、多模型适配等核心能力。其与Spring Boot的深度整合,可快速构建RESTful API服务。Ollama则是开源的本地化模型运行框架,支持通过Docker容器部署LLM模型,提供高性能的推理服务。
采用分层架构设计:
这种设计兼顾开发效率与运行性能,特别适合需要本地化部署的私有化AI服务场景。
需准备以下环境:
建议使用Linux服务器(Ubuntu 22.04+)以获得最佳性能,Windows/macOS需通过WSL2或Docker Desktop配置。
安装Ollama:
curl -fsSL https://ollama.com/install.sh | sh
下载deepseek-r1模型(以7B参数版为例):
ollama pull deepseek-r1:7b
验证模型加载:
ollama run deepseek-r1:7b "测试指令"
关键参数配置建议:
--num-gpu 1 --temperature 0.7-v /path/to/models:/models挂载卷使用Spring Initializr创建项目,添加以下依赖:
<dependencies><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-ollama-starter</artifactId><version>0.8.0</version></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency></dependencies>
application.yml配置示例:
spring:ai:ollama:base-url: http://localhost:11434model-id: deepseek-r1:7bchat:prompt-template: "用户:{{prompt}}\nAI:"
创建ChatService接口:
public interface ChatService {String chat(String prompt);Stream<String> streamChat(String prompt);}
实现类使用Spring AI的OllamaClient:
@Servicepublic class OllamaChatService implements ChatService {private final OllamaChatClient chatClient;public OllamaChatService(OllamaChatClient chatClient) {this.chatClient = chatClient;}@Overridepublic String chat(String prompt) {ChatRequest request = ChatRequest.builder().messages(Collections.singletonList(AiMessage.builder().content(prompt).build())).build();ChatResponse response = chatClient.call(request);return response.getGeneration().getContent();}@Overridepublic Stream<String> streamChat(String prompt) {// 实现流式响应逻辑}}
REST API端点示例:
@RestController@RequestMapping("/api/chat")public class ChatController {@Autowiredprivate ChatService chatService;@PostMappingpublic ResponseEntity<String> chat(@RequestBody ChatRequestDto request) {String response = chatService.chat(request.getPrompt());return ResponseEntity.ok(response);}@GetMapping("/stream")public ResponseEntity<StreamingResponseBody> streamChat(@RequestParam String prompt) {// 实现SSE流式响应}}
使用Spring的StreamingResponseBody:
public ResponseEntity<StreamingResponseBody> streamChat(@RequestParam String prompt) {StreamingResponseBody stream = outputStream -> {// 通过Ollama的SSE接口获取流式数据// 逐块写入outputStream};return ResponseEntity.ok().header(HttpHeaders.CONTENT_TYPE, "text/event-stream").body(stream);}
实现多轮对话的上下文保持:
public class ConversationManager {private Map<String, List<Message>> conversations = new ConcurrentHashMap<>();public List<Message> getConversation(String sessionId) {return conversations.computeIfAbsent(sessionId, k -> new ArrayList<>());}public void addMessage(String sessionId, Message message) {getConversation(sessionId).add(message);}}
关键优化点:
连接池配置:
spring:ai:ollama:connection-pool:max-size: 10idle-timeout: 30000
批处理优化:
@Beanpublic OllamaChatClient ollamaChatClient(OllamaProperties properties) {return new OllamaChatClientBuilder(properties).batchSize(512) // 最大token批处理.build();}
Dockerfile示例:
FROM eclipse-temurin:17-jdk-jammyWORKDIR /appCOPY target/*.jar app.jarEXPOSE 8080ENTRYPOINT ["java","-jar","app.jar"]
docker-compose.yml配置:
version: '3.8'services:ollama:image: ollama/ollamavolumes:- ./models:/modelsports:- "11434:11434"api:build: .ports:- "8080:8080"depends_on:- ollama
推荐监控指标:
Prometheus配置示例:
scrape_configs:- job_name: 'spring-ai'metrics_path: '/actuator/prometheus'static_configs:- targets: ['api:8080']
模型选择策略:
安全实践:
扩展方案:
连接超时:
显存不足:
流式响应卡顿:
通过以上技术方案,开发者可以快速构建基于Spring AI和Ollama的deepseek-r1模型服务,实现从本地部署到API服务化的完整链路。该方案特别适合需要数据主权控制的金融、医疗等行业应用场景。