logo

全面讲解基于大型语言模型的智能Agent:发展历程、架构与基于Langchain的实现demo

在大型语言模型(LLM)的时代,基于大型语言模型的智能Agen在过去一年中取得了显著进展。
本文主要介绍基于大型语言模型的智能Agent,目录如下:
  1. Agent技术的起源。
  2. 人工智能Agent技术的发展历程。
  3. 基于LLM的Agent架构。
  4. 基于LLM的Agent应用。
  5. 使用简单代码实现基于LLM的Agent。

Agent技术的起源

什么是Agent(代理/智能体)

Agent,又可以翻译为代理或者智能体
Agent的定义和性质因学科或文化背景而异。通常,Agent是一个具有自主性的个体,能够行使自己的意志,做出决定并采取行动,而不仅仅是被动地响应外部刺激。人类是这个星球上最复杂的Agent。

人工智能Agent

自20世纪80年代中期以来,人工智能领域关于Agent的研究显著增加。基于此,Wooldridge将人工智能定义为旨在设计和构建表现出智能行为的计算机Agent。
从本质上讲,人工智能Agent是对Agent概念的具体化。
如图1所示,人工智能Agent是一个通过传感器感知其环境、做出决策并相应地响应的人造实体。

人工智能Agent的发展阶段

人工智能Agent研究的技术演变历史主要包括以下几个阶段。

符号Agent

在人工智能研究的早期阶段,主要采用的方法是符号人工智能,它使用逻辑规则和符号表示来封装知识并促进推理过程。符号Agent的架构如图2所示:
以前的各种基于知识的专家系统就是最常见的符号Agent。该类系统主要由知识库、推理引擎和解释器组成。
然而,正如决策引擎逐渐被AI模型所淘汰,人工构建的决策逻辑通常太过死板,难以具有应用价值。

反应型Agent

与符号Agent不同,反应型Agent不采用复杂的符号推理。他们主要关注Agent与环境之间的互动,优先考虑快速和实时的反应。反应型Agent通常使用预定义的规则集来指导其行为,如图3所示:
相对于符号Agent,反应型Agent所使用的策略更为简单,举个例子,符号Agent类似于编译器,决策引擎中有大量逻辑推演规则,而反应型Agent则就是一堆if else,通过读取环境数据快速进行判断。

基于强化学习的Agent

在LLM出现之前,基于强化学习的Agent属于是研究热点,最著名的应该就是AlphaGo。这一领域的主要关注点是如何使Agent通过与环境的互动来学习,以在特定任务中获得最大的累积奖励。
深度学习出现后深度神经网络与强化学习整合。这使得Agent能够从高维输入中学习复杂的策略。如图4所示。
然而,强化学习的问题包括:长时间的训练周期、采样效率低、在复杂的现实世界环境中模型不稳定。

基于LLM的Agent

近年来,大型语言模型(LLM)非常火热,潜力巨大。因此,一个新的研究领域已经出现,使用LLM作为Agent的核心控制器,以让Agent拥有人类水平的决策能力。
这是文章的重点,接下来将详细说明。

基于LLM的Agent的架构

基于LLM的Agent的架构形式各异。然而,所有架构的核心模块都包括记忆、规划和行动

四模块框架

Wang et al.提出了一个统一框架,如图5所示。这个框架包括一个分析(Profile)模块、一个记忆(Memory)模块、一个规划(Planning)模块和一个行动(Action)模块

分析模块

Agent在执行任务时通常会预定义一个身份,比如教师、某领域的专家等。分析模块的作用是定义这些agent所扮演的角色的详细档案,这些档案会被写入到提示中,用以影响大型语言模型(LLM)的行为。
Agent档案一般包含基本信息(如年龄、性别、职业)、个性相关的心理学信息,以及描述Agent间社交关系的信息。选择哪些信息主要取决于应用的具体场景。

记忆模块

基于LLM(大型语言模型)的Agent的记忆机制仿照了人类记忆。人类记忆可以分为短期记忆(短暂保持信息)和长期记忆(在较长时间内巩固信息)。
而在LLM中,短期记忆指transformer架构限制的上下文窗口内的输入信息。长期记忆类似于外部向量存储,Agent可以根据需要快速查询和检索。

规划模块

面对复杂任务时,人类倾向于将其分解为更简单的子任务并分别解决它们。规划模块的目标是赋予Agent这种人类能力,使Agent的行为更加强大。思维链就是一种常见的规划策略。

行动模块

行动模块将Agent的决策转化为具体的输出。这个模块直接与环境互动。它受到分析、记忆和规划模块的影响。行动模块可以分为4部分:
  • 行动目标涉及Agent执行特定行动所达成的具体目标。
  • 行动产生描述了行动是如何从Agent的决策过程中产生出来的,包括决策逻辑、策略选择等。
  • 行动空间定义了在特定环境中Agent可以采取的所有可能行动的集合。
  • 行动影响考虑了行动执行后对环境、Agent自身状态或整个任务进展产生的后果。

三模块框架

此外也有一些其他的框架,Xi et al.提出了一个基于LLM的Agent的一般概念框架,由三个关键部分组成:大脑(brain)、感知(perception)和行动(action),如图6所示。
大脑模块作为控制器,处理基本任务,如记忆、思考和决策。感知模块解释和处理来自外部环境的多模态信息,而行动模块则执行响应并使用工具与环境互动。
举个例子来说明工作流程:假设有人问今天是否会下雨。感知模块将这个查询转换成LLM可以理解的格式。然后,大脑模块根据当前的天气情况和在线天气报告进行推断。最后,行动模块作出响应并给这个人递一把伞。通过这一过程,Agent能够持续接收反馈并与环境互动。

基于LLM的Agent的应用

根据领域的不同,基于LLM的Agent的应用可以分为三类:社会科学、自然科学和工程,如图7所示。
根据应用场景的不同,基于LLM的Agent的应用又可以分为:单一Agent、多重Agent和人机交互Agent,如图8所示。
单一Agent具有多样化的能力。当多个Agent互动时,它们可以通过合作或对抗性互动加强性能。人与Agent的互动中,人类的反馈可以使Agent更有效地执行任务。

动手实现基于LLM的Agent

接下来我们将用Langchain和python实现一个Agent的demo,demo的整体架构图如下:

环境配置

这里我用了anaconda来配置环境,命令如下:
  
  
  
  
  
  
(base) Florian: conda create -n agent python=3.11
(base) Florian: conda activate agent
(agent) Florian: pip install langchain
(agent) Florian: pip install langchain_openai
(agent) Florian: pip install duckduckgo-search
最终所用库的版本如下:
  
  
  
  
  
  
langchain 0.1.15
langchain-community 0.0.32
langchain-core 0.1.41
langchain-openai 0.1.2
langchain-text-splitters 0.0.1
duckduckgo_search 5.3.0

首先导入库

  
  
  
  
  
  
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
from langchain.agents import AgentExecutor, Tool, ZeroShotAgent
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_openai import OpenAI
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper

为Agent引入搜索能力

这里,我使用DuckDuckGo库作为Agent可以使用的工具,从而为Agent赋予搜索能力。
tools定义了一个工具列表,其中包含一个名为"Search"的工具,该工具使用search.run函数。
  
  
  
  
  
  
search = DuckDuckGoSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
)
]

计划模块和行动模块

如前所述,计划模块和行动模块是一个Agent的核心,这里我使用ReAct算法,构建计划模块和行动模块。该算法如下图所示:

ReAct算法解析

以做一道菜为例,切好菜和打开煤气之间,有一个这样的推理,或者说内心独白:”现在我切好菜了,后面要煮菜了,我需要打开煤气。“在做菜过程中,如果发生意外,发现没有盐时,有这样一个推理:”没有盐了,今天我就用胡椒粉调味“,然后我们就会去拿胡椒粉。
通过这个洞察,作者提出一个方法:
让LLM把内心独白说出来,然后再根据独白做相应的动作,来提高LLM答案的准确性。
以下面这个问题为例:
除了Apple遥控器,还有什么其他设备可以控制相关软件?相关软件指的是,Apple遥控器最早可以控制的软件。
说明:
  1. Apple遥控器最早只能控制Front Row软件。
  2. Front Row软件可以被两种设备控制,Apple遥控器和键盘的功能键。
  3. 所以,正确答案是键盘的功能键。
ReAct方法,即推理+动作。得到结果:
想法1:我需要搜索Apple遥控器,并找到Apple遥控器最早可以控制的软件。
动作1:搜索【Apple 遥控器】
结果1:Apple遥控器是一种遥控器……最早可以控制“Front Row”……
想法2:Apple遥控器最早可以控制Front Row。我需要搜索Front Row下一步,找出还有什么其他设备可以控制他
动作2:搜索【Front Row】
结果2:无结果,可以试试“Front Row Seat to Earth”和“Front Row软件”
想法3:Front Row没有找到,我可以搜索“Front Row软件”
动作3:搜索【Front Row软件】
结果3:FrontRow是一种过时软件……可以被Apple遥控器和键盘功能键控制。
想法4:现在我知道答案了。
动作4:完成【键盘功能键】
答案正确。通过显式推理,加上动作,LLM Agent自主找到了答案。整个过程感觉就像是个不是特别聪明的孩子,要把想法写出来,然后去综合所有想法和观察,接着再做出相应的动作。但显然这个方法很有效,它最终找到了答案。

代码实现

  
  
  
  
  
  
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"
{chat_history}
Question: {input}
{agent_scratchpad}"""
# 使用`ZeroShotAgent.create_prompt`方法创建一个提示,该提示将用于与LLM交互。
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)
这里定义了对话的前缀和后缀,以及对话历史、用户输入和Agent的思考过程的占位符。使用ZeroShotAgent.create_prompt方法创建一个prompt
这里可能有读者疑惑,ReAct算法在哪使用呢————在prompt里!
这里展示下内容:
  
  
  
  
  
  
Have a conversation with a human, answering the following questions as best you can.
You have access to the following tools:
Search: useful for when you need to answer questions about current events
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!"
{chat_history}
Question: {input}
{agent_scratchpad}
prompt将使用一个定义好的[search]工具,并且prompt末尾有三个变量:
  1. chat_history:包含存储在记忆中的内容。这包括之前的对话、Agent的内部状态、或者是先前任务的上下文信息。
  2. input:指用户输入的问题。这是Agent需要处理和响应的主要输入。
  3. agent_scratchpad:代表Agent之前的思考过程,包括思考、行动、行动输入、观察等。这个变量会在Agent的执行过程中不断更新,记录Agent的推理过程和决策依据。

记忆模块

langchain已经提供了默认的记忆模块的函数:
  
  
  
  
  
  
memory = ConversationBufferMemory(memory_key="chat_history")
创建一个ConversationBufferMemory实例,用于存储对话历史。

创建Agent

  
  
  
  
  
  
#创建一个`LLMChain`实例,它将使用OpenAI模型和之前创建的提示。
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
#创建一个`ZeroShotAgent`实例,它将使用LLM链和工具列表。
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
#创建一个`AgentExecutor`实例,它将用于运行agent。
agent_executor = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)

测试用例

  
  
  
  
  
  
agent_executor.run(input="How many people live in canada?")
agent_executor.run(input="what is their national anthem called?")
agent_executor.run(input="what is their capital?")
这里连续运行了三次agent执行器,每次处理一个不同的输入。第二次和第三次测试了agent的记忆功能,即agent能否利用之前交互中的信息来回答后续问题。

最终的全部代码

  
  
  
  
  
  
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
from langchain.agents import AgentExecutor, Tool, ZeroShotAgent
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_openai import OpenAI
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
search = DuckDuckGoSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
)
]
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"
{chat_history}
Question: {input}
{agent_scratchpad}"""
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_executor = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)
agent_executor.run(input="How many people live in canada?")
# To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be answered correctly.
agent_executor.run(input="what is their national anthem called?")
agent_executor.run(input="what is their capital?")

测试结果

这里我用了个gpt-3.5的api,输出如下:
  
  
  
  
  
  
> Entering new AgentExecutor chain...
Thought: I should use the Search tool to find the most recent population data for Canada.
Action: Search
Action Input: "Population of Canada"
Observation: Canada population density map (2014) Top left: The Quebec City-Windsor Corridor is the most densely inhabited and heavily industrialized region accounting for nearly 50 percent of the total population Canada ranks 37th by population among countries of the world, comprising about 0.5% of the world's total, with 40 million Canadians. Despite being the second-largest country by total area ... As of July 1, 2023, NPRs were estimated to represent 5.5% of the population of Canada. Among provinces, this proportion was highest in British Columbia (7.3%) and Ontario (6.3%) and lowest in Newfoundland and Labrador (2.4%) and Saskatchewan (2.5%). The 2.2 million NPRs now outnumber the 1.8 million Indigenous people enumerated during the 2021 ... Historical population of Canada. Statistics Canada conducts a country-wide census that collects demographic data every five years on the first and sixth year of each decade. The 2021 Canadian census enumerated a total population of 36,991,981, an increase of around 5.2 percent over the 2016 figure. It is estimated that Canada's population surpassed 40 million in 2023 and 41 million in 2024. Canada's population reaches 40 million. On June 16, 2023, Statistics Canada announced that Canada's population passed the 40 million mark according to the Canada's population clock (real-time model). Today's release of total demographic estimates and related data tables for a reference date of July 1, 2023, is the first since reaching that ... Canada's population was estimated at 40,528,396 on October 1, 2023, an increase of 430,635 people (+1.1%) from July 1. This was the highest population growth rate in any quarter since the second quarter of 1957 (+1.2%), when Canada's population grew by 198,000 people. At the time, Canada's population was 16.7 million people, and this rapid population growth resulted from the high number of ...
Thought: Based on the data, I can see that the population of Canada is estimated to be around 40 million as of October 1, 2023.
Final Answer: The estimated population of Canada as of October 1, 2023 is 40 million.
> Finished chain.
> Entering new AgentExecutor chain...
Thought: I should use the search tool to find the answer.
Action: Search
Action Input: "Canada national anthem"
Observation: O Canada, national anthem of Canada.It was proclaimed the official national anthem on July 1, 1980. "God Save the Queen" remains the royal anthem of Canada. The music, written by Calixa Lavallée (1842-91), a concert pianist and native of Verchères, Quebec, was commissioned in 1880 on the occasion of a visit to Quebec by John Douglas Sutherland Campbell, marquess of Lorne (later 9th ... Learn about the history and lyrics of Canada's national anthem 'O Canada', which has both French and English versions. The song was composed by Calixa Lavallée in 1880 and was proclaimed the official anthem in 1980. It replaced 'God Save the Queen', which is Canada's royal anthem. O Canada (French: Ô Canada) is the national anthem of Canada. The song was originally commissioned by Lieutenant Governor of Quebec Théodore Robitaille for t... National Anthem of Canada - O Canada (English only) - featuring new lyricsOther versions:Bilingual: https://www.youtube.com/watch?v=wBCuyeoSURoFrench only: h... Enjoy this virtual choir rendition of 'O Canada' arranged by George Alfred Grant-Shaefer . Make sure to subscribe for more virtual choir videos!After 100 yea...
Thought: I now know the final answer.
Final Answer: The national anthem of Canada is "O Canada".
> Finished chain.
> Entering new AgentExecutor chain...
Thought: I should use the Search tool to find the answer.
Action: Search
Action Input: "Capital of Canada"
Observation: Ottawa is the capital city of Canada.It is located in the southern portion of the province of Ontario, at the confluence of the Ottawa River and the Rideau River.Ottawa borders Gatineau, Quebec, and forms the core of the Ottawa-Gatineau census metropolitan area (CMA) and the National Capital Region (NCR). As of 2021, Ottawa had a city population of 1,017,449 and a metropolitan population of ... Ottawa, city, capital of Canada, located in southeastern Ontario.In the eastern extreme of the province, Ottawa is situated on the south bank of the Ottawa River across from Gatineau, Quebec, at the confluence of the Ottawa (Outaouais), Gatineau, and Rideau rivers.The Ottawa River (some 790 miles [1,270 km] long), the principal tributary of the St. Lawrence River, was a key factor in the city ... Skyline of Toronto. The national capital is Ottawa, Canada's fourth largest city. It lies some 250 miles (400 km) northeast of Toronto and 125 miles (200 km) west of Montreal, respectively Canada's first and second cities in terms of population and economic, cultural, and educational importance. The third largest city is Vancouver, a centre ... Learn about Canada's location, climate, terrain, natural resources, and major lakes and rivers. Find out the population distribution, ethnic groups, languages, and religions of Canada. The national capital, Ottawa, is prominently marked in the province of Ontario. Where is Canada? Canada is the largest country in North America. Canada is bordered by non-contiguous US state of Alaska in the northwest and by 12 other US states in the south. The border of Canada with the US is the longest bi-national land border in the world.
Thought: I now know the final answer.
Final Answer: The capital of Canada is Ottawa.
> Finished chain.
————————————————
版权声明:本文为稀土掘金博主「大鲸鱼crush」的原创文章
如有侵权,请联系千帆社区进行删除
评论
用户头像