Fuyu-8B

更新时间：2025-02-27

Fuyu-8B是由Adept AI训练的多模态图像理解模型，可以支持多样的图像分辨率，回答图形图表有关问题。模型在视觉问答和图像描述等任务上表现良好。本文介绍了相关API。

功能介绍

调用本接口，用于根据用户输入的图像和文字，回答图像有关问题。

使用说明

支持通过Python SDK、Java SDK 和Node.js SDK调用，调用流程请查看SDK安装及使用流程。

SDK调用

调用示例

import os
import qianfan
import base64
from qianfan.resources import Image2Text

# 使用安全认证AK/SK鉴权，通过环境变量方式初始化；替换下列示例中参数，安全认证Access Key替换your_iam_ak，Secret Key替换your_iam_sk
os.environ["QIANFAN_ACCESS_KEY"] = "your_iam_ak"
os.environ["QIANFAN_SECRET_KEY"] = "your_iam_sk"

# 请替换图片对应的路径地址
with open("/xxx/.../image.jpg", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode()

# 使用model参数
i2t = Image2Text(model="Fuyu-8B")
resp = i2t.do(prompt="分析一下图片画了什么", image=encoded_string)

print(resp["result"])

import com.baidubce.qianfan.Qianfan;
import com.baidubce.qianfan.model.image.Image2TextResponse;

public class test {
    public static void main(String[] args) throws IllegalAccessException {

        // 使用安全认证AK/SK鉴权，替换下列示例中参数，安全认证Access Key替换your_iam_ak，Secret Key替换your_iam_sk
        Qianfan qianfan = new Qianfan("your_iam_ak", "your_iam_sk");
       
       // 调用大模型
        Image2TextResponse response = qianfan.image2Text().model("Fuyu-8B")
        .image("/9j/4AAQSkZJRgABAQAAAQABAAD/xxxxxx")    # 请替换图片的base64编码
        .prompt("introduce the picture")
        .execute();
        System.out.println(response.getResult());
    }
}

import {setEnvVariable} from '@baiducloud/qianfan'
import {Image2Text} from "@baiducloud/qianfan";

// 使用安全认证AK/SK鉴权，通过环境变量方式初始化；替换下列示例中参数，安全认证Access Key替换your_iam_ak，Secret Key替换your_iam_sk
setEnvVariable('QIANFAN_AK','your_iam_ak');
setEnvVariable('QIANFAN_SK','your_iam_sk');

// 调用大模型
const client = new Image2Text();
async function main() {
    const resp = await client.image2Text({
        prompt: '分析一下图片画了什么',
        image: 'iVBORw0KGgoAAAANSUhEUgAAB4IAAxxxxxxxxxxxx=',  //  请替换图片的base64编码
    },'Fuyu-8B'
);
    console.log(resp.result)
}

main();

返回示例

The image portray s a black and white portrait of a beautiful young woman . She is wearing a red hat, giving her  a hat -like appearance. The black and white nature of the photograph enhances the visual appeal and adds depth to the image .

The image portray s a black and white portrait of a beautiful young woman . She is wearing a red hat, giving her  a hat -like appearance. The black and white nature of the photograph enhances the visual appeal and adds depth to the image .

The image portray s a black and white portrait of a beautiful young woman . She is wearing a red hat, giving her  a hat -like appearance. The black and white nature of the photograph enhances the visual appeal and adds depth to the image .

请求参数

注意：以下为Python SDK参数说明，其他SDK参数相关说明请参考Java SDK参数相关说明、Node.js SDK参数相关说明。

名称	类型	必填	描述
prompt	string	是	请求信息
image	string	是	图片数据，说明： base64编码，要求base64编码后大小不超过4M，最短边至少15px，最长边最大4096px，支持jpg/png/bmp格式，注意请去掉头部
model	string	否	模型名称，用于指定平台支持预置服务的模型，说明：（1）如果调用预置服务，即调用本文API，该字段必填，且为固定值Fuyu-8B （2）如果指定用户自行发布的模型服务，该字段不填写，需填写endpoint字段，详见参数endpoint说明
endpoint	string	否	用于指定用户自行发布的模型服务，说明：（1）如果需指定用户自行发布的模型服务，endpoint字段为必填（2）该字段值可以通过查看服务地址获取：打开模型服务-模型推理-我的服务页面，选择创建的服务-点击详情页查看服务地址，endpoint值为`https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/image2text/`后面的地址，如下图所示注意：在创建服务页面，选择模型后，API地址会自动新增个后缀。例如选择模型Fuyu-8B，输入API地址为“fuyu8btest”，endpoint的取值即为“q24xxxb6_fuyu8btest”，如下图所示，如何发布服务请参考发布平台预置的模型服务
stream	bool	否	是否以流式接口的形式返回数据，默认False
retry_count	int	否	网络调用失败重试次数，默认1次
request_timeout	float	否	请求超时时间，默认60秒
request_id	str	否	网络请求的id，若不填写则自动生成
backoff_factor	float	否	网络调用失败重试的等待时长增长因子，默认为0。每次重试，等待时长会加上超时时长乘该因子的时间。
temperature	float	否	说明：（1）较高的数值会使输出更加随机，而较低的数值会使其更加集中和确定（2）范围 (0, 1.0]，不能为0
top_k	int	否	Top-K 采样参数，在每轮token生成时，保留k个概率最高的token作为候选。说明：（1）影响输出文本的多样性，取值越大，生成文本的多样性越强（2）取值范围：正整数
top_p	float	否	说明：（1）影响输出文本的多样性，取值越大，生成文本的多样性越强（2）取值范围 [0, 1.0]
penalty_score	float	否	通过对已生成的token增加惩罚，减少重复生成的现象。说明：（1）值越大表示惩罚越大（2）取值范围：[1.0, 2.0]
stop	List[String]	否	生成停止标识。当模型生成结果以stop中某个元素结尾时，停止文本生成。说明：（1）每个元素长度不超过20字符。（2）最多4个元素
user_id	string	否	表示最终用户的唯一标识符

返回参数

名称	类型	描述
id	string	本轮对话的id
object	string	回包类型。 completion：文本生成返回
created	int	时间戳
sentence_id	int	表示当前子句的序号。只有在流式接口模式下会返回该字段
is_end	bool	表示当前子句是否是最后一句。只有在流式接口模式下会返回该字段
result	string	对话返回结果
is_safe	int	说明： · 1：表示输入内容无安全风险 · 0：表示输入内容有安全风险
usage	usage	token统计信息

usage说明

名称	类型	描述
prompt_tokens	int	问题tokens数
completion_tokens	int	回答tokens数
total_tokens	int	tokens总数

Stable-Diffusion-XL

重排序Reranker

百度智能云

千帆大模型服务与开发平台ModelBuilder