OpenAI GPT-4视觉API可以玩了,GPT4V,gpt-4-vision-preview,chatgpt
发布时间
阅读量:
阅读量
备受期待的GPT-4V终于正式推出,并处于体验阶段;然而使用OpenAI视觉API的用户都对其实力感到惊叹。
已经有人玩出了各种花样了,比如用AI来解说视频,其实也是如此的丝滑:
整个实现过程可以分为 7 步:
- 从视频中提取图像帧;
- 生成用于描述的提示信息;
- 向GPT系统发送请求指令;
- 设计适用于语音解说的提示模板;
- 输出适合语音解说的文字脚本;
- 将文字脚本转换为音频格式;
- 将音频内容与视频文件整合。
这个可以大家去玩哈
先上一个基础的示例:
先从这里拿到key:https://github.com/xing61/xiaoyi-robot
import os
import openai
import requests
import time
import json
import time
API_SECRET_KEY = "你的智增增的key";
BASE_URL = "https://flag.smarttrot.com/v1/" #智增增的base_url
from openai import OpenAI
# gpt4v
def gpt4v(query):
client = OpenAI(api_key=API_SECRET_KEY, base_url=BASE_URL)
resp = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": query},
{
"type": "image_url",
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
],
}
],
max_tokens=300,
)
print(resp)
print(resp.choices[0].message.content)
if __name__ == '__main__':
gpt4v("What are in these images? Is there any difference between them?");

将该代码向OpenAI发送了一个图像,请其识别或分析其中的内容。然而,在此过程中发现一个问题:询问图像差异时用户却声称只上传了一份文件。
图片如下:

于是我们看到返回:
I'm sorry, but I can only view one image at a time. The image you've provided is a beautiful landscape scene. It features a wooden boardwalk or path leading through a lush green meadow with tall grass on both sides. The sky is partly cloudy with a rich blue color and some gentle white clouds, suggesting a pleasant day. The scenery is tranquil and might be a nature reserve or park. There are no people or animals visible in the image. If you have another image for comparison, please provide it separately.
效果还是不错滴
全部评论 (0)
还没有任何评论哟~
