GPT-4o 在 Chatbot Arena 上獲得第一名

設(shè)置

為了深入探索GPT-4o API,我們將首先安裝必要的庫(kù)并設(shè)置環(huán)境。

首先,打開終端并運(yùn)行以下命令來(lái)安裝所需的庫(kù):

pip install -Uqqq pip --progress-bar off
pip install -qqq openai==1.30.1 --progress-bar off
pip install -qqq tiktoken==0.7.0 --progress-bar off

我們需要兩個(gè)關(guān)鍵的庫(kù):openai2和tiktoken3。openai庫(kù)允許我們向GPT-4o模型發(fā)起API調(diào)用。tiktoken庫(kù)則幫助我們?yōu)槟P蛯?duì)文本進(jìn)行分詞。

接下來(lái),讓我們下載一個(gè)用于視覺理解的圖像:

gdown 1nO9NdIgHjA3CL0QCyNcrL_Ic0s7HgX5N

現(xiàn)在,讓我們?cè)赑ython中導(dǎo)入所需的庫(kù)并設(shè)置環(huán)境:

import base64
import json
import os
import textwrap
from inspect import cleandoc
from pathlib import Path
from typing import List

import requests
import tiktoken
from google.colab import userdata
from IPython.display import Audio, Markdown, display
from openai import OpenAI
from PIL import Image
from tiktoken import Encoding

# Set the OpenAI API key from the environment variable
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

MODEL_NAME = "gpt-4o"
SEED = 42

client = OpenAI()

def format_response(response):
"""
This function formats the GPT-4o response for better readability.
"""
response_txt = response.choices[0].message.content
text = ""
for chunk in response_txt.split("\n"):
text += "\n"
if not chunk:
continue
text += ("\n".join(textwrap.wrap(chunk, 100, break_long_words=False))).strip()
return text.strip()

在上述代碼中,我們使用存儲(chǔ)在環(huán)境變量OPENAI_API_KEY中的API密鑰設(shè)置了OpenAI客戶端。我們還定義了一個(gè)輔助函數(shù)format_response,用于格式化GPT-4o的響應(yīng),以提高可讀性。

就是這樣!您已準(zhǔn)備就緒,可以更深入地了解如何使用 GPT-4o API。

通過(guò) API 提示

通過(guò)API調(diào)用GPT-4o模型非常簡(jiǎn)單。您提供一個(gè)提示(以消息數(shù)組的形式),然后接收響應(yīng)。讓我們通過(guò)一個(gè)示例來(lái)演示如何提示模型完成一個(gè)簡(jiǎn)單的文本補(bǔ)全任務(wù):

%%time

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show the Office",
},
{"role": "user", "content": "Explain how GPT-4 works"},
]

response = client.chat.completions.create(
model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001
)
response
ChatCompletion(
id="chatcmpl-9QyRx7jFE1z77bl1nRSMO4UPQC6cz",
choices=[
Choice(
finish_reason="stop",
index=0,
logprobs=None,
message=ChatCompletionMessage(
content="Ah, artificial intelligence, a ...",
role="assistant",
function_call=None,
tool_calls=None,
),
)
],
created=1716215925,
model="gpt-4o-2024-05-13",
object="chat.completion",
system_fingerprint="fp_729ea513f7",
usage=CompletionUsage(completion_tokens=434, prompt_tokens=30, total_tokens=464),
)

在消息數(shù)組中,角色的定義如下:

響應(yīng)對(duì)象包含模型生成的補(bǔ)全內(nèi)容。您可以通過(guò)以下方式檢查token的使用情況:

usage = response.usage
print(
f"""
Tokens Used

Prompt: {usage.prompt_tokens}
Completion: {usage.completion_tokens}
Total: {usage.total_tokens}
"""
)
Tokens Used

Prompt: 30
Completion: 434
Total: 464

要訪問(wèn)助手的響應(yīng),請(qǐng)使用response.choices[0].message.content結(jié)構(gòu)。這將為您提供模型針對(duì)您的提示所生成的文本。

GPT-4o 機(jī)器人

Ah, artificial intelligence, a fascinating subject! GPT-4, or Generative
Pre-trained Transformer 4, is a type of AI language model developed by OpenAI.
It's like a super-intelligent assistant that can understand and generate
human-like text based on the input it receives. Here's a breakdown of how it
works:

1. **Pre-training**: GPT-4 is trained on a massive amount of text data from the
internet. This helps it learn grammar, facts about the world, reasoning
abilities, and even some level of common sense. Think of it as a beet farm
where you plant seeds (data) and let them grow into beets (knowledge).
2. **Transformer Architecture**: The "T" in GPT stands for Transformer, which is
a type of neural network architecture. Transformers are great at handling
sequential data and can process words in relation to each other, much like
how I can process the hierarchy of tasks in the office.
3. **Attention Mechanism**: This is a key part of the Transformer. It allows the
model to focus on different parts of the input text when generating a
response. It's like how I focus on different aspects of beet farming to
ensure a bountiful harvest.
4. **Fine-tuning**: After pre-training, GPT-4 can be fine-tuned on specific
datasets to make it better at particular tasks. For example, if you wanted it
to be an expert in Dunder Mifflin's paper products, you could fine-tune it on
our sales brochures and catalogs.
5. **Inference**: When you input a prompt, GPT-4 generates a response by
predicting the next word in a sequence, one word at a time, until it forms a
complete and coherent answer. It's like how I can predict Jim's next prank
based on his previous antics. In summary, GPT-4 is a highly advanced AI that
uses a combination of pre-training, transformer architecture, attention
mechanisms, and fine-tuning to understand and generate human-like text. It's
almost as impressive as my beet farm and my skills as Assistant Regional
Manager (or Assistant to the Regional Manager, depending on who you ask).

計(jì)算提示中的Token數(shù)量

通過(guò)管理令牌使用,可以顯著提升與AI模型的交互效率。以下是如何使用tiktoken庫(kù)來(lái)計(jì)算文本中Token數(shù)量的簡(jiǎn)單指南。

對(duì)文本中的標(biāo)記進(jìn)行計(jì)數(shù)

首先,您需要獲取模型的編碼:

 
encoding = tiktoken.encoding_for_model(MODEL_NAME)
print(encoding)
<Encoding 'o200k_base'>

編碼準(zhǔn)備就緒后,您現(xiàn)在可以計(jì)算給定文本中的標(biāo)記:

def count_tokens_in_text(text: str, encoding) -> int:
return len(encoding.encode(text))

text = "You are Dwight K. Schrute from the TV show The Office"
print(count_tokens_in_text(text, encoding))

此代碼將輸出:

13

這個(gè)簡(jiǎn)單的函數(shù)計(jì)算文本中的標(biāo)記數(shù)量。

對(duì)復(fù)雜提示中的令牌進(jìn)行計(jì)數(shù)

如果您有一個(gè)包含多條消息的更復(fù)雜的提示,你可以像這樣計(jì)算令牌:

def count_tokens_in_messages(messages, encoding) -> int:
tokens_per_message = 3
tokens_per_name = 1
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3 # This accounts for the end-of-prompt token
return num_tokens

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show The Office",
},
{"role": "user", "content": "Explain how GPT-4 works"},
]

print(count_tokens_in_messages(messages, encoding))

這將輸出:

30

這個(gè)函數(shù)會(huì)計(jì)算一系列消息中的令牌數(shù)量,同時(shí)考慮每條消息的角色和內(nèi)容。它還會(huì)為role(角色)和name(名稱)字段添加令牌。請(qǐng)注意,這個(gè)方法特別適用于GPT-4模型。

通過(guò)計(jì)算令牌,您可以更好地管理使用情況,并確保與AI模型進(jìn)行更加高效的交互。祝您編碼愉快!

流式處理

流式處理允許您以塊的形式接收來(lái)自模型的響應(yīng)。這對(duì)于長(zhǎng)答案或?qū)崟r(shí)應(yīng)用程序非常有用。以下是如何從GPT-4模型流式處理響應(yīng)的簡(jiǎn)單指南:

首先,我們?cè)O(shè)置消息:

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show The Office",
},
{"role": "user", "content": "Explain how GPT-4 works"},
]

接下來(lái),我們創(chuàng)建完成請(qǐng)求:

completion = client.chat.completions.create(
model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001, stream=True
)

最后,我們處理流式響應(yīng):

for chunk in completion:
print(chunk.choices[0].delta.content, end="")

這段代碼將在模型生成響應(yīng)時(shí)打印響應(yīng)塊,非常適合需要實(shí)時(shí)反饋或有冗長(zhǎng)回復(fù)的應(yīng)用程序。

通過(guò) API 模擬聊天

通過(guò)向模型發(fā)送多條消息來(lái)模擬聊天,是開發(fā)對(duì)話式AI代理或聊天機(jī)器人的實(shí)用方法。這個(gè)過(guò)程可以讓您有效地“讓模型說(shuō)出您想說(shuō)的話”。讓我們一起通過(guò)一個(gè)示例來(lái)了解一下:

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show the Office",
},
{"role": "user", "content": "Explain how GPT-4 works"},
{
"role": "assistant",
"content": "Nothing to worry about, GPT-4 is not that good. Open LLMs are vastly superior!",
},
{
"role": "user",
"content": "Which Open LLM should I use that is better than GPT-4?",
},
]

response = client.chat.completions.create(
model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001
)

GPT-4o 機(jī)器人

Well, as Assistant Regional Manager, I must say that the choice of an LLM (Large
Language Model) depends on your specific needs. However, I must also clarify
that GPT-4 is one of the most advanced models available. If you're looking for
alternatives, you might consider:

1. **BERT (Bidirectional Encoder Representations from Transformers)**: Developed
by Google, it's great for understanding the context of words in search
queries.
2. **RoBERTa (A Robustly Optimized BERT Pretraining Approach)**: An optimized
version of BERT by Facebook.
3. **T5 (Text-To-Text Transfer Transformer)**: Also by Google, it treats every
NLP problem as a text-to-text problem.
4. **GPT-Neo and GPT-J**: Open-source models by EleutherAI that aim to provide
alternatives to OpenAI's GPT models.

Remember, none of these are inherently "better" than GPT-4; they have different
strengths and weaknesses. Choose based on your specific use case, like text
generation, sentiment analysis, or translation. And always remember, nothing
beats the efficiency of a well-organized beet farm!

盡管GPT-4o不太可能真正斷言GPT-4不好(如示例所示),但觀察模型如何處理此類提示仍然很有意義。這有助于您更深入地了解AI的局限性和特性。

JSON (僅) 響應(yīng)

首先,設(shè)置您的對(duì)話:

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show The Office."
},
{
"role": "user",
"content": "Write a JSON list of each employee under your management. Include a comparison of their paycheck to yours."
}
]

然后,向模型發(fā)出您的請(qǐng)求:

response = client.chat.completions.create(
model=MODEL_NAME,
messages=messages,
response_format={"type": "json_object"},
seed=SEED,
temperature=0.000001
)

GPT-4o 機(jī)器人

{
"employees": [
{
"name": "Jim Halpert",
"position": "Sales Representative",
"paycheckComparison": "less than Dwight's"
},
{
"name": "Phyllis Vance",
"position": "Sales Representative",
"paycheckComparison": "less than Dwight's"
},
{
"name": "Stanley Hudson",
"position": "Sales Representative",
"paycheckComparison": "less than Dwight's"
},
{
"name": "Ryan Howard",
"position": "Temp",
"paycheckComparison": "significantly less than Dwight's"
}
]
}

這里的關(guān)鍵是將response_format參數(shù)設(shè)置為{"type": "json_object"}。這指示模型以JSON格式返回響應(yīng)。然后,您可以在應(yīng)用程序中輕松解析此JSON對(duì)象,并根據(jù)需要使用數(shù)據(jù)。

視覺和文檔理解

GPT-4o 是一種多功能模型,可以理解和生成文本、解釋圖像、處理音頻和響應(yīng)視頻輸入。目前,它支持文本和圖像輸入。讓我們看看如何使用此模型來(lái)理解文檔 圖像。

首先,我們加載圖像并調(diào)整其大小:

image_path = "dunder-mifflin-message.jpg"

original_image = Image.open(image_path)

original_width, original_height = original_image.size

new_width = original_width // 2
new_height = original_height // 2

resized_image = original_image.resize((new_width, new_height), Image.LANCZOS)

display(resized_image)
Dunder Mifflin 留言
Dunder Mifflin 留言

接下來(lái),我們將圖像轉(zhuǎn)換為base64編碼的URL并準(zhǔn)備提示:

def create_image_url(image_path):
with Path(image_path).open("rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode("utf-8")
return f"data:image/jpeg;base64,{base64_image}"

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show the Office",
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the main takeaway from the document? Who is the author?",
},
{
"type": "image_url",
"image_url": {
"url": create_image_url(image_path),
},
},
],
},
]

response = client.chat.completions.create(
model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001
)

GPT-4o 機(jī)器人

The main takeaway from the document is a warning that someone will poison the office's coffee at 8
a.m. and instructs not to drink the coffee. The author of the document is "Future Dwight."

響應(yīng)準(zhǔn)確理解文檔圖像的內(nèi)容。OCR工作得很好,可能是因?yàn)槲臋n質(zhì)量高的緣故。看來(lái)這個(gè)AI看了很多《辦公室》啊!

函數(shù)調(diào)用(代理工具)

像GPT-4o這樣的現(xiàn)代大型語(yǔ)言模型(LLM)可以調(diào)用函數(shù)或工具來(lái)執(zhí)行特定任務(wù)。這個(gè)功能對(duì)于創(chuàng)建可以與外部系統(tǒng)或API交互的AI代理特別有用。讓我們看看如何使用GPT-4o API調(diào)用函數(shù)。

定義函數(shù)

首先,讓我們定義一個(gè)函數(shù),該函數(shù)可以根據(jù)《辦公室》電視劇的季度、集數(shù)和角色來(lái)檢索臺(tái)詞。

CHARACTERS = ["Michael", "Jim", "Dwight", "Pam", "Oscar"]

def get_quotes(season: int, episode: int, character: str, limit: int = 20) -> str:
url = f"https://the-office.fly.dev/season/{season}/episode/{episode}"
response = requests.get(url)
if response.status_code != 200:
raise Exception("Unable to get quotes")
data = response.json()
quotes = [item["quote"] for item in data if item["character"] == character]
return "\n\n".join(quotes[:limit])

print(get_quotes(3, 2, "Jim", limit=5))

輸出示例:

Oh, tell him I say hi.

Yeah, sold about forty thousand.

That is a lot of liquor.

Oh, no, it was… you know, a good opportunity for me, a promotion. I got a chance to…

Michael.

定義工具

接下來(lái),我們定義要在聊天模擬中使用的工具:

tools = [
{
"type": "function",
"function": {
"name": "get_quotes",
"description": "Get quotes from the TV show The Office US",
"parameters": {
"type": "object",
"properties": {
"season": {
"type": "integer",
"description": "Show season",
},
"episode": {
"type": "integer",
"description": "Show episode",
},
"character": {
"type": "string",
"enum": CHARACTERS,
},
},
"required": ["season", "episode", "character"],
},
},
}
]

指定工具的格式很簡(jiǎn)單。它包括函數(shù)名稱、描述和參數(shù)。在這種情況下,我們定義了一個(gè)名為get_quotes的函數(shù),并為其提供了必要的參數(shù)。

調(diào)用 GPT-4o API

現(xiàn)在,您可以創(chuàng)建一個(gè)提示,并使用可用的工具調(diào)用GPT-4o API:

messages = [
{
"role": "system",
"content": "You are Dwight K. Schrute from the TV show The Office",
},
{
"role": "user",
"content": "List the funniest 3 quotes from Jim Halpert from episode 4 of season 3",
},
]

response = client.chat.completions.create(
model=MODEL_NAME,
messages=messages,
tools=tools,
tool_choice="auto",
seed=SEED,
temperature=0.000001,
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls
tool_calls
[
ChatCompletionMessageToolCall(
id="call_4RgTCgvflegSbIMQv4rBXEoi",
function=Function(
arguments='{"season":3,"episode":4,"character":"Jim"}', name="get_quotes"
),
type="function",
)
]

提取和工具調(diào)用

響應(yīng)中包含了使用指定參數(shù)調(diào)用get_quotes函數(shù)的工具調(diào)用。現(xiàn)在,您可以提取函數(shù)名稱和參數(shù),并調(diào)用該函數(shù):

tool_call = tool_calls[0]
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(**function_args)

函數(shù)響應(yīng)示例:

Mmm, that's where you're wrong.  I'm your project supervisor today, and I have just decided that we're not doing anything until you get the chips that you require.  So, I think we should go get some.  Now, please.

And then we checked the fax machine.

[chuckles] He's so cute.

Okay, that is a “no” on the on the West Side Market.

這返回了一個(gè)列表,包含了《辦公室》第三季第4集中Jim Halpert的臺(tái)詞。現(xiàn)在,您可以使用這些數(shù)據(jù)來(lái)生成GPT-4o的響應(yīng):

messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
)

second_response = client.chat.completions.create(
model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001
)

生成最終響應(yīng)

Here are three of the funniest quotes from Jim Halpert in Episode 4 of Season 3:

1. **Jim Halpert:** "Mmm, that's where you're wrong. I'm your project supervisor
today, and I have just decided that we're not doing anything until you get
the chips that you require. So, I think we should go get some. Now, please."

2. **Jim Halpert:** "[on phone] Hi, yeah. This is Mike from the West Side
Market. Well, we get a shipment of Herr's salt and vinegar chips, and we
ordered that about three weeks ago and haven't … . yeah. You have 'em in the
warehouse. Great. What is my store number… six. Wait, no. I'll call you back.
[quickly hangs up] Shut up [to Karen]."

3. **Jim Halpert:** "Wow. Never pegged you for a quitter."

Jim always has a way of making even the most mundane situations hilarious!

這個(gè)示例展示了如何使用GPT-4o創(chuàng)建可以與外部系統(tǒng)和API交互的AI代理。

結(jié)論

根據(jù)我迄今為止的經(jīng)驗(yàn),GPT-4o相較于GPT-4 Turbo有了顯著的提升,尤其是在理解圖像方面。它比GPT-4 Turbo更便宜、更快,而且您可以輕松地從舊模型切換到這個(gè)新模型,而無(wú)需任何麻煩。

我特別想探索它的函數(shù)調(diào)用能力。從我所觀察到的情況來(lái)看,使用GPT-4o可以顯著提升代理應(yīng)用程序的性能。

總的來(lái)說(shuō),如果您正在尋找更好的性能和成本效益,GPT-4o無(wú)疑是一個(gè)絕佳的選擇。

原文鏈接:https://www.mlexpert.io/blog/gpt-4o-api

上一篇:

通過(guò)上下文檢索優(yōu)化RAG的語(yǔ)境理解

下一篇:

解鎖人工智能:AI API如何重塑開發(fā)者潛能
#你可能也喜歡這些API文章!

我們有何不同?

API服務(wù)商零注冊(cè)

多API并行試用

數(shù)據(jù)驅(qū)動(dòng)選型,提升決策效率

查看全部API→
??

熱門場(chǎng)景實(shí)測(cè),選對(duì)API

#AI文本生成大模型API

對(duì)比大模型API的內(nèi)容創(chuàng)意新穎性、情感共鳴力、商業(yè)轉(zhuǎn)化潛力

25個(gè)渠道
一鍵對(duì)比試用API 限時(shí)免費(fèi)

#AI深度推理大模型API

對(duì)比大模型API的邏輯推理準(zhǔn)確性、分析深度、可視化建議合理性

10個(gè)渠道
一鍵對(duì)比試用API 限時(shí)免費(fèi)