
使用Scala Play框架構建REST API
2025 年,大模型江湖出現“開源三幻神”:
模型 | 參數量 | 上下文 | 本地顯存 | 云端價格 (1K in/out) | 一句話賣點 |
---|---|---|---|---|---|
GPT-OSS-20B | 21 B | 128 K | 16 GB | $0.05 / $0.2 | 開發機就能跑 |
GPT-OSS-120B | 117 B | 128 K | 80 GB | $0.1 / $0.5 | 代碼怪獸 |
GPT-4.1 | 未知 | 200 K | 云端專享 | $0.06 / $0.18 | 貴且閉源 |
OpenAI OSS 以 Apache 2.0 協議完全開源,MoE 架構 + RoPE + 128 K 上下文,官方直接放出 OpenAI-Compatible REST Endpoint,讓 Go 開發者“開箱即用”。
平臺 | 特色 | 網址 |
---|---|---|
Novita AI | 免翻墻、支付寶、120B 云端直調 | novita.ai |
OpenRouter | 多模型路由、BYOK、統一賬單 | openrouter.ai |
Ollama | 本地 16 GB 起、零網絡延遲 | ollama.ai |
下文以 Novita AI 為例,步驟對 OpenRouter/Ollama 同樣適用。
sk-nov-***
go get github.com/sashabaranov/go-openai
官方兼容格式,一行搞定。
# 安裝
curl -fsSL https://ollama.ai/install.sh | sh
# 拉取 20B
ollama pull gpt-oss:20b
# 啟動
ollama serve
package main
import (
"context"
"fmt"
"log"
"os"
openai "github.com/sashabaranov/go-openai"
)
func main() {
client := openai.NewClientWithConfig(openai.ClientConfig{
BaseURL: "http://localhost:11434/v1",
APIKey: "ollama",
})
req := openai.ChatCompletionRequest{
Model: "gpt-oss:20b",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "寫一段 Go 協程池示例"},
},
MaxTokens: 512,
Temperature: 0.1,
}
resp, err := client.CreateChatCompletion(context.Background(), req)
if err != nil {
log.Fatal(err)
}
fmt.Println(resp.Choices[0].Message.Content)
}
運行:
go run 20b_local.go
終端輸出:
package main
import (
"context"
"fmt"
"sync"
"time"
)
func main() {
pool := make(chan func(), 10)
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1)
task := func(id int) func() {
return func() {
defer wg.Done()
fmt.Printf("Worker %d done\n", id)
}
}(i)
pool <- task
}
go func() {
for t := range pool {
t()
}
}()
wg.Wait()
close(pool)
time.Sleep(time.Second)
}
client := openai.NewClientWithConfig(openai.ClientConfig{
BaseURL: "https://api.novita.ai/v3/openai",
APIKey: os.Getenv("GENIE3_API_KEY"),
})
req := openai.ChatCompletionRequest{
Model: "openai/gpt-oss-120b",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleSystem, Content: "你是架構師,請給出詳細設計"},
{Role: openai.ChatMessageRoleUser, Content: "設計一個支持千萬并發的 IM 系統"},
},
MaxTokens: 2048,
Temperature: 0.3,
}
并發 | 首 token 延遲 | 成功率 | 成本 (1K in/out) |
---|---|---|---|
1 | 1.1 s | 100 % | $0.10 / $0.50 |
10 | 1.3 s | 100 % | $0.10 / $0.50 |
100 | 2.4 s | 99.7 % | $0.10 / $0.50 |
stream, err := client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{
Model: "openai/gpt-oss-20b",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "講個笑話"},
},
MaxTokens: 128,
Stream: true,
})
for {
resp, err := stream.Recv()
if err == io.EOF {
break
}
fmt.Print(resp.Choices[0].Delta.Content)
}
前端 WebSocket 一行:
ws.onmessage = e => document.body.insertAdjacentText("beforeend", e.data);
type WeatherReq struct {
City string json:"city"
}
var tool = openai.Tool{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: "get_weather",
Description: "查詢城市天氣",
Parameters: json.RawMessage(`{
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}`),
},
}
req := openai.ChatCompletionRequest{
Model: "openai/gpt-oss-20b",
Messages: []openai.ChatCompletionMessage{
{Role: "user", Content: "北京天氣如何?"},
},
Tools: []openai.Tool{tool},
}
resp, _ := client.CreateChatCompletion(ctx, req)
// 解析 resp.Choices[0].Message.ToolCalls
FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN go build -o app main.go
FROM gcr.io/distroless/base
COPY --from=builder /app/app /app
ENV GENIE3_API_KEY=${GENIE3_API_KEY}
EXPOSE 8080
ENTRYPOINT ["/app"]
# values.yaml
image:
repository: your-registry/genie-go
tag: latest
env:
GENIE3_API_KEY: sk-nov-xxx
resources:
limits:
memory: "512Mi"
cpu: "500m"
錯誤 | 原因 | 解決 |
---|---|---|
401 Unauthorized | 密鑰錯誤 | 重新復制 |
429 Rate Limit | 并發超限 | 控制臺升級 |
500 Internal | Prompt 過長 | 縮減 context |
git clone https://github.com/yourname/genie-oss-go-demo.git
cd genie-oss-go-demo
go run main.go
倉庫包含:
從 20B 輕量快刀 到 120B 推理怪獸,再到 128 K 超長上下文,OpenAI OSS 把“大模型”這三個字的門檻踩成了地平線。
下一次,當產品經理問你“能不能讓 AI 自己寫 CI/CD 腳本”時,你可以微笑著說:
“給我 10 分鐘,Go 搞定。”