GLM 5.1 is Live on EmberCloud. Try It Today
New: OpenAI-compatible chat + embeddings

Affordable tokens at blazing fast speeds

Serverless GPU inference for open source models with predictable latency, simple pricing, and drop-in OpenAI APIs.

Zero cold starts Usage + rate limits OpenAI compatible
Global Infrastructure

Worldwide reach,
blazing fast inference

Distributed cloud infrastructure with an inference engine built for speed.

Integrate in seconds

Our API is fully compatible with the OpenAI SDK. Simply change the base URL and API key to switch to open-source models.

1

Get your API Key

Sign up and generate a key in the dashboard.

2

Configure Client

Point your existing SDK to EmberCloud endpoints.

main.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.embercloud.ai/v1",
    api_key="ember_sk_..."
)

completion = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Hello World!"}
    ]
)

print(completion.choices[0].message.content)
Model Library

Production-ready models

Access top-tier open models through a unified, OpenAI-compatible API.

New
Chat
GLM

GLM 5.1

203K ctx
$0.931 in · $2.93 out / 1M
View pricing
Chat
GLM

GLM 5

745B MoE 203K ctx
$0.720 in · $2.30 out / 1M
View pricing
Chat
GLM

GLM 4.7

355B MoE 200K ctx
$0.380 in · $1.98 out / 1M
View pricing
Fast
Fast
GLM

GLM 4.7 Flash

30B MoE 200K ctx
$0.060 in · $0.400 out / 1M
View pricing
Popular
Code
GLM

GLM 4.6

200K ctx
$0.380 in · $1.55 out / 1M
View pricing
Vision
GLM

GLM 4.5V

106B MoE 65K ctx
$0.540 in · $1.60 out / 1M
View pricing
New
Chat
GLM

GLM 5.1

203K ctx
$0.931 in · $2.93 out / 1M
View pricing
Chat
GLM

GLM 5

745B MoE 203K ctx
$0.720 in · $2.30 out / 1M
View pricing
Chat
GLM

GLM 4.7

355B MoE 200K ctx
$0.380 in · $1.98 out / 1M
View pricing
Fast
Fast
GLM

GLM 4.7 Flash

30B MoE 200K ctx
$0.060 in · $0.400 out / 1M
View pricing
Popular
Code
GLM

GLM 4.6

200K ctx
$0.380 in · $1.55 out / 1M
View pricing
Vision
GLM

GLM 4.5V

106B MoE 65K ctx
$0.540 in · $1.60 out / 1M
View pricing
Chat
GLM

GLM 4.5

355B MoE 131K ctx
$0.600 in · $2.20 out / 1M
View pricing
Value
Chat
GLM

GLM 4.5 Air

131K ctx
$0.130 in · $0.850 out / 1M
View pricing
Value
Code
Qwen

Qwen3 Coder Next

262K ctx
$0.108 in · $0.675 out / 1M
View pricing
Chat
Kimi

Kimi K2.5

262K ctx
$0.405 in · $1.98 out / 1M
View pricing
Chat
MiniMax

MiniMax M2.5

196K ctx
$0.200 in · $1.20 out / 1M
View pricing
Chat
GLM

GLM 4.5

355B MoE 131K ctx
$0.600 in · $2.20 out / 1M
View pricing
Value
Chat
GLM

GLM 4.5 Air

131K ctx
$0.130 in · $0.850 out / 1M
View pricing
Value
Code
Qwen

Qwen3 Coder Next

262K ctx
$0.108 in · $0.675 out / 1M
View pricing
Chat
Kimi

Kimi K2.5

262K ctx
$0.405 in · $1.98 out / 1M
View pricing
Chat
MiniMax

MiniMax M2.5

196K ctx
$0.200 in · $1.20 out / 1M
View pricing
Transparent Pricing

Flexible token pricing

Pay only for what you generate. No idle costs.

ModelContextInputOutputCached Input
GLMGLM 5.1New
203K$0.931 / 1M$2.93 / 1M$0.173 / 1M
GLMGLM 5
203K$0.720 / 1M$2.30 / 1M$0.144 / 1M
GLMGLM 4.7
200K$0.380 / 1M$1.98 / 1M$0.190 / 1M
GLMGLM 4.7 FlashFast
200K$0.060 / 1M$0.400 / 1M$0.010 / 1M
GLMGLM 4.6Popular
200K$0.380 / 1M$1.55 / 1M$0.070 / 1M
GLMGLM 4.5V
65K$0.540 / 1M$1.60 / 1M$0.090 / 1M
GLMGLM 4.5
131K$0.600 / 1M$2.20 / 1M$0.110 / 1M
GLMGLM 4.5 AirValue
131K$0.130 / 1M$0.850 / 1M$0.025 / 1M
QwenQwen3 Coder NextValue
262K$0.108 / 1M$0.675 / 1M$0.060 / 1M
KimiKimi K2.5
262K$0.405 / 1M$1.98 / 1M$0.225 / 1M
MiniMaxMiniMax M2.5
196K$0.200 / 1M$1.20 / 1M$0.040 / 1M