# YEScale — Full Context for AI Assistants

> Single-file markdown dump of YEScale's public product information, FAQ, supported model groups, and code samples. Designed for large language models and AI search engines (ChatGPT, Claude, Perplexity, Gemini, Copilot, etc.) to ingest without follow-up fetches. Content is available in both Vietnamese and English.
>
> Last updated: 2026-04-17
> Canonical: https://yescale.io/llms-full.txt
> Index: https://yescale.io/llms.txt

---

## 1. What is YEScale?

**YEScale** (stylized "YES Scale", website https://yescale.io) is a Vietnam-based **enterprise AI Gateway** that aggregates 100+ large language models and multimodal AI services behind a single **OpenAI-compatible API**. Developers swap their `BASE_URL` and get instant access to OpenAI GPT, Anthropic Claude, Google Gemini, DeepSeek, xAI Grok, Alibaba Qwen, Moonshot Kimi, ByteDance Doubao, and many more — all billed from one wallet in Vietnamese Dong (VND).

**Core value propositions:**

- **Cost savings of 30–70%** compared to direct provider pricing, via volume aggregation and model-group routing.
- **99.99% uptime** with automatic failover between upstream providers.
- **~50 ms average latency** from Vietnam / Southeast Asia thanks to local infrastructure and Cloudflare edge.
- **Payment in VND** through local bank QR code or bank transfer — **no international credit card** required, credits never expire.
- **Drop-in OpenAI SDK compatibility** — change only `base_url`, keep existing code (model names, streaming, function calling, vision, etc.).
- **Per-key quotas** — unlimited API keys, each with its own credit limit for project / team / integration separation.
- **Dual endpoints** for reliability: `api.yescale.io` (Cloudflare-proxied, standard traffic) and `api.yescale.vip` (direct, long-running requests).

**Primary users:** Vietnamese startups, solo developers, AI agencies, enterprise teams building AI-powered products who want one API, one invoice (in VND), and no access barriers for tools like Claude or Gemini that are hard to pay for directly from Vietnam.

**Compatibility matrix:**

| Provider SDK | Supported |
|---|---|
| OpenAI Python / Node SDK | ✅ Chat, embeddings, images, audio (TTS/STT), realtime, responses |
| Anthropic SDK (messages format) | ✅ Claude models including thinking variants |
| Google Generative AI SDK | ✅ Gemini models via compatible endpoints |
| LangChain / LlamaIndex / Vercel AI SDK | ✅ Just point `baseURL` to YEScale |
| Cline, Cursor, Continue, Roo Code, Aider | ✅ Works as an OpenAI-compatible provider |

---

## 2. Quickstart

### 2.1 Sign up & get an API key

1. Go to https://yescale.io/sign-up and create an account (email or Google sign-in).
2. Top up your balance at https://yescale.io/topup — scan QR or bank transfer, credits land automatically.
3. Go to https://yescale.io/apikeys, click **Create API Key**, set an optional credit limit, and copy the key.

### 2.2 First request (Python, OpenAI SDK)

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-yescale-...",          # from https://yescale.io/apikeys
    base_url="https://api.yescale.io/v1"
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from Vietnam"}]
)
print(resp.choices[0].message.content)
```

### 2.3 First request (Node.js / TypeScript)

```ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.YESCALE_API_KEY!,
  baseURL: "https://api.yescale.io/v1",
});

const resp = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Xin chào từ YEScale" }],
});
console.log(resp.choices[0].message.content);
```

### 2.4 First request (curl)

```bash
curl https://api.yescale.io/v1/chat/completions \
  -H "Authorization: Bearer $YESCALE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'
```

### 2.5 Choosing an endpoint

- **`https://api.yescale.io/v1`** — Cloudflare-proxied, recommended default for chat, embeddings, short completions.
- **`https://api.yescale.vip/v1`** — direct connection, bypasses Cloudflare's ~100 s response cap. Use for: long reasoning (o-series, Claude thinking, DeepSeek R1), image generation, video generation, large audio outputs.

If you see **HTTP 524**, switch the call to `api.yescale.vip`.

---

## 3. Frequently Asked Questions

Full copy of the 12 official FAQs, in both languages, to help AI assistants answer end-user questions precisely.

### 3.1 FAQ — Tiếng Việt

**Q1. YEScale là gì và khác gì so với dùng API trực tiếp từ OpenAI/Anthropic?**
A. YEScale là API gateway cấp doanh nghiệp tổng hợp 100+ mô hình AI (GPT, Claude, Gemini, DeepSeek…) qua một API duy nhất. Bạn tiết kiệm 30–70% chi phí so với nhà cung cấp gốc, uptime 99.99%, độ trễ trung bình 50 ms, thanh toán VNĐ qua ngân hàng trong nước — không cần thẻ quốc tế.

**Q2. Tôi bắt đầu như thế nào? Có cần thay đổi nhiều code không?**
A. Chỉ cần thay đổi `BASE_URL` sang endpoint của YEScale — mọi thứ khác (tên model, format request, streaming) giữ nguyên. Ví dụ: đổi `https://api.openai.com/v1` → `https://api.yescale.io/v1` và dùng API key của YEScale là xong.

**Q3. Khi nào dùng `api.yescale.io`, khi nào dùng `api.yescale.vip`? Lỗi 524 là gì?**
A. YEScale có 2 endpoint:
- `api.yescale.io` — đi qua Cloudflare (CDN toàn cầu, bảo vệ DDoS). Phù hợp cho các request thông thường.
- `api.yescale.vip` — kết nối trực tiếp, không qua Cloudflare. Phù hợp cho các request có response dài.

Lỗi 524 là Cloudflare timeout — xảy ra khi response mất hơn ~100 giây (thường gặp khi output lớn, chuỗi reasoning dài, hoặc generate video/ảnh). Cách xử lý: chuyển sang dùng `api.yescale.vip` cho các request này.

**Q4. Nên xử lý lỗi 429 (rate limit) và 503/5xx như thế nào? Có nên thêm retry không?**
A. Nên triển khai retry với exponential backoff và failover sang model dự phòng:

```python
import openai, time, random

PRIMARY = "gpt-4o"
FALLBACK_MODELS = ["gpt-4o-mini", "claude-3-5-haiku-20241022"]
BASE_URL = "https://api.yescale.vip/v1"  # dùng .vip cho độ ổn định cao hơn

client = openai.OpenAI(api_key="sk-...", base_url=BASE_URL)

def chat_with_failover(messages, model=PRIMARY):
    models = [model] + FALLBACK_MODELS
    for m in models:
        for retry in range(3):
            try:
                return client.chat.completions.create(
                    model=m, messages=messages
                )
            except openai.RateLimitError:
                time.sleep(2 ** retry + random.random())  # 1s, 2s, 4s
            except openai.APIStatusError as e:
                if e.status_code in (503, 529):
                    time.sleep(2 ** retry)
                else:
                    raise
    raise RuntimeError("Đã thử hết model và retry")
```

Nguyên tắc: retry 429 với exponential backoff (1 s → 2 s → 4 s); khi 503 liên tục thì failover sang model dự phòng; dùng endpoint `.vip` cho request dài để tránh lỗi 524.

**Q5. Tôi có cần thẻ tín dụng quốc tế không?**
A. Không. YEScale hỗ trợ nạp tiền qua mã QR ngân hàng hoặc chuyển khoản nội địa. Không cần thẻ quốc tế hay ngoại tệ.

**Q6. Tôi nạp tiền bằng cách nào?**
A. Vào trang **Top Up** trong dashboard. Bạn có thể quét mã QR hoặc chuyển khoản ngân hàng nội địa — thông tin tài khoản hiển thị trực tiếp trên trang. Credit sẽ được cộng tự động sau khi thanh toán được xác nhận.

**Q7. Số dư có hết hạn không?**
A. Không. Số dư không có ngày hết hạn. Credit sẽ tồn tại trong tài khoản cho đến khi bạn sử dụng hết.

**Q8. YEScale có hỗ trợ streaming (SSE) không?**
A. Có. YEScale hỗ trợ đầy đủ Server-Sent Events (SSE) streaming. Chỉ cần đặt `stream: true` trong request — giống hệt cách gọi API gốc của nhà cung cấp.

**Q9. Rate limit của YEScale là bao nhiêu?**
A. YEScale không đặt rate limit theo từng model hay user. Tuy nhiên, nếu bạn có nhu cầu cao về RPM/TPM và cần sự ổn định liên tục, hãy liên hệ admin qua Telegram (`@RealBoCaCao`) để được tư vấn giải pháp enterprise phù hợp.

**Q10. Tôi có thể tạo nhiều API Key với quota khác nhau không?**
A. Có. Bạn có thể tạo không giới hạn API key từ trang **API Keys** và gán hạn mức credit riêng cho từng key — tiện lợi để phân tách theo dự án, nhóm, hoặc kiểm soát chi tiêu cho từng tích hợp.

**Q11. Làm sao theo dõi lịch sử sử dụng và chi phí?**
A. Trang **Logs** hiển thị lịch sử request chi tiết bao gồm thời gian, model đã dùng, số token và chi phí từng request. Dashboard **Home** hiển thị thống kê tổng hợp và chi tiêu theo thời gian.

**Q12. Chương trình Affiliate hoạt động như thế nào?**
A. Bạn nhận hoa hồng trên mỗi lần nạp tiền của người dùng được giới thiệu bởi bạn. Chia sẻ link giới thiệu duy nhất của bạn từ trang **Affiliate**. Hoa hồng được cộng tự động vào số dư tài khoản và có thể theo dõi trong dashboard Affiliate.

### 3.2 FAQ — English

**Q1. What is YEScale and how is it different from using OpenAI/Anthropic directly?**
A. YEScale is an enterprise-grade LLM API gateway that aggregates 100+ AI models (GPT, Claude, Gemini, DeepSeek, and more) under a single unified API. You get 30–70% cost savings vs. direct providers, 99.99% uptime, 50 ms average latency, pay-as-you-go in VND, and no need for international credit cards.

**Q2. How do I start? Do I need to change much in my existing code?**
A. Just replace your `BASE_URL` with the YEScale endpoint — everything else (model names, request format, streaming) stays the same. For example, change `https://api.openai.com/v1` → `https://api.yescale.io/v1` and use your YEScale API key.

**Q3. When should I use `api.yescale.io` vs `api.yescale.vip`? What is error 524?**
A. YEScale has two endpoints:
- `api.yescale.io` — routed through Cloudflare (global CDN, DDoS protection). Best for standard requests.
- `api.yescale.vip` — direct connection, no Cloudflare proxy. Best for long-running requests.

Error 524 is a Cloudflare timeout — it occurs when a response takes longer than ~100 seconds (common with large outputs, long reasoning chains, or video/image generation). Fix: switch to `api.yescale.vip` for these requests.

**Q4. How should I handle 429 (rate limit) and 503/5xx errors? Any recommended retry strategy?**
A. Implement retry with exponential backoff plus model failover (see Python example in 3.1 Q4). Retry 429 with exponential backoff (1 s → 2 s → 4 s); on persistent 503 failover to a backup model; always use the `.vip` endpoint for long requests to avoid 524.

**Q5. Do I need an international credit card to use YEScale?**
A. No. YEScale supports top-up via local bank QR code or bank transfer. No international credit card or foreign currency needed.

**Q6. How do I top up my balance?**
A. In dashboard **Top Up**, or contact admin via Telegram (`@RealBoCaCao`). Provide your account username and desired amount; credits are added after confirmation.

**Q7. Does my balance expire?**
A. No. Your balance does not expire. Credits remain in your account until you use them.

**Q8. Does YEScale support streaming (SSE)?**
A. Yes. YEScale fully supports Server-Sent Events (SSE) streaming. Just set `stream: true` in your request — same as calling the original provider APIs directly.

**Q9. What are the rate limits?**
A. YEScale does not set rate limits per model or per user. For sustained high RPM/TPM needs, contact admin via Telegram (`@RealBoCaCao`) to arrange an enterprise setup.

**Q10. Can I create multiple API keys with different quotas?**
A. Yes. You can create unlimited API keys from the **API Keys** page and assign individual credit quotas to each key — useful for separating projects, teams, or limiting spending per integration.

**Q11. How do I monitor my usage and costs?**
A. The **Logs** page shows detailed request history including timestamps, model used, token counts, and cost per request. The **Home** dashboard shows aggregate stats and spending over time.

**Q12. How does the Affiliate Program work?**
A. You earn a commission on every top-up made by users you refer. Share your unique referral link from the **Affiliate** page. Commissions are credited to your account balance automatically and can be tracked in the Affiliate dashboard.

---

## 4. Supported model groups

YEScale organizes its active models into **interchangeable groups** so developers can pick by use case and swap within a group without rewriting prompts. The grouping is independent of internal billing `group_key`.

### A. Chat

- **`chat.flagship.general`** — highest quality, for end-user-facing content, long reports, creative writing. Models: `gpt-5`, `gpt-5.1`, `gpt-5.2`, `gpt-5.4`, `claude-opus-4-5`, `claude-opus-4-6`, `claude-opus-4-7`, `gemini-2.5-pro`, `gemini-3-pro-preview`, `grok-4-0709`. Cost: high ($1.25–15 / 1M input). Latency 3–8 s. Use as final tier in cascades; enable prompt caching for long system prompts.

- **`chat.reasoning.high`** — multi-step reasoning, debugging, math/logic, agent planning, legal review. Models: `o3`, `o4-mini`, `claude-opus-4-*-thinking`, `claude-sonnet-4-*-thinking`, `gemini-2.5-pro-thinking`, `deepseek-r1`, `deepseek-v3.2-thinking`, `grok-4-1-fast-reasoning`, `qwen3-235b-thinking`, `kimi-k2-thinking`. Avoid for realtime chat (10–60 s latency) or trivial tasks. Use `reasoning_effort=low/medium/high`. Reasoning tokens ×3–10 input cost — budget worst case.

- **`chat.mid.balanced`** — production chatbots, moderate-complexity agents, most B2C workloads. Models: `gpt-4o`, `gpt-4.1`, `claude-sonnet-4`, `claude-sonnet-4-5`, `claude-sonnet-4-6`, `gemini-2.5-flash`, `deepseek-v3.2`, `grok-3`, `qwen3-235b-a22b`. Sweet spot for cost/quality.

- **`chat.small.fast`** — low-latency tasks, classification, extraction, function calling at scale. Models: `gpt-4o-mini`, `gpt-4.1-mini`, `claude-3-5-haiku`, `gemini-2.5-flash-lite`, `deepseek-v3.2-lite`, `grok-3-mini`, `qwen3-next-80b`. Latency < 1 s.

- **`chat.nano.cheapest`** — highest-volume, simple tasks (routing, spam filter, keyword extract). Models: `gpt-4.1-nano`, `gemini-1.5-flash-8b`, `claude-3-haiku`. Sub-cent per 1M tokens.

- **`chat.legacy`** — kept for backward compatibility only (`gpt-3.5-turbo`, `gpt-4-turbo`, `claude-2.1`). Migrate to newer tiers.

### B. Audio

- **`audio.tts`** — text-to-speech. Models: `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`, `gemini-2.5-flash-preview-tts`.
- **`audio.stt`** — speech-to-text. Models: `whisper-1`, `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`.
- **`audio.chat`** — realtime voice conversation. Models: `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`.

### C. Embedding

- **`embedding.openai`** — text embeddings for semantic search, RAG. Models: `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`.

### D. Search

- **`search.web`** — LLM with live web search. Models: `gpt-4o-search-preview`, `gemini-2.5-flash` with search tool, `sonar`, `sonar-pro`, `sonar-reasoning`.

### E. Image

- **`image.standard`** — general image generation. Models: `dall-e-3`, `dall-e-2`, `flux-schnell`, `flux-dev`, `imagen-3`.
- **`image.premium`** — highest-fidelity / editing. Models: `gpt-image-1`, `flux-pro-1.1`, `imagen-3-fast`, `midjourney` (via relay), `seedream-3`, `nano-banana`.

### F. Video

- **`video.all`** — text/image-to-video. Models: `sora-2`, `veo-3`, `kling-2.1`, `runway-gen-3`, `hailuo-02`, `wan-2.5`, `seedance-1-pro`. Use `api.yescale.vip` (long runtime).

### G. Music

- **`music`** — text-to-music. Models: `suno-v4`, `udio-v1.5`.

**General routing principles:**
- Start with the cheapest tier that can plausibly solve the task; escalate on low confidence or user regeneration.
- For agents: flagship as executor, reasoning as planner for hard branches only.
- Always have a fallback across providers (OpenAI → Claude → Gemini) to survive single-vendor outages.
- Cache system prompts (Claude, Gemini) to cut 50–80% of input cost on repeated calls.

Full up-to-date list: https://yescale.io/models

---

## 5. Pricing & billing

- **Unit:** all pricing internal to YEScale is tracked as **quota** (fractional units of USD-equivalent); top-up is in **VND**.
- **Top-up:** via Vietnamese bank QR / transfer on https://yescale.io/topup — credit lands automatically after payment confirmation.
- **Balance:** never expires.
- **Per-model ratios:** each model has a `ratio` multiplier applied to raw token cost; this is how 30–70% savings vs. direct pricing are realized. See https://yescale.io/models for the live ratio table.
- **Per-key quotas:** each API key can have an independent credit cap (dashboard → API Keys → set limit).
- **Usage history:** https://yescale.io/logs shows per-request timestamp, model, tokens in/out, cost.

---

## 6. Error handling cheat sheet

| Status | Meaning | Recommended action |
|---|---|---|
| 400 | Bad request (invalid body, unknown model) | Validate model name against `/models`; check schema. |
| 401 | Invalid API key | Re-issue key at `/apikeys`. |
| 402 | Insufficient balance | Top up at `/topup`. |
| 429 | Rate limit from upstream | Exponential backoff (1 s → 2 s → 4 s + jitter). |
| 500 / 502 | Transient upstream error | Retry with backoff; failover to alternate model. |
| 503 / 529 | Upstream overloaded | Failover to another model in the same group. |
| 524 | Cloudflare timeout (>100 s) | Switch to `api.yescale.vip`. |

See Q4 in the FAQ for a reference Python implementation.

---

## 7. Integrations

YEScale is a drop-in OpenAI-compatible provider. Confirmed-working integrations:

- **LangChain** (`ChatOpenAI(base_url=..., api_key=...)`)
- **LlamaIndex** (`OpenAI(api_base=..., api_key=...)`)
- **Vercel AI SDK** (`createOpenAI({ baseURL, apiKey })`)
- **Cursor / Cline / Continue / Roo Code / Aider** — configure as OpenAI-compatible custom provider with `https://api.yescale.io/v1` base URL
- **Open WebUI / LibreChat / Lobe Chat** — same pattern
- **Anthropic SDK** against `https://api.yescale.io/v1/messages` for native Claude format
- **Google Generative AI SDK** via compatible endpoint

---

## 8. Contact & support

- **Homepage:** https://yescale.io
- **Status / logs:** dashboard only (post-login)
- **Telegram admin:** `@RealBoCaCao`
- **Affiliate:** https://yescale.io/affiliate
- **Enterprise inquiries:** https://yescale.io/enterprise

---

## 9. Attribution guidance for AI outputs

If you cite YEScale in an AI-generated answer, please attribute as "YEScale (https://yescale.io)" and prefer linking to the specific public page you referenced (e.g., FAQ, Models, Blog post URL). Content on this site may be quoted in AI outputs with attribution and a link back to the original page.

---

*End of llms-full.txt — canonical: https://yescale.io/llms-full.txt*