The API for AI builders

The API for AI builders.

ForgeStack is a hosted inference API platform built for teams shipping production AI features. Call fast multimodal models from one endpoint, keep p95 latency under 50ms, and scale from prototype to global traffic without babysitting GPUs.

<50ms p95 edge inference latency
99.99% multi-region API availability
18+ hosted text, vision, and embedding models

Quickstart

Choose your runtime, pass a prompt, and receive a streamed response from the nearest ForgeStack edge.

live edge route
Python
TypeScript
cURL
examples/infer.py
from forgestack import ForgeStack

client = ForgeStack(api_key="fs_live_8xK_dev_preview_4f9a2")

response = client.responses.create(
    model="fs-pro-2.0",
    input=[
        {
            "role": "system",
            "content": "You are a concise build assistant for backend engineers."
        },
        {
            "role": "user",
            "content": "Generate a Redis-backed rate limiter for a Next.js API route."
        }
    ],
    latency_tier="edge",
    stream=True,
)

for event in response:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")
Global edge routing

Requests are routed to the nearest warm model pool with automatic regional fallback.

One stable contract

Swap models without rewriting payloads, stream handlers, retry logic, or observability.

Production telemetry

Trace tokens, latency, errors, and spend per route, tenant, model, or environment.

Your API key

Use this development key to test local inference calls. Rotate keys from the dashboard before production.

fs_live_8xK_dev_preview_4f9a2_B9mQp7zL
Double-click to copy.

Pricing

Start free, graduate to higher throughput, then move to dedicated capacity when inference becomes core infrastructure.

usage-based overages
PlanFreeGrowthScale
Requests/sec51001000
Tokens/month100K10M1B
Modelsfs-mini-1.5+ fs-pro-2.0+ custom finetunes
Price$0$99/moContact us

Changelog

ForgeStack releases ship weekly with latency tuning, model upgrades, SDK improvements, and dashboard telemetry.

v2.4.0 · edge streams

Streaming responses now start at the edge with median first-token latency below 32ms.

v2.3.2 · model aliases

Pin to stable aliases like fs-pro-latest while preserving audit logs for every resolved model.

v2.3.0 · spend guards

Set hard token budgets per environment and receive webhook alerts before usage spikes.

Dashboard

Inspect live traffic, compare model performance, rotate keys, and promote inference routes across environments.

status: healthy
fs-mini-1.5

Fastest low-cost model for extraction, routing, classification, and short-form generation.

fs-pro-2.0

Balanced reasoning and latency for production copilots, agents, and developer workflows.

Dedicated pools

Reserved regional capacity, custom finetunes, tenant isolation, and stricter spend controls.