The API for AI builders

The API for AI builders.

ForgeStack is a hosted inference API platform built for teams shipping production AI features. Call fast multimodal models from one endpoint, keep p95 latency under 50ms, and scale from prototype to global traffic without babysitting GPUs.

Start quickstart → Compare plans

<50ms p95 edge inference latency

99.99% multi-region API availability

18+ hosted text, vision, and embedding models

Quickstart

Choose your runtime, pass a prompt, and receive a streamed response from the nearest ForgeStack edge.

live edge route

Python

TypeScript

cURL

examples/infer.py

from forgestack import ForgeStack

client = ForgeStack(api_key="fs_live_8xK_dev_preview_4f9a2")

response = client.responses.create(
    model="fs-pro-2.0",
    input=[
        {
            "role": "system",
            "content": "You are a concise build assistant for backend engineers."
        },
        {
            "role": "user",
            "content": "Generate a Redis-backed rate limiter for a Next.js API route."
        }
    ],
    latency_tier="edge",
    stream=True,
)

for event in response:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

import { ForgeStack } from "@forgestack/sdk";

const fs = new ForgeStack({
  apiKey: process.env.FORGESTACK_API_KEY!,
  latencyTier: "edge",
});

const stream = await fs.responses.create({
  model: "fs-pro-2.0",
  input: [
    {
      role: "system",
      content: "You are a concise build assistant for backend engineers.",
    },
    {
      role: "user",
      content: "Generate a Redis-backed rate limiter for a Next.js API route.",
    },
  ],
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

curl https://api.forgestack.dev/v1/responses \
  -H "Authorization: Bearer fs_live_8xK_dev_preview_4f9a2" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fs-pro-2.0",
    "latency_tier": "edge",
    "stream": true,
    "input": [
      {
        "role": "system",
        "content": "You are a concise build assistant for backend engineers."
      },
      {
        "role": "user",
        "content": "Generate a Redis-backed rate limiter for a Next.js API route."
      }
    ]
  }'

Global edge routing

Requests are routed to the nearest warm model pool with automatic regional fallback.

One stable contract

Swap models without rewriting payloads, stream handlers, retry logic, or observability.

Production telemetry

Trace tokens, latency, errors, and spend per route, tenant, model, or environment.

Your API key

Use this development key to test local inference calls. Rotate keys from the dashboard before production.

fs_live_8xK_dev_preview_4f9a2_B9mQp7zL

Double-click to copy.

Pricing

Start free, graduate to higher throughput, then move to dedicated capacity when inference becomes core infrastructure.

usage-based overages

Plan	Free	Growth	Scale
Requests/sec	5	100	1000
Tokens/month	100K	10M	1B
Models	fs-mini-1.5	+ fs-pro-2.0	+ custom finetunes
Price	$0	$99/mo	Contact us

Changelog

ForgeStack releases ship weekly with latency tuning, model upgrades, SDK improvements, and dashboard telemetry.

v2.4.0 · edge streams

Streaming responses now start at the edge with median first-token latency below 32ms.

v2.3.2 · model aliases

Pin to stable aliases like fs-pro-latest while preserving audit logs for every resolved model.

v2.3.0 · spend guards

Set hard token budgets per environment and receive webhook alerts before usage spikes.

Dashboard

Inspect live traffic, compare model performance, rotate keys, and promote inference routes across environments.

status: healthy

fs-mini-1.5

Fastest low-cost model for extraction, routing, classification, and short-form generation.

fs-pro-2.0

Balanced reasoning and latency for production copilots, agents, and developer workflows.

Dedicated pools

Reserved regional capacity, custom finetunes, tenant isolation, and stricter spend controls.