Volt Spark

Tokens-as-a-service, OpenAI-compatible

Frontier open-weights LLMs in production, served in your customer's metro. Bedrock-beating prices with zero egress.

$0.95/M tokens, Llama 70B standard

Request access Docs

OpenAI drop-in

Change the base URL and key. Your existing SDK code keeps working.

Zero egress, in-metro serving

Tokens are served in your customer’s metro. Data never leaves the city.

Sovereign tier

Pod-pinned inference with attestation. $1.45/M — zero egress, sovereign, multi-vendor.

Drop-in compatible

Point the OpenAI SDK at Volt. No rewrite, no new client library.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cuemby.cloud/v1",
    api_key="volt-...",
)

resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello, Volt."}],
)

print(resp.choices[0].message.content)

Standard catalog: Llama 3.3/4, Mistral, Gemma 3, Phi-4, and more.
Bring your own LoRA or full-weights fine-tunes.
99.9% uptime SLA with credits at 99.0% and 98.0% breach.

Run frontier models in your metro.

Zero egress, in-metro serving, at Bedrock-beating prices. Your data never leaves the city.

Request access Read the docs