Place: Self-hosted · Hetzner
Role: Sole architect and builder
Team: 1
Stack: FastAPI · Python 3.12

Running two AI-powered products from a single server — Vera (personal assistant) and Stepan (Instagram sales agent) — the recurring problem was API key sprawl and invisible cost. AIbroker is the single control plane: every LLM call across all projects routes through it, with per-project caps, automatic cooldown on 429s, and a Telegram alert the moment a key dies.

The cost incident that made it necessary

In June 2026, a stale cost multiplier in LiteLLM caused provider calls to be logged at 1/20th of actual price. The cost guard thought it had budget remaining; it did not. A $25 bill arrived. AIbroker was already in development; the incident pushed it from "useful experiment" to "production requirement." The broker reads real USD from usage logs, not model-tier estimates.

Two modes for two problems

Proxy mode: the broker calls the provider via LiteLLM SDK, returns the response, logs the cost. The client project never sees the API key. Vending mode: the broker issues a short-lived key lease; the client calls the provider directly, reports usage back on release. Vending handles providers that do not conform to the OpenAI-compatible interface.

What runs in production

LRU-aware key selection avoids hot-rotating one key while others sit idle. Per-project daily and monthly cost caps enforce budget discipline across all connected clients. A health monitor runs every 10 minutes — cheapest valid call per provider — and marks dead keys immediately with a Telegram alert. A live dashboard shows cost, key health, and per-project usage.

Live links

← Back to all work

AIbroker. One control plane for every API key you use.

The cost incident that made it necessary

Two modes for two problems

What runs in production