Simple, Transparent Pricing

One plan. Everything included. No per-request fees.

Everything Included

Managed Relay

$20/mo

Flat rate, no usage surprises

One managed relay instance
Unlimited workers
Unlimited requests
WebSocket streaming
Request queueing & cancellation
All three API formats (OpenAI, Anthropic, Responses)
TLS & authentication included
Priority support

Open source · No vendor lock-in · Cancel anytime

Why Not Self-Host?

Self-Hosted

Provision and maintain a VPS
Configure TLS certificates
Set up WebSocket reverse proxy
Monitor uptime and alerts
Keep dependencies updated

ModelRelay Managed

Ready in under a minute
TLS and auth handled for you
Automatic monitoring and scaling
Always up-to-date
Less than the cost of a small VPS

Frequently Asked Questions

What API formats does ModelRelay support?

ModelRelay supports three API formats: OpenAI Chat Completions (/v1/chat/completions), Anthropic Messages (/v1/messages), and OpenAI Responses (/v1/responses). Use whichever SDK you prefer — they all route to your own workers.

Are there any per-request or bandwidth fees?

No. The $20/month plan includes unlimited requests and unlimited data transfer. You only pay the flat monthly fee regardless of usage.

Can I connect multiple GPU machines?

Yes. You can connect as many worker machines as you want. ModelRelay automatically load-balances requests across all connected workers and handles failover if one goes offline.

What happens if I cancel?

You can cancel anytime from your dashboard. Your relay stays active until the end of your billing period. ModelRelay is open source, so you can always self-host if you prefer.

Is ModelRelay open source?

Yes. The relay server, worker SDK, and llamafile CLI are all open source on GitHub. The managed service saves you the work of hosting and maintaining it yourself.

Ready to get started?

From sign-up to your first inference request in under a minute.

Get Started →