Simple, Transparent Pricing
One plan. Everything included. No per-request fees.
Managed Relay
- One managed relay instance
- Unlimited workers
- Unlimited requests
- WebSocket streaming
- Request queueing & cancellation
- All three API formats (OpenAI, Anthropic, Responses)
- TLS & authentication included
- Priority support
Why Not Self-Host?
Self-Hosted
- Provision and maintain a VPS
- Configure TLS certificates
- Set up WebSocket reverse proxy
- Monitor uptime and alerts
- Keep dependencies updated
ModelRelay Managed
- Ready in under a minute
- TLS and auth handled for you
- Automatic monitoring and scaling
- Always up-to-date
- Less than the cost of a small VPS
Frequently Asked Questions
What API formats does ModelRelay support?
ModelRelay supports three API formats: OpenAI Chat Completions (/v1/chat/completions), Anthropic Messages (/v1/messages), and OpenAI Responses (/v1/responses). Use whichever SDK you prefer — they all route to your own workers.
Are there any per-request or bandwidth fees?
No. The $20/month plan includes unlimited requests and unlimited data transfer. You only pay the flat monthly fee regardless of usage.
Can I connect multiple GPU machines?
Yes. You can connect as many worker machines as you want. ModelRelay automatically load-balances requests across all connected workers and handles failover if one goes offline.
What happens if I cancel?
You can cancel anytime from your dashboard. Your relay stays active until the end of your billing period. ModelRelay is open source, so you can always self-host if you prefer.
Is ModelRelay open source?
Yes. The relay server, worker SDK, and llamafile CLI are all open source on GitHub. The managed service saves you the work of hosting and maintaining it yourself.