Rate limits & quotas
When RB_ENABLE_QUOTAS=true, two per-tenant quotas cap what can be stored and queried, and a short-term token-bucket rate limiter protects the service from bursts. These are the default per-tenant limits when RB_ENABLE_QUOTAS=true.
Default limits
What a default tenant gets. The daily query counter resets at UTC midnight (00:00:00 UTC).
| Quota | Default | Resets |
|---|---|---|
| Vectors stored | 100,000 | never (cumulative) |
| Queries per day | 10,000 | daily, at UTC midnight |
GET /auth/usage
Check usage
Check current usage against your quotas at any time from any JWT-authenticated client.
curl -s http://localhost:8080/auth/usage \ -H "Authorization: Bearer $RB_API_KEY"
Response (HTTP 200):
{
"vectors_used": 12500,
"vector_quota": 100000,
"queries_today": 342,
"daily_query_quota": 10000,
"queries_reset_at": "2026-05-16"
}queries_reset_at is a YYYY-MM-DD date string — the UTC day the daily counter was last reset (effectively "today"). queries_today is lazily zeroed on the first usage call of a new UTC day. The call performs that reset before reading, so the values are never stale from a previous day.
What happens at the limit
Quota breaches return HTTP 429 with the standard error envelope (see Authentication).
vector_quota_exceeded— an upload toPOST /v1/datasets/{name}/vectors(or a bulkPOST /v1/datasets/{name}/imports) would push stored vectors past the 100,000 cap. The upload is rejected whole — there is no partial acceptance up to the cap.detailscarrieslimitandused.query_quota_exceeded— aPOST /v1/queryafter the daily 10,000-query cap is reached.detailscarrieslimitandreset_at.
Sample 429 query_quota_exceeded body:
{
"error": {
"code": "query_quota_exceeded",
"message": "Daily query quota exceeded for this tenant",
"details": {
"limit": 10000,
"reset_at": "2026-05-16"
}
}
}Per-key rate limit
Separate from the quotas above: a short-term limit on request rate per API key. Each API key gets roughly 50 requests/second sustained. Bursting past that returns 429 rate_limited:
{
"error": {
"code": "rate_limited",
"message": "Rate limit exceeded; slow down and retry",
"details": { "limit_rps": 50, "burst": 100 }
}
}Treat rate_limited as transient: retry the request with exponential backoff and jitter. The limiter is an in-memory token bucket and best-effort in the MVP — it is process-local, so it resets on restart and is not shared across pods. It is applied to the customer-facing /v1/* endpoints only; the /auth/* surface is not rate-limited. Requests authenticated with a JWT are bucketed per tenant rather than per key.