Best practices for handling API rate limits and 429 errors

Prev Next

Introduction

When integrating with our APIs, clients may occasionally receive an HTTP 429 – Too Many Requests response.

This indicates that the request rate from the client has temporarily exceeded the allowed limits.

Our rate-limiting implementation is designed to protect system stability and ensure fair usage across all clients.

This article describes how the rate limiter works and how client applications should safely handle HTTP 429 responses.

Rate limiting model (token bucket)

Rate limits are applied per identity:

  • The Logged-In Identity (User ID)

    • When a request is authenticated (using an API key, OAuth token, etc.), the identity is the specific User account.

      How it works

      • It doesn’t matter if that user is making requests from a laptop, a phone, and a server simultaneously; all those requests are pooled together because they share the same User ID.

      • It can be a person or a specific service account.

  • The Anonymous Identity (IP Address)

    • When a request is unauthenticated, the identity is the IP address of the machine sending the request.

      How it works

      • Since the system doesn’t know who is calling, it tracks where the call is coming from. Multiple people sitting in the same office sharing one public IP address would likely be treated as a single "identity" in this context.

      • It can be a specific network location or device.

Please note

If you are building an integration, "identity" means your rate limit is tied to the credentials you use. If you are building a public-facing tool without login requirements, your rate limit is tied to the user's network.

Each identity has a token bucket:

  • Each request consumes one token

  • Tokens refill at a fixed tokens-per-second rate

  • When no tokens remain, the API responds with HTTP 429

  • Once tokens accumulate again, the requests succeed normally

Handling 429 responses: exponential backoff with jitter

What is Jitter?

Jitter is the addition of randomness to the delay time before a client retries a request. Instead of waiting for a fixed, exact duration, jitter ensures the retry occurs within a variable range (for example, between 500ms and 1000ms). This technique desynchronizes client activity, preventing synchronized bursts where multiple clients retry simultaneously, which helps spread traffic over time and stabilizes system recovery.

Clients must implement exponential backoff with jitter since the API does not specify how long to wait before retrying.

Recommended backoff schedule

Retry attempt

Delay

1

500–1000 ms

2

1–2 s

3

2–4 s

4

4–8 s

5

8–16 s

Why jitter?

Without randomness, multiple clients retry at the same time causing synchronized bursts that may generate more 429s and destabilize the system.

Jitter spreads retry attempts over time and stabilizes recovery

Maximum retry cap

To avoid infinite retry loops or cascading failures:

  • Maximum delay: 30–60 seconds

  • Maximum attempts: 5–7

  • After exceeding the limit: stop retrying and return an error upstream.

What not to do

Clients should adhere to these guidelines:

  • Do not retry immediately after a 429

  • Do not retry requests in parallel

  • Do not spike request volume after recovery

  • Do not assume any fixed rate window or quota
    (for example, “we can do 10 requests per second” or “the limit resets at :00”)

Pseudocode example

This example is generic pseudocode illustrating a way to implement basic exponential backoff with jitter.

function sendWithRetry(request):
  baseDelay = 500  // ms
  maxDelay = 60000 // ms
  maxAttempts = 7
  attempt = 0
  while attempt < maxAttempts:
    response = sendHttpRequest(request)
    if response.status != 429:
      return response
    attempt += 1
    // exponential delay
    delay = baseDelay * (2 ^ (attempt - 1))
    // jitter ±50%
    jitter = randomBetween(0.5, 1.5)
    delay = delay * jitter
    delay = min(delay, maxDelay)
    sleep(delay)
  throw "Max retry attempts exceeded"

What the rate limiter does not provide

Our current rate-limiting implementation does not return:

  • Retry-After header

  • Remaining quota for the current window

  • Information about the size or boundaries of the time window

Clients must implement their own backoff logic and treat every 429 as a signal to, temporarily, reduce the request rate.

Summary

  • An HTTP 429 indicates that the request rate has exceeded the allowed limits.

  • The server does not indicate how long to wait before retrying.

  • Clients must implement exponential backoff with jitter, with strict retry caps.

  • Avoid parallel retries and assumptions about rate windows.

  • The previously shown pseudocode example illustrates correct and safe retry behavior.