System Design of Rate Limiter
Design deep dive
Considerations when Designing a Rate Limiter
When designing a rate limiter, we need to choose the right rate limiting strategy depending on who our clients are, what our clients' expectations are, and what level of service we can offer.
Specifically, we should find out the anticipated number of callers, average and peak number of requests per client, average size of incoming requests, likelihood of surges in traffic, max latency our callers can tolerate, what our overall infrastructure looks like, and what it can achieve in terms of throughput and latency.
Only then can we start to define and implement rate-limiting rules:
the number of requests and the unit of time,
the max size of the incoming requests,
which endpoints to rate limit,
how to respond when rate limiting,
how to react on rate limiter failure,
where to locate the rate limiter,
where to store rate-limiting state,
whether some endpoints require different limits,
whether to limit per account or per IP or otherwise,
whether to institute a global rate limit,
whether different tiers of clients should be limited differently,
whether to impose a strict limit or allow bursts of requests, and
which rate-limiting algorithm to use.
Let's unpack some of these aspects.
Keep reading with a 7-day free trial
Subscribe to Better Engineers to keep reading this post and get 7 days of free access to the full post archives.