Distributed System Refreshers :
A Rate Limiter limits the number of client requests allowed to be sent over a specified period. If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked.
There are plenty of algorithms for rate limiting: we will look at some of them including a Token bucket, Leaky bucket, Sliding window counter, etc.
Algorithms for rate limiting
1. Token Bucket
Concept:
A bucket holds a fixed number of tokens.
Tokens are added to the bucket at a constant rate.
A request can only proceed if it consumes a token.
If the bucket is empty (i.e., no tokens left), the request is rejected or delayed until new tokens arrive.
Use Cases:
✅ API rate limiting (e.g., AWS API Gateway)
✅ Traffic shaping in networks
✅ Ensuring fair usage of shared resources
Pros:
✔ Allows burst traffic as long as tokens are available.
✔ More flexible than strict time-based approaches.
Cons:
✖ If all tokens are consumed, new requests must wait for replenishment.
Code: Java
2. Leaky Bucket
Concept:
Similar to a physical bucket with a hole, requests enter the bucket but leak out at a fixed rate.
If the bucket overflows (too many incoming requests), excess requests are dropped.
Helps smooth out bursty traffic.
Use Cases:
✅ Network congestion control (e.g., traffic policing in routers)
✅ API rate limiting where a constant processing rate is needed
Pros:
✔ Ensures a steady outflow of requests, preventing overload.
✔ Prevents sudden spikes from overwhelming the system.
Cons:
✖ Can discard excess requests, leading to potential data loss.
Code : Leaky Bucket
3. Fixed Window Counter
The sliding window algorithm is actually a variation of the two algorithms, namely the fixed window and the leaky bucket. It keeps a moving time frame and restricts the number of requests to be made within that frame. It provides for a finer and basically a better rate limiting in that the window is most likely renewed and the rate ideally spread out evenly over a period such that there can be adequate control over the traffic direction and occurrence.
Use Cases:
✅ Simple API rate limiting (e.g., 100 requests per minute)
✅ Protecting resources from excessive requests
Pros:
✔ Simple and easy to implement.
✔ Works well when traffic is evenly distributed.
Cons:
✖ Not suitable for bursty traffic—many requests at the end of one window and the start of the next can cause overload (known as the boundary problem).
Code :
4. Sliding Window Algorithm
Concept:
Instead of using a fixed window, this method maintains a log of request timestamps.
Whenever a new request arrives, the system removes old requests outside the time window and counts the remaining ones.
If the count is within the allowed limit, the request is allowed.
Use Cases:
✅ More accurate rate limiting than fixed window counters
✅ Scenarios where smooth traffic distribution is required
Pros:
✔ Eliminates the boundary problem of fixed windows.
✔ Provides fine-grained control over request rates.
Cons:
✖ Memory-intensive—it needs to store all timestamps.
✖ Performance overhead due to frequent log updates.
Code
High Level Design of Rate Limiter
Keep reading with a 7-day free trial
Subscribe to Better Engineers to keep reading this post and get 7 days of free access to the full post archives.