Requirements:
Users can Create posts
Upvotes and Downvotes on the posts of other users
Resource estimation
If we assume 300 million Daily Active Users (DAU) for Reddit, the resource estimation will scale significantly. Below is the updated resource estimation based on this new assumption.
Resource Estimation for 300M DAU
1. Traffic Estimation
Daily Active Users (DAU): 300 million.
Requests per User per Day: 100.
Total Requests per Day: 300 million * 100 = 30 billion requests/day.
Requests per Second (RPS): 30 billion / 86400 ≈ 350,000 RPS.
2. Storage Estimation
Posts
New Posts per Day: 1 million (scales with DAU, but assuming post creation rate remains similar).
Average Post Size: 1 KB (text) + 100 KB (media) = 101 KB.
Storage per Day: 1 million * 101 KB = 101 GB/day.
Storage for 5 Years: 101 GB/day * 365 * 5 ≈ 185 TB.
Comments
New Comments per Day: 10 million (scales with DAU, but assuming comment creation rate remains similar).
Average Comment Size: 500 bytes.
Storage per Day: 10 million * 500 bytes = 5 GB/day.
Storage for 5 Years: 5 GB/day * 365 * 5 ≈ 9 TB.
Votes
New Votes per Day: 100 million (scales with DAU, but assuming vote creation rate remains similar).
Average Vote Size: 50 bytes.
Storage per Day: 100 million * 50 bytes = 5 GB/day.
Storage for 5 Years: 5 GB/day * 365 * 5 ≈ 9 TB.
Total Storage
Per Day: 101 GB (posts) + 5 GB (comments) + 5 GB (votes) = 111 GB/day.
For 5 Years: 185 TB + 9 TB + 9 TB ≈ 200 TB.
3. Bandwidth Estimation
Requests per Day: 30 billion.
Average Response Size: 10 KB (including text, media, and metadata).
Total Outgoing Data per Day: 30 billion * 10 KB = 300 TB/day.
Bandwidth Required: 300 TB/day ≈ 30 Gbps (assuming even distribution over 24 hours).
4. Compute Resources
Web Servers
Requests per Second (RPS): 350,000.
Requests per Server: Assume each server can handle 1,000 RPS.
Number of Web Servers: 350,000 / 1,000 = 350 servers.
Redundancy: Double the number for high availability = 700 servers.
Database Servers
SQL Database:
Assume 1 primary and 4 replicas for high availability.
Total: 5 servers.
NoSQL Database:
Use a distributed database like Cassandra.
Assume 12 nodes for replication and fault tolerance.
Cache Servers:
Use Redis or Memcached.
Assume 12 servers for redundancy and load distribution.
Message Queue
Kafka/RabbitMQ:
Assume 6 brokers for fault tolerance.
Search Servers
Elasticsearch:
Assume 6 nodes for replication and fault tolerance.
Total Compute Resources
Web Servers: 700.
SQL Database: 5.
NoSQL Database: 12.
Cache Servers: 12.
Message Queue: 6.
Search Servers: 6.
Total: 741 servers.
Source of Truth Table
We will store our comments data in multiple database but still we need a Data base that hold all the data . This database will hold all the comments and nested comments also, technically it will lookalike a Tree.
Keep reading with a 7-day free trial
Subscribe to Better Engineers to keep reading this post and get 7 days of free access to the full post archives.