System Design Interview, Volume 2 | Chapter 2. Nearby Friends

This chapter designs a scalable backend system that supports a mobile app feature called Nearby Friends.

Facebook launches Nearby Friends

Step 1: Understand the Problem and Define Scope

  • “Nearby” means within a straight-line distance of 5 miles.
    • This value can be configured.
  • Store the user’s movement history.
    • It may be used for various purposes such as machine learning.
  • If a user is inactive for more than 10 minutes, the user disappears from the nearby friends list.
  • Privacy data protection laws are not considered.
    • They are excluded because they are complex.

Functional Requirements

  • Users must be able to check nearby friends in the mobile app.
    • Display the distance to each friend and the timestamp when the information was last updated.
  • This friend list must be updated every few seconds.

Non-functional Requirements

  • Low latency
    • Changes in nearby friends’ locations must not take too long.
  • Reliability
    • The system should generally be reliable, but losing a small amount of data can be acceptable at times.
  • Eventual consistency
    • Strong consistency is not required when storing location data.
    • It is acceptable if it takes a few seconds until replica data becomes identical to the original.

Rough Scale Estimation

  • Nearby Friends is defined as friends within a 5-mile radius.
  • Friend location information is updated every 30 seconds.
    • People do not walk extremely fast, so 30 seconds is sufficient.
  • Assume 100 million users use the nearby friends search feature per day.
  • Assume concurrent users are 10%.
    • 10% * 100 million = 10 million.
  • Assume an average user has 4,000 friends.
  • This feature displays 20 nearby friends per page.
    • If requested, more friends are shown.

Step 2: Propose a High-level Design and Get Agreement

Unlike other chapters, location information must be sent to all friends. Since HTTP cannot be used between server and client for this, it is hard to know what API to create before understanding the high-level design. Therefore, we first look at the high-level design.

High-level Design

Shared backend

  • Receive location change history from every active user.
  • Whenever a user’s location changes, deliver the change to active friends.
  • If two users are far apart, send the change history.

The problem is that this is not easy to apply at large scale.

  • If there are about 10 million active concurrent users and location information is updated every 30 seconds, there are about 334,000 location updates per second.
    • TPS: 10,000,000 transactions / 30 seconds = 333,333.333 TPS(about 334,000).
  • If the average user has 400 friends and 10% of them are nearby and active, there are 14 million location update requests.
    • 334,000 per second * 400 friends * 10% = 14 million.
  • This enormous volume of updates must be sent to user devices.

Design

High-level design

  • Load balancer
    • Located in front of WebSocket and HTTP servers.
    • Distributes traffic among servers to balance load.
  • RESTful API server
    • Handles auxiliary operations such as adding/deleting friends and updating user information.
  • WebSocket server
    • A server cluster that maintains real-time state for friend location changes.
      • Each client keeps one WebSocket connection.
      • Location changes are sent over this connection.
  • Redis location cache
    • Used to cache the latest location information of active users.
    • A TTL is specified so the location cache is deleted after time passes.
    • The TTL is also refreshed whenever cache information is updated.
  • User database
    • Stores user data and friend relationship information.
    • Either RDB or NoSQL can be used.
  • Location history database
    • Stores users’ location change history.
  • Redis Pub/Sub server
    • Uses Redis Pub/Sub as an ultra-lightweight message bus.

Periodic location update

  1. The client sends the location change to the load balancer.
  2. The load balancer sends the location change to the WebSocket server.
  3. The WebSocket server stores the event in the location history database.
  4. The WebSocket server stores it in the cache and updates the TTL.
  5. The WebSocket server publishes the location to that user’s channel in Redis Pub/Sub. Steps 3 to 5 are performed in parallel.
  6. Redis Pub/Sub broadcasts it to all subscribers(online friends).
  7. The distance between the user who sent the new location and the user who received the message is recalculated.
  8. If the distance calculated in step 7 does not exceed the search radius, the information is sent to each client. If it exceeds the radius, it is not sent.

API Design

  1. Server API: periodic location update.
  2. Client API: API used by the client to receive updated friend locations.
  3. Server API: WebSocket initialization API.
  4. Client API: new friend subscription API.
  5. Client API: unsubscribe API.

Data Model

Location cache

  • Key: user ID.
  • Value: {location, longitude, timestamp}.

Only current information is stored.
Redis is suitable because active users can be stored with TTL.

Location history database

  • Columns: user ID, location, longitude, timestamp.

As large-scale processing requirements grow, a database that supports sharding is needed.

Step 3: Detailed Design

Scalability by Important Component

API Server

  • There are many ways to automatically scale clusters based on CPU and I/O usage.

WebSocket Server

  • It is not difficult to automatically scale based on usage.
  • Automatic scaling can be handled at the load balancer.

Client Initialization

  1. Update the location cache.
  2. Fetch all friend information.
  3. Fetch the locations of all active friends at once.
  4. Inactive users should already be gone due to TTL.
  5. Establish WebSocket connections for each received friend location.
  6. The WebSocket server subscribes to each friend’s Redis Pub/Sub channel.
  7. The user’s current location is sent to all friends through the user’s dedicated channel in Redis Pub/Sub.

User Database

  • Stored data
    • User ID, username, profile image URL, and so on.
    • Friend relationship data.
  • A database sharded by user ID is needed for large scale.
  • At sufficient scale, a separate team may be needed to manage user and friend data.

Location Cache

  • For large-scale processing, load can be distributed by sharding across multiple servers based on user ID.
  • To increase availability, standby nodes can be replicated, and if the primary node fails, the standby node can be promoted to reduce downtime.

Redis Pub/Sub Server

  • Redis Pub/Sub was chosen because channel creation is very cheap.
  • The bottleneck of Redis Pub/Sub is not memory but CPU usage.
  • Expected memory usage is about 200 GB, and servers that can install 100 GB each can be used.
  • Location updates: 14 million per second / subscribers one server can handle: 100,000 = about 140 servers.
  • To handle large scale, a distributed Redis Pub/Sub cluster is needed.

Distributed Redis Pub/Sub Server Cluster

  • Pub/Sub servers can be sharded based on the user ID that publishes messages.
    • All channels are independent.
  • In operations, a service discovery component can be introduced to solve this problem.
    • Examples include etcd and ZooKeeper.
  • One approach is to use the hash ring of consistent hashing.

Adding/Deleting Friends

  • When friends are added or deleted, callbacks can be registered in the app.
  • When called, this callback sends a message to the WebSocket server to subscribe or unsubscribe from the friend’s Pub/Sub channel.

Users with Many Friends

  • Handle only up to 5,000 friends, and exclude discussion of one-way relationships such as follower models.
  • The Pub/Sub subscription relationships needed to subscribe to thousands of friends are distributed across many WebSocket servers in the cluster.
  • Because the load is shared by each socket server, hotspot problems should not occur.

Nearby Arbitrary Users

  • Maintain a pool of Pub/Sub channels built according to geohash.
    • Details are omitted.

Alternatives to Redis Pub/Sub

  • Erlang can be used as an alternative.
    • Details are omitted.

Step 4: Wrap-up

  • We designed a system that efficiently delivers users’ location changes to their friends.
  • WebSocket, Redis, and Redis Pub/Sub are the key components of this design.
  • We also looked at solutions as the system grows from small scale to large scale.