System Design Interview - URL Shortener

Problem Statement

Persona:

  1. User: someone who creates the shortened URL
  2. Audience: recepient of the shortened URL

Stories:

  1. As a user
    1. I want to create a randomised URL with less than ten characters based on a URL I provide
    2. So that I can copy paste the URL easily
  2. As a user
    1. I want to create a vanity URL based on a URL I provide
    2. So that my audience can easily distinguish my URL
  3. As a user
    1. I want to be able to replace the URL provided to the randomised URL
    2. So my audience gets redirected to the new URL I need to change it due to mistakes or new development
  4. As a user
    1. I want to be able to replace the URL provided to the vanity URL
    2. So my audience gets redirected to the new URL I need to change it due to mistakes or new development
  5. As a user
    1. I do not want other users to be able to replace the URL I provided
    2. So that my audience gets does not get scammed or phished by the shortened URL I provided
  6. As an audience
    1. I want the shortened URL I received to redirect to the URL the user provided
    2. So that I get the content that I am expecting

Non functional requirements:

  1. High availability
  2. High read throughput, 1000x updates + create (writes)
  3. Significant write throughput, 100 RPS
  4. Low redirection latency
  5. Fault tolerance

Discussions

Estimations:

  1. Create throughput: 10 RPS
  2. Dataset size:
    1. A request takes: 512 char URL, 10 char shortened / vanity URL, UTF-8 -> 522 * 8 -> 4 kb
    2. 1 year of storage: 4 * 365 * 24 * 3600 * 10 -> 1.3 Tb per year, 13 Tb storage for 10 years

Authorisation options:

  1. Username and password
  2. Single sign on
  3. Token based (admin URL, randomised string password)

Frontend options:

  1. Server rendered pages (thymeleaf, razor, pug, etc)
  2. SPA (react + redux, angular, vue)

Backend options:

  1. MVC backend (ASP.NET MVC, Spring MVC, Express)
  2. FaaS (Knative, AWS lambda, Google cloud functions, etc)

Additional features:

  1. Metrics (redirection counts)
    1. Innacuracy acceptable?
  2. Redirection countdown display
    1. Ads?
    2. Other information?

System Design

Here we will explore designs with:

  1. Username and password + single sign on authorisation
  2. No discussions on front end, because it is more practical solution based on skill set (mine is react + redux)
  3. FaaS backend
  4. Metrics inaccuracy unacceptable
  5. No redirection countdown
  6. Transactional database preventing non repeatable read
    1. To prevent inserting the same id twice
    2. Unlikely to happen with randomised URL, but highly possible with vanity URL

Main reasonings:

  1. Managed services as much as possible to reduce bootstrap codes
  2. FaaS are easy to spawn and destroy (ports and adapters)
    1. Fast replacement of features
    2. Independent scaling between differing components (scale read more than write)
  3. Codebases for each functions are independent
  4. Data structure will not need to change frequently
  5. Data structure can change independently between all functions
    1. Data definition first approach instead of code first
    2. Backward compatibility of data definition is required

Backend Design

Components:

  1. API gateway (AWS API gateway, Kong, Istio)
  2. Identity management (cognito, auth0, okta)
  3. FaaS (AWS Lambda, Knative)
  4. OLTP database (AWS RDS, Postgresql)
  5. Cache (AWS elasticache, Redis, memcached, hazelcast)
  6. Emailer (AWS SES, Mailbird, Mailgun)

Metrics Capture

We can use Clickstream analytics (Firebase, Supabase, CleverTap, etc) in the frontend, as a step before redirecting to the URL provided.

Design Diagram

alt text