System Design Interview - URL Shortener
Problem Statement
Persona:
- User: someone who creates the shortened URL
- Audience: recepient of the shortened URL
Stories:
- As a user
- I want to create a randomised URL with less than ten characters based on a URL I provide
- So that I can copy paste the URL easily
- As a user
- I want to create a vanity URL based on a URL I provide
- So that my audience can easily distinguish my URL
- As a user
- I want to be able to replace the URL provided to the randomised URL
- So my audience gets redirected to the new URL I need to change it due to mistakes or new development
- As a user
- I want to be able to replace the URL provided to the vanity URL
- So my audience gets redirected to the new URL I need to change it due to mistakes or new development
- As a user
- I do not want other users to be able to replace the URL I provided
- So that my audience gets does not get scammed or phished by the shortened URL I provided
- As an audience
- I want the shortened URL I received to redirect to the URL the user provided
- So that I get the content that I am expecting
Non functional requirements:
- High availability
- High read throughput, 1000x updates + create (writes)
- Significant write throughput, 100 RPS
- Low redirection latency
- Fault tolerance
Discussions
Estimations:
- Create throughput: 10 RPS
- Dataset size:
- A request takes: 512 char URL, 10 char shortened / vanity URL, UTF-8 -> 522 * 8 -> 4 kb
- 1 year of storage: 4 * 365 * 24 * 3600 * 10 -> 1.3 Tb per year, 13 Tb storage for 10 years
Authorisation options:
- Username and password
- Single sign on
- Token based (admin URL, randomised string password)
Frontend options:
- Server rendered pages (thymeleaf, razor, pug, etc)
- SPA (react + redux, angular, vue)
Backend options:
- MVC backend (ASP.NET MVC, Spring MVC, Express)
- FaaS (Knative, AWS lambda, Google cloud functions, etc)
Additional features:
- Metrics (redirection counts)
- Innacuracy acceptable?
- Redirection countdown display
- Ads?
- Other information?
System Design
Here we will explore designs with:
- Username and password + single sign on authorisation
- No discussions on front end, because it is more practical solution based on skill set (mine is react + redux)
- FaaS backend
- Metrics inaccuracy unacceptable
- No redirection countdown
- Transactional database preventing non repeatable read
- To prevent inserting the same id twice
- Unlikely to happen with randomised URL, but highly possible with vanity URL
Main reasonings:
- Managed services as much as possible to reduce bootstrap codes
- FaaS are easy to spawn and destroy (ports and adapters)
- Fast replacement of features
- Independent scaling between differing components (scale read more than write)
- Codebases for each functions are independent
- Data structure will not need to change frequently
- Data structure can change independently between all functions
- Data definition first approach instead of code first
- Backward compatibility of data definition is required
Backend Design
Components:
- API gateway (AWS API gateway, Kong, Istio)
- Identity management (cognito, auth0, okta)
- FaaS (AWS Lambda, Knative)
- OLTP database (AWS RDS, Postgresql)
- Cache (AWS elasticache, Redis, memcached, hazelcast)
- Emailer (AWS SES, Mailbird, Mailgun)
Metrics Capture
We can use Clickstream analytics (Firebase, Supabase, CleverTap, etc) in the frontend, as a step before redirecting to the URL provided.
Design Diagram
