Designing a URL Shortener Database Backend
Introduction
Imagine your URL shortener gets hacked because attackers easily guess sequential short URLs, scanning your entire database. This lecture tackles designing a secure, scalable database backend for a URL shortener. We compare two approaches—auto-incrementing IDs vs. hashed/custom URLs—and discuss scaling strategies. The key revelation? Simplicity and scalability often clash; the right design depends on your use case.
Core Concepts
Design 1: Auto-incrementing IDs
- Structure: A relational table with ID(primary key, 64-bit serial) andLong URL.
- Workflow:
    - Shorten: Insert Long URL, returnID(e.g.,domain.com/7).
- Redirect: Query by ID(fast due to indexed integer).
 
- Shorten: Insert 
- Pros:
    - ⚡ Blazing-fast reads/writes (indexed integer lookups).
- 🛠️ No collision risk (unique IDs guaranteed).
 
- Cons:
    - 🔍 Predictable URLs: Attackers can scan IDs to harvest URLs.
- 🚫 No custom URLs.
 
Design 2: Hashed/Custom URLs
- Structure: Table with Short URL(primary key, 8-character hash) andLong URL.
- Workflow:
    - Shorten: Hash the Long URL(e.g., SHA-256 + base64), insert. On collision, retry with a salt.
- Redirect: Query by Short URL(indexed but slower than integers).
 
- Shorten: Hash the 
- Pros:
    - 🔒 Unpredictable URLs: Mitigates scanning attacks.
- 🎨 Supports custom URLs (user-defined short paths).
 
- Cons:
    - 🐢 Slower writes: Collision handling adds retries.
- 💾 Larger indexes: Strings vs. integers.
 
Scaling Strategies
Replication & Read Replicas
- Master Database: Handles all writes (e.g., creating short URLs).
- Read Replicas:
    - Deploy globally to serve GET requests (redirects).
- Reduce load on the master and improve read latency for users.
- Trade-off: Eventual consistency—replicas may lag behind the master by milliseconds.
 
Reverse Proxy (Layer 7 Load Balancer)
- Role:
    - Routes POST requests (writes) to the master database.
- Directs GET requests (reads) to the nearest read replica.
 
- Geographical Distribution:
    - Place proxies in regions (e.g., Asia, Europe) to minimize latency.
- Example: A user in Europe accesses a European replica, while writes go to a master in North America.
 
Sharding (Horizontal Partitioning)
- When to Use: Only if the master database cannot handle write volume.
- Approach:
    - Split data by a shard key (e.g., Short URLprefix or geographic region).
- Each shard operates independently, reducing write congestion.
 
- Split data by a shard key (e.g., 
- Trade-off: Complexity—joins and global queries become harder.
“Sharding is a last resort. Start with replication and optimize until you absolutely need it.” — Key scaling insight.
Key Characteristics
- Indexing:
    - Design 1 uses a compact integer index; Design 2 uses a string index (larger but secure).
 
- Collision Handling:
    - Design 2 requires salting and retries on insertion conflicts.
 
Advantages & Disadvantages
| Aspect | Design 1 (Auto-ID) | Design 2 (Hashed/Custom) | 
|---|---|---|
| Speed | Faster reads/writes | Slower writes (collision retries) | 
| Security | Predictable, vulnerable | Secure, unpredictable | 
| Customization | No custom URLs | Supports custom URLs | 
| Scalability | Easier to scale reads | Same as Design 1 + sharding | 
Practical Implementations
- 
    Database Setup: -- Design 1 CREATE TABLE short_urls ( id SERIAL PRIMARY KEY, long_url TEXT NOT NULL ); -- Design 2 CREATE TABLE custom_urls ( short_url VARCHAR(8) PRIMARY KEY, long_url TEXT NOT NULL );
- Scaling Infrastructure:
    - Use PostgreSQL or MySQL for replication (supports built-in read replicas).
- Configure a reverse proxy (e.g., NGINX) to route traffic.
 
- Redirect Logic:
    - Use HTTP 301(permanent) or302(temporary) redirects after querying the database.
 
- Use HTTP 
Conclusion
- Choose Design 1 for internal systems where speed matters and URLs aren’t user-facing.
- Choose Design 2 for public services (e.g., Bitly) requiring security and custom URLs.
- Scaling Takeaway:
    - Start with replication for reads.
- Use a reverse proxy to manage traffic.
- Shard only when writes overwhelm the master.
 
“Simplicity sometimes rules. A simple design leads to solid, maintainable systems.” — Key insight from the lecture.