System Design of URL Shortening Service like TinyURL

If you are looking for a great resource to not just improve your distributed system design skills but also to ace your distributed system design interviews and targeting level L5/L6 in companies like Facebook, Google, etc. then you should check “The Distributed System Design Interviews Bible”.

The course also includes a section on mock system design interviews. This is a great resource because it will provide you the experience of an actual distributed system design interview. More chapters and mock system design interviews are being added as well.

The following article on designing a URL shortening service like TinyURL is a chapter from the above course.

TinyURL is a URL shortening web service, which provides short aliases for redirection of long URLs.

The first step would be to collect the functional and non-functional requirements of the service.

Functional Requirements

The functional requirements include:

1. Given a big URL, the service should generate a unique small URL (or link) for the big URL (this is a write operation).

2. When provided a short URL, the service should redirect the user to the big URL, or else return “not found” if the big URL does not exist (this is a read operation).

3. The length of the short URL should be at most 6 characters (of course, in the future, we can increase the length to 7, 8, or more characters, if needed. However, we will see that even with 6 characters, we can generate a very large number of unique short URLs).

4. The generated short URL should be somewhat random. This is not a hard requirement, though, as we do not see why it would be bad for the service if someone can guess the next generated short URL.

Let us know in the comments what do you think why it would be bad for the service if someone can figure out the next generated short URL.

5. If two users try to create a short URL for the same big URL, the service should return different short URLs.

6. We would like the users to have an account with the service to create short URLs. Thus, we are not allowing an anonymous user to create short URLs. This will protect our service from some misbehaviors where an anonymous user could try to generate millions of short URLs, thus exhausting our short URL space. This would also allow us to monetize our service, where the service could have two types of users.

  • Free users — who can create up to a certain number (e.g., up to 10) of short URLs with some default expiration time (e.g., one month)
  • Premium users (or paying users) — who can create up to a certain number of short URLs which is much bigger than 10 (e.g., 100 or 1000 or 10000) and also with a longer expiration time (e.g., the expiration time in years, or never gets expired as long as the user is a premium user)

7. There should be a way for the developer to monitor the health and other metrics of the service (e.g., the number of reads/writes happening per minute, the total number of short URLs created).

Non-Functional Requirements

The non-functional requirements include:

1. The service should be highly available (e.g., having 99.999% or five 9s).

2. The service should be fault-tolerant.

  • Let us know what does it mean to be fault-tolerant?

3. The read and write operations (i.e., creating short URLs from big URLs and redirecting users to big URLs from short URLs) should occur with minimal latencies.

4. The service is scalable with increasing load.

5. Minimum cost possible — it dictates that the system should start with few servers to minimize the cost but should be elastic enough to scale with increasing user load.

6. The service has strong consistency in the sense that once we created and returned a short URL to the user, if the user queries the big URL using that short URL, the service should be able to return the big URL.

7. At the same time, we also want data to be durable. So, once a short URL has been created, it should be present in the system during its lifetime (i.e., before its expiration time).

Of course, some of the non-functional requirements are somewhat related. E.g.

  • the service cannot be highly available if it is not fault-tolerant
  • similarly, the service cannot have minimal latencies if it is not scalable with increasing load

We are restricting the functional and non-functional requirements to only the above set, although there can be more requirements. These requirements will dictate how we are going to design our service.

Application Programming Interface (API)

The service will support the following set of APIs.

The first set of APIs is related to user logon/logoff.


This API returns the user token after authenticating the user.

userLogoff ()

The service will use an external Identity Provider, such as Facebook or Google (i.e., the user can use his Facebook or Google account to login to the service).

Now, Let’s discuss the other set of APIs that are related directly to our service functionality. There are two operations that we are performing: one creating a short URL for a big URL and second retrieving a big URL from the short URL. So, there will be following two APIs that our service will provide:

createShortURL (userToken, bigURL, optional expirationTime)

This API takes a big URL and an optional expiration time as an input, and then creates and returns a unique short URL. Here, the userToken is returned by the logon API and is used to get the user identifier. The user identifier is then used to determine whether the user is a free user or a premium user. Based on this, we can decide whether the user has reached his allocated quota for the short URLs or not. It can also be used to throttle a user based on his allocated quota.

getBigURL (shortURL)

This API returns the big URL given a short URL if it exists.

High-Level Design

In a simple design, the user will be talking to a single application server in our service. That application server will be storing the mapping between the short URLs and the big URLs in the database locally. However, this design has several flaws. It is not scalable as a single server cannot handle 10s to 100s of thousands of requests per sec. Also, neither the design is fault-tolerant, nor the service is highly available. This single server can go down at any time when some failure happens. It can also go down for periodic maintenance, thus affecting the availability of the service.

Simultaneously, since the data is stored in the same single server’s hard drive, the design does not enforce data durability. The hard drive crash can cause instant data loss at any time.

Obviously, even though the design can fully serve the functional requirements, it cannot meet the non-functional requirements. This clearly shows that the non-functional requirements are an important factor in dictating the design of a service.

Now consider the following design:

Here, we have a minimum of three application servers that serve the users’ read/write requests. The users talk to the app servers via a load balancer. The load balancer distributes the read/write requests among the servers equally in a round-robin fashion. We are using a minimum of three application servers. Reason being that using two app servers is not enough to ensure that the service remains highly available at all times. It is possible that one server could be down for some time, for some periodic maintenance (e.g., host OS patching or deployment of new service builds). During that time, there is always a possibility of the single working server getting a failure, thus causing service unavailability. This is the reason we use at least three app servers. The periodic maintenance is done on one server at a time to avoid the possibility of a situation where only a single server is serving the read/write requests.

This design is also highly scalable. We will be monitoring each server. As the resource consumption in each server crosses above some threshold (e.g., 80% of CPU usage or the number of read/write requests per second cross 10K requests per server), we can add more servers and configure the load balancer to send requests to the new servers as well. Also, if the resource consumption goes below a certain threshold (e.g., less than 30% of CPU and/or the number of requests below 1K requests per second), we can remove servers to minimize the cost. Similarly, if a server dies, we can remove it from the load balancer as well.

The load balancer can also use different mechanisms other than round-robin to decide which server to forward the request to. This we will discuss later in future chapters.

By introducing at least three app servers, we have tried to ensure that the service’s design is highly available and scalable. However, the overall service design cannot be highly available, scalable, and durable if the datastore used is not highly available, scalable, and durable. The data store needs to exhibit these properties for the overall service to exhibit these properties. We will discuss the design of the datastore in a future chapter in detail.

In-Memory Cache:

The number of read requests would be much higher than the number of write requests. Initially, we can safely assume that the number of reads is at least 10x the number of writes. Since we also have an analytics and monitoring component to measure the number of read/write requests, we can determine the correct ratio between read and write requests. However, it is for sure that the number of reads will be much higher than the number of writes. So, we can introduce an in-memory cache to store the mapping between the short and big URLs to reduce requests hitting the datastore for reading.

Now there are two approaches to use an in-memory cache. The first approach is that each app server has its own local in-memory cache and the second approach is to use a global in-memory cache. We have discussed the different pros and cons of each approach in more detail in the course.

Key/Short URL Generation:

The first thing we will discuss is how the short URL will be encoded. We have different encoding mechanisms that we can use.

  • If we use base62 (0-to-9, a-to-z, A-to-Z) then we can encode up to 62 ^ 6 = 56.8 billion short URLs
  • If we use base64 (0-to-9, a-to-z, A-to-Z, -, _) then we can encode up to 64 ^ 6 = 68.7 billion short URLs

⚠ Some online resources discuss using ‘+’ and ‘/’ characters in base64 encoding. Please note these characters are unsafe for use in an HTTP request. And so, if you are using them, then you need to encode ‘+’ and ‘/’ as ‘%2B’ and ‘%2F’ respectively, but this will increase the size of the short URL to more than 6 characters. We will use base64 encoding with characters (0-to-9, a-to-z, A-to-Z, -, _).

Now, let us discuss how the short URL will be generated. There are different mechanisms of generating short URLs that have been discussed by other online resources, including some other YouTube videos and online books/courses on system design interviews.

Then some other online resources have suggested using a key-generation service (KGS), which generates unique short URLs and returns them to the app servers. However, this just moves the design complexity of generating a unique short URL to a different service. Then, we need to discuss how this KGS is designed and whether KGS comprises a single server or more. If a single server, this breaks our non-functional requirements around availability, scalability, durability, etc. If KGS has more than one server, it adds lots of complexity to KGS service’s design that how the KGS service will be generating unique keys and passing them to the app server when several app servers contact KGS service. Whether all the requests, from different app servers, go to a single server in KGS service (which means that the single server becomes a bottleneck for the service) or goes to multiple servers in KGS service (if that is the case, then how different KGS servers coordinate to only provide unique short URLs to different app servers). It also adds an extra dependency on a different service in our design, and this different service (i.e., KGS) is needed to be maintained as well.

In short, we do not need to rely on a zookeeper or a separate key generation service. We can generate unique short URLs as follows. In our data store, we are storing a counter key-value. Now when an app-server comes up, it will go to the datastore and read and increment this counter value in a database transaction by some increment value. The increment value could be passed to the server as a starting configuration and could be 10, 100, or 1000. Let us take 100 as an example increment value right now. Now when after reading and incrementing the counter by 100 when the app server commits the transaction successfully, then it can safely assume that it can use all the counter values starting from the read value up to read value plus 99 for generating short URLs.

Consider an example below, where we have three app servers, and in the datastore, the counter value is 1.

Now first app server 1 goes to the datastore and read the counter value of 1 and increment the counter value from 1 to 101 in a database transaction . If the transaction is successful, then app server 1 can safely assume that it can generate short URLs using counter values 1 to 100. When it exhausts all these values, it will go again to the datastore to read the next set of counter values.

After app server 1, app server 2 goes to the datastore and reads and increments the counter’s value by 100 in a database transaction. This time app server 2 will read the value of 101 and increment the counter value in the datastore to 201. Thus, it can safely assume that it can use values from 101 to 200 for short URL generation.

After app server 2, app server 3 goes to the datastore and reads and increments the counter’s value by 100 in a transaction. This time app server 3 will read the value of 201 and increment the counter value in the datastore to 301. Thus, it can safely assume that it can use values from 201 to 300 for short URL generation.

Now, when one of the above servers exhausts all his counter values, then it will go to the datastore again and try to read and increment the counter value in the datastore in a database transaction. E.g., if now app server 2 goes to the datastore, it will read the value of 301 and increment the counter to 401 in the datastore. It can then safely assume that it can use counter values from 301 to 400 to generate short URLs.

The important thing here is that each app server will try to read and increment the value in a database transaction. This is important for the case when two app servers try to read and increment the value of the counter at the same time. In this case, one app server will succeed in committing the transaction and can use the read counter value to generate short URLs. However, the other app server will get ‘commit conflicts’ while committing the transaction, and it needs to retry again to read and increment the counter in the database. It seems like reading and incrementing the counter value in the data store is a bottleneck, but this is not the case. Since we are incrementing the counter value by 100, each app server only goes to datastore after generating 100 short URLs locally. Instead of a key-value, we can also use a database construct called “Sequence” or “Auto-increment Counter” found in different databases like Oracle DB, MongoDB, Cassandra, etc.

There is one issue in this design, though. There is a possibility of losing up to 100 short URLs if the app server somehow crashes after reading the next counter value. However, this is tolerable as we can generate up to 68.7 billion short URLs. The other mechanisms, like using a zookeeper or a key generation service, also have the same flaws.

Now, this counter value (which is a decimal integer) can be easily converted to a base64 number as follows (considering we have the following mapping of decimal integer to base64 number):

Decimal integer 1 can be converted to 1 in base64 encoded number and precisely to 000001 if we want to have at least 6 digits of base64 encoded number. Similarly, decimal integer 10 can be converted to 00000a in the base64 number.

There are two types of databases that we could use for our datastore, either a relational database or a NoSQL database. However, as we have discussed before that the non-functional requirements dictate that the datastore needs to be highly available, scalable, performant, and durable. Also, the database schema required for our service does not have any relations requiring relational databases. Thus, a NoSQL database seems to be the best choice for the datastore. The relational databases are usually not that highly available and performant and only support vertical scaling. The following will be two tables (sometimes called buckets in several databases) that we need.

Since we are now using a NoSQL database, we have two choices. Either we use a simple key-value datastore or use an advanced key-value datastore like document datastore .

Let us know which NoSQL datastore will you choose and why?

Also, check that the Users table has an optional column for URLCount, which is the count of short URLs created by the user. If we do not need to maintain this count, then creating a short URL only requires generating a short URL and then storing the mapping between the short and the big URL in the URL_Mapping table. Since we are already ensuring that the short URL generated by an app server is unique, we do not even need to use a database transaction to store the mapping in this bucket. However, if we also need to increment the count of short URLs created by the user, then we need to update both the tables, and then we require to do that in a database transaction as follows:

Generate short URL

Open datastore transaction

Write to URL_Mapping table

Read user information from Users table using userId

Increment the URLCount and update user information in the Users table

Commit transaction

The database transaction will enable us to update both the tables atomically in the sense that either write to both the tables will succeed, or both will fail, thus not leave the datastore in an inconsistent state. These design decisions will affect the datastore that we will choose to store the URL mapping and user information. If a NoSQL database does not provide a transaction guarantee to update the records in both the tables then we need to either choose a relational database or find some other mechanism to increment the URLCount. One approach is to add the operation in a persistent queue and then process the messages in the queue to increment the URLCount. However, in this case, the URLCount update operation has an eventual consistency.

This is how the write operation will happen in our service. Now let us discuss how the read operation will happen in our service, i.e., how the service returns a big URL given a short URL. When an app server receives the read request from the user, it will first check the in-memory cache with the short URL key (either local or global cache). If it is present in the cache, it will check whether the short URL is expired or not. If not expired, then it will return the big URL corresponding to the short URL from the mapping entry in the cache. If the mapping is not present in the cache, then the app server will go to the data store and read the mapping from there, store it in the cache and return the big URL to the caller/user.

How will we scale the datastore?

We scale the datastore by partitioning it. The URL_Mapping table can be partitioned by the ShortURL as the partition key. Now there are two approaches to partition the table.

  • Hash-based partitioning — we partition the database by the hash of the partition key.
  • Range-based partitioning — we store short URLs in the partitions by certain ranges. E.g., we can decide to store short URLs from “000001” to “00ZZZZ” in one partition, “010000” to “01ZZZZ” in another partition, and so on.

If we look closely at both the above partitioning scheme, then we can realize that for our purpose, the hash-based partitioning scheme is better because the generated short URLs will be uniformly distributed across all the partitions. If we use a range-based partitioning scheme, then due to the way we are generating the short URL via a counter value, all the writes will always go to a single database partition, thus resulting in one partition as always write-heavy. Most of the time, the short URL is accessed/read more frequently just after it is created. So, most of the read requests will go to the same partition as well. Thus, a range-based partitioning scheme is not suitable as it will result in all writes and most reads to go to a single partition, making that partition a hot spot. On the other hand, using a hash-based partitioning scheme will result in short URLs uniformly distributed across all partitions. This will result in reads/writes to be distributed uniformly across all the partitions.

Similarly, the User table can be partitioned by the hash of the UserId.

Let us know how do we purge the database?

Global Cache vs. Local Cache

The trade-offs between global and local cache and the design of both is discussed in more detail in the course. Here, we would like to discuss one scenario where we will prefer a global cache over a local cache. Consider the case where you start receiving a large number of requests for some short URL that has not yet been created. It could be because of a malicious user trying to perform a denial of service (DOS) attack. So, you will not find that URL in your in-memory cache, and then you will go to datastore for every request for such a short URL. This will cause unnecessary stress on your datastore. To avoid this stress, you could store a sentinel value in the in-memory cache for that short URL, informing that this short URL does not yet have any big URL mapping. And so instead of going to the datastore, you can return “not found” after checking your in-memory cache if you see that sentinel value. However, this now requires you to remove this sentinel value from the cache if that short URL is used in creating a URL mapping. This removal is only possible in the case of a global cache where the create request could come to any server, and that server can invalidate (or remove) the sentinel value from the global cache. If we would have used a local in-memory cache, then we cannot use the sentinel value in the local in-memory cache. The reason being if the same short URL is now used to create a short URL to big URL mapping, then we need to invalidate this short URL entry in the local in-memory cache. Since the local in-memory cache is local to each app server, the app server that receives the write request cannot go and invalidate the in-memory local cache of all the other app servers. This will cause a consistency problem in our service if the read request now goes to an app server, which has this sentinel value stored in its local in-memory cache, suggesting that the short URL mapping is not present. This app server will return “big URL not found”, which would be an incorrect response. We have another choice to use both the local and global cache but store the sentinel values only in the global cache.

❔ We designed the TinyURL service based on the requirements that we discussed in the beginning. If we change those requirements, then it will affect the design of the service as well. Now, there is a question for you. If we allow anonymous users to create short URLs, then how will the design change? Let us know what you think in the discussion section.

What do you think about how spam and malicious website links are handled?

Originally published at

If you are looking for a great resource to not just improve your distributed system design skills but also to ace your distributed system design interviews and targeting level L5/L6 in companies like Facebook, Google, etc. then you should check “The Distributed System Design Interviews Bible”.

The course also includes a section on mock system design interviews. This is a great resource because it will provide you the experience of an actual distributed system design interview.

The following article discusses the microservices architecture design of Twitter service: