Snowflake algorithm

In this article, we will explore the Snowflake algorithm implementation in details. I will be touching every little detail as much as I can because when I was trying to understand and use this algorithm, I couldn’t find a detail explanation like the one that is provided in this article.

In today's data-driven world, generating unique identifiers is crucial for many systems, from databases to distributed applications. Unique IDs are essential for ensuring that each entity in a system can be uniquely identified and accessed, which helps maintain data integrity and consistency. However, as systems scale and the need for distributed data management grows, generating these unique IDs efficiently and without conflicts becomes a complex challenge.

This challenge led to the development of the Snowflake algorithm, a powerful solution designed to address this challenge. Developed by Twitter, the Snowflake algorithm provides a scalable and distributed approach to generating unique IDs. It combines time-based components with machine-specific identifiers to produce unique 64-bit integers that are sortable by creation time. This ensures that IDs are both unique and ordered, making it an ideal choice for systems that require high-throughput, distributed ID generation.

This article explains a practical application of this algorithm in a URL shortening service. You can check out the live project here. Additionally, you can view the code here.

With all the components needed to generate a single ID, some may have opinions on simpler ways to achieve this. For this reason, let me take some time to debunk the most popular and naive assumptions, which I, too, have been guilty of holding. 😜:

Using some sort of calculation, for example
```
  const min = 1000000
  const max = 9999999999

  const id = Math.floor(Math.random()*max) + min;
```
When we try to design a distributed, scalable system, the approach above will fall short in several ways. First, the code above does not generate unique IDs but random numbers, so sooner or later, there will be collisions where the same ID is generated more than once.

Secondly, unlike Snowflake, which generates time-ordered IDs, these random IDs are not ordered, making them unsuitable for sequencing operations. Finally, in a distributed system, generating random numbers on multiple nodes increases the likelihood of collisions, especially with a large number of requests. So imagine what would happen if you had the exact same code on different nodes generating the IDs.
Using the auto-incrementing integer feature of a database - first, auto-increment IDs are usually generated by a single database node, creating a bottleneck as the system scales, especially in distributed environments. Having different nodes of the database auto-generate their IDs will definitely lead to collisions. Then one might suggest, 'just make sure only one database generates the IDs.' Well, this would mean that we have a single point of failure. Additionally, we still face the problems of a lack of uniqueness and the generation of non-time-ordered IDs.
Using UUIDs (Universally Unique Identifiers) presents a significant issue: the generation of non-time-ordered IDs. Another less obvious concern is the size of a UUID. A typical UUID, like 67DB31FB-1887-4372-A8B0-E87C092D7D11, is 128 bits. For a service like tinyurl.com, which shortens around 100 million URLs per month (according to estimates), this means a substantial amount of storage is needed just for IDs. Additionally, indexing these large IDs in a database can be challenging.

Now that we have established the reason for using Snowflake, let's explain what the Snowflake algorithm is. The Snowflake algorithm is a distributed unique ID generator developed by Twitter. It creates 64-bit unique identifiers by combining a timestamp, data center ID (node ID), machine ID (worker ID), and a sequence number. The IDs are time-ordered, which is useful for sequencing events, and are generated without a central coordination point, allowing for high scalability. The idea of combining all these components to generate one ID guarantees uniqueness across the network. Even if two machines in the same network generate IDs at the exact millisecond, these IDs will be different due to the different machine IDs.

To effectively explain this algorithm, I will use some code snippets a diagram to aid the explanation.

Notes

Please note that most of the operations in this algorithm are bitwise operations, so for that reasons, I will love you to take note of the following

A single digit in a binary number is called a bit.
A long is a 64-bit data type. And if you imagine a long to be a box. Then it has 64 slots and each slot can have a value of 1 or 0 that is, it is binary format. This imply that even when you have small number say 1, all other 63 slots will have to contain zeros. Thus 1 is represented as
```
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
```
The first binary digit that represents whether a number is positive or negative is called the sign bit or most significant bit (MSB). 0 means positive while 1 means negative. Just an early notice, all Snowflake IDs will start with 0 since they are always positive numbers when converted to base 10.
Given a positive byte (an 8-bit data type) say 00000001, the negative can be obtained as shown below using the two's complement method. Here’s how it works:
1. Invert all the bits (also known as taking the one's complement):
  - 00000001 becomes 11111110.
2. Add 1 to the inverted bits:
  - 11111110 + 1 = 11111111.

So, the negative of 00000001 in an 8-bit byte is 11111111, which represents -1 in decimal (base 10)

In a shifting operation, such as 27 << 2, the number on the left is treated as a binary value, 00011011, and the number on the right represents the number of positions to shift. So, 27 << 2 means shifting the binary representation 00011011 two positions to the left. And this will results in 01101100 (108). The operation essentially adds two zeroes to the right of the binary representation.

Similarly, In a shifting operation, such as 27 >> 2, the number on the left is treated as a binary value, 00011011, and the number on the right represents the number of positions to shift. So, 27 >> 2 means shifting the binary representation 00011011 two positions to the right. This results in 00000110 (6). The operation essentially removes the two rightmost bits from the binary representation and adds two zeroes to the left.
Given the bitwise OR operation 1100 | 0011 , we will perform the following the operations on each bit
- 1 OR 0 = 1
- 1 OR 0 = 1
- 0 OR 1 = 1
- 0 OR 1 = 1

This results in 1111. From a human perspective, the bitwise OR operation effectively merges the two numbers, where 0s are treated as gaps that can be filled with 1s from the other number.

Concept of wrapping around zero - The idea here is that if you have a binary number with a fixed number of bits, increasing this number by 1 after it has reached its maximum value will result in zero. For example, consider a 4-bit binary number. The largest number you can represent with 4 bits is 1111, which is 15 in decimal. If you increment 1111 by 1, you get 10000 in binary. However, since only 4 bits are available, the leftmost bit (the 5th bit) is discarded, leaving you with 0000 in binary, which is 0 in decimal.

Now, let’s examine each section of the algorithm. I will be selecting code snippets from the full implementation and explaining them until everything is covered. I also recommend that you open the full code implementation in another window to better understand how each snippet fits into the overall code.

Epoch

 private final long epoch = 1,609,459,200,000L;

The main idea behind incorporating the current time into the ID generation process is to ensure that each ID is unique and sequential. To achieve this, we retrieve the current time in milliseconds from the machine generating the ID. However, the time value we obtain represents the total number of milliseconds that have elapsed since 00:00:00 UTC on January 1, 1970, which is known as the Unix epoch. This time reference is commonly used across systems to maintain consistency in timekeeping.

One challenge with using the Unix epoch is that the resulting time value is very large. For example, at the time of writing this article, the number of milliseconds since the Unix epoch is approximately 1,724,623,452,621. This large number might not fit into the space allocated for the timestamp component in the final ID generated by the Snowflake algorithm.

To address this issue, I’ve chosen to use a custom epoch starting on January 1, 2021. This custom epoch shifts the reference point forward, resulting in a smaller timestamp value. By doing this, I can ensure that the generated ID fits within the designated space and remains efficient.

Setting bit size

    private final long datacenterIdBits = 5L;
    private final long workerIdBits = 5L;
    private final long sequenceBits = 12L;

When you look at the diagram I provided earlier, you see that in the full 64 bits or slots that make up a Snowflake ID, the 12 bits on the far right are designated for the sequence number. The next 5 bits are allocated for the nodeId or workerId (representing the machine), and the following 5 bits are used for the datacenterId.

Thus, when we say long datacenterIdBits = 5L;, we're not directly saying that the datacenter ID will be represented as 5 in binary (which is 101). Instead, we're saying that 5 bits out of the total 64 bits that make up a Snowflake ID are allocated to represent the datacenter ID.

With 5 bits, we can represent 2^5=32 different values, which range from 00000 (0 in decimal) to 11111 (31 in decimal). This allows the system to support up to 32 different data centers, each with a unique identifier. Also, with 5 bits for the worker id, we can have up to 32 unique worker or machines. Then for the sequence, the sequenceBits allocates 12 bits for the sequence, allowing it to represent 2^12 = 4096 unique IDs that can be generated per worker, per millisecond. The sequence is a counter that is incremented whenever more than one ID is generated within the same millisecond. Once the sequence reaches its maximum value (4095), the generator will wait until the next millisecond to continue generating IDs.

Determining maximum values

In the previous section, we where able to quickly identity that the maximum value for the workerId and the datacenterId are 32 while that of the sequence is 4096 by just inserting 1 into their bit size. This was possible for humans but for a computer we have to tell it programmatically.

    private final long maxWorkerId = ~(-1L << workerIdBits);
    private final long maxDatacenterId = ~(-1L << datacenterIdBits);
    private final long sequenceMask = ~(-1L << sequenceBits);

Now to explain what is happening here, you will have to recall some things that were mentioned in the notes section. Also, since all the operations are basically the same, I will be explain only one of them.

1L → is as shown below since a long is 64-bit number.

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001

-1L → Then we use minus sign on it. Check the 4th point in the note section to understand what is happening here.

11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111

-1L << workerIdBits → Then starting from the right to left, we move 5 (where workerIdBits = 5L) and in each move, we change what ever was there to 0s. Check the 5th point in the notes section.

11111111 11111111 11111111 11111111 11111111 11111111 11111111 11100000

Then we apply NOT (~) operator which then changes all 1s to 0s and 0s to 1s.

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011111

Specifying shift positions

To generate the final 64-bit long ID, we need to position the different components (e.g., workerId) at specific locations within the 64-bit structure. Each component we've prepared so far resides at the far right of its respective value, padded with leading zeros to fit into a 64-bit number. To correctly position these components according to the Snowflake ID design, we need to shift them to the left. This ensures that the worker ID, for example, is shifted left until its last bit aligns with (is just front of) the first bit of the 12-bit sequence ID, placing it in the correct range within the 64-bit ID.

    private final long workerIdShift = sequenceBits;
    private final long datacenterIdShift = sequenceBits + workerIdBits;
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

The code above defines the bit shifts required to correctly position each component within the 64-bit Snowflake ID, starting from the right. As depicted in the picture above, the first 12 bits are reserved for the sequence number (sequenceBits), so no shifting is needed for this component.

The workerIdShift is positioned immediately after the sequenceBits, so we will be shifting the workerId by the number of bits equal to sequenceBits.
The datacenterIdShift comes next, after the workerId. To correctly position it, we will be shifting the datacenterId by the sum of sequenceBits + workerIdBits.
Finally, the timestampLeftShift positions the timestamp component. Since it comes after all the previous components, we will be shifting the timestamp by the sum of sequenceBits + workerIdBits + datacenterIdBits.

Other variables

    private long lastTimestamp = -1L;
    private long sequence = 0L;
    private final long workerId;
    private final long datacenterId;

The lastTimestamp is initialized to 1L as a placeholder to indicate that no IDs have been generated yet. During the first ID generation, any valid current timestamp will naturally be greater than 1L, ensuring the logic proceeds correctly.
The sequence variable is initialized to 0L, representing the starting point of the sequence for the first millisecond. Since no IDs have been generated yet, the sequence is set to zero.
The workerId and datacenterId are declared as final and will be assigned values when the SnowflakeIdGenerator instance is created. These values are provided by the client code, representing the specific worker and datacenter in a distributed system.

The constructor

    public SnowflakeIdGenerator(long workerId, long datacenterId) {
        //Ensures that the machine generating the id is valid machine on the network
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }

        //Ensures that the data center hosting the machine is a valid one in the network
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

Here, the constructor receives workerId and datacenterId and checks if they exceed the maximum expected number of workers or data centers in the network. It also verifies that they are not smaller than the minimum acceptable values.

The `tilNextMillis` method

The idea here is that in a system receiving many requests to create IDs, there will eventually be multiple requests to a particular machine to generate an ID at the exact same millisecond. If this happens, any ID generated within this millisecond will be numbered from 1 to the maximum allowed number according to the algorithm's design. However, if the number of requests exceeds the maximum number of IDs that can be generated within a millisecond, we need a way to pause and wait until the next millisecond before generating a new ID.

    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

The method first retrieves the current time using the timeGen() method.
It then enters a while loop that continuously checks if the current timestamp is less than or equal to lastTimestamp.
If the current timestamp is indeed less than or equal to lastTimestamp, the method keeps calling timeGen() to get a new timestamp, effectively waiting for the next millisecond.
Once the system clock advances and the new timestamp is greater than lastTimestamp, the loop exits, and the method returns the updated timestamp.

The `nextId` method

    public synchronized long nextId() {

        // Gets the current time in the machine that the id is to be generated
        long timestamp = timeGen();
        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards.");
        }

        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0L;
        }

        lastTimestamp = timestamp;

        return ((timestamp - epoch) << timestampLeftShift) |
                (datacenterId << datacenterIdShift) |
                (workerId << workerIdShift) |
                sequence;
    }

Now let’s break it down.

The first thing here is checking the system time.

        if (timestamp < lastTimestamp) {
            throw new RuntimeException("Clock moved backwards.");
        }

Here the algorithm needs to guarantee that all IDs are unique and monotonically increasing based on the time they were generated. If the system clock were to shift backward after generating an ID, there would be a risk of generating an ID with a timestamp that is earlier than the last generated ID, potentially leading to duplicate IDs or out-of-order IDs. To prevent this, the algorithm checks that the current timestamp is greater than the timestamp of the last generated ID. If the current time is found to be earlier, it indicates a clock rollback, and the system raises an error to avoid producing invalid IDs.

Next we have

      if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0L;
        }

        lastTimestamp = timestamp;

The condition if (lastTimestamp == timestamp) checks if the current timestamp (timestamp) is the same as the lastTimestamp (the timestamp when the last ID was generated). This situation arises when multiple ID generation requests are received within the same millisecond. To ensure that each ID remains unique, the sequence number is incremented with sequence = (sequence + 1) & sequenceMask;. The & sequenceMask operation ensures that the sequence number stays within the allowed range by masking the higher bits, effectively keeping the sequence within the desired bit length (e.g., 12 bits).

In some cases, IDs may be generated up to the maximum allowed within a single millisecond. When this happens, the sequence number wraps around to 0 (as explained in point 7 in the notes section). This necessitates the check if (sequence == 0), which forces the generator to wait for the next millisecond to maintain ID uniqueness.

If there is no timestamp issue (i.e., the current timestamp differs from lastTimestamp), the sequence number is reset to 0 to start a new sequence for the new millisecond, and lastTimestamp is updated to the current time.

Finally we have code below where all the components are fixed in their respective positions in the 64-bit ID to get a valid ID

        return ((timestamp - epoch) << timestampLeftShift) |
                (datacenterId << datacenterIdShift) |
                (workerId << workerIdShift) |
                sequence;

Positioning Components:
- Each component (timestamp, datacenterId, workerId, and sequence) is shifted left by a specific number of bits to place its value in the correct position within the 64-bit ID.
- The << operator is used to perform this left shift. By shifting, we move the bits of the component to the left, filling the vacated positions with 0s.
Combining Components:
- The bitwise OR (|) operator is then used to combine these shifted values into a single 64-bit number. The OR operation effectively overlays the components, ensuring that each occupies its designated space within the final ID.
- Because of the left shifts, there are no overlapping bits between the components. The OR operation simply combines the non-overlapping bits, resulting in the final ID.
Result:
- The final result is a 64-bit unique ID where each component (timestamp, datacenterId, workerId, and sequence) occupies its designated position within the ID, with the remaining positions filled with 0s.

This process ensures that each ID generated is unique based on the combination of its components, with each component properly placed within the 64-bit structure.

If you found this article helpful, you can follow me through the links below:

https://x.com/unyimeudoh22