AutoInc: Distributed unique Id generator

Reflections on the design of AutoInc: a distributed Id generator.

AutoInc

I started AutoInc early in 2020 as an attempt to replace SnowMaker which wasn't being maintained, wasn't accepting PRs, and was used in production. SnowMaker is a distributed unique Id generator. Its purpose is to generate sequence numbers for business applications (example: invoice number, product number, registration number) and those sequence numbers are guaranteed to be unique. It does this by leveraging Azure Blob Storage and using some smart techniques.

When I started on AutoInc as a replacement for SnowMaker, I wanted to support multiple databases, multiple cloud providers and create a provider abstraction so that applications which require unique Ids could easily swap providers without re-engineering. Looking back at one of the earliest commits, I started with an interface.

public interface IValueStore
{
    Task<long> GetValue(string scopeName);
    Task<bool> TryWriteValue(string scopeName, long value);
}

This seemed reasonable. IValueStore is the provider store's interface. A unique Id generator should be able to support multiple sequence number ranges, so we define each sequence number by its scopeName. As well as retrieving the current value, we need to write its value. A write is not guaranteed because the store might be locked (being updated by another system) so we TryWriteValue which returns true if write is successful.

I added a basic FileStore provider for testing purposes. The FileStore simply generates a text file for each sequence number. Then I started on the AzureStore provider. The idea was to generate something similar (and compatible) to SnowMaker so that we could replace it with AutoInc in our system and off we go.

But something didn't seem right with me. We're using SnowMaker to generate sequence numbers which will be stored in a Neo4j database. We designed (or should I say Tatham Oddie designed) SnowMaker for us (he was working contract at Barnardos Australia at the time) because Neo4j didn't (and still doesn't) support generating sequence numbers out-of-the-box. We need sequence numbers. We generate Ids for each person who is recorded in the system. In fact we generate quite a lot of sequence numbers for all sorts of business data. And we started using Azure Cloud Services, and well it all sort of worked, so we went with it.

But while working on SnowMaker's successor, I started thinking again: why should I generate my sequence numbers in the cloud? I don't need them there. I only need them in the database. And I only have one database. And if I had multiple databases, they would all be replicas, so I still only need to generate and store sequence numbers in one place: the database. So I had another look again at what we could do in Neo4j. It doesn't have a sequence number generator, but it IS an ACID compliant database. So I should be able to create my own sequence number using cypher, and the database (through ACID transactions) should ensure that while generating the sequence number, that if an other operation was trying to access or update the sequence number, then locking mechanisms should prevent this, to avoid incorrect updates, or duplicates. All standard stuff you would expect with a modern database.

Generating sequence numbers in cypher

Step #1: Using cypher, we set the initial value. We MERGE (create if it doesn't exist) the node with a label UniqueId that has a Scope (name) of myscope and set a Value of 1.

MERGE (n:UniqueId {Scope: 'myscope'})
SET n.Value = 1

Step #2: Increment its value. We MATCH (fetch) the UniqueId node that has the Scope of myscope and increment its Value by 1.

MATCH (n:UniqueId {Scope: 'myscope'})
SET n.Value = n.Value + 1

Step #3: Return the sequence number. We MATCH (fetch) the UniqueId node that has the Scope of myscope and return its Value.

MATCH (n:UniqueId {Scope: 'myscope'})
RETURN n.Value

All good so far, but we're separating the creation, incrementing and return steps. We'd want to combine them all. We just need to fetch the next sequence number irrespective of whether it already exists or not.

The solution

MERGE (n:UniqueId {Scope: 'myscope'})
SET n.Value = COALESCE(n.Value, 0) + 1
RETURN n.Value

And here we have it. We MERGE (create if not exists) the node and then increment using SET and COALESCE which will ensure that if the property Value doesn't exist then it will be set to 0, otherwise the existing Value is used, and then incremented by 1 before being returned.

And that's all there is to it. We can now generate as many sequence numbers we need using different scopes, and they will be managed through ACID transactions.

So where does that leave AutoInc?

AutoInc is dead. Long live AutoInc.Neo4j

Now that we have the cypher to create sequence numbers on demand, we just need a bit of syntactic sugar so that the sequence numbers can be expressed in code.

var session = Driver.AsyncSession();

await session.WriteTransactionAsync(async tx =>
{
    var id = await tx.NextUniqueIdAsync(Id.InvoiceNumber);
    
    var parameters = new { id };
    var query = "CREATE (invoice:Invoice {Id: $id})";
    await tx.RunAsync(query, parameters);
});

await session.CloseAsync();

Most of this code is the work of Neo4j.Driver. What we've added is the NextUniqueIdAsync extension method which takes a transaction and a scope name and runs the above query to generate and return the next sequence number. In this case, the Invoice number. Then within the same transaction, we run our business query: generate an Invoice. With this code, each invoice should obtain a unique Id.

Reflection

So what started as an ambitious project to create a multi-cloud, multi-database, distributed solution ended up being a 3-line cypher query and a couple of extensions methods.

But it works, and that's the main thing. No need to over-engineer for the sake of writing code.