Distributed System Partitioning

Partitioning is the action of breaking up large datasets into multiple partitions. There are several names for this:

  1. Shard - MongoDB, Elasticsearch, SolrCloud
  2. Region - HBase
  3. Tablet - Bigtable
  4. Vnode - Cassandra, Riak

Partitioning are often combined with replication, so that copies of each partitions are stored on multiple nodes. The choice of partitioning scheme are usually independent of the replication scheme.

Terminologies

Key Value Data Partitioning

Assumptions:

Partitioning by Key Range

Notes:

Concerns:

Partitioning by Hash of Key

Concerns:

Skew and Relieving Hot Spots

Secondary Indexes

Partitioning Secondary Index by Document

Each partition is completely separate, each partition maintains its own secondary indexes. Document partitioned secondary index is also called local index for that reason.

However, to search by an index, the query needs to be sent to all partitions, and combine the results received (scatter gather). This approach is prone to tail latency amplification.

Partitioning Secondary Index by Term

Also called a global index, it stores all document references for a term in the same partition. The global index can be partitioned differently from the primary key index.

The term we are looking for determines which partition of the index we find.

Benefits:

  1. Possibility to perform range scans (if index is partitioned by range)
  2. Reads are more efficient

Drawbacks:

  1. Writes are slower and more complicated, writing a document can result in writes to multiple partitions / nodes

Updates to secondary index partitioned this way are often performed asynchronously, because synchronous updates will require distributed transactions across multiple nodes and partitions. The changes to index are usually not immediate.

Rebalancing Partitions

Some minimum requirements for rebalancing:

Hash Mod N

This is a bad idea, because when the number of nodes changes, more than necessary partitions are moved because of the partition node modulo.

Fixed Number of Partitions

Dynamic Partitioning

Benefits:

Caveats:

Partitioning Proportionally to Nodes

Automatic vs Manual Rebalancing

Request Routing

An instance of a more general problem of service discovery – partition discovery.

Approaches:

  1. Allow clients to contact any node, handle if the partition happens to be in the node, otherwise forwards to the appropriate node. This requires partition assignment metadata to be present in all the nodes.
  2. Use a routing tier, which all clients connect to. It does not handle the request, but forwards all the request accordingly (partition aware load balancer). This requires partition assignment metadata to be present in the routing tier.
  3. Use partition aware client. This requires partition assignment metadata to be present in the client.

How to distribute partition assignment metadata?

Parallel Query Execution

Query optimisers (spanner, Trino) can break down complex query into execution stages and partitions, that can be executed in parallel.

References