Over time, things change in a database:
- • The query throughput increases, so you want to add more CPUs to handle the load.
- • The dataset size increases, so you want to add more disks and RAM to store it.
- • A machine fails, and other machines need to take over the failed machine’s responsibilities.
All of these changes call for data and requests to be moved from one node to another. The process of moving load from one node in the cluster to another is called rebalancing.
No matter which partitioning scheme is used, rebalancing is usually expected to meet some minimum requirements:
- • After rebalancing, the load (data storage, read and write requests) should be shared fairly between the nodes in the cluster.
- • While rebalancing is happening, the database should continue accepting reads and writes.
- • No more data than necessary should be moved between nodes, to make rebalancing fast and to minimize the network and disk I/O load.
Strategies for Rebalancing
There are a few different ways of assigning partitions to nodes . Let’s briefly discuss each in turn.