You are viewing...

RocksDB Merge Operations

Updated on July 28, 2022 at the 02th hour
Posted under:

DISCLAIMER: Expressed views on this blog are my own.

I've been an in memory database fan for a long time. They are easy to work with. Hell, SQLlite is used in browsers and mobile apps because relational DBMS are powerful. RocksDB is a powerful key/value store and is used on the server side and embedded devices. Sometimes, I want to merge data just like how SQL updates work where you update part of a row or redis where you update a data structure such as list/set.

Merging data is not uncommon. I store treemaps in rocksdb. I'd like to add and remove integers from the treemaps in response to user action. Well, turns out I use Rust rocksdb which is just bindings to the C++ library. You can read about Merge Operators for the C++ library to learn more.

With rust's rocksdb, you have a choice between associative merge operator and generic merge operator.

Associative merge operation is for simple data structures where you can do operations such as addition, subtraction, concatenation. This merge operation is a specific implementation of the generic merge operator. Essentially, the data structure has implied operations.

The Generic Merge Operator expects you to understand Full Merge and Partial Merge. When you call rocksdb.merge(serialized operand), the data structure you input must have a way to reach an output at the end of a full merge. A full merge will contain an existing value for you to apply the merges to and give back the new value. A partial merge on the other hand must combine multiple merge(serialized operand) into one merged operand or else fail it so a full merge takes care of applying the operand.

Let's say you call merge("foobar", UNION([1])), merge("foobar", UNION([2])), merge("foobar", DIFFERENCE([1])) then you'd expect get("foobar") to return [2]. If we applied the ideas of partial and full merge. You'd expect a partial merge to generate a stack like this (UNION[1, 2], DIFFERENCE(1]) then a full merge input to be (UNION[1, 2], DIFFERENCE(1]) and produce the final output of [2]. A full merge can also do (UNION[1], UNION[2], DIFFERENCE(1]) to produce the same value.

With a partial merge, whether it can always return a new operand is dependent on the data structure operations. You could have an operand data structure that smashes unions and differences into buckets to be applied at the full merge. You'd probably be creating a stack of combined operations if the order of operations matter otherwise a hashmap of list<Operand>.

If you look at rocksdb set generic merge operator, it has callbacks for full merge and partial merge. Full merge follows the C++ documentation for FullMergeV2, but partial merge follows PartialMergeMulti which isn't shown as an example. Partial Merge will contain a list of operands and no existing value. You must smash down the list of operands into one operand or else return None to fail the partial merge. It's inefficient, in my opinion, since if you received a list of union, union, difference, union then you couldn't smash it down to union, difference, union.

You just read "RocksDB Merge Operations". Please share if you liked it!
You can read more recent posts here.