March 28th, 2021 JSON on the Brain By Jeffrey M. Barber

Since I am tweaking the API, I am going to write some pre-docs here on my passion for JSON. I’m only using JSON because I’m lazy and don’t want to write yet another binary encoder for Java, JavaScript, and eventually Rust. I may at some point, but I’m kind of OK with JSON for now. The key thing I like about JSON is that you can treat it algebraically.

For a brief primer, check out RFC7396 as merge is an exceptionally powerful idea. At core, merge enables the set of all objects expressible via JSON to be an almost algebraic group. If you interpret null as nothing equivalently, then you have a group.

As a math guy, these group properties tend to signal a general substrate for useful stuff. Unfortunately, RFC7396 is not efficient when it comes to arrays. In this document, I’ll introduce how Adama leverages RFC7396 in two contexts. The first context is the delta log persisted on disk or within a database, and the second context is the change log for a connected client. The second context requires a non-trivial extension.

Adama and the Delta Log

The key deficit of RFC7396 is that arrays don’t work well. Arrays can have elements inserted, removed, re-arranged and the merge just treats the array like a big value. Given this, Adama doesn’t use arrays within the delta log. The entire data model of Adama is a giant object with just objects and values. The way Adama represents collections is via either tables or maps, and this works well with the language integrated query and indexing. The efficiency gains of an array are manifested during the runtime, and we tolerate the representation overhead on the wire.

The motivation for is to enable efficient replication of potentially large documents, and the value proposition of Adama is to translate domain messages from people into deltas. This means that Adama, as a document store, achieves replication efficiency matched by traditional database solutions. A document in Adama behaves much like a reasonably sized database which can fit within memory. Note: the log is a powerful concept.

Putting efficiency aside, this also enables a powerful feature for board games (and any application). First, the game’s state can be rewound into the past at any point. Simply trim the head of the log, and rebuild the state. For usability, games powered by Adama should allow people to make mistakes and roll back time; the only downside of this feature is enabling crafty players to know what the future holds.

Applications can leverage this as well because it provides a universal undo which is also collaborative. While the rewind operation is required for board games, a collaborative undo requires the contributions of multiple people to be considered. For instance, consider the log:

seq who redo undo
1 Alice {"name":"Phobia","balance":42} {"name":null,"balance":null}
2 Carol {"name":"Fear"} {"name":"Phobia"}
3 Bob {"balance":100} {"balance":42}

If Alice wishes to undo her contribution, then we can take her undo and then run it forward. In this case, there is nothing to undo as Carol and Bob have made contributions which conflict. On the other hand, Carol and Bob are able to undo their messages. Fortunately, this algorithm is fairly easy. We simply match the fields within the undo to the fields within the future redo objects, and then trim down the undo operation that we will apply to the log.

undo(at) redo(at + k) behavior
object object recurse
* not defined preserve
* * remove

Sadly, this may not be sound within all application domains. However, this is worth researching, and it may work in enough domains to not be a big problem. The plan here is to simply describe the “Undo” API and provide guidance on how to use it.

Since good can sometimes be the enemy of the perfect, an item on the roadmap is to enable message handlers to emit an undo message which can be played at a future date. This will be nicer for some scenarios, but it will not be collaborative in spirit. For instance, if Alice creates an object and Bob comes along to manipulate it, then should Alice’s undo remove the object?

This is clearly a complex subject, but it’s fun to play with super powers! Things that I need to read and digest:

Thankfully, I can launch with rewind which is within my domain…

Adama and Clients

While the philosophy of having everything being a map or value works well enough on the server, clients require the order that arrays provide. This is where we extend RFC7396.

Fundamentally, we will divide arrays into two classes. The first class is an array of values like [1,2,3] which will isn’t interesting, so we lean on RFC7396 for merging that array like a giant value. The second class is an array of objects with an unique integer id field, and it’s worth noting that is the pareto-major for Adama. Arrays of objects without an integer id field will be treated like a value, and these map into the first class.

We will transform the array of objects into an object with a special field. The conversion process takes every element in the array, and creates a mapping between the element’s id and the element. The special field with key “@o” is then the order of the elements by id. Immediately, this has the benefit that using RFC7396 will then work on data within the new object.

For example, the given array:

[
  {id:42, name:"Jeff"},
  {id:50, name:"Jake"},
  {id:66, name:"Bob"}
]

will convert into

{
  42: {name:"Jeff"},
  50: {name:"Jake"},
  66: {name:"Bob"},
  '@o': [42, 50, 66];
]

The client can now reconstruct the original array, and this enables a delta to change “Jeff” to “Jeffrey” be represented as:

{
  42: {name:"Jeffrey"},
  '@o': [42, 50, 66];
]

This allows RFC7396 to just work as elements can be inserted, rearranged, removed with the overhead being concentrated within the ‘@o’ field. Now, this can be optimized as well if we allow ourselves to do a bit of transcoding on the server. The server can monitor the ordering of the array before and after execution. Take the given array:

[1, 5, 6, 7, 10, 24, 50, 51, 52, 53]

as the before execution ordering, and then

[1, 5, 6, 7, 10, 100, 24, 50, 51, 52, 53]

as the after execution ordering. Sending this large array is an exceptional waste, and here we will hack JSON and have the ‘@o’ array be heterogeneous with two types of elements. The first is the id as a value, and the second is an array representing a range. The key idea being that subsequences from the before execution ordering can be reused. The above change would then be represented as:

[[0, 4], 100, [5,9]]

which saves tremendous space for small data changes. The values within the sub-arrays are pairs of inclusive beginning and end index values. This extension requires tweaks to RFC7396 to detect objects containing the ‘@o’.

Thoughts for the API

Rebuilding the data is one thing, the other interesting potential of having clients receiving deltas is that they can update their application proportionally. This is represented in the poorly implemented current client, and the weak point is the emitting events of tree changes. It works well enough for my experiments, but I have a way to go.

As I build confidence that the code is working via all my hacking, my aspiration is to document this differential tree event firing in great detail and test it fully. Fun times!