Performance Funday

A non-productive theme for July is to jump into performance and measure a few things. The hope is that this search will produce some low-hanging fruit that I can exploit, and I also intend to validate correctness on many things since there are some slap dashing stupid stuff. A core reason for the urgency beyond being interesting work is that I want to utilize permutation testing as a way of finding novel test-cases. Given that it took a bit over half a day to simulate playing 1.2 M games, I would like to be able to cut that time down.

Test Setup#

I am going to leverage the current game prototype which comes in at a massive 4.8 KLOC, and the goal is simulate random players playing a game. To factor out the randomness, I’m going to seed the random number generator such that the game play and decisions are predictable. I tweaked the seed parameters until I found a meaty enough game to stress the system.

This meaty game will be played 101 times, and the results from the first test will be thrown away as it is heavily influenced by the JVM warming up. Each test afterwards will produce these data columns

  • Time: proxy measure for CPU (and memory induced GC pressure) within a single thread.
  • “Billing Cost”: number proportional to the amount of work done (the language bills by the statement and a few other things)
  • Decisions: number of decisions taken by all the agents within a game

It is worth noting that “billing cost“ is expected to remain constant within a single test run, and the number of decisions is expected to remain constant over all runs because it is a measure of correctness. Decisions must be fixed at 798. Over those 100 runs, the average time will be a sufficient measure of performance. I intend to eyeball variance, but I leaning on an average to factor out GC cost. With all this said and done, the initial data is thusly:

msbilling cost
73912810563

That is approximate 1ms per game decision. This actually feels super slow, and as a secondary goal of this work is to build tools to dive into issues. My intent during this day is to just address the language and not push back on the user-space changes.

Exploit Primary Key#

Every record has a primary key called id which is unique and an integer, and it is fairly common to leverage this to find items by primary key. Since the only way to get access to data within a table is via a language integrated query, we must contend with the query syntax. The first task was to analyze where expressions to extract associations mapping the primary key to an expression. The initial algorithm for this is simple, and I'll write with pseudo-ish-code here (I'm actually using Java, but it is too verbose):

function extract(expr):
if expr is Parenthesis:
return extract(expr)
if expr is BinaryOp(&&):
result = extract(expr.left)
if !result:
result = extract(expr.right)
return result
if expr is BinaryOp(==):
if expr.left == Lookup("id"):
return expr.right
if expr.right == Lookup("id"):
return expr.left
return null
return null

This identified many cases where the primary key was part of the where clause, and the results were.... disappointing. The billing cost went down 2.5% while the time cost went down only 1.4%.

msbilling cost
72812483470

Ok, this is going to be a slog.

Direction Seeking: What if we don’t update client views#

The tool at hand doesn’t really need to see the individual view per agent, so as a method of developing a direction we can experiment and simply turn that feature off and see what happens. This enables a quick way of developing a sense of where time is worth spending. We do that, and the data is revealing.

msbilling cost
5103689641

This shows potential in that the billing cost is overblown and the client views represent close a third of our CPU cost, but billing was reduced by a whopping 70%. This fundamentally requires investigation. Now, this is not the way, but it gives a tingle to explore. Please ignore the above data.

Cache Record to Json#

In thinking about how to optimize client views, we can exploit caching of viewable records between the four agents. However, this requires a consultation with each record’s privacy policy. Statically, we can devise a very simple predicate to eliminate the need to compute per-user views. A record's per-user view can be cached if:

  • Has no bubbles
  • Has no visibility requirements.
  • The privacy for each field is either public or private
  • The fields are all primary data (i.e. ints and strings) (i.e. neither records nor tables)

With this, we can stash a copy of the JSON within the record and let record invalidation blow it away on change. By leveraging the reactive elements, we not only cache between viewers but also over time as well. This result in new data.

msbilling cost
64710125290

This is very encouraging as the billing cost has now been reduced by 20% and CPU cost is down 12.4%. In terms of the goal, that’s over an hour of time saved in the experimental test.

Quick Experiment: Turning Off Code Coverage#

Currently, the document keeps track of every single statement which gets executed in a giant list... The question at hand is how much that costs.

msbilling cost
62010125290

This is slightly exciting for production as this takes us towards a 16% CPU reduction, but this is a requirement for the testing I hope to achieve. As I hope to guide agent decisions towards maximimzing code coverage, I must turn this off... for now. Please ignore the above data.

Direction Seeking: Bad Indexing the Tables#

Indexing is the bees knees, so let’s introduce indexing to our tables!

This is not an easy endeavor, so we first set out to measure opportunity. We do that by doing half the hard work now, then exploit that work to measure and validate correctness. What we measure is the potential to do good work if we finish the other half of the hard work.

First, we have to leverage and extend the analysis work done via extracting the primary key and generalizing it to any field. This was added to the code-generation such that a where could generate a flat array of which columns map to which value. Second, each item must reveal their data in an easy to consume way. Third, we must measure how effective the where cause could be enhanced with indexing. This work enables a very simple algorithm to test

/**
* @param clause an integer array mapping columns to values [col0, testValue0, col1, testValue2, ..., colN, testValueN]
* @param value an integer array represent a particular item's columns [value0, value1, ..., valueM]
* @param effectiveness how effective a column was at rejecting the item
*/
private static boolean slowTest(int[] clause, int[] value, int[] effectiveness) {
boolean result = true;
for (int k = 0; k + 1 < clause.length; k += 2) {
if (value[clause[k]] != clause[k+1]) {
effectiveness[k/2]++;
result = false;
}
}
return result;
}

Now, this code is not great for a variety of reasons, but we can run it prior to executing the where clause and measure changes to billing cost. Well, data came in, and it was both suprising and not so.

msbilling cost
8102170041

A 10% CPU bump (yikes), but a 83% reduction to billing cost (woah). OK, this is very pro-customer, however this does not help me. It is however encouraging that if I roll up my sleeves, then I can get some better results.

Index all things?!?#

The challenge of indexing in a reactive system is not paying the implicit insertion cost. That is, stuff will change, and you don’t want to constantly insert and remove items. You need to be a bit lazy, and for indexing this requires some book-keeping and allowing some slack to emerge. Each column must have its own index, but when things change they must be either be removed or included in a catch-all bucket since the indexing is indeterminate in a lazy system.

private final ReactiveIndex<Ty>[] indices;
private final TreeSet<Ty> unknowns;

The key is that the combinatation of column specific indicies plus this catch all bucket form a super-set of the right answers you seek, and the hope is that the resulting set is smaller consideration pool. First, we index everything and measure the overhead:

msbilling cost
69510125290

Yikes! It is already adding cost to just index the data. However, we have yet to use the data. One more blind trial, but let's compute the needed super set.

msbilling cost
83210125290

Oh no, this is not trending good at all! Let's use it.....

msbilling cost
7342361554

I am Jack's sense of disappointment. OK, this is proving to be my white whale. There are many problems with the approach I have taken on this... Part of the problem is that I’m indexing every enum, integer, and client variable. Pareto is proving a relevant observation, so I’m going to need to take a break... Several hours pass

I’m going to restore the code I used for the bad indexing, then put it behind a condition and introduce a DocumentMonitor. The role of the DocumentMonitor is to stream data out to me which I can use, and I have a flag called “measureTablePerformance” which will use that bad code in special circumstances.

public interface DocumentMonitor {
/** should the runtime measure table performance */
public boolean measureTablePerformance();
/** emit a single datapoint about table performance */
public void recordTablePerformanceInstance(
String tableName,
String colummName,
int total,
int effectiveness);
}

With hope, the if statement doesn't regress the performance...

msbilling cost
64610125290

It's basically the same, so let's use it to emit some data, collect that data into a table, and then review the table.

tablecolumncallstotaleffectiveness%
skill_cardslocation711996763905529881678.34%
skill_cardsskill576815479695438375680%
skill_cardsowner305042897880226060778.01%
civilian_shipsstatus3150737808436387096.24%
militarytype939439631434056985.93%
civilian_shipsspace_key2828133937228475783.91%
crisis_deckrevealed_to4000280000280000100%
militaryspace_key909338315123790762.09%
destinationsrevealed_to4000174328174328100%
civilian_shipsrevealed_to39244708847088100%
loyalty_deckrevealed_to39244239642395100%
loyalty_deckowner4068439383884188.4%
playersplay_id7454298162236475.01%
playerslocation_key5079203161953896.17%
playersspace_key5052202081608679.6%
locationskey54910431997195.59%
charactersowner119119093478.49%
base_starsspace_key49699284685.28%
playerscharacter18875268490.96%
characterstype5151035770%
characterskey2222019890%
playerslink4161275%
super_crisis_deckowned_by1500%

Clearly, I want to exploit the skillcards table since it is highly rejective. Out of 6.7M tests, 5.2M could be quickly rejected via an index. This is great! We should exploit that, so let's do it. This requires a bunch of working like: upgrading the parser, updating the code generation to present the indexing information to tables, and then upgrade where clause code generation to build the needed set. Given that the last code was mostly thrown away due to complexity, this is going to require care... _Even more hours pass

msbilling cost
6302912569

OK... seriously. All that work for an additional 2% in CPU reduction. Sigh. On the bright side, the billing is now more customer-friendly and is down 77%. If the where clauses were more expensive, then this would be a very productive thing, so we will keep it as it is customer-friendly feature that encourages better focus on more controllable things by the customer.

Fixing a bug in the reactive tree#

As part of this work, something felt off as to why so much computation was happening in the first place. This triggered an audit into the code, and I had to rethink a few things from core first principles. At hand, how formulas were being computed was done poorly due to sloppy mixing of invalidation and dirty signals. The core two principles at play were:

  • invalidation alwayss flow from data down to formulas
  • dirty signals flow up from data to root

invalidations and dirty signals

This diagram became crucial for investigating each class and making sure it conformed to the principles. All but two of the classes behaved, and the one that did not behave had a giant TODO on it. Fixing that TODO and the other class caused the system to behave, and produced new data.

msbilling cost
5502328882

At this point, 25% of the CPU has been reduced and 82% of the billing cost has been reduced.

Until next time.#

I'll take 25% as a win for now, but I want to invest in tooling to provide better insights. There is some super dumb code within the prototype at the moment because the focus then was to ship the game in a workable form.

June went by fast! Woah!

I’ll be honest, I didn’t get as much as I wanted done beyond a bunch of clean-up on the language and develop a few new language features. I am however much happier with the state of the world now, but I’m avoiding a few things like deprecating some dumb ideas.

I finished going ham on the parser. Rewriting the entire parser from scratch by hand required rebuilding my trust in it, so I set out to get test coverage on the parser and the type checker. The core language has 100% test coverage, and this process uncovered many bugs. It further required resolving many TODOs within the code. The parser swap was a deep change, and while I’m very happy with it and the correctness of the language with my over 1,500 tests; the time has come to polish the error messages and align the character positions with the errors to be meaningful. This is the way for July.

Before I went ham on the parser, I made progress on simulation. I simulated 1.2M games over night. I let all my cores go, and I discovered that complete games could be played in under a half a second. The purpose of these simulations were to randomly play games looking for dead-ends, crashes, or game-ending problems. The time per game-decision was about 2.1 ms which is exceptionally high. Performance being a non-goal meant I’d move on.

Something amazing happened after going ham on the parser, fixing issues, eliminating some hacks, and improving the type system. The simulation performance tripled and now the time is about 0.7 ms per game-decision. w00t. I hope to spend a touch of time in July optimizing performance a bit because it is fun, and I hope to double the performance again.

Going HAM on the parser with next level testing

Finding the balance between what to work on is a challenge when there are so many interesting problems ahead. Testing the back-end for the current game has proved to be a challenge due to the entropy involved, so I’m thinking about ways of building tooling to corral that complexity and chaos.

However, the game works, but I have yet to reach the reality of the first milestone of playing with friends (UI sucks, afraid of a game-stopping bug). I am going to focus on improving the core in two ways. First, I want to improve the error messages. Second, I want to bring real-time code coverage into the picture as part of the test tooling.

I started with ANTLR to do the parsing and help me build the trees backing the language primitives. However, I started to get frustrated with what ANTLR does with white space and the error messages it produces. While that is relatively minor, it also requires me to rethink all my trees to accomplish some goals. For instance, I want to make it easy to do refactoring and code completion. I also want to support LSP to bring comments on field via the hover mechanism. In using ANTLR, I have found truth in some wisdom from why others recommend not using parser tools.

The objectives of the new parser are thusly:

  • Have a unified way of thinking about comments which can annotate any and everything
  • Have the parser become an ally in how to format the code
  • Have great error messages
  • Make it easy to associate environments to the token space such that code completion queries are fast and relevant.
  • Be fast (it is already 20% faster)
  • Fix some keyword issues such that things which identifiers are forbidden versus not.
  • Achieve 100% code coverage on the parser
  • Fail on the first error, it is simply better that way

With the new parser, I will have plenty of information to think about code coverage at the character level. With code coverage and the code indexed with comments, I can build a tool where the test cases are generated for me and the transitions are generated to achieve code coverage. The aim is to use AI techniques to minimize the total number of automated tests to simplify the validation.

Now, this requires more thought and more words to elucidate, but imagine having a UI which shows you the raw state, the code with comments that is being tested, and the related state change of running that code. This can be certified as a good and right state change (by potentially multiple people), and this test can be saved for posterity and re-ran later. However, if changes happened, then those changes can be validated as either new data changes, loss of a data change, a different data change and require new certification.

Existing tests are fed in to establish a base line code coverage, and new tests can be generated such that code coverage is maximized with minimal test certification. For instance, the introduction of a new card could require several test cases to be randomly generated such that the shuffling of the card happens early. Only the test cases which extend code coverage need to be stored and certified.

It feels exciting that I’ll never need to mock or write test code ever again since the goal of testing is primarily about reducing entropy and maximizing determinism.

How this all started from building a custom browser

Conceptually, a user interface is a simple thing. It is a pretty and delightful picture which makes it easy to understand and interact with a product. That's it.

Since Adama is a programming language for board games, it stands to reason that Adama does not exist in a vacuum and it must be workable with existing UI technologies. That is, it must sanely integrate with a variety of frameworks to achieve some measure of success and be usable beyond my myopic view of reality.

When it comes to modern day application building, I am an old man screaming from my porch for you damn kids to get off my lawn.

An interesting point of history is that my recent work on board games actually started using SDL and then eventually Skia with C#. I was trying to build a new style of browser where (1) a single socket was the only way to get data and send messages, (2) there is exceptionally limited client side logic to eliminate abuse, (3) anti-entropy protocols would reconcile data between client and server, (4) the browser was 100% reactive to data changes, and (5) the server was in complete control because "the network is the computer". Now, there is a lot to unpack here, but the point is that building a new style of browser is a monumental task within itself.

The reason for starting a new browser is that I feel the web is a complete shit-show, and I hate building web products. I hate that your stupid website runs code on my machine. I also hate how a lack of privacy is a given in today's world. Remember, I'm full of hate.

Putting aside the bucket of rage I feel when using the internet, I wanted to bring board games to mobile devices. That was the mission. My hope was to ship a simple binary which would utilize a single secure socket to do some magic and let the game play happen like an efficient remote desktop protocol.

Unfortunately, when you start building new UI frameworks with new idioms and low-level technologies, you kick off a massive empire building project which will require support and tools. Once again, I was shaving a Yak and building a new empire in praise of the glorious understanding that I have acquired about these stupid machines. I wish I had realized this before getting a working UI editor sort of done. sigh

The socket was a key thing to focus on. I prefer raw sockets because they are stateful and conversational. For board games, they are essential because of the inherent complexity of communication between players. The socket simplifies this because you can use the socket to mirror the server state, and all the complexity can be held within the single server.

Key reason for socket

The moment you have all state within a single server, the challenge of shipping a complex product is several orders of magnitude less due to practically zero failure modes. There is just one tiny problem... The reliability of a single server in today's cloud with crazy orchestration and almost constantly induced failures is not great.

Ignoring that tiny problem, I persisted since I had built a prototype tiny browser and if I control the hardware then I could probably be ok (right?). There I was building my favorite game with node.js as my back-end, and I had finally made some decent progress on the game. The usage of a socket and some of the new UI idioms were proving fruitful. However, the server-side complexity became exceptionally overwhelming. Board games are non-trivial endeavors.

This is where the impetus for Adama was born. There were three key mission questions to drive. First, can that pesky single server limitation be overcome where machine failures are handled gracefully without user impact. Second, are there useful primitives which reduce the total complexity. Third, what essential truths were learned during from the simplified UI idioms.

These first two questions will be addressed at length in the future, but the last question however gave pause. Systems do not live in an isolated vacuum, and it is the role of the system to make itself useful beyond its immediate peer all the way to the user. Afterall, Distributed systems are a UX problem, so we must operate on primitives which are useful to the UI and the UI developers.

First, we study the UI idioms which the toy browser used. They were straightforward, have been incorporated in the prototype of Adama:

  • An Adama server can only receive messages from clients of two core types.
    • The first type is a free form message like "say hello" which has no rules associated to it. It's only for the pesky humans.
    • The second type is a response in request-response where the server asks the client a question (which piece do you want to move, where do you want to move it). This is the magic for implementing the board game logic in a sane way, and also the secret for enabling AI. This will be written about at length in the future.
    • Note: there is a possible third type where the client may send a request to Adama, and Adama will respond, but that is open to debate. It feels natural (and may be useful outside of the current domain), but it introduces dealing with the failures of the RPC. Instead, messages are stored in a queue on the client side and must replicate to the Adama server with exactly once semantics (i.e. at least once with deduping) Adama must keep the client state up to date and be consistent
  • The entire application state is a giant object represented by JSON
    • An Adama server can differentiate the state and keep the client up to date with json merge (rfc7386) for the win.
    • The UI must then process a stream of state differentials which are congruent to the initial payload (i.e. they have the same shape and form, but differentials have vastly less information)

Ideal User Interface

The entire application state being a giant JSON is reminiscent of Redux and the notion of an application state container. Now, this has the property that the UI simply needs to react to changes of a single object. In the toy browser, the role was to simply render the scene and then index the scene in a way to convert interactions into messages. With the browser, this requires work to expose new idioms, and this is where I am at. Since the JSON is predictable, there is a maybe new (or not) concept of an "object sieve subscribe".

var sieve = GOAT.CreateSieve()
GOAT.SieveSubscribe(sieve, wrappedCallback);

The implementation of the above is basically to implement json merge (rfc7386), and then publish out changes as they happen by walking a parallel object callback structure (ie. the sieve). We can gleam a basic idea of how this works with single example.

GOAT.SieveSubscribe(sieve, {'turn': function(change) {
GameLog.write(
[ "The turn has changed from ",
change.before,
" to ",
change.after
].join(""));
document.getElementById('turn').innerHTML = change.after;
}});
GOAT.SieveMerge(state, diff, sieve); // <- powers the entire engine

This example shows an interesting idiom where updates on the UI are ONLY derived from the update stream. There is no global re-computation, and no giant reconstruction for small changes. This property was important for the toy browser because I was aiming for a battery efficient engine where only updates would refresh the screen rather than periodically polling the scene.

However, there is a small hitch. JSON Merge (rfc7386) does not handle arrays well, but this can be overcome by constructing a new merge operation which enables array differences. This requires the server to craft and embed meta signals in the delta, so that's what I am working on at this very moment until I got distracted by my thoughts. This will be described in more detail at a later date.

The core observation that I have had through-out this journey is that when back-end and front-work together, amazing properties are to be had. In this case, a small change from client to back-end results in a small change from back-end to other clients using a small amount of network. When clients learn of changes, they update proportional to the change at hand. No excess. No fuss. Just niceness.

Progress is Slow, May 2020 Update

"Wow"

I forgot how much work there is in building a programming language, but it is super fun. So here we are in May of the remarkably interesting 2020. I am coming out a slump of depression (I think) from this world-changing covid-19, and I recently made good progress towards the first milestone.

I am lurching forward ever so slowly every weekend I invest in this madness.

The first milestone is, fundamentally, a real game. Afterall, if this thing cannot ship real online board games, then it is a failure. The good news is that I am testing a real game, filling in missing rules, adding content, fixing bugs, re-reading the rule book, and working on the UI. I do not expect much from the UI for a game I do not intend to release (friends only due to licensing issues). The key thing is that I am suffering my language while I make progress on a product rather than being stuck in the infinite meta-game of improving the language.

It is a remarkable feeling to have over 3,000 lines of code in my own language.

This first milestone will be a success when I play a few games with people I know, and I am getting closer to this every weekend. The real question is what happens after this milestone, and this is where I need to find a good balance between playing the product-game (i.e. shipping real games) versus the build-the-empire (i.e. the meta-game of developing the programming language). What is a successful strategy?

This is something to think about during May, but one thing on my mind is that I must have a ruthless focus on shipping games rather than focusing on meta-problems. “Write Games, Not Engines” is particularly good advice. I am finding that I’ve solved a sufficiently hard problem with my language to justify its existence because I’m using the language to do exceptionally complex things, but I must resist the siren’s call to improve the language because that is an never-ending task.

So, my current premature attempt at a strategy is:

  1. Fix bugs in the language which are show-stoppers.
  2. Ship a game
  3. Do a retrospective after shipping a game, write about what was good versus what was awkward.
  4. Pick exactly 3 issues to improve within the language, and do them
  5. Goto 2

First Announcement

Welcome, welcome, welcome...

This is the first update on the Adama language project!

I finally have set forth on the journey of telling people about my latest passion project. Surprise, it's a programming language! However, it is not a generic programming language, and I want to make this exceptionally clear. It's a domain specific programming language meant for board games. Yep, that's right! Board Games!!!. However, I have discovered this language has some very interesting properties which make it broadly applicable, and I have a vision for a new type of infrastructure!

Since there is not much to share about the project at the moment, and I am really just filling in the blog so I have some content. I'll share a bit about who I am and rant a bit. Maybe it will be entertaining! Let's see.

My day job is one of those "technical architects", "engineering leaders", and sometimes "code machine". I recently gave myself the title of "Dark Lord of Infrastructure Engineer". That is seriously the title on my business card, so it is definitely legit, right? Well, it simply means I've succumbed to the dark side, and I don't mean management. Instead, I find myself more driven by hate than anything else.

I hate.

I hate bad infrastructure.

I hate leaky abstractions which become infrastructure.

I hate unreliable infrastructure.

Seriously, I hate the way we define Infrastructure via web services, but what other choice is there? This is a serious question.

I also hate having to read a long ass list of products and realize that my core option is to buy stupid shit rather than build my own stupid shit where I have control. I hate buying shit because buying creates hard boundaries which more often than not requires some kludge or hack to make work well. I value reliability, and I believe reliability only comes from simplification where less is more. And, I don't mean less stuff to buy, but less total stuff period. However, simplification is exceptionally hard and extremely expensive for a multitude of reasons.

I'm that fucker that shaves the Yak, and I have issues. Worse yet, I sometimes will allow myself to be the enemy of the good and seek perfection. I have issues. After-all, one doesn't set forth to build a programming language unless one has serious fucking psychotic issues and hatreds!

The interesting thing however in that the psychotic issues which drive me to write this language has given me a deeper appreciation and understanding of the science in the field. In a way, I've got closer to the root of things and how we build, and I see a different evolutionary branch to take. I believe I've discovered something, and I intend to pull that thread to build a new glorious empire!

Adama is the first step in building out my glorious empire dedicated towards building a new web and redefining the landscape for how products are built, and it all started with the desire to make board games easy to build.

I hope you enjoy reading what I write, and I hope you eventually decide to drink the Kool-Aid about Adama.

Thank for your time.

And, if you hate it, then that's ok. I get it.