A non-productive theme for July is to jump into performance and measure a few things. The hope is that this search will produce some low-hanging fruit that I can exploit, and I also intend to validate correctness on many things since there are some slap dashing stupid stuff. A core reason for the urgency beyond being interesting work is that I want to utilize permutation testing as a way of finding novel test-cases. Given that it took a bit over half a day to simulate playing 1.2 M games, I would like to be able to cut that time down.
I am going to leverage the current game prototype which comes in at a massive 4.8 KLOC, and the goal is simulate random players playing a game. To factor out the randomness, I’m going to seed the random number generator such that the game play and decisions are predictable. I tweaked the seed parameters until I found a meaty enough game to stress the system.
This meaty game will be played 101 times, and the results from the first test will be thrown away as it is heavily influenced by the JVM warming up. Each test afterwards will produce these data columns
- Time: proxy measure for CPU (and memory induced GC pressure) within a single thread.
- “Billing Cost”: number proportional to the amount of work done (the language bills by the statement and a few other things)
- Decisions: number of decisions taken by all the agents within a game
It is worth noting that “billing cost“ is expected to remain constant within a single test run, and the number of decisions is expected to remain constant over all runs because it is a measure of correctness. Decisions must be fixed at 798. Over those 100 runs, the average time will be a sufficient measure of performance. I intend to eyeball variance, but I leaning on an average to factor out GC cost. With all this said and done, the initial data is thusly:
That is approximate 1ms per game decision. This actually feels super slow, and as a secondary goal of this work is to build tools to dive into issues. My intent during this day is to just address the language and not push back on the user-space changes.
Every record has a primary key called
id which is unique and an integer, and it is fairly common to leverage this to find items by primary key. Since the only way to get access to data within a table is via a language integrated query, we must contend with the query syntax. The first task was to analyze
where expressions to extract associations mapping the primary key to an expression. The initial algorithm for this is simple, and I'll write with pseudo-ish-code here (I'm actually using Java, but it is too verbose):
This identified many cases where the primary key was part of the where clause, and the results were.... disappointing. The billing cost went down 2.5% while the time cost went down only 1.4%.
Ok, this is going to be a slog.
The tool at hand doesn’t really need to see the individual view per agent, so as a method of developing a direction we can experiment and simply turn that feature off and see what happens. This enables a quick way of developing a sense of where time is worth spending. We do that, and the data is revealing.
This shows potential in that the billing cost is overblown and the client views represent close a third of our CPU cost, but billing was reduced by a whopping 70%. This fundamentally requires investigation. Now, this is not the way, but it gives a tingle to explore. Please ignore the above data.
- Has no bubbles
- Has no visibility requirements.
- The privacy for each field is either public or private
- The fields are all primary data (i.e. ints and strings) (i.e. neither records nor tables)
With this, we can stash a copy of the JSON within the record and let record invalidation blow it away on change. By leveraging the reactive elements, we not only cache between viewers but also over time as well. This result in new data.
This is very encouraging as the billing cost has now been reduced by 20% and CPU cost is down 12.4%. In terms of the goal, that’s over an hour of time saved in the experimental test.
Currently, the document keeps track of every single statement which gets executed in a giant list... The question at hand is how much that costs.
This is slightly exciting for production as this takes us towards a 16% CPU reduction, but this is a requirement for the testing I hope to achieve. As I hope to guide agent decisions towards maximimzing code coverage, I must turn this off... for now. Please ignore the above data.
Indexing is the bees knees, so let’s introduce indexing to our tables!
This is not an easy endeavor, so we first set out to measure opportunity. We do that by doing half the hard work now, then exploit that work to measure and validate correctness. What we measure is the potential to do good work if we finish the other half of the hard work.
First, we have to leverage and extend the analysis work done via extracting the primary key and generalizing it to any field. This was added to the code-generation such that a
where could generate a flat array of which columns map to which value. Second, each item must reveal their data in an easy to consume way. Third, we must measure how effective the where cause could be enhanced with indexing. This work enables a very simple algorithm to test
Now, this code is not great for a variety of reasons, but we can run it prior to executing the
where clause and measure changes to billing cost. Well, data came in, and it was both suprising and not so.
A 10% CPU bump (yikes), but a 83% reduction to billing cost (woah). OK, this is very pro-customer, however this does not help me. It is however encouraging that if I roll up my sleeves, then I can get some better results.
The challenge of indexing in a reactive system is not paying the implicit insertion cost. That is, stuff will change, and you don’t want to constantly insert and remove items. You need to be a bit lazy, and for indexing this requires some book-keeping and allowing some slack to emerge. Each column must have its own index, but when things change they must be either be removed or included in a catch-all bucket since the indexing is indeterminate in a lazy system.
The key is that the combinatation of column specific indicies plus this catch all bucket form a super-set of the right answers you seek, and the hope is that the resulting set is smaller consideration pool. First, we index everything and measure the overhead:
Yikes! It is already adding cost to just index the data. However, we have yet to use the data. One more blind trial, but let's compute the needed super set.
Oh no, this is not trending good at all! Let's use it.....
I am Jack's sense of disappointment. OK, this is proving to be my white whale. There are many problems with the approach I have taken on this... Part of the problem is that I’m indexing every enum, integer, and client variable. Pareto is proving a relevant observation, so I’m going to need to take a break... Several hours pass
I’m going to restore the code I used for the bad indexing, then put it behind a condition and introduce a DocumentMonitor. The role of the DocumentMonitor is to stream data out to me which I can use, and I have a flag called “measureTablePerformance” which will use that bad code in special circumstances.
With hope, the if statement doesn't regress the performance...
It's basically the same, so let's use it to emit some data, collect that data into a table, and then review the table.
Clearly, I want to exploit the skillcards table since it is highly rejective. Out of 6.7M tests, 5.2M could be quickly rejected via an index. This is great! We should exploit that, so let's do it. This requires a bunch of working like: upgrading the parser, updating the code generation to present the indexing information to tables, and then upgrade where clause code generation to build the needed set. Given that the last code was mostly thrown away due to complexity, this is going to require care... _Even more hours pass
OK... seriously. All that work for an additional 2% in CPU reduction. Sigh. On the bright side, the billing is now more customer-friendly and is down 77%. If the
where clauses were more expensive, then this would be a very productive thing, so we will keep it as it is customer-friendly feature that encourages better focus on more controllable things by the customer.
As part of this work, something felt off as to why so much computation was happening in the first place. This triggered an audit into the code, and I had to rethink a few things from core first principles. At hand, how formulas were being computed was done poorly due to sloppy mixing of invalidation and dirty signals. The core two principles at play were:
- invalidation alwayss flow from data down to formulas
- dirty signals flow up from data to root
This diagram became crucial for investigating each class and making sure it conformed to the principles. All but two of the classes behaved, and the one that did not behave had a giant TODO on it. Fixing that TODO and the other class caused the system to behave, and produced new data.
At this point, 25% of the CPU has been reduced and 82% of the billing cost has been reduced.
I'll take 25% as a win for now, but I want to invest in tooling to provide better insights. There is some super dumb code within the prototype at the moment because the focus then was to ship the game in a workable form.