July 4th, 2020 Performance Funday By Jeffrey M. Barber

A non-productive theme for July is to jump into performance and measure a few things. The hope is that this search will produce some low-hanging fruit that I can exploit, and I also intend to validate correctness on many things since there are some slap dashing stupid stuff. A core reason for the urgency beyond being interesting work is that I want to utilize permutation testing as a way of finding novel test-cases. Given that it took a bit over half a day to simulate playing 1.2 M games, I would like to be able to cut that time down.

Test Setup

I am going to leverage the current game prototype which comes in at a massive 4.8 KLOC, and the goal is simulate random players playing a game. To factor out the randomness, I’m going to seed the random number generator such that the game play and decisions are predictable. I tweaked the seed parameters until I found a meaty enough game to stress the system.

This meaty game will be played 101 times, and the results from the first test will be thrown away as it is heavily influenced by the JVM warming up. Each test afterwards will produce these data columns

  • Time: proxy measure for CPU (and memory induced GC pressure) within a single thread.
  • “Billing Cost”: number proportional to the amount of work done (the language bills by the statement and a few other things)
  • Decisions: number of decisions taken by all the agents within a game

It is worth noting that “billing cost“ is expected to remain constant within a single test run, and the number of decisions is expected to remain constant over all runs because it is a measure of correctness. Decisions must be fixed at 798. Over those 100 runs, the average time will be a sufficient measure of performance. I intend to eyeball variance, but I leaning on an average to factor out GC cost. With all this said and done, the initial data is thusly:

ms billing cost
739 12810563

That is approximate 1ms per game decision. This actually feels super slow, and as a secondary goal of this work is to build tools to dive into issues. My intent during this day is to just address the language and not push back on the user-space changes.

Exploit Primary Key

Every record has a primary key called id which is unique and an integer, and it is fairly common to leverage this to find items by primary key. Since the only way to get access to data within a table is via a language integrated query, we must contend with the query syntax. The first task was to analyze where expressions to extract associations mapping the primary key to an expression. The initial algorithm for this is simple, and I’ll write with pseudo-ish-code here (I’m actually using Java, but it is too verbose):

function extract(expr):
  if expr is Parenthesis:
    return extract(expr)
  if expr is BinaryOp(&&):
    result = extract(expr.left)
    if !result:
      result = extract(expr.right)
    return result
  if expr is BinaryOp(==):
  	if expr.left == Lookup("id"):
      return expr.right
    if expr.right == Lookup("id"):
      return expr.left
    return null
  return null

This identified many cases where the primary key was part of the where clause, and the results were…. disappointing. The billing cost went down 2.5% while the time cost went down only 1.4%.

ms billing cost
728 12483470

Ok, this is going to be a slog.

Direction Seeking: What if we don’t update client views

The tool at hand doesn’t really need to see the individual view per agent, so as a method of developing a direction we can experiment and simply turn that feature off and see what happens. This enables a quick way of developing a sense of where time is worth spending. We do that, and the data is revealing.

ms billing cost
510 3689641

This shows potential in that the billing cost is overblown and the client views represent close a third of our CPU cost, but billing was reduced by a whopping 70%. This fundamentally requires investigation. Now, this is not the way, but it gives a tingle to explore. Please ignore the above data.

Cache Record to Json

In thinking about how to optimize client views, we can exploit caching of viewable records between the four agents. However, this requires a consultation with each record’s privacy policy. Statically, we can devise a very simple predicate to eliminate the need to compute per-user views. A record’s per-user view can be cached if:

  • Has no bubbles
  • Has no visibility requirements.
  • The privacy for each field is either public or private
  • The fields are all primary data (i.e. ints and strings) (i.e. neither records nor tables)

With this, we can stash a copy of the JSON within the record and let record invalidation blow it away on change. By leveraging the reactive elements, we not only cache between viewers but also over time as well. This result in new data.

ms billing cost
647 10125290

This is very encouraging as the billing cost has now been reduced by 20% and CPU cost is down 12.4%. In terms of the goal, that’s over an hour of time saved in the experimental test.

Quick Experiment: Turning Off Code Coverage

Currently, the document keeps track of every single statement which gets executed in a giant list… The question at hand is how much that costs.

ms billing cost
620 10125290

This is slightly exciting for production as this takes us towards a 16% CPU reduction, but this is a requirement for the testing I hope to achieve. As I hope to guide agent decisions towards maximimzing code coverage, I must turn this off… for now. Please ignore the above data.

Direction Seeking: Bad Indexing the Tables

Indexing is the bees knees, so let’s introduce indexing to our tables!

This is not an easy endeavor, so we first set out to measure opportunity. We do that by doing half the hard work now, then exploit that work to measure and validate correctness. What we measure is the potential to do good work if we finish the other half of the hard work.

First, we have to leverage and extend the analysis work done via extracting the primary key and generalizing it to any field. This was added to the code-generation such that a where could generate a flat array of which columns map to which value. Second, each item must reveal their data in an easy to consume way. Third, we must measure how effective the where cause could be enhanced with indexing. This work enables a very simple algorithm to test

  /**
   * @param clause an integer array mapping columns to values [col0, testValue0, col1, testValue2, ..., colN, testValueN]
   * @param value an integer array represent a particular item's columns [value0, value1, ..., valueM]
   * @param effectiveness how effective a column was at rejecting the item
   */
  private static boolean slowTest(int[] clause, int[] value, int[] effectiveness) {
    boolean result = true;
    for (int k = 0; k + 1 < clause.length; k += 2) {
      if (value[clause[k]] != clause[k+1]) {
        effectiveness[k/2]++;
        result = false;
      }
    }
    return result;
  }

Now, this code is not great for a variety of reasons, but we can run it prior to executing the where clause and measure changes to billing cost. Well, data came in, and it was both suprising and not so.

ms billing cost
810 2170041

A 10% CPU bump (yikes), but a 83% reduction to billing cost (woah). OK, this is very pro-customer, however this does not help me. It is however encouraging that if I roll up my sleeves, then I can get some better results.

Index all things?!?

The challenge of indexing in a reactive system is not paying the implicit insertion cost. That is, stuff will change, and you don’t want to constantly insert and remove items. You need to be a bit lazy, and for indexing this requires some book-keeping and allowing some slack to emerge. Each column must have its own index, but when things change they must be either be removed or included in a catch-all bucket since the indexing is indeterminate in a lazy system.

  private final ReactiveIndex<Ty>[] indices;
  private final TreeSet<Ty> unknowns;

The key is that the combinatation of column specific indicies plus this catch all bucket form a super-set of the right answers you seek, and the hope is that the resulting set is smaller consideration pool. First, we index everything and measure the overhead:

ms billing cost
695 10125290

Yikes! It is already adding cost to just index the data. However, we have yet to use the data. One more blind trial, but let’s compute the needed super set.

ms billing cost
832 10125290

Oh no, this is not trending good at all! Let’s use it…..

ms billing cost
734 2361554

I am Jack’s sense of disappointment. OK, this is proving to be my white whale. There are many problems with the approach I have taken on this… Part of the problem is that I’m indexing every enum, integer, and client variable. Pareto is proving a relevant observation, so I’m going to need to take a break… Several hours pass

I’m going to restore the code I used for the bad indexing, then put it behind a condition and introduce a DocumentMonitor. The role of the DocumentMonitor is to stream data out to me which I can use, and I have a flag called “measureTablePerformance” which will use that bad code in special circumstances.

public interface DocumentMonitor {

  /** should the runtime measure table performance */
  public boolean measureTablePerformance();

  /** emit a single datapoint about table performance */
  public void recordTablePerformanceInstance(
    String tableName,
    String colummName,
    int total,
    int effectiveness);
}

With hope, the if statement doesn’t regress the performance…

ms billing cost
646 10125290

It’s basically the same, so let’s use it to emit some data, collect that data into a table, and then review the table.

table column calls total effectiveness %
skill_cards location 71199 6763905 5298816 78.34%
skill_cards skill 57681 5479695 4383756 80%
skill_cards owner 30504 2897880 2260607 78.01%
civilian_ships status 31507 378084 363870 96.24%
military type 9394 396314 340569 85.93%
civilian_ships space_key 28281 339372 284757 83.91%
crisis_deck revealed_to 4000 280000 280000 100%
military space_key 9093 383151 237907 62.09%
destinations revealed_to 4000 174328 174328 100%
civilian_ships revealed_to 3924 47088 47088 100%
loyalty_deck revealed_to 3924 42396 42395 100%
loyalty_deck owner 4068 43938 38841 88.4%
players play_id 7454 29816 22364 75.01%
players location_key 5079 20316 19538 96.17%
players space_key 5052 20208 16086 79.6%
locations key 549 10431 9971 95.59%
characters owner 119 1190 934 78.49%
base_stars space_key 496 992 846 85.28%
players character 188 752 684 90.96%
characters type 51 510 357 70%
characters key 22 220 198 90%
players link 4 16 12 75%
super_crisis_deck owned_by 1 5 0 0%

Clearly, I want to exploit the skill_cards table since it is highly rejective. Out of 6.7M tests, 5.2M could be quickly rejected via an index. This is great! We should exploit that, so let’s do it. This requires a bunch of working like: upgrading the parser, updating the code generation to present the indexing information to tables, and then upgrade where clause code generation to build the needed set. Given that the last code was mostly thrown away due to complexity, this is going to require care… Even more hours pass

ms billing cost
630 2912569

OK… seriously. All that work for an additional 2% in CPU reduction. Sigh. On the bright side, the billing is now more customer-friendly and is down 77%. If the where clauses were more expensive, then this would be a very productive thing, so we will keep it as it is customer-friendly feature that encourages better focus on more controllable things by the customer.

Fixing a bug in the reactive tree

As part of this work, something felt off as to why so much computation was happening in the first place. This triggered an audit into the code, and I had to rethink a few things from core first principles. At hand, how formulas were being computed was done poorly due to sloppy mixing of invalidation and dirty signals. The core two principles at play were:

  • invalidation alwayss flow from data down to formulas
  • dirty signals flow up from data to root

invalidations and dirty signals

This diagram became crucial for investigating each class and making sure it conformed to the principles. All but two of the classes behaved, and the one that did not behave had a giant TODO on it. Fixing that TODO and the other class caused the system to behave, and produced new data.

ms billing cost
550 2328882

At this point, 25% of the CPU has been reduced and 82% of the billing cost has been reduced.

Until next time.

I’ll take 25% as a win for now, but I want to invest in tooling to provide better insights. There is some super dumb code within the prototype at the moment because the focus then was to ship the game in a workable form.