3 posts tagged with "infrastructure"

View All Tags

Five Secrets of the Adama Programming Language

Now, that I am committed to my path with this language; let's talk about secrets. These are the things that I know which I believe could lead to a niche empire. I'll share them directly along with insights about why I'm continuing with a language-centric approach. As a moment of clarity, this isn't my best writing since I'm fumbling around with ideas. Perhaps, secrets today are just messy facts of tomorrow?.

DIY database within a document#

We start with the first experience of writing code. It's an amazing experience where you can take inputs in and do things producing a happy output. Making fractals and other graphical things with mode 13h was pure joy. I remember making calculator programs for my TI-80 which helped me in my mathematical journey. I remember building a simple game for a BBS using QBasic, and then I started the journey beyond compute: persistence.

Persisting data from your program to a something like a disk, a database, or a network service is problematic for many reasons. We start with the sheer tedium of taking state from memory and then marshalling it out into a block of data. Assuming you got everything, the other side may fail for a variety of reasons. Worse yet, you may have a partial failure and be in a funky state. Having spent a decade within infrastructure: there are many things that can go wrong.

This problem is so hard that many developers bias towards some degree of stateless programming letting the emergent cloud (i.e., the availability of many specialized services) handle those pesky hard problems. This has been a fruitful decoupling for moving faster, but this is not without its own pain points. The first secret rests in a return to the basics of just writing code, and let the language persist everything. We throw away the notion that we must load data, do our work, then save data.

We can even model this using existing ideas. Take for instance your computers ability to suspend, power off, then power on back to where you were. When it works, it is amazing, so let’s draw it out:

mental model of vm flow

Suppose the VM holds the state of your game (or app/experience). When the first player joins the game, the VM wakes up (1) by loading a snapshot from the disk (2). As more players join and they interact (3.a), changes to the VM’s memory needs to be persisted (3.b). Once all the players leave or the VM needs to be run on a different host (cloud shenanigans), the VM needs to be shutdown and put to sleep (4). This mental model is precisely the model that Adama uses for back-ends written in the language.

With this model, we can start asking questions like: how many VMs can a host hold? How do we upgrade the VM’s logic and data structures? What is the memory layout? What is the throughput to persist changes from the VM? How expensive is it to load state? How quickly can a deployment cycle from sleep to load? These questions provide insights into why databases are designed the way they have been.

The key property that I’ve come to appreciate is differentiability. When you use a database, your updates and deletes are effectively data differentials which the database can simply ingest and proxy out to the disk with a log structure. Databases are beautifully simple once you see through the complex layers on top of the transactional log powering them.

Unfortunately, application developers tend to want different ways of persisting data beyond emitting data differentials. So, we get things like ORM or GraphQL. The object-relational impedance mismatch is a very precocious problem requiring either your app or API to submit to the relational model or your relational model to submit to your app (i.e. the emergence of NoSQL).

Making persistence easier is both a siren song and essential research. Personally, using a database doesn’t spark joy with me which is why I’ve biased towards the chaos of NoSQL. However, that chaos is not without its pain and suffering. Perhaps the ultimate truth is that persistence, like life, is pain. I have hope that there is a better way, and here I am to push this rock up the hill once again. However, doing better requires a pilgrimage of understanding why existing solutions the way are they are.

Adama takes the view that developers should work within the previously mentioned VM model. However, instead of mirroring the memory as a giant block of bytes, the memory should have a layout that is compatible with logic upgrades and be cheaply differentiable. This layout is precisely why the key container within Adama is a table because tables have a lot of good things.

As your code runs, Adama’s runtime is building up a data differential which can be emitted to a log around a transactional boundary. With Adama, the transactional boundary is a single message. A single message from a person enters and a single data differential is emitted. This has the beautiful consequence that you think in terms of your data which is declared directly within your code. This is then compounded by the sheer lack of failures for you to handle. The message and data differential are tied together such that your back-end code need not think about failures. You just write code, change state, and magic of persistence is handled for you.

The secret then is that you just focus on the experience directly as if you are a beginner again.

UNDO!#

As previously stated, Adama is translating messages into data differentials by monitoring the game/app state. This monitoring enables recording the inverse differential as well, and this imbues products with an automatic Undo feature.

This comes in two forms.

The first form is to simply rewind the state of the document. This alone is worth the complexity of building the language because the biggest complaint my social group has when playing any online board game is the lack of undo. As I get older with my peers, the game is less about cut-throat competition and more about socially connecting in a casual fun way. Mistakes happen, and that’s ok.

This first form is also global to the document which is problematic when using undo in a collaborative setting; the second form is unsend which speaks towards how the inverse differential can be isolated and pulled forward such that it works to preserve the work of multiple people. This second form is much more theoretical and has potential problems, so I’ll probably need to invent a way for some messages to describe an inverse message for undo rather than using an algebraic manipulation.

Fortunately, I can focus on the first form since the second form is much harder to think about. However, the nerd-snipe is to focus on the academic problem of collaborative undo which leads into various CRDTs.

Await/async and the dungeon master role#

Await and async are language mechanisms which greatly simplify much of the burden that asynchronous code places on developers. For anyone that has had to deal with callback hell with lots of janky code running in various threads, await/async are a tall glass of iced tea.

Since this is (or is becoming) a common idiom, the secret rests with breathing life into the Adama. Each Adama game/document instance has a labeled state machine which will execute code either on transition or after some time. For a board game, this state machine behaves like a dungeon master who controls the flow of the game.

This dungeon master is then able to ask people questions like “which card would you like to play?” or “would you like to roll dice or pass?” and this ability to reach out to people is the key that greatly simplifies writing a board game on the server. Alas, this creates problems.

The first problem relates to the transactional boundary. While the dungeon master incorporates messages from different people, the boundary must happen at the end of the state machine transition. This means that a state machine transition becomes a multi-message transaction boundary where all the messages commit together.

The second problem arises from how some languages implement async-await by converting the logic into a state machine. This complicates the implementation of the transaction boundaries as it is possible for messages be handled (i.e. chat) which must commit a transaction while the dungeon master is waiting on asking a player a question. Furthermore, the code running the dungeon master should be able to restart on a different machine for operations, and this is problematic with how current languages implement async-await.

The third problem arises in a few ways. What happens if a player goes away forever, can the players cancel the current state machine to reduce players and try again? The ability to cancel the multi-message transaction feels important.

These problems are fixable by (1) forcing a single message asked by the dungeon master into a queue within a data differential, (2) an await will throw an exception when data is unavailable, (3) a document will drain the queue and emit a data differential only on successful completion of the state machine logic.

This is possible due to the ability to monitor state grants the ability to revert changes, but it also requires non-determinism for functions like rolling dice.

Many of these problems were discovered in trying to build a game using node.js, and the operational problems of restarting node.js were painfully apparent. Being able to restart a program and not notice if it restarted is a hard problem which manifests as a secret of Adama.

Reactivity and privacy#

A key challenge with board games is privacy between players. It’s a goldilocks problem of sharing too much or not sharing enough as it is tedious to copy, transform, and serialize the state for each user. A key secret of Adama is to define state with privacy rules of who can see what.

Laying out your state with privacy in mind with a language automates the copying, transform, and serialization; this is a great productivity boon mirroring an interface description language like protobuf and thrift which greatly improve productivity with serialization.

The automation of this transformation from global to personalized private state then reveals the tedium of pushing out changes on each state change. Furthermore, the failure mode of reconnecting requires clients to be able to pull state as well. We leverage reactivity to combine the pulling and pushing into a signal idiom.

Reactivity becomes a perfect pairing with privacy such that developers can dial into the perfect balance of what to share or not whilst also minimizing costs. The networking cost is minimized by the server sending only updates to the clients. The client cost is minimized by only ingesting the change. The server cost is minimized by recomputing privacy on data changes (which is courteously given from the server’s ability to monitor change for persistence).

Reactivity also enables sharing of cached results internally like how Excel enables reuse of computation. It is entirely possible to use Adama in ways that are compatible with Excel, and an interesting future project would be to convert an Excel spreadsheet into an Adama script + data blob.

Massive scale + client-side prediction for low latency#

As a final secret on something that I believe is a new mode of building low latency applications.

A key criticism of Adama currently is that it requires consistency mirroring how ACID databases work. This means that if players within the US want to play a game, then a server must be elected near the centroid of the users to keep latency balanced. For board games, latency isn’t a huge deal. However, for collaborative applications or real-time games, then this becomes a challenge.

We can exploit the ability to log data changes out first to achieve massive scale such that millions of people can observe a game with a reasonable number of participants writing state. This mirrors how read replicas of databases function, and this then begs a question of what a write against a read replica could become.

A write against a read replica is, in some ways, the beginnings of client-side prediction where if that write doesn’t conflict then absorbing it is not a big deal. The name of the game is how to deal with conflicts. The reason to investigate this because a read replica could be geographically much closer to the user providing exceptional latency for the users which are close to each other.

Furthermore, this begs a question if client-side predictions can be estimated on the client side. If the state of the game is mostly all public, then the answer is yes. Privacy is a confounding issue, but given we have a language it begs a question if a specialized client-side predictor could be generated to estimate how local messages manifest local data changes.

The future game in this to figure out many things. For instance, inserting items into a table requires id generation to federate by writers. Well, we could set a maximum number of writers and then provision an id space per client such that new items can’t conflict on ids. Similarly, we could have each writer with a randomization seed such that randomness is deterministic between multiple writers. These estimates would only work on independent messages rather than messages requested by the dungeon master.

This area is fascination to explore, and I feel at the beginning of the low latency journey here. Unfortunately, I must prioritize away from low latency. However, the possibilities feel endless!

Summary#

Well, I'm exploring an area at the intersection of many things. The platform I imagine is server-less in nature powered by a language enabling developers to build an application specific database.

I may be making a mistake by focusing too much on the language, and perhaps I should rebrand this as a new kind of data store. Regardless, I need to shift my thinking about how I execute and find a balanced strategy between my long term research interests and short term results to drive interest. Expect a future post on a strategy shift. And, if you got this far in my rambling, then I thank you.

Micro-monoliths and the future of infrastructure...

I’ve got my silly opinions on this site, but generally my silly thoughts have a way of manifesting over time. In this post, I’m thinking about possible alignments of my thoughts with the broader industry. The key question is which of my ideas are worth going to production with the limited innovation tokens I would have if leveraged in a business.

First, the Adama language itself will take a tremendous time to finish both in terms of quality, features, tooling, documents, idioms, and what-not. I believe strongly in the language, but this would not make a great foundation for an enterprise business. Programming languages tend to gain a religious zeal to them, and it tends to be best to focus at making either a great library or a service with robust API.

Aside: speaking of religious zeal, I am now a fan of Rust. It’s great, and I am playing with WebAssembly. A personal goal this year is to be somewhat competent at both Rust and WebAssembly because the artifacts produced make me happy. Rust gives me confidence that we can have good software, and WebAssembly is the modern JVM that will be ubiquitous.

Focusing on the language could be a sketchy lifestyle business, and maybe that is ok? But... If the goal is to align and lead industry, then I could find success by building an infrastructure business around WebAssembly such that people could bring their own language. This would allow that business to simply focus on supporting a black-box of logic, and then orchestration and state management would be the business.

This language could power a "state-machine/actor as a service" business, and it would be very similar to Durable Objects from Cloudflare. The key difference is that I’d hook up a WebSocket to it and invent yet another robust streaming protocol (sigh).

It would feel a bit like Firebase, but it would be much more expressive as you would build your own database operations within the document. Is there an open-source Firebase?

Beyond WebAssembly’s current popularity and tremendous investments, I believe that WebAssembly has a huge potential for disruption in how we think about building software. Generally speaking, the hot shit these days is Docker/k8s/micro-services. All the cool kids want their software to be super scalable, and they cargo cult practices that only make sense with hundreds of engineers. That’s fine and par for the course, but as the guy that generally cleans up messes and make services reliable; I shudder as I have a bipolar relationship with micro-services.

On one hand, they let engineers scope and focus their world view. Micro service architectures tend to solve very real people problems, but this comes at great cost. That cost manifests in extra machines, but also requires everyone to understand that networks are not perfectly reliable. Having a bunch of stateless services sound great until failures mount and your reliability sucks.

The other perspective relates to monoliths which enable engineers to build (more) reliable software (due to less moving pieces), but slow build times and hard release schedules make them undesirable because people. Distributed systems are hard asynchronous systems that are exceptionally expensive in terms of hardware, but people can move fast. Monoliths have all the nice properties, but scaling is people expensive and requires vertical growth.

This is where WebAssembly can enter the picture as you can re-create a monolith with WebAssembly containers which can be changed during normal operation. This is a “micro-monolith” which fits as many conceptual services within a single machine as possible such that you get the people benefits of micro-services with a monolithic runtime. This thinking mirrors mainframes where hardware can be replaced while the machine is running.

Developers already contend with asynchronous interfaces for services, so it is a productivity wash with respect to hands on keyboard coding when compared to microservice frameworks. The upsides comes from reduced operational cost and better responsiveness due to locality as compute can chase data and caches, and this has the potential to nullify the advantages of a traditional monolith.

The potential for removing the need to serialize requests, queue a write, let the network do its magic, read from network queue, and deserialize the request is exceptionally interesting. This reduces cpu work, decreases latency, increases reliability, and reduces heap and GC pressure. It's full of win especially when you think that a diverse fleet of stateless services feels exceptionally pointless and wasteful of network resources.

Paradoxically, it would seem these ideas are not new. We can check out the actor model or Erlang/Elixer/BEAM VM for spiritual alignment. It's always good when ideas are not new as it represents harmonization, and I feel my appreciation and education deepening within this field. I've come to believe that this mode of programming is superior (as many of the Erlang zealots would promote), but it has been held back by languages. WebAssembly feels like the way to escape that dogma, and the key is to produce the platform.

Having written about DFINITY with a technical lens, I'm realizing it may be a bigger deal in the broader sense. I now realize that decentralized compute will fundamentally transform the landscape for comnputing and manifest infrastructure as a public utility without corporate goverance. What will computing look like if both compute and storage are public utilities? What happens when racks can be installed on-premise and become assets that serve the broader community?

This is an exciting time to be alive, and the question then is what do I do? Stay tuned.

Wrapping my head around DFINITY's Internet Computer

I hope to ship a single simple game this year, so I will need some infrastructure to run the game. Alas, I'm at a crossroads.

One enjoyable path is that I could build the remaining infrastructure myself. While this would require starting a bunch of fun projects and then trying to collapse them into a single offering, this potentially creates an operational burden that I'm not willing to live with. Wisdom would require me to put my excitement aside of combining gossip failure detection with raft and handling replication myself.

Another path is to simply not worry about scale nor care about durability. The simple game to launch only lasts for five minutes, so my core failure scenario of concern is deployment. The only thing that I should do here is to use a database like MySQL on the host instead of my ad-hoc file log. Ideally, the stateful-yet-stateless process should be able to quickly recover the state on deployment. Scale is easy enough to achieve if you are willing to have a static mapping of games to hosts, and this currently is a reasonable assumption for a single game instance. This is not ideal for the future serverless cloud offering empire, but the key is to get a tactical product off the ground and evolve. As durability demands and scale demands go up, the key is to balance the operational cost with engineering cost.

So, it's clear that I should start the model as simple as possible. Starting simple is never perfect and takes too long, but it's how to navigate uncertain waters. This is where DFINITY's Internet Computer (IC) enters the picture as it is almost the perfect infrastructure. Today, I want to give a brief overview of what the Internet Computer (IC) is, and I want to compare and contrast it with Adama. For context, I'm basing my opinions on their website as I'm a buyer looking at brochures; some of this might be inaccurate.

Canister / Document#

At the core, the Internet Computer is a series of stateful compute containers called canisters. A canister is a stateful application running WebAssembly, so it is a foundational building block. The state within the canister is made durable via replication of a log of messages (i.e., the blockchain), and the state is the deterministic result of ingesting those messages. Canisters can be assembled together into a graph to build anything at a massive scale. However, the techniques to do so are not commonly understood since the canister relies on actor-centric programming.

This is very similar to an Adama document where each document is a stateful document fed via a message queue from users, but Adama doesn't address the graph element. What does it mean for two documents to talk to each other? There are three reasons that Adama doesn't do this (yet):

  • Allowing any kind of asynchronous behavior requires error handling as bumps in the night will increase dramatically. The moment you send a message, that message may succeed or fail outright, timeout, and may have side effects; a design goal of Adama is to eliminate as much failure handling as possible (no disk, no network).
  • I don't believe direct message exchange is compatible with reactivity. Instead of two documents sending messages as commands to each other, I'd rather have one document export a reactive view to another. This collapses the error handling to the data has or hasn't arrived. Adama's design for document linking is on the back burner since I currently lack a usecase within board games.
  • While humans are somewhat limited in their ability to generate messages, machines are not. It is entirely appropriate to have a queue from a device driven by a human and then leverage flow control to limit the human. Many operational issues can be traced to an unbound queue somewhere, and handling failures is non-trivial to reason about. This further informs the need to have a reactive shared view between documents since reactivity enables flow control.

From my perspective, the IC's canister concept is a foundational building block. Time and experience will build a lingo around the types of canisters. For instance, Adama fits within a "data-only canister" which only accepts messages and yields data. I'll talk more about possible types of canisters later on in the document.

Access Control / Privacy#

The beauty of the IC canister is that it can be a self-enclosed security container as well via the use of principals and strong crypto. There is no need for an external access control list. Each canister has the responsibility of authorizing what principals can see and do, and this is made possible since each canister is a combination of state and logic. The moment you separate state and logic, a metagame is required to protect that state with common access control logic or access rules.

This is what Adama does via the combination of @connected and privacy controls (without strong crypto). The Adama constructor allows the document to initialize state around a single principal (the creator) and anything within the constructor argument. The @connected event will enable principals to be authorized to interact with the document or be rejected. Each message sent requires the handler to validate the sender can send that message or not.

Since the IC canister is service orientated, there is nothing that different for updating state within the container. The big difference happens at the read side where Adama has all the state as readable to clients up to privacy rules, and privacy rules are attached to data. This means that Adama clients get only the data the document opens up to them, and they don't need to ask for it explicitly.

Since Adama retroactively forwards updates to connected clients, this is the most significant technical hurdle to cross with using IC. However, this is solvable in a variety of ways. It would be great to connect a socket up to a canister, and I don't see this as an impossible feat.

Cost#

The canister also has an elegant cost model around “cycles” that Adama has somewhat except Adama can bill for partial compute. The Adama language solved the halting problem preventing infinite cost! A finite budget is a key aspect for a language to consider to guarantee to halt. When you run out of budget, the program stops, and you are broke. The critical technical decision to ensure randomly stopping a program isn't a horrific idea is that state must be transactional, and you can go back in time. Given the canister's all-or-nothing billing model, it seems that it also has transactional state.

The IC canister talks abstractly about cycles, but I'm not sure how cycles relate to storage and memory costs. Perhaps, cycles are associated with paging memory in and off disk? This is where I'm not clear about how IC canister communicates itself to cost-conscious people. Furthermore, it's not clear how cost-competitive it is with centralized infrastructure.

Usage and Scale#

The IC canister is far more generic than Adama, but this is where we have to think about the roles of canisters as they evolve with people building products. With both Adama and IC's canister deeply inspired by actors, this is a shared problem about relating the ideas to people. There will be off-the-shelf canisters that can be configured since scale is part of every business's journey.

From a read side, the good news is that both Adama and canisters that only store data and respond to queries can scale to infinity and beyond. Since replication is a part of both stories, one primary replica cluster can handle the write and read replicas that can tail a stream of updates. The emergence of the "data primary actor" and "data replica" roles are born.

From a write side, data at scale must be sharded across a variety of "data primary actors" in a variety of ways, so even more roles are born.

First, you need "stateless router actor" which will route requests to the appropriate "data primary actor" either with stickiness or randomness (a random router could also be called a "stateless load balancer actor").

Second, with writes splayed across different machines, you will at some point require an "aggregator actor" which will aggregate/walk the shards building something like a report, index, or what-not.

Third, replication gives you the ability to reactively inform customers of changes, an "ephemeral data replica actor" or "fan out actor" would be a decent way for clients to listen to data changes to know when to poll or provide a delta. This is where Adama would put the privacy logic and reactivity layer. Given the infinite read scale of replicas, this also offers an infinite reactive scale.

Fourth, perhaps sharding doesn't take the heat off a "primary data actor", then a "fan in actor reducer" roll would be able to buffer writes and aggregate them into the main "data primary actor." The ability to fan in would enable view counters to accurately bump up with many viewers.

Fifth, beyond data, there is the problem of static assets and user-generated content. A "web server actor" makes sense for front-ends which they already have in the platform. I imagine IPFS would be the place for user-generated content, so long as the IC canister and IPFS got along.

There are more actors for sure, but the key thing to note is that building a new internet requires rethinking how to transform the old world ideas into the new world. Some ideas will die, but some may be foundational.

Operations#

As I haven't started to play with the DFINITY sdk yet, so I'm not sure about a few things. Adama was built around easy deployment, monitoring, and everything I know about operating services. This is where the IC container feels murky to me. For instance, how do I deploy and validate a new actor? How does the state up rev? Can the state down rev during a rollback? This is why Adama has all of its state represented by a giant JSON document because it is easy to understand how data can change both forward and backward.

Deploying with Adama should be a cakewalk, and I want Adama to replicate state changes from a running Adama document into a new shadow document. I can validate the lack of crashing, behavior changes, data changes, and manually auditing things without impacting users.

I'm interested to see how DFINITY addresses the engineering process side of their offering.

Concluding Thoughts & Interesting Next Steps#

Overall, I'm excited about the technology. I'm wary of the cost and the lack of communal know-how, but these are addressable over time. I also feel like there is a synergy between how Adama and the canister think about state. For instance, I choose JSON purely as a way to move faster. However, I hope to design a binary format with similar properties. Perhaps DFINITY will release a compatible memory model that is vastly more efficient?

Ultimately, the future in this space will require first adopters willing to slog through the issues and build the vocabulary about how to create products with this offering.

A clear next step for me in 2022/2023 is to figure out a WebAssembly strategy that would enable me to offload my infrastructure to this new cloud. It makes sense that I keep my infrastructure investments low and focus on killer products to make tactical progress towards a product that could fund deeper engineering. This translates towards a more casual approach towards durability and scale. For durability, I'll just use block storage and hope that my blocks do not go poof. As protection against catastrophe, I'll integrate object storage into the mix and move cold state off block storage into Amazon S3 or compatible storage tier. For availability, I'll avoid treating my servers like cattle and use a proper hand-off. For now, I just have to accept that machine failures will result in an availability loss.