Join-Accumulate Machine: A Mostly-Coherent Trustless Supercomputer
Draft 0.8.0
- June 3, 2026
Abstract.
We present a comprehensive and formal definition of Jam, a protocol combining elements of both Polkadot and Ethereum. In a single coherent model, Jam provides a global singleton permissionless object environment—much like the smart-contract environment pioneered by Ethereum—paired with secure sideband computation parallelized over a scalable node network, a proposition pioneered by Polkadot.
Jam introduces a decentralized hybrid system offering smart-contract functionality structured around a secure and scalable in-core/on-chain dualism. While the smart-contract functionality implies some similarities with Ethereum’s paradigm, the overall model of the service offered is driven largely by underlying architecture of Polkadot.
Jam is permissionless in nature, allowing anyone to deploy code as a service on it for a fee commensurate with the resources this code utilizes and to induce execution of this code through the procurement and allocation of core-time, a metric of resilient and ubiquitous computation, somewhat similar to the purchasing of gas in Ethereum. We already envision a Polkadot-compatible CoreChains service.
1. Introduction
1.1. Nomenclature
In this paper, we introduce a decentralized, crypto-economic protocol to which the Polkadot Network will transition itself in a major revision on the basis of approval by its governance apparatus.
An early, unrefined, version of this protocol was first proposed in Polkadot Fellowship rfc31, known as CoreJam. CoreJam takes its name after the collect/refine/join/accumulate model of computation at the heart of its service proposition. While the CoreJam rfc suggested an incomplete, scope-limited alteration to the Polkadot protocol, Jam refers to a complete and coherent overall blockchain protocol.
1.2. Driving Factors
Within the realm of blockchain and the wider Web3, we are driven by the need first and foremost to deliver resilience. A proper Web3 digital system should honor a declared service profile—and ideally meet even perceived expectations—regardless of the desires, wealth or power of any economic actors including individuals, organizations and, indeed, other Web3 systems. Inevitably this is aspirational, and we must be pragmatic over how perfectly this may really be delivered. Nonetheless, a Web3 system should aim to provide such radically strong guarantees that, for practical purposes, the system may be described as unstoppable.
While Bitcoin is, perhaps, the first example of such a system within the economic domain, it was not general purpose in terms of the nature of the service it offered. A rules-based service is only as useful as the generality of the rules which may be conceived and placed within it. Bitcoin’s rules allowed for an initial use-case, namely a fixed-issuance token, ownership of which is well-approximated and autonomously enforced through knowledge of a secret, as well as some further elaborations on this theme.
Later, Ethereum would provide a categorically more general-purpose rule set, one which was practically Turing complete.11 1 The gas mechanism did restrict what programs can execute on it by placing an upper bound on the number of steps which may be executed, but some restriction to avoid infinite-computation must surely be introduced in a permissionless setting. In the context of Web3 where we are aiming to deliver a massively multiuser application platform, generality is crucial, and thus we take this as a given.
Beyond resilience and generality, things get more interesting, and we must look a little deeper to understand what our driving factors are. For the present purposes, we identify three additional goals:
-
(1)
Resilience: highly resistant from being stopped, corrupted and censored.
-
(2)
Generality: able to perform Turing-complete computation.
-
(3)
Performance: able to perform computation quickly and at low cost.
-
(4)
Coherency: the causal relationship possible between different elements of state and thus how well individual applications may be composed.
-
(5)
Accessibility: negligible barriers to innovation; easy, fast, cheap and permissionless.
As a declared Web3 technology, we make an implicit assumption of the first two items. Interestingly, items 3 and 4 are antagonistic according to an information theoretic principle which we are sure must already exist in some form but are nonetheless unaware of a name for it. For argument’s sake we shall name it size-coherency antagonism.
1.3. Scaling under Size-Coherency Antagonism
Size-coherency antagonism is a simple principle implying that as the state-space of information systems grow, then the system necessarily becomes less coherent. It is a direct implication of principle that causality is limited by speed. The maximum speed allowed by physics is the speed of light in a vacuum, however other information systems may have lower bounds: In biological system this is largely determined by various chemical processes whereas in electronic systems is it determined by the speed of electrons in various substances. Distributed software systems will tend to have much lower bounds still, being dependent on a substrate of software, hardware and packet-switched networks of varying reliability.
The argument goes:
-
(1)
The more state a system utilizes for its data-processing, the greater the amount of space this state must occupy.
-
(2)
The more space used, then the greater the mean and variance of distances between state-components.
-
(3)
As the mean and variance increase, then time for causal resolution (i.e. all correct implications of an event to be felt) becomes divergent across the system, causing incoherence.
Setting the question of overall security aside for a moment, we can manage incoherence by fragmenting the system into causally-independent subsystems, each of which is small enough to be coherent. In a resource-rich environment, a bacterium may split into two rather than growing to double its size. This pattern is rather a crude means of dealing with incoherency under growth: intra-system processing has low size and total coherence, inter-system processing supports higher overall sizes but without coherence. It is the principle behind meta-networks such as Polkadot, Cosmos and the predominant vision of a scaled Ethereum (all to be discussed in depth shortly). Such systems typically rely on asynchronous and simplistic communication with “settlement areas” which provide a small-scoped coherent state-space to manage specific interactions such as a token transfer.
The present work explores a middle-ground in the antagonism, avoiding the persistent fragmentation of state-space of the system as with existing approaches. We do this by introducing a new model of computation which pipelines a highly scalable, mostly coherent element to a synchronous, fully coherent element. Asynchrony is not avoided, but we bound it to the length of the pipeline and substitute the crude partitioning we see in scalable systems so far with a form of “cache affinity” as it typically seen in multi-cpu systems with a shared ram.
Unlike with snark-based L2-blockchain techniques for scaling, this model draws upon crypto-economic mechanisms and inherits their low-cost and high-performance profiles and averts a bias toward centralization.
1.4. Document Structure
We begin with a brief overview of present scaling approaches in blockchain technology in section 2. In section 3 we define and clarify the notation from which we will draw for our formalisms.
We follow with a broad overview of the protocol in section 4 outlining the major areas including the Polkadot Virtual Machine (pvm), the consensus protocols Safrole and Grandpa, the common clock and build the foundations of the formalism.
We then continue with the full protocol definition split into two parts: firstly the correct on-chain state-transition formula helpful for all nodes wishing to validate the chain state, and secondly, in sections 14 and 19 the honest strategy for the off-chain actions of any actors who wield a validator key.
The main body ends with a discussion over the performance characteristics of the protocol in section 20 and finally conclude in section 21.
The appendix contains various additional material important for the protocol definition including the pvm in appendices A & B, serialization and Merklization in appendices C & D and cryptography in appendices E, G & H. We finish with an index of terms which includes the values of all simple constant terms used in the work in appendix I, and close with the bibliography.
2. Previous Work and Present Trends
In the years since the initial publication of the Ethereum YP, the field of blockchain development has grown immensely. Other than scalability, development has been done around underlying consensus algorithms, smart-contract languages and machines and overall state environments. While interesting, these latter subjects are mostly out scope of the present work since they generally do not impact underlying scalability.
2.1. Polkadot
In order to deliver its service, Jam co-opts much of the same game-theoretic and cryptographic machinery as Polkadot known as Elves and described by [4]. However, major differences exist in the actual service offered with Jam, providing an abstraction much closer to the actual computation model generated by the validator nodes its economy incentivizes.
It was a major point of the original Polkadot proposal, a scalable heterogeneous multichain, to deliver high-performance through partition and distribution of the workload over multiple host machines. In doing so it took an explicit position that composability would be lowered. Polkadot’s constituent components, parachains are, practically speaking, highly isolated in their nature. Though a message passing system (xcmp) exists it is asynchronous, coarse-grained and practically limited by its reliance on a high-level slowly evolving interaction language xcm.
As such, the composability offered by Polkadot between its constituent chains is lower than that of Ethereum-like smart-contract systems offering a single and universal object environment and allowing for the kind of agile and innovative integration which underpins their success. Polkadot, as it stands, is a collection of independent ecosystems with only limited opportunity for collaboration, very similar in ergonomics to bridged blockchains though with a categorically different security profile. A technical proposal known as spree would utilize Polkadot’s unique shared-security and improve composability, though blockchains would still remain isolated.
Implementing and launching a blockchain is hard, time-consuming and costly. By its original design, Polkadot limits the clients able to utilize its service to those who are both able to do this and raise a sufficient deposit to win an auction for a long-term slot, one of around 50 at the present time. While not permissioned per se, accessibility is categorically and substantially lower than for smart-contract systems similar to Ethereum.
Enabling as many innovators to participate and interact, both with each other and each other’s user-base, appears to be an important component of success for a Web3 application platform. Accessibility is therefore crucial.
2.2. Ethereum
The Ethereum protocol was formally defined in this paper’s spiritual predecessor, the Yellow Paper, by [36]. This was derived in large part from the initial concept paper by [8]. In the decade since the YP was published, the de facto Ethereum protocol and public network instance have gone through a number of evolutions, primarily structured around introducing flexibility via the transaction format and the instruction set and “precompiles” (niche, sophisticated bonus instructions) of its scripting core, the Ethereum virtual machine (evm).
Almost one million crypto-economic actors take part in the validation for Ethereum.22 2 Practical matters do limit the level of real decentralization. Validator software expressly provides functionality to allow a single instance to be configured with multiple key sets, systematically facilitating a much lower level of actual decentralization than the apparent number of actors, both in terms of individual operators and hardware. Using data collated by [10] on Ethereum 2, one can see one major node operator, Lido, has steadily accounted for almost one-third of the almost one million crypto-economic participants. Block extension is done through a randomized leader-rotation method where the physical address of the leader is public in advance of their block production.33 3 Ethereum’s developers hope to change this to something more secure, but no timeline is fixed. Ethereum uses Casper-ffg introduced by [7] to determine finality, which with the large validator base finalizes the chain extension around every 13 minutes.
Ethereum’s direct computational performance remains broadly similar to that with which it launched in 2015, with a notable exception that an additional service now allows 1mb of commitment data to be hosted per block (all nodes to store it for a limited period). The data cannot be directly utilized by the main state-transition function, but special functions provide proof that the data (or some subsection thereof) is available. According to [12], the present design direction is to improve on this over the coming years by splitting responsibility for its storage amongst the validator base in a protocol known as Dank-sharding.
According to [11], the scaling strategy of Ethereum would be to couple this data availability with a private market of roll-ups, sideband computation facilities of various design, with zk-snark-based roll-ups being a stated preference. Each vendor’s roll-up design, execution and operation comes with its own implications.
One might reasonably assume that a diversified market-based approach for scaling via multivendor roll-ups will allow well-designed solutions to thrive. However, there are potential issues facing the strategy. A research report by [29] on the level of decentralization in the various roll-ups found a broad pattern of centralization, but notes that work is underway to attempt to mitigate this. It remains to be seen how decentralized they can yet be made.
Heterogeneous communication properties (such as datagram latency and semantic range), security properties (such as the costs for reversion, corruption, stalling and censorship) and economic properties (the cost of accepting and processing some incoming message or transaction) may differ, potentially quite dramatically, between major areas of some grand patchwork of roll-ups by various competing vendors. While the overall Ethereum network may eventually provide some or even most of the underlying machinery needed to do the sideband computation it is far from clear that there would be a “grand consolidation” of the various properties should such a thing happen. We have not found any good discussion of the negative ramifications of such a fragmented approach.44 4 Some initial thoughts on the matter resulted in a proposal by [28] to utilize Polkadot technology as a means of helping create a modicum of compatibility between roll-up ecosystems!
2.2.1. Snark Roll-ups
While the protocol’s foundation makes no great presuppositions on the nature of roll-ups, Ethereum’s strategy for sideband computation does centre around snark-based rollups and as such the protocol is being evolved into a design that makes sense for this. Snarks are the product of an area of exotic cryptography which allow proofs to be constructed to demonstrate to a neutral observer that the purported result of performing some predefined computation is correct. The complexity of the verification of these proofs tends to be sub-linear in their size of computation to be proven and will not give away any of the internals of said computation, nor any dependent witness data on which it may rely.
Zk-snarks come with constraints. There is a trade-off between the proof’s size, verification complexity and the computational complexity of generating it. Non-trivial computation, and especially the sort of general-purpose computation laden with binary manipulation which makes smart-contracts so appealing, is hard to fit into the model of snarks.
To give a practical example, risc-zero (as assessed by [2]) is a leading project and provides a platform for producing snarks of computation done by a risc-v virtual machine, an open-source and succinct risc machine architecture well-supported by tooling. A recent benchmarking report by [26] showed that compared to risc-zero’s own benchmark, proof generation alone takes over 61,000 times as long as simply recompiling and executing even when executing on 32 times as many cores, using 20,000 times as much ram and an additional state-of-the-art gpu. According to hardware rental agents https://cloud-gpus.com/, the cost multiplier of proving using risc-zero is 66,000,000x of the cost55 5 In all likelihood actually substantially more as this was using low-tier “spare” hardware in consumer units, and our recompiler was unoptimized. to execute using the Polkavm recompiler.
Many cryptographic primitives become too expensive to be practical to use and specialized algorithms and structures must be substituted. Often times they are otherwise suboptimal. In expectation of the use of snarks (such as plonk as proposed by [14]), the prevailing design of the Ethereum project’s Dank-sharding availability system uses a form of erasure coding centered around polynomial commitments over a large prime field in order to allow snarks to get acceptably performant access to subsections of data. Compared to alternatives, such as a binary field and Merklization in the present work, it leads to a load on the validator nodes orders of magnitude higher in terms of cpu usage.
In addition to their basic cost, snarks present no great escape from decentralization and the need for redundancy, leading to further cost multiples. While the need for some benefits of staked decentralization is averted through their verifiable nature, the need to incentivize multiple parties to do much the same work is a requirement to ensure that a single party not form a monopoly (or several not form a cartel). Proving an incorrect state-transition should be impossible, however service integrity may be compromised in other ways; a temporary suspension of proof-generation, even if only for minutes, could amount to major economic ramifications for real-time financial applications.
Real-world examples exist of the pit of centralization giving rise to monopolies. One would be the aforementioned snark-based exchange framework; while notionally serving decentralized exchanges, it is in fact centralized with Starkware itself wielding a monopoly over enacting trades through the generation and submission of proofs, leading to a single point of failure—should Starkware’s service become compromised, then the liveness of the system would suffer.
It has yet to be demonstrated that snark-based strategies for eliminating the trust from computation will ever be able to compete on a cost-basis with a multi-party crypto-economic platform. All as-yet proposed snark-based solutions are heavily reliant on crypto-economic systems to frame them and work around their issues. Data availability and sequencing are two areas well understood as requiring a crypto-economic solution.
We would note that snark technology is improving and the cryptographers and engineers behind them do expect improvements in the coming years. In a recent article by [34] we see some credible speculation that with some recent advancements in cryptographic techniques, slowdowns for proof generation could be as little as 50,000x from regular native execution and much of this could be parallelized. This is substantially better than the present situation, but still several orders of magnitude greater than would be required to compete on a cost-basis with established crypto-economic techniques such as Elves.
2.3. Fragmented Meta-Networks
Directions for general-purpose computation scalability taken by other projects broadly centre around one of two approaches; either what might be termed a fragmentation approach or alternatively a centralization approach. We argue that neither approach offers a compelling solution.
The fragmentation approach is heralded by projects such as Cosmos (proposed by [22]) and Avalanche (by [33]). It involves a system fragmented by networks of a homogenous consensus mechanic, yet staffed by separately motivated sets of validators. This is in contrast to Polkadot’s single validator set and Ethereum’s declared strategy of heterogeneous roll-ups secured partially by the same validator set operating under a coherent incentive framework. The homogeneity of said fragmentation approach allows for reasonably consistent messaging mechanics, helping to present a fairly unified interface to the multitude of connected networks.
However, the apparent consistency is superficial. The networks are trustless only by assuming correct operation of their validators, who operate under a crypto-economic security framework ultimately conjured and enforced by economic incentives and punishments. To do twice as much work with the same levels of security and no special coordination between validator sets, then such systems essentially prescribe forming a new network with the same overall levels of incentivization.
Several problems arise. Firstly, there is a similar downside as with Polkadot’s isolated parachains and Ethereum’s isolated roll-up chains: a lack of coherency due to a persistently sharded state preventing synchronous composability.
More problematically, the scaling-by-fragmentation approach, proposed specifically by Cosmos, provides no homogenous security—and therefore trustlessness—guarantees. Validator sets between networks must be assumed to be independently selected and incentivized with no relationship, causal or probabilistic, between the Byzantine actions of a party on one network and potential for appropriate repercussions on another. Essentially, this means that should validators conspire to corrupt or revert the state of one network, the effects may be felt across other networks of the ecosystem.
That this is an issue is broadly accepted, and projects propose for it to be addressed in one of two ways. Firstly, to fix the expected cost-of-attack (and thus level of security) across networks by drawing from the same validator set. The massively redundant way of doing this, as proposed by [9] under the name replicated security, would be to require each validator to validate on all networks and for the same incentives and punishments. This is economically inefficient in the cost of security provision as each network would need to independently provide the same level of incentives and punishment-requirements as the most secure with which it wanted to interoperate. This is to ensure the economic proposition remain unchanged for validators and the security proposition remained equivalent for all networks. At the present time, replicated security is not a readily available permissionless service. We might speculate that these punishing economics have something to do with it.
The more efficient approach, proposed by the OmniLedger team, [21], would be to make the validators non-redundant, partitioning them between different networks and periodically, securely and randomly repartitioning them. A reduction in the cost to attack over having them all validate on a single network is implied since there is a chance of having a single network accidentally have a compromising number of malicious validators even with less than this proportion overall. This aside it presents an effective means of scaling under a basis of weak-coherency.
Alternatively, as in Elves by [4], we may utilize non-redundant partitioning, combine this with a proposal-and-auditing game which validators play to weed out and punish invalid computations, and then require that the finality of one network be contingent on all causally-entangled networks. This is the most secure and economically efficient solution of the three, since there is a mechanism for being highly confident that invalid transitions will be recognized and corrected before their effect is finalized across the ecosystem of networks. However, it requires substantially more sophisticated logic and their causal-entanglement implies some upper limit on the number of networks which may be added.
2.4. High-Performance Fully Synchronous Networks
Another trend in the recent years of blockchain development has been to make “tactical” optimizations over data throughput by limiting the validator set size or diversity, focusing on software optimizations, requiring a higher degree of coherency between validators, onerous requirements on the hardware which validators must have, or limiting data availability.
The Solana blockchain is underpinned by technology introduced by [37] and boasts theoretical figures of over 700,000 transactions per second, though according to [25] the network is only seen processing a small fraction of this. The underlying throughput is still substantially more than most blockchain networks and is owed to various engineering optimizations in favor of maximizing synchronous performance. The result is a highly-coherent smart-contract environment with an api not unlike that of YP Ethereum (albeit using a different underlying vm), but with a near-instant time to inclusion and finality which is taken to be immediate upon inclusion.
Two issues arise with such an approach: firstly, defining the protocol as the outcome of a heavily optimized codebase creates structural centralization and can undermine resilience. [19] writes “since January 2022, 11 significant outages gave rise to 15 days in which major or partial outages were experienced”. This is an outlier within the major blockchains as the vast majority of major chains have no downtime. There are various causes to this downtime, but they are generally due to bugs found in various subsystems.
Ethereum, at least until recently, provided the most contrasting alternative with its well-reviewed specification, clear research over its crypto-economic foundations and multiple clean-room implementations. It is perhaps no surprise that the network very notably continued largely unabated when a flaw in its most deployed implementation was found and maliciously exploited, as described by [16].
The second issue is concerning ultimate scalability of the protocol when it provides no means of distributing workload beyond the hardware of a single machine.
In major usage, both historical transaction data and state would grow impractically. Solana illustrates how much of a problem this can be. Unlike classical blockchains, the Solana protocol offers no solution for the archival and subsequent review of historical data, crucial if the present state is to be proven correct from first principle by a third party. There is little information on how Solana manages this in the literature, but according to [30], nodes simply place the data onto a centralized database hosted by Google.66 6 Earlier node versions utilized Arweave network, a decentralized data store, but this was found to be unreliable for the data throughput which Solana required.
Solana validators are encouraged to install large amounts of ram to help hold its large state in memory (512 gb is the current recommendation according to [31]). Without a divide-and-conquer approach, Solana shows that the level of hardware which validators can reasonably be expected to provide dictates the upper limit on the performance of a totally synchronous, coherent execution model. Hardware requirements represent barriers to entry for the validator set and cannot grow without sacrificing decentralization and, ultimately, transparency.
3. Notational Conventions
Much as in the Ethereum Yellow Paper, a number of notational conventions are used throughout the present work. We define them here for clarity. The Ethereum Yellow Paper itself may be referred to henceforth as the YP.
3.1. Typography
We use a number of different typefaces to denote different kinds of terms. Where a term is used to refer to a value only relevant within some localized section of the document, we use a lower-case roman letter e.g. , (typically used for an item of a set or sequence) or e.g. , (typically used for numerical indices). Where we refer to a Boolean term or a function in a local context, we tend to use a capitalized roman alphabet letter such as , . If particular emphasis is needed on the fact a term is sophisticated or multidimensional, then we may use a bold typeface, especially in the case of sequences and sets.
For items which retain their definition throughout the present work, we use other typographic conventions. Sets are usually referred to with a blackboard typeface, e.g. refers to all natural numbers including zero. Sets which may be parameterized may be subscripted or be followed by parenthesized arguments. Imported functions, used by the present work but not specifically introduced by it, are written in calligraphic typeface, e.g. the Blake2 cryptographic hashing function. For other non-context dependent functions introduced in the present work, we use upper case Greek letters, e.g. denotes the state transition function.
Values which are not fixed but nonetheless hold some consistent meaning throughout the present work are denoted with lower case Greek letters such as , the state identifier. These may be placed in bold typeface to denote that they refer to an abnormally complex value.
3.2. Functions and Operators
We define the precedes relation to indicate that one term is defined in terms of another. E.g. indicates that may be defined purely in terms of :
| (3.1) |
The substitute-if-nothing function is equivalent to the first argument which is not , or if no such argument exists:
| (3.2) |
Thus, e.g. and .
3.3. Sets
Given some set , its power set and cardinality are denoted as and . When forming a power set, we may use a numeric subscript in order to restrict the resultant expansion to a particular cardinality. E.g. .
Sets may be operated on with scalars, in which case the result is a set with the operation applied to each element, e.g. . Functions may also be applied to all members of a set to yield a new set, but for clarity we denote this with a superscript, e.g. .
We denote set-disjointness with the relation . Formally:
We commonly use to indicate that some term is validly left without a specific value. Its cardinality is defined as zero. We define the operation such that indicating the same set but with the addition of the element.
The term is utilized to indicate the unexpected failure of an operation or that a value is invalid or unexpected. (We try to avoid the use of the more conventional here to avoid confusion with Boolean false, which may be interpreted as some successful result in some contexts.)
3.4. Numbers
denotes the set of naturals including zero whereas implies a restriction on that set to values less than . Formally, and .
denotes the set of integers. We denote to be the set of integers within the interval . Formally, . E.g. . We denote the offset/length form of this set as , a short form of .
It can sometimes be useful to represent lengths of sequences and yet limit their size, especially when dealing with sequences of octets which must be stored practically. Typically, these lengths can be defined as the set . To improve clarity, we denote as the set of lengths of octet sequences and is equivalent to .
We denote the % operator as the modulo operator, e.g. . Furthermore, we may occasionally express a division result as a quotient and remainder with the separator R , e.g. .
3.5. Dictionaries
A dictionary is a possibly partial mapping from some domain into some co-domain in much the same manner as a regular function. Unlike functions however, with dictionaries the total set of pairings are necessarily enumerable, and we represent them in some data structure as the set of all pairs. (In such data-defined mappings, it is common to name the values within the domain a key and the values within the co-domain a value, hence the naming.)
Thus, we define the formalism to denote a dictionary which maps from the domain to the range . It is a subset of the power set of pairs :
| (3.3) |
The subset is caused by a constraint that a dictionary’s members must associate at most one unique value for any given key :
| (3.4) |
In the context of a dictionary we denote the pairs with a mapping notation:
| (3.5) | |||
| (3.6) |
This assertion allows us to unambiguously define the subscript and subtraction operator for a dictionary :
| (3.7) | |||
| (3.8) |
Note that when using a subscript, it is an implicit assertion that the key exists in the dictionary. Should the key not exist, the result is undefined and any block which relies on it must be considered invalid.
To denote the active domain (i.e. set of keys) of a dictionary , we use and for the range (i.e. set of values), . Formally:
| (3.9) | ||||
| (3.10) |
Note that since the co-domain of is a set, should different keys with equal values appear in the dictionary, the set will only contain one such value.
Dictionaries may be combined through the union operator , which priorities the right-side operand in the case of a key-collision:
| (3.11) |
3.6. Tuples
Tuples are groups of values where each item may belong to a different set. They are denoted with parentheses, e.g. the tuple of the naturals and is denoted , and it exists in the set of natural pairs sometimes denoted , but denoted in the present work as .
We have frequent need to refer to a specific item within a tuple value and as such find it convenient to declare a name for each item. E.g. we may denote a tuple with two named natural components and as . We would denote an item through subscripting its name, thus for some , and .
3.7. Sequences
A sequence is a series of elements with particular ordering not dependent on their values. The set of sequences of elements all of which are drawn from some set is denoted , and it defines a partial mapping . The set of sequences containing exactly elements each a member of the set may be denoted and accordingly defines a complete mapping . Similarly, sets of sequences of at most elements and at least elements may be denoted and respectively. Finally, the set of sequences with length in set may be denoted .
Sequences are subscriptable, thus a specific item at index within a sequence may be denoted , or where unambiguous, . A range may be denoted using an ellipsis for example: and . The length of such a sequence may be denoted .
We denote modulo subscription as . We denote the final element of a sequence through the function .
3.7.1. Construction
We may wish to define a sequence in terms of incremental subscripts of other values: denotes a sequence of values beginning continuing up to . Furthermore, we may also wish to define a sequence as elements each of which are a function of their index ; in this case we denote . Thus, when the ordering of elements matters we use rather than the unordered notation . The latter may also be written in short form . This applies to any set which has an unambiguous ordering, particularly sequences, thus . Multiple sequences may be combined, thus .
As with sets, we use explicit notation to denote a function mapping over all items of a sequence.
Sequences may be constructed from sets or other sequences whose order should be ignored through sequence ordering notation , which is defined to result in the set or sequence of its argument except that all elements are placed in ascending order of the corresponding value .
The key component may be elided in which case it is assumed to be ordered by the elements directly; i.e. . does the same, but excludes any duplicate values of . E.g. assuming , then and .
Sets may be constructed from sequences with the regular set construction syntax, e.g. assuming , then would be equivalent to .
Sequences of values which themselves have a defined ordering have an implied ordering akin to a regular dictionary, thus and .
3.7.2. Editing
We define the sequence concatenation operator such that . For sequences of sequences, we define a unary concatenate-all operator: . Further, we denote element concatenation as . We denote the sequence made up of the first elements of sequence to be , and only the final elements as .
We define as the transposition of the sequence-of-sequences , fully defined in equation H.4. We may also apply this to sequences-of-tuples to yield a tuple of sequences.
We denote sequence subtraction with a slight modification of the set subtraction operator; specifically, some sequence excepting the left-most element equal to would be denoted .
3.7.3. Boolean values
denotes the set of Boolean strings of length , thus . When dealing with Boolean values we may assume an implicit equivalence mapping to a bit whereby and , thus . We use the function to denote the sequence of bits, ordered with the most significant first, which represent the octet sequence , thus .
The unary-not operator applies to both boolean values and sequences of boolean values, thus and .
3.7.4. Octets and Blobs
denotes the set of octet strings (“blobs”) of arbitrary length. As might be expected, denotes the set of such sequences of length . denotes the subset of which are ascii-encoded strings. Note that while an octet has an implicit and obvious bijective relationship with natural numbers less than 256, and we may implicitly coerce between octet form and natural number form, we do not treat them as exactly equivalent entities. In particular for the purpose of serialization, an octet is always serialized to itself, whereas a natural number may be serialized as a sequence of potentially several octets, depending on its magnitude and the encoding variant.
3.7.5. Shuffling
We define the sequence-shuffle function , originally introduced by [13], with an efficient in-place algorithm described by [35]. This accepts a sequence and some entropy and returns a sequence of the same length with the same elements but in an order determined by the entropy. The entropy may be provided as either an indefinite sequence of naturals or a hash. For a full definition see appendix F.
3.8. Cryptography
3.8.1. Hashing
denotes the set of 256-bit values equivalent to . All hash functions in the present work output to this type and is the value equal to . We assume a function denoting the Blake2b 256-bit hash introduced by [27] and a function denoting the Keccak 256-bit hash as proposed by [1] and utilized by [36].
The inputs of a hash function should be expected to be passed through our serialization codec to yield an octet sequence to which the cryptography may be applied. (Note that an octet sequence conveniently yields an identity transform.) We may wish to interpret a sequence of octets as some other kind of value with the assumed decoder function . In both cases, we may subscript the transformation function with the number of octets we expect the octet sequence term to have. Thus, would assert and , whereas would assert and .
3.8.2. Signing Schemes
is the set of valid Ed25519 signatures, defined by [20], made through knowledge of a secret key whose public key counterpart is and whose message is . To aid readability, we denote the set of valid public keys .
We denote the set of valid Bandersnatch public keys as , defined in appendix G. is the set of valid singly-contextualized vrf signatures utilizing the secret counterpart to the public key , some context and message .
, meanwhile, is the set of valid Bandersnatch Ringvrf deterministic singly-contextualized proofs of knowledge of a secret within some sequence of public keys identified by a root in the set of valid roots . We denote to be the root specific to the sequence of public key counterparts . A root implies a specific sequence of Bandersnatch key pairs, knowledge of one of the secrets would imply being capable of making a unique, valid—and anonymous—proof of knowledge of a unique secret within the sequence.
Both the Bandersnatch signature and Ringvrf proof strictly imply that a member utilized their secret key in combination with both the context and the message ; the difference is that the member is identified in the former and is anonymous in the latter. Furthermore, both define a vrf output, a high entropy hash influenced by but not by , formally denoted and .
We use to denote the set of public keys for the bls signature scheme, described by [3], on curve bls12-381 defined by [17]. We correspondingly use the notation to denote the set of valid bls signatures for public key and message .
We define the signature functions for creating valid signatures; , . We assert that the ability to compute a result for this function relies on knowledge of a secret key.
4. Overview
As in the Yellow Paper, we begin our formalisms by recalling that a blockchain may be defined as a pairing of some initial state together with a block-level state-transition function. The latter defines the posterior state given a pairing of some prior state and a block of data applied to it. Formally, we say:
| (4.1) |
Where is the prior state, is the posterior state, is some valid block and is our block-level state-transition function.
Broadly speaking, Jam (and indeed blockchains in general) may be defined simply by specifying and some genesis state .77 7 Practically speaking, blockchains sometimes make assumptions of some fraction of participants whose behavior is simply honest, and not provably incorrect nor otherwise economically disincentivized. While the assumption may be reasonable, it must nevertheless be stated apart from the rules of state-transition. We also make several additional assumptions of agreed knowledge: a universally known clock, and the practical means of sharing data with other systems operating under the same consensus rules. The latter two were both assumptions silently made in the YP.
4.1. The Block
To aid comprehension and definition of our protocol, we partition as many of our terms as possible into their functional components. We begin with the block which may be restated as the header and some input data external to the system and thus said to be extrinsic, :
| (4.2) | ||||
| (4.3) |
The header is a collection of metadata primarily concerned with cryptographic references to the blockchain ancestors and the operands and result of the present transition. As an immutable known a priori, it is assumed to be available throughout the functional components of block transition. The extrinsic data is split into its several portions:
- tickets:
-
Tickets, used for the mechanism which manages the selection of validators for the permissioning of block authoring. This component is denoted .
- preimages:
-
Static data which is presently being requested to be available for workloads to be able to fetch on demand. This is denoted .
- reports:
-
Reports of newly completed workloads whose accuracy is guaranteed by specific validators. This is denoted .
- availability:
-
Assurances by each validator concerning which of the input data of workloads they have correctly received and are storing locally. This is denoted .
- disputes:
-
Information relating to disputes between validators over the validity of reports. This is denoted .
4.2. The State
Our state may be logically partitioned into several largely independent segments which can both help avoid visual clutter within our protocol description and provide formality over elements of computation which may be simultaneously calculated (i.e. parallelized). We therefore pronounce an equivalence between (some complete state) and a tuple of partitioned segments of that state:
| (4.4) |
In summary, is the portion of state dealing with services, analogous in Jam to the Yellow Paper’s (smart contract) accounts, the only state of the YP’s Ethereum. The identities of services which hold some privileged status are tracked in .
Validators, who are the set of economic actors uniquely privileged to help build and maintain the Jam chain, are identified within , archived in and enqueued from . All other state concerning the determination of these keys is held within . Note this is a departure from the YP proof-of-work definitions which were mostly stateless, and this set was not enumerated but rather limited to those with sufficient compute power to find a partial hash-collision in the sha2-256 cryptographic hash function. An on-chain entropy pool is retained in .
Our state also tracks two aspects of each core: , the authorization requirement which work done on that core must satisfy at the time of being reported on-chain, together with the queue which fills this, ; and , each of the cores’ currently assigned work-report guarantees, the availability of whose work-package must yet be assured by a super-majority of validators.
Finally, details of the most recent blocks and timeslot index are tracked in and respectively, work-reports which are ready to be accumulated and work-packages which were recently accumulated are tracked in and respectively and, judgments are tracked in and validator statistics are tracked in .
4.2.1. State Transition Dependency Graph
Much as in the YP, we specify as the implication of formulating all items of posterior state in terms of the prior state and block. To aid the architecting of implementations which parallelize this computation, we minimize the depth of the dependency graph where possible. The overall dependency graph is specified here:
| (4.5) | ||||
| (4.6) | ||||
| (4.7) | ||||
| (4.8) | ||||
| (4.9) | ||||
| (4.10) | ||||
| (4.11) | ||||
| (4.12) | ||||
| (4.13) | ||||
| (4.14) | ||||
| (4.15) | ||||
| (4.16) | ||||
| (4.17) | ||||
| (4.18) | ||||
| (4.19) | ||||
| (4.20) |
The only synchronous entanglements are visible through the intermediate components superscripted with a dagger and defined in equations 4.6, 4.12, 4.13, 4.14, 4.16, 4.17 and 4.18. The latter two mark a merge and join in the dependency graph and, concretely, imply that the availability extrinsic may be fully processed and accumulation of work happen before the preimage lookup extrinsic is folded into state.
4.3. Which History?
A blockchain is a sequence of blocks, each cryptographically referencing some prior block by including a hash of its header, all the way back to some first block which references the genesis header. We already presume consensus over this genesis header and the state it represents already defined as .
By defining a deterministic function for deriving a single posterior state for any (valid) combination of prior state and block, we are able to define a unique canonical state for any given block. We generally call the block with the most ancestors the head and its state the head state.
It is generally possible for two blocks to be valid and yet reference the same prior block in what is known as a fork. This implies the possibility of two different heads, each with their own state. While we know of no way to strictly preclude this possibility, for the system to be useful we must nonetheless attempt to minimize it. We therefore strive to ensure that:
-
(1)
It be generally unlikely for two heads to form.
-
(2)
When two heads do form they be quickly resolved into a single head.
-
(3)
It be possible to identify a block not much older than the head which we can be extremely confident will form part of the blockchain’s history in perpetuity. When a block becomes identified as such we call it finalized and this property naturally extends to all of its ancestor blocks.
These goals are achieved through a combination of two consensus mechanisms: Safrole, which governs the (not-necessarily forkless) extension of the blockchain; and Grandpa, which governs the finalization of some extension into canonical history. Thus, the former delivers point 1, the latter delivers point 3 and both are important for delivering point 2. We describe these portions of the protocol in detail in sections 6 and 19 respectively.
While Safrole limits forks to a large extent (through cryptography, economics and common-time, below), there may be times when we wish to intentionally fork since we have come to know that a particular chain extension must be reverted. In regular operation this should never happen, however we cannot discount the possibility of malicious or malfunctioning nodes. We therefore define such an extension as any which contains a block in which data is reported which any other block’s state has tagged as invalid (see section 10 on how this is done). We further require that Grandpa not finalize any extension which contains such a block. See section 19 for more information here.
4.4. Time
We presume a pre-existing consensus over time specifically for block production and import. While this was not an assumption of Polkadot, pragmatic and resilient solutions exist including the ntp protocol and network. We utilize this assumption in only one way: we require that blocks be considered temporarily invalid if their timeslot is in the future. This is specified in detail in section 6.
Formally, we define the time in terms of seconds passed since the beginning of the Jam Common Era, 1200 utc on January 1, 2025.88 8 1,735,732,800 seconds after the Unix Epoch. Midday utc is selected to ensure that all major timezones are on the same date at any exact 24-hour multiple from the beginning of the common era. Formally, this value is denoted .
4.5. Best block
Given the recognition of a number of valid blocks, it is necessary to determine which should be treated as the “best” block, by which we mean the most recent block we believe will ultimately be within of all future Jam chains. The simplest and least risky means of doing this would be to inspect the Grandpa finality mechanism which is able to provide a block for which there is a very high degree of confidence it will remain an ancestor to any future chain head.
However, in reducing the risk of the resulting block ultimately not being within the canonical chain, Grandpa will typically return a block some small period older than the most recently authored block. (Existing deployments suggest around 1-2 blocks in the past under regular operation.) There are often circumstances when we may wish to have less latency at the risk of the returned block not ultimately forming a part of the future canonical chain. E.g. we may be in a position of being able to author a block, and we need to decide what its parent should be. Alternatively, we may care to speculate about the most recent state for the purpose of providing information to a downstream application reliant on the state of Jam.
In these cases, we define the best block as the head of the best chain, itself defined in section 19.
4.6. Economics
The present work describes a crypto-economic system, i.e. one combining elements of both cryptography and economics and game theory to deliver a self-sovereign digital service. In order to codify and manipulate economic incentives we define a token which is native to the system, which we will simply call tokens in the present work.
A value of tokens is generally referred to as a balance, and such a value is said to be a member of the set of balances, , which is exactly equivalent to the set of naturals less than (i.e. 64-bit unsigned integers in coding parlance). Formally:
| (4.21) |
Though unimportant for the present work, we presume that there be a standard named denomination for tokens. This is different to both Ethereum (which uses a denomination of ), Polkadot (which uses a denomination of ) and Polkadot’s experimental cousin Kusama (which uses ).
The fact that balances are constrained to being less than implies that there may never be more than around tokens (each divisible into portions of ) within Jam. We would expect that the total number of tokens ever issued will be a substantially smaller amount than this.
We further presume that a number of constant prices stated in terms of tokens are known. However we leave the specific values to be determined in following work:
- :
-
the additional minimum balance implied for a single item within a mapping.
- :
-
the additional minimum balance implied for a single octet of data within a mapping.
- :
-
the minimum balance implied for a service.
4.7. The Virtual Machine and Gas
In the present work, we presume the definition of a Polkadot Virtual Machine (pvm). This virtual machine is based around the risc-v instruction set architecture, specifically the rv64em variant, and is the basis for introducing permissionless logic into our state-transition function.
The pvm is comparable to the evm defined in the Yellow Paper, but somewhat simpler: the complex instructions for cryptographic operations are missing as are those which deal with environmental interactions. Overall it is far less opinionated since it alters a pre-existing general purpose design, risc-v, and optimizes it for our needs. This gives us excellent pre-existing tooling, since pvm remains essentially compatible with risc-v, including support from the compiler toolkit llvm and languages such as Rust and C++. Furthermore, the instruction set simplicity which risc-v and pvm share, together with the register size (64-bit), active number (13) and endianness (little) make it especially well-suited for creating efficient recompilers on to common hardware architectures.
The pvm is fully defined in appendix A, but for contextualization we will briefly summarize the basic invocation function which computes the resultant state of a pvm instance initialized with some registers () and ram (), within the limits of some amount of gas (), a number of approximately time-proportional computational steps:
| (4.22) |
We refer to the time-proportional computational steps as gas (much like in the YP) and limit it to a 64-bit quantity. Within the context of the pvm, is typically used to denote gas, but we may also use internally within the definition of the pvm where it may be convenient.
| (4.23) |
It is left as a rather important implementation detail to ensure that the amount of time taken while computing the function has a maximum computation time approximately proportional to the value of regardless of other operands.
The pvm is a very simple risc register machine and as such has 13 registers, each of which is a 64-bit quantity, denoted as , a natural less than .99 9 This is three fewer than risc-v’s 16, however the amount that program code output by compilers uses is 13 since two are reserved for operating system use and the third is fixed as zero Within the context of the pvm, is typically used to denote the registers.
| (4.24) | ||||
| (4.25) |
The pvm assumes a simple pageable ram of 32-bit addressable octets situated in pages of octets where each page may be either immutable, mutable or inaccessible. The ram definition includes two components: a value and access . If the component is unspecified while being subscripted then the value component may be assumed. Within the context of the virtual machine, is typically used to denote ram.
| (4.26) | ||||
| (4.27) |
We define two sets of indices for the ram : is the set of indices which may be read from; and is the set of indices which may be written to.
Invocation of the pvm has an exit-reason as the first item in the resultant tuple. It is either:
-
•
Regular program termination caused by an explicit halt instruction, .
-
•
Irregular program termination caused by some exceptional circumstance, .
-
•
Exhaustion of gas, .
-
•
A page fault (attempt to access some address in ram which is not accessible), F . This includes the address of the page at fault.
-
•
An attempt at progressing a host-call, . This allows for the progression and integration of a context-dependent state-machine beyond the regular pvm.
The full definition follows in appendix A.
4.8. Epochs and Slots
Unlike the YP Ethereum with its proof-of-work consensus system, Jam defines a proof-of-authority consensus mechanism, with the authorized validators presumed to be identified by a set of public keys and decided by a staking mechanism residing within some system hosted by Jam. The staking system is out of scope for the present work; instead there is an api which may be utilized to update these keys, and we presume that whatever logic is needed for the staking system will be introduced and utilize this api as needed.
The Safrole mechanism subdivides time following genesis into fixed length epochs with each epoch divided into timeslots each of uniform length seconds, given an epoch period of seconds or one hour.
This six-second slot period represents the minimum time between Jam blocks, and through Safrole we aim to strictly minimize forks arising both due to contention within a slot (where two valid blocks may be produced within the same six-second period) and due to contention over multiple slots (where two valid blocks are produced in different time slots but with the same parent).
Formally when identifying a timeslot index, we use a natural less than (in compute parlance, a 32-bit unsigned integer) indicating the number of six-second timeslots from the Jam Common Era. For use in this context we introduce the set :
| (4.28) |
This implies that the lifespan of the proposed protocol takes us to mid-August of the year 2840, which with the current course that humanity is on should be ample.
4.9. The Core Model and Services
Whereas in the Ethereum Yellow Paper when defining the state machine which is held in consensus amongst all network participants, we presume that all machines maintaining the full network state and contributing to its enlargement—or, at least, hoping to—evaluate all computation. This “everybody does everything” approach might be called the on-chain consensus model. It is unfortunately not scalable, since the network can only process as much logic in consensus that it could hope any individual node is capable of doing itself within any given period of time.
4.9.1. In-core Consensus
In the present work, we achieve scalability of the work done through introducing a second model for such computation which we call the in-core consensus model. In this model, and under normal circumstances, only a subset of the network is responsible for actually executing any given computation and assuring the availability of any input data it relies upon to others. By doing this and assuming a certain amount of computational parallelism within the validator nodes of the network, we are able to scale the amount of computation done in consensus commensurate with the size of the network, and not with the computational power of any single machine. In the present work we expect the network to be able to do upwards of 300 times the amount of computation in-core as that which could be performed by a single machine running the virtual machine at full speed.
Since in-core consensus is not evaluated or verified by all nodes on the network, we must find other ways to become adequately confident that the results of the computation are correct, and any data used in determining this is available for a practical period of time. We do this through a crypto-economic game of three stages called guaranteeing, assuring, auditing and, potentially, judging. Respectively, these attach a substantial economic cost to the invalidity of some proposed computation; then a sufficient degree of confidence that the inputs of the computation will be available for some period of time; and finally, a sufficient degree of confidence that the validity of the computation (and thus enforcement of the first guarantee) will be checked by some party who we can expect to be honest.
All execution done in-core must be reproducible by any node synchronized to the portion of the chain which has been finalized. Execution done in-core is therefore designed to be as stateless as possible. The requirements for doing it include only the refinement code of the service, the code of the authorizer and any preimage lookups it carried out during its execution.
When a work-report is presented on-chain, a specific block known as the lookup-anchor is identified. Correct behavior requires that this must be in the finalized chain and reasonably recent, both properties which may be proven and thus are acceptable for use within a consensus protocol.
We describe this pipeline in detail in the relevant sections later.
4.9.2. On Services and Accounts
In YP Ethereum, we have two kinds of accounts: contract accounts (whose actions are defined deterministically based on the account’s associated code and state) and simple accounts which act as gateways for data to arrive into the world state and are controlled by knowledge of some secret key. In Jam, all accounts are service accounts. Like Ethereum’s contract accounts, they have an associated balance, some code and state. Since they are not controlled by a secret key, they do not need a nonce.
The question then arises: how can external data be fed into the world state of Jam? And, by extension, how does overall payment happen if not by deducting the account balances of those who sign transactions? The answer to the first lies in the fact that our service definition actually includes multiple code entry-points, one concerning refinement and the other concerning accumulation. The former acts as a sort of high-performance stateless processor, able to accept arbitrary input data and distill it into some much smaller amount of output data, which together with some metadata is known as a digest. The latter code is more stateful, providing access to certain on-chain functionality including the possibility of transferring balance and invoking the execution of code in other services. Being stateful this might be said to more closely correspond to the code of an Ethereum contract account.
To understand how Jam breaks up its service code is to understand Jam’s fundamental proposition of generality and scalability. All data extrinsic to Jam is fed into the refinement code of some service. This code is not executed on-chain but rather is said to be executed in-core. Thus, whereas the accumulator code is subject to the same scalability constraints as Ethereum’s contract accounts, refinement code is executed off-chain and subject to no such constraints, enabling Jam services to scale dramatically both in the size of their inputs and in the complexity of their computation.
While refinement and accumulation take place in consensus environments of a different nature, both are executed by the members of the same validator set. The Jam protocol through its rewards and penalties ensures that code executed in-core has a comparable level of crypto-economic security to that executed on-chain, leaving the primary difference between them one of scalability versus synchroneity.
As for managing payment, Jam introduces a new abstraction mechanism based around Polkadot’s Agile Coretime. Within the Ethereum transactive model, the mechanism of account authorization is somewhat combined with the mechanism of purchasing blockspace, both relying on a cryptographic signature to identify a single “transactor” account. In Jam, these are separated and there is no such concept of a “transactor”.
In place of Ethereum’s gas model for purchasing and measuring blockspace, Jam has the concept of coretime, which is prepurchased and assigned to an authorization agent. Coretime is analogous to gas insofar as it is the underlying resource which is being consumed when utilizing Jam. Its procurement is out of scope in the present work and is expected to be managed by a system parachain operating within a parachains service itself blessed with a number of cores for running such system services. The authorization agent allows external actors to provide input to a service without necessarily needing to identify themselves as with Ethereum’s transaction signatures. They are discussed in detail in section 8.
5. The Header
We must first define the header in terms of its components. The header comprises a parent hash and prior state root ( and ), an extrinsic hash , a time-slot index , the epoch, winning-tickets and offenders markers , and , a block author index and two Bandersnatch signatures; the entropy-yielding vrf signature and a block seal . Headers may be serialized to an octet sequence with and without the latter seal component using and respectively. Formally:
| (5.1) |
The blockchain is a sequence of blocks, each cryptographically referencing some prior block by including a hash derived from the parent’s header, all the way back to some first block which references the genesis header. We already presume consensus over this genesis header and the state it represents defined as .
Excepting the Genesis header, all block headers have an associated parent header, whose hash is . We denote the parent header :
| (5.2) |
is thus defined as being the mapping from one block header to its parent block header. With , we are able to define the set of ancestor headers :
| (5.3) |
We only require implementations to store headers of ancestors which were authored in the previous hours of any block they wish to validate.
The extrinsic hash is a Merkle commitment to the block’s extrinsic data, taking care to allow for the possibility of reports and preimages to individually have their inclusion proven. Given any block , then formally:
| (5.4) | ||||
| (5.5) | ||||
| (5.6) | ||||
| (5.7) |
A block may only be regarded as valid once the time-slot index is in the past. It is always strictly greater than that of its parent. Formally:
| (5.8) |
Blocks considered invalid by this rule may become valid as advances.
The parent state root is the root of a Merkle trie composed by the mapping of the prior state’s Merkle root, which by definition is also the parent block’s posterior state. This is a departure from both Polkadot and the Yellow Paper’s Ethereum, in both of which a block’s header contains the posterior state’s Merkle root. We do this to facilitate the pipelining of block computation and in particular of Merklization.
| (5.9) |
We assume the state-Merklization function is capable of transforming our state into a 32-octet commitment. See appendix D for a full definition of these two functions.
All blocks have an associated public key to identify the author of the block. We identify this as an index into the posterior current validator set . We denote the Bandersnatch key of the author as though note that this is merely an equivalence, and is not serialized as part of the header.
| (5.10) |
5.1. The Markers
If not , then the epoch marker specifies key and entropy relevant to the following epoch in case the ticket contest does not complete adequately (a very much unexpected eventuality). Similarly, the winning-tickets marker, if not , provides the series of 600 slot sealing “tickets” for the next epoch (see the next section). Finally, the offenders marker is the sequence of Ed25519 keys of newly misbehaving validators, to be fully explained in section 10. Formally:
| (5.11) |
6. Block Production and Chain Growth
As mentioned earlier, Jam is architected around a hybrid consensus mechanism, similar in nature to that of Polkadot’s Babe/Grandpa hybrid. Jam’s block production mechanism, termed Safrole after the novel Sassafras production mechanism of which it is a simplified variant, is a stateful system rather more complex than the Nakamoto consensus described in the YP.
The chief purpose of a block production consensus mechanism is to limit the rate at which new blocks may be authored and, ideally, preclude the possibility of “forks”: multiple blocks with equal numbers of ancestors.
To achieve this, Safrole limits the possible author of any block within any given six-second timeslot to a single key-holder from within a prespecified sequence of validators. Furthermore, under normal operation, the identity of the key-holder of any future timeslot will have a very high degree of anonymity. As a side effect of its operation, we can generate a high-quality pool of entropy which may be used by other parts of the protocol and is accessible to services running on it.
Because of its tightly scoped role, the core of Safrole’s state, , is independent of the rest of the protocol. It interacts with other portions of the protocol through and , the prospective and active sequences of validator keys respectively; , the most recent block’s timeslot; and , the entropy accumulator.
The Safrole protocol determines, once per epoch, a slot-sealer sequence of entries, one for each potential block within the epoch. Under normal operation, each entry is a ticket: an anonymous claim to a timeslot. Each block header includes its timeslot index (the number of six-second periods since the Jam Common Era began) and a valid seal signature , signed by the validator holding the secret key corresponding to the entry at the relevant index of . Each ticket is in fact a pseudonym for some validator which was agreed the privilege of authoring a block in the corresponding timeslot.
In order to generate while keeping the correspondence between tickets and validators anonymous, we use a novel Ringvrf cryptographic scheme built on the Bandersnatch curve. This scheme allows validators to provide a proof which simultaneously: (1) guarantees the author controlled a key within the validator key sequence, and (2) produces an unbiasable deterministic hash output, giving us a secure verifiable random function (vrf). This anonymous vrf output is the ticket identifier. Validators submit their tickets throughout the epoch, accumulating them in . The tickets with the best scores are then selected to populate the next epoch’s slot-sealer sequence , determining which validator may author a block in each corresponding timeslot.
6.1. Timekeeping
Here, defines the most recent block’s slot index, which we transition to the slot index as defined in the block’s header:
| (6.1) |
We track the slot index in state as in order that we are able to easily both identify a new epoch and determine the slot at which the prior block was authored. We denote as the prior’s epoch index and as the prior’s slot phase index within that epoch and and are the corresponding values for the present block:
| (6.2) |
6.2. Safrole Basic State
We restate into a number of components:
| (6.3) |
is the epoch’s root, a Bandersnatch ring root composed with the one Bandersnatch key of each of the next epoch’s validators, defined in (itself defined in the next section).
| (6.4) |
Finally, is the ticket accumulator, a sequence of highest-scoring ticket identifiers to be used for the next epoch. is the current epoch’s slot-sealer sequence, which is either a full complement of tickets or, in the case of a fallback mode, a sequence of Bandersnatch keys:
| (6.5) |
Here, is used to denote the set of tickets, a combination of a verifiably random ticket identifier and the ticket’s entry-index :
| (6.6) |
As we state in section 6.4, Safrole requires that every block header contain a valid seal , which is a Bandersnatch signature produced with the private key corresponding to the entry at index of the current epoch’s slot-sealer sequence .
6.3. Key Rotation
In addition to the active sequence of validator keys and staging sequence , internal to the Safrole state we retain a pending sequence . The active validator keys identifies the nodes which are currently privileged to author blocks and carry out the validation processes, whereas the pending keys , which is reset to at the beginning of each epoch, is the sequence of keys which will be active in the next epoch and which determine the Bandersnatch ring root which authorizes tickets into the slot-sealer contest for the next epoch. The length of each sequence is always a multiple of 3 between 6 and .
| (6.7) | |||
| (6.8) |
We must introduce , the set of validator key tuples. This is a combination of a set of cryptographic public keys and metadata which is an opaque octet sequence, but utilized to specify practical identifiers for the validator, not least a hardware address.
The set of validator keys itself is equivalent to the set of 336-octet sequences. However, for clarity, we divide the sequence into four easily denoted components. For any validator key , the Bandersnatch key is denoted , and is equivalent to the first 32-octets; the Ed25519 key, , is the second 32 octets; the bls key denoted is equivalent to the following 144 octets, and finally the metadata is the last 128 octets. Formally:
| (6.9) | ||||
| (6.10) | ||||
| (6.11) | ||||
| (6.12) | ||||
| (6.13) |
With a new epoch under regular conditions, validator keys get rotated and the epoch’s Bandersnatch ring root is updated into :
| (6.14) | ||||
| (6.15) |
Note that on epoch changes the posterior queued validator key sequence is defined such that incoming keys belonging to the offenders are replaced with a null key containing only zeroes. The origin of the offenders is explained in section 10.
6.4. Sealing and Entropy Accumulation
The header must contain a valid seal and valid entropy source. These are two vrf signatures both using the private key corresponding to the slot-sealer sequence entry for the current timeslot; the message data of the former is the header’s serialization omitting the seal component , whereas the latter is used as a bias-resistant entropy source and thus its message must already have been fixed: we use the entropy stemming from the vrf of the seal signature. Formally:
| (6.16) | ||||
| (6.17) | ||||
| (6.18) | ||||
| (6.19) | ||||
| (6.20) | ||||
| (6.21) | ||||
Sealing using the ticket is of greater security, and we utilize this knowledge when determining a candidate block on which to extend the chain, detailed in section 19. We thus note that the block was sealed under the regular security with the boolean marker . We define this only for the purpose of ease of later specification.
In addition to the entropy accumulator , we retain three additional historical values of the accumulator at the point of each of the three most recently ended epochs, , and . The second-oldest of these is utilized to help ensure future entropy is unbiased (see equation 6.30) and seed the fallback slot-sealer generation function with randomness (see equation 6.25). The oldest is used to regenerate this randomness when verifying the seal above (see equations 6.17 and 6.16).
| (6.22) |
defines the state of the randomness accumulator to which the provably random output of the vrf, the signature over some unbiasable input, is combined each block. , and meanwhile retain the state of this accumulator at the end of the three most recently ended epochs in order.
| (6.23) |
On an epoch transition (identified as the condition ), we therefore rotate the accumulator value into the history , and :
| (6.24) |
6.5. The Slot-Sealer Sequence
The posterior slot-sealer sequence is one of three expressions depending on the circumstance of the block. If the block is not the first in an epoch, then it remains unchanged from the prior . If the block signals the next epoch (by epoch index) and the previous block’s slot was within the closing period of the previous epoch, then it takes the value of the prior ticket accumulator . Otherwise, it takes the value of the fallback key sequence. Formally:
| (6.25) |
Here, we use as the outside-in sequencer function, defined as follows:
| (6.26) |
Finally, is the fallback key sequence function which selects an epoch’s worth of validator Bandersnatch keys () from the validator key sequence using the entropy collected on-chain :
| (6.27) |
6.6. The Markers
The epoch and winning-tickets markers are information placed in the header in order to minimize data transfer necessary to determine the validator keys associated with any given epoch. They are particularly useful to nodes which do not synchronize the entire state for any given block since they facilitate the secure tracking of changes to the validator key sequences using only the chain of headers.
As mentioned earlier, the header’s epoch marker is either empty or, if the block is the first in a new epoch, then a tuple of the next and current epoch randomness, along with a sequence of tuples containing both Bandersnatch keys and Ed25519 keys for each validator defining the validator keys beginning in the next epoch. Formally:
| (6.28) |
The winning-tickets marker is either empty or, if the block is the first after the end of the submission period for tickets and if the ticket accumulator is saturated, then the final sequence of ticket identifiers. Formally:
| (6.29) |
6.7. The Extrinsic and Tickets
The extrinsic is a sequence of proofs of valid tickets; a ticket implies an entry in our epochal “contest” to determine which validators are privileged to author a block for each timeslot in the following epoch. Tickets specify an entry index (a natural number less than the maximum number of ticket entries per validator, defined in equation 6.30) together with a proof of ticket’s validity. The proof implies a ticket identifier, a high-entropy unbiasable 32-octet sequence, which is used both as a score in the aforementioned contest and as input for the block’s entropy source vrf signature.
Towards the end of the epoch (i.e. slots from the start) this contest is closed implying successive blocks within the same epoch must have an empty tickets extrinsic. At this point, the following epoch’s slot-sealer sequence becomes fixed.
We define the extrinsic as a sequence of proofs of valid tickets, each of which is a tuple of an entry index and a proof of ticket validity. To ensure the accumulator can be saturated, when there are fewer validators, each validator is permitted more tickets. Formally:
| (6.30) | ||||
| (6.31) | ||||
We define as the sequence of new tickets, with the ticket identifier, a hash, defined as the output component of the Bandersnatch Ringvrf proof:
| (6.32) |
The tickets submitted via the extrinsic must already have been placed in order of their implied identifier. Duplicate identifiers are never allowed lest a validator submit the same ticket multiple times:
| (6.33) | ||||
| (6.34) |
The new ticket accumulator is constructed by merging new tickets into the previous accumulator value (or the empty sequence if it is a new epoch):
| (6.35) |
The maximum size of the ticket accumulator is . On each block, the accumulator becomes the lowest items of the sorted union of tickets from prior accumulator and the submitted tickets. It is invalid to include useless tickets in the extrinsic, so all submitted tickets must exist in their posterior ticket accumulator. Formally:
| (6.36) |
Note that it can be shown that in the case of an empty extrinsic , as implied by , and unchanged epoch (), then .
7. Recent History
We retain in state information on the most recent blocks. This is used to preclude the possibility of duplicate or out of date work-reports from being submitted.
| (7.1) | ||||
| (7.2) | ||||
| (7.3) | ||||
| (7.4) |
For each recent block, we retain its header hash, its state root, its accumulation-result mmb and the corresponding work-package hashes of each item reported (which is no more than the total number of cores, ).
During the accumulation stage, a value with the partial transition of this state is provided which contains the correction for the newly-known state-root of the parent block:
| (7.5) |
We define the new Accumulation Output Log . This is formed from the block’s accumulation-output sequence (defined in section 12), taking its root using the basic binary Merklization function (, defined in appendix E) and appending it to the previous log value with the mmb append function (defined in appendix E.2). Throughout, the Keccak hash function is used to maximize compatibility with legacy systems:
| (7.6) | ||||
| (7.7) |
The final state transition for appends a new item including the new block’s header hash, a Merkle commitment to the block’s Accumulation Output Log and the set of work-reports made into it (for which we use the guarantees extrinsic, ). Formally:
| (7.8) | ||||
The new state-trie root is the zero hash, , which is inaccurate but safe since is not utilized except to define the next block’s , which contains a corrected value for this, as per equation 7.5.
8. Authorization
We have previously discussed the model of work-packages and services in section 4.9, however we have yet to make a substantial discussion of exactly how some coretime resource may be apportioned to some work-package and its associated service. In the YP Ethereum model, the underlying resource, gas, is procured at the point of introduction on-chain and the purchaser is always the same agent who authors the data which describes the work to be done (i.e. the transaction). Conversely, in Polkadot the underlying resource, a parachain slot, is procured with a substantial deposit for typically 24 months at a time and the procurer, generally a parachain team, will often have no direct relation to the author of the work to be done (i.e. a parachain block).
On a principle of flexibility, we would wish Jam capable of supporting a range of interaction patterns both Ethereum-style and Polkadot-style. In an effort to do so, we introduce the authorization system, a means of disentangling the intention of usage for some coretime from the specification and submission of a particular workload to be executed on it. We are thus able to disassociate the purchase and assignment of coretime from the specific determination of work to be done with it, and so are able to support both Ethereum-style and Polkadot-style interaction patterns.
8.1. Authorizers and Authorizations
The authorization system involves three key concepts: Authorizers, Tokens and Traces. A Token is simply a piece of opaque data to be included with a work-package to help make an argument that the work-package should be authorized. Similarly, a Trace is a piece of opaque data which helps characterize or describe some successful authorization. An Authorizer meanwhile, is a piece of logic which executes within some pre-specified and well-known computational limits and determines whether a work-package—including its Token—is authorized for execution on some particular core and yields a Trace on success.
Authorizers are identified as the hash of their pvm code hash concatenated with their Configuration blob, the latter being, like Tokens and Traces, opaque data meaningful to the pvm code. The process by which work-packages are determined to be authorized (or not) is not the competence of on-chain logic and happens entirely in-core and as such is discussed in section 14.3. However, on-chain logic must identify each set of authorizers assigned to each core in order to verify that a work-package is legitimately able to utilize that resource. It is this subsystem we will now define.
8.2. Pool and Queue
We define the set of authorizers allowable for a particular core as the authorizer pool . To maintain this value, a further portion of state is tracked for each core: the core’s current authorizer queue , from which we draw values to fill the pool. Formally:
| (8.1) |
Note: The portion of state may be altered only through an exogenous call made from the accumulate logic of an appropriately privileged service.
The state transition of a block involves placing a new authorization into the pool from the queue:
| (8.2) | |||
| (8.3) |
Since is dependent on , practically speaking, this step must be computed after accumulation, the stage in which is defined. Note that we utilize the guarantees extrinsic to remove the oldest authorizer which has been used to justify a guaranteed work-package in the current block. This is further defined in equation 11.25.
9. Service Accounts
As we already noted, a service in Jam is somewhat analogous to a smart contract in Ethereum in that it includes amongst other items, a code component, a storage component and a balance. Unlike Ethereum, the code is split over two isolated entry-points each with their own environmental conditions; one, Refinement, is essentially stateless and happens in-core, and the other, Accumulation, which is stateful and happens on-chain. It is the latter which we will concern ourselves with now.
Service accounts are held in state under , a partial mapping from a service identifier into a tuple of named elements which specify the attributes of the service relevant to the Jam protocol. Formally:
| (9.1) | ||||
| (9.2) |
The service account is defined as the tuple of storage dictionary , preimage lookup dictionaries and , code hash , balance and gratis storage offset , as well as the two code gas limits & . We also record certain usage characteristics concerning the account: the time slot at creation , the time slot at the most recent accumulation and the parent service . Formally:
| (9.3) |
Thus, the balance of the service of index would be denoted and the storage item of key for that service is written .
9.1. Code and Gas
The code and associated metadata of a service account is identified by a hash which, if the service is to be functional, must be present within its preimage lookup (see section 9.2) and have a preimage which is a proper encoding of the two blobs. We thus define the actual code and metadata :
| (9.4) |
There are two entry-points in the code:
- 0 refine:
-
Refinement, executed in-core and stateless.1010 10 Technically there is some small assumption of state, namely that some modestly recent instance of each service’s preimages. The specifics of this are discussed in section 14.3.
- 1 accumulate:
-
Accumulation, executed on-chain and stateful.
As stated in appendix A, execution time in the Jam virtual machine is measured deterministically in units of gas, represented as a natural number less than and formally denoted . We may also use to denote the set if the quantity may be negative. There are two limits specified in the account, which determine the minimum gas required in order to execute the Accumulate entry-point of the service’s code. is the minimum gas required per work-item, while is the minimum gas required per deferred-transfer.
9.2. Preimage Lookups
In addition to storing data in arbitrary key/value pairs available only on-chain, an account may also solicit data to be made available also in-core, and thus available to the Refine logic of the service’s code. State concerning this facility is held under the service’s and components.
There are several differences between preimage-lookups and storage. Firstly, preimage-lookups act as a mapping from a hash to its preimage, whereas general storage maps arbitrary keys to values. Secondly, preimage data is supplied extrinsically, whereas storage data originates as part of the service’s accumulation. Thirdly preimage data, once supplied, may not be removed freely; instead it goes through a process of being marked as unavailable, and only after a period of time may it be removed from state. This ensures that historical information on its existence is retained. The final point especially is important since preimage data is designed to be queried in-core, under the Refine logic of the service’s code, and thus it is important that the historical availability of the preimage is known.
We begin by reformulating the portion of state concerning our data-lookup system. The purpose of this system is to provide a means of storing static data on-chain such that it may later be made available within the execution of any service code as a function accepting only the hash of the data and its length in octets.
During the on-chain execution of the Accumulate function, this is trivial to achieve since there is inherently a state which all validators verifying the block necessarily have complete knowledge of, i.e. . However, for the in-core execution of Refine, there is no such state inherently available to all validators; we thus name a historical state, the lookup anchor which must be considered recently finalized before the work’s implications may be accumulated hence providing this guarantee.
By retaining historical information on its availability, we become confident that any validator with a recently finalized view of the chain is able to determine whether any given preimage was available at any time within the period where auditing may occur. This ensures confidence that judgments will be deterministic even without consensus on chain state.
Restated, we must be able to define some historical lookup function which determines whether the preimage of some hash was available for lookup by some service account at some timeslot, and if so, provide it:
| (9.5) |
This function is defined shortly below in equation LABEL:eq:historicallookup.
The preimage lookup for some service of index is denoted is a dictionary mapping a hash to its corresponding preimage. Additionally, there is metadata associated with the lookup denoted which is a dictionary mapping some hash and presupposed length into historical information.
9.2.1. Invariants
9.2.2. Semantics
The historical status component is a sequence of up to three time slots and the cardinality of this sequence implies one of four modes:
-
•
: The preimage is requested, but has not yet been supplied.
-
•
: The preimage is available and has been from time .
-
•
: The previously available preimage is now unavailable since time . It had been available from time .
-
•
: The preimage is available and has been from time . It had previously been available from time until time .
The historical lookup function may now be defined as:
| (9.7) | ||||
9.3. Account Footprint and Threshold Balance
We define the dependent values and as the storage footprint of the service, specifically the number of items in storage and the total number of octets used in storage. They are defined purely in terms of the storage map of a service, and it must be assumed that whenever a service’s storage is changed, these change also.
Furthermore, as we will see in the account serialization function in section C, these are expected to be found explicitly within the Merklized state data. Because of this we make explicit their set.
We may then define a third dependent term , the minimum, or threshold, balance needed for any given service account in terms of its storage footprint.
| (9.8) |
9.4. Service Privileges
Jam includes the ability to bestow privileges on a number of services. The portion of state in which this is held is denoted and includes five kinds of privilege. The first, , is the index of the manager service which is the service able to effect an alteration of from block to block as well as bestow services with storage deposit credits. The next, , is able to set . Then alone is able to create new service accounts with indices in the protected range. The following, , are the service indices capable of altering the authorizer queue , one for each core.
Finally, is a small dictionary containing the indices of services which automatically accumulate in each block together with a basic amount of gas with which each accumulates. Formally:
| (9.9) | ||||
| (9.10) | ||||
| (9.11) |
10. Disputes, Verdicts and Judgments
Jam provides a means of recording judgments: consequential votes amongst most of the validators over the validity of a work-report (a unit of work done within Jam, see section 11). Such collections of judgments are known as verdicts. Jam also provides a means of registering offenses, judgments and guarantees which dissent with an established verdict. Together these form the disputes system.
The registration of a verdict is not expected to happen very often in practice, however it is an important security backstop for removing and banning invalid work-reports from the processing pipeline as well as removing troublesome keys from the validator set where there is consensus over their malfunction. It also helps coordinate nodes to revert chain-extensions containing invalid work-reports and provides a convenient means of aggregating all offending validators for punishment in a higher-level system.
Judgement statements come about naturally as part of the auditing process and are expected to be positive, further affirming the guarantors’ assertion that the work-report is valid. In the event of a negative judgment, then all validators audit said work-report and we assume a verdict will be reached. Auditing and guaranteeing are off-chain processes properly described in sections 14 and 17.
A judgment against a report implies that the chain is already reverted to some point prior to the accumulation of said report, usually forking at the block immediately prior to that at which accumulation happened. The specific strategy for chain selection is described fully in section 19. Authoring a block with a non-positive verdict has the effect of cancelling its imminent accumulation, as can be seen in equation 10.14.
Registering a verdict also has the effect of placing a permanent record of the event on-chain and allowing any offending keys to be placed on-chain both immediately or in forthcoming blocks, again for permanent record.
Having a persistent on-chain record of misbehavior is helpful in a number of ways. It provides a very simple means of recognizing the circumstances under which action against a validator must be taken by any higher-level validator-selection logic. Should Jam be used for a public network such as Polkadot, this would imply the slashing of the offending validator’s stake on the staking parachain.
As mentioned, recording reports found to have a high confidence of invalidity is important to ensure that said reports are not allowed to be resubmitted. Conversely, recording reports found to be valid ensures that additional disputes cannot be raised in the future of the chain.
10.1. The State
The disputes state includes four items, three of which concern verdicts: a good-set (), a bad-set () and a wonky-set () containing the hashes of all work-reports which were respectively judged to be correct, incorrect or that it appears impossible to judge. The fourth item, the punish-set (), is a set of Ed25519 keys representing validators which were found to have misjudged a work-report.
| (10.1) |
10.2. Extrinsic
The disputes extrinsic is functional grouping of three otherwise independent extrinsics. It comprises verdicts , culprits , and faults . Verdicts are a compilation of judgments coming from either the active validator set or the previous epoch’s validator set, i.e. the Ed25519 keys of or . Culprits and faults are proofs of the misbehavior of one or more validators, respectively either by guaranteeing a work-report found to be invalid, or by signing a judgment found to be in contradiction to a work-report’s validity. Both of these are considered a kind of offense. Formally:
| (10.2) | ||||
The signatures of all judgments must be valid in terms of one of the two allowed validator key-sets, identified by the verdict’s second term which must be either the epoch index of the prior state or one less. Each verdict must contain judgments from exactly two-thirds plus one of the identified validators. Formally:
| (10.3) | |||
| (10.4) | |||
| (10.5) |
Offender signatures must be similarly valid and reference work-reports with judgments and may not report keys which are already in the punish-set:
| (10.6) | ||||
| (10.7) | ||||
Verdicts must be ordered by report hash. Offender signatures and must each be ordered by the validator’s Ed25519 key. There may be no duplicate report hashes within the extrinsic, nor amongst any past reported hashes. Formally:
| (10.8) | |||
| (10.9) | |||
| (10.10) |
The judgments of all verdicts must be ordered by validator index and there may be no duplicates:
| (10.11) |
We define to derive from the sequence of verdicts introduced in the block’s extrinsic. For each verdict, contains only the report hash and whether the report is good (), bad () or wonky (), which is determined from the sum of positive judgments. We require this sum to be either exactly two-thirds-plus-one, zero or one-third of the validator set indicating, respectively, that the report is good, that it’s bad, or that it’s wonky.1111 11 This requirement may seem somewhat arbitrary, but these happen to be the decision thresholds for our three possible actions and are acceptable since the security assumptions include the requirement that at least two-thirds-plus-one validators are live ([4] discusses the security implications in depth). Formally:
| (10.12) | ||||
Any verdict containing solely valid judgments implies the same report having at least one valid entry in the faults sequence . Formally:
| (10.13) |
We clear any availability assignments for work-reports which we judged as uncertain or invalid:
| (10.14) |
The state’s good-set, bad-set and wonky-set assimilate the hashes of the reports from each verdict. Finally, the punish-set accumulates the keys of any validators who have been found guilty of offending. Formally:
| (10.15) | ||||
| (10.16) | ||||
| (10.17) | ||||
| (10.18) | ||||
10.3. Header
The offenders markers must contain exactly the keys of all new offenders, respectively. Formally:
| (10.19) |
11. Reporting and Assurance
Reporting and assurance are the two on-chain processes we do to allow the results of in-core computation to make their way into the state of service accounts, . A work-package, which comprises several work-items, is transformed by validators acting as guarantors into its corresponding work-report, which similarly comprises several work-digests and then presented on-chain within the guarantees extrinsic. At this point, the work-package is erasure coded into a multitude of segments and each segment distributed to the associated validator who then attests to its availability through an assurance placed on-chain. After enough assurances the work-report is considered available, and the work-digests transform the state of their associated service by virtue of accumulation, covered in section 12. The report may also be timed-out, implying it may be replaced by another report without accumulation.
From the perspective of the work-report, therefore, the guarantee happens first and the assurance afterwards. However, from the perspective of a block’s state-transition, the assurances are best processed first since each core may only have one availability assignment (a guaranteed work-report pending availability). Thus, we will first cover the transition arising from processing the availability assurances followed by the work-report guarantees. This synchroneity can be seen formally through the requirement of an intermediate state , utilized later in equation 11.32.
11.1. State
The state of the reporting and availability portion of the protocol is largely contained within , which tracks each core’s availability assignment, if any. An availability assignment is a work-report guarantee which has been reported but is not yet known to be available to a super-majority of validators, together with the time at which it was reported. Formally:
| (11.1) |
As usual, intermediate and posterior values (, , ) are held under the same constraints as the prior value.
A work-report guarantee, of the set , consists of a work-report together with signatures from guarantors attesting to its correctness. is formally defined and guarantees discussed further in section 11.4. For now, we concern ourselves with the work-reports that guarantees attest to.
11.1.1. Work Report
A work-report, of the set , is defined as a tuple of the work-package specification, ; the refinement context, ; the core-index (i.e. on which the work is done), ; as well as the authorizer hash and trace ; a segment-root lookup dictionary ; the gas consumed during the Is-Authorized invocation, ; and finally the work-digests which comprise the results of the evaluation of each of the items in the package together with some associated data. Formally:
| (11.2) |
We limit the sum of the number of items in the segment-root lookup dictionary and the number of prerequisites to :
| (11.3) |
11.1.2. Refinement Context
A refinement context, denoted by the set , describes the context of the chain at the point that the report’s corresponding work-package was evaluated. It identifies two historical blocks, the anchor, header hash , timeslot , posterior state-root and accumulation output log super-peak ; and the lookup-anchor, header hash , timeslot and posterior state-root . Finally, it identifies the hash of any prerequisite work-packages . Formally:
| (11.4) |
11.1.3. Availability
We define the set of availability specifications, , as the tuple of the work-package’s hash , an auditable work bundle length (see section 14.4.1 for more clarity on what this is), together with an erasure-root , the total number of erasure-coded chunks , a segment-root and segment-count . Work-reports include this availability specification in order to ensure the original work-package is able to be correctly reconstructed and the purported ramifications audited. Formally:
| (11.5) |
The erasure-root () is the root of a binary Merkle tree whose leaves are the chunks produced by erasure-coding the work-package bundle and exported segments. As one chunk is distributed to each assurer, the number of chunks must equal the size of the assuring validator set. The erasure-root functions as a commitment to all data required for the auditing of the report and for use by later work-packages should they need to retrieve any data yielded. It is thus used by assurers to verify the correctness of data they have been sent by guarantors, and it is later verified as correct by auditors. It is discussed fully in section 14.
The segment-root () is the root of a constant-depth, left-biased and zero-hash-padded binary Merkle tree committing to the hashes of each of the exported segments of each work-item. These are used by guarantors to verify the correctness of any reconstructed segments they are called upon to import for evaluation of some later work-package. It is also discussed in section 14.
11.1.4. Work Digest
We finally come to define a work-digest, , which is the data conduit by which services’ states may be altered through the computation done within a work-package.
| (11.6) |
Work-digests are a tuple comprising several items. Firstly , the index of the service whose state is to be altered and thus whose refine code was already executed. We include the hash of the code of the service at the time of being reported , which must be accurately predicted within the work-report according to equation 11.45.
Next, the hash of the payload () within the work item which was executed in the refine stage to give this result. This has no immediate relevance, but is something provided to the accumulation logic of the service. We follow with the gas limit for executing this item’s accumulate.
There is the work result, the output blob or error of the execution of the code, , which may be either an octet sequence in case it was successful, or a member of the set , if not. This latter set is defined as the set of possible errors, formally:
| (11.7) |
The first two are special values concerning execution of the virtual machine, denoting an out-of-gas error and denoting an unexpected program termination. Of the remaining four, the first indicates that the number of exports made was invalidly reported, the second that the size of the digest (refinement output) would cross the acceptable limit, the third indicates that the service’s code was not available for lookup in state at the posterior state of the lookup-anchor block. The fourth indicates that the code was available but was beyond the maximum size allowed .
Finally, we have five fields describing the level of activity which this workload imposed on the core in bringing the result to bear. We include the actual amount of gas used during refinement; and the number of segments imported from, and exported into, the D3L respectively; and and the number of, and total size in octets of, the extrinsics used in computing the workload. See section 14 for more information on the meaning of these values.
In order to ensure fair use of a block’s extrinsic space, work-reports are limited in the maximum total size of the successful refinement output blobs together with the authorizer trace, effectively limiting their overall size:
| (11.8) | ||||
| (11.9) | ||||
| (11.10) |
11.2. Package Availability Assurances
We first define , the intermediate state to be utilized next in section 11.4, as well as , the set of available work-reports, which we will utilize later in section 12. Both require the integration of information from the assurances extrinsic .
11.2.1. The Assurances Extrinsic
The assurances extrinsic is a sequence of assurance values, at most one per validator. Each assurance is a sequence of binary values (i.e. a bitstring), one per core, together with a signature and the index of the validator who is assuring. A value of (or , if interpreted as a Boolean) at any given index implies that the validator assures they are contributing to its availability.1212 12 This is a “soft” implication since there is no consequence on-chain if dishonestly reported. For more information on this implication see section 16. Formally:
| (11.11) |
The assurances must all be anchored on the parent and ordered by validator index:
| (11.12) | ||||
| (11.13) |
The signature must be one whose public key is that of the validator assuring and whose message is the serialization of the parent hash and the aforementioned bitstring:
| (11.14) | ||||
| (11.15) |
A bit may only be set if the corresponding core has an availability assignment:
| (11.16) |
11.2.2. Available Reports
A work-report is said to become available if and only if there are a clear super-majority of validators who have marked its core as set within the block’s assurance extrinsic. Formally, we define the sequence of newly available work-reports as:
| (11.17) |
This value is utilized in the definition of both and which we will define presently as equivalent to except for the removal of items which are either now available or have timed out:
| (11.18) | ||||
Note that all items are cleared when the size of the active validator key set changes. Items cleared in this way can be viewed as having timed out early.
11.3. Guarantor Assignments
As discussed in section 4.9, the amount of in-core computation that is possible scales with the number of validator nodes. Concretely, while there are a fixed number of cores , only the first are active and capable of processing work-packages.
Every block, each active core has three validators uniquely assigned to guarantee work-reports for it. The core index assigned to each of the validators, as well as the validators’ keys are denoted by :
| (11.19) |
We determine the core to which any given validator is assigned through a shuffle using epochal entropy and a periodic rotation to help guard the security and liveness of the network. We use for the epochal entropy rather than to avoid the possibility of fork-magnification where uncertainty about chain state at the end of an epoch could give rise to two established forks before it naturally resolves.
We define the rotation function , the permute function and finally the guarantor assignments as follows:
| (11.20) | ||||
| (11.21) | ||||
| (11.22) |
We also define , which is equivalent to the value as it would have been under the previous rotation:
| (11.23) | ||||
11.4. Work Report Guarantees
We begin by defining , the set of guarantees. A guarantee is a tuple of a work-report , a credential and its corresponding timeslot . The credential is a sequence of two or three tuples of a unique validator index and a signature. Formally:
| (11.24) |
We continue by defining the guarantees extrinsic, , a series of guarantees. The core index of each guarantee in must be unique and guarantees must be in ascending order of this. Formally:
| (11.25) | ||||
| (11.26) |
Credentials must be ordered by their validator index:
| (11.27) |
The signature must be one whose public key is that of the validator identified in the credential, and whose message is the serialization of the hash of the work-report. The signing validators must have been assigned to the core in the given timeslot , which must either be in the same rotation as this block’s timeslot or in the previous rotation. Use of an inactive core is not permitted even if a timeslot in the previous rotation is used and the core was active then. Formally:
| (11.28) | |||
| (11.29) |
We note that the Ed25519 key of each validator whose signature is in a credential is placed in the reporters set . This is utilized by the validator activity statistics bookkeeping system section 13.
We denote to be the set of work-reports in the present extrinsic :
| (11.30) |
The total number of erasure-coded chunks for each report must match the number of active validators:
| (11.31) |
No guarantees may be placed on cores which already have availability assignments. A guarantee is valid only if the report’s authorizer hash is present in the authorizer pool of the core on which the work is reported. Formally:
| (11.32) |
We require that the gas allotted for accumulation of each work-digest in each work-report respects its service’s minimum gas requirements. We also require that all work-reports’ total allotted accumulation gas is no greater than the overall gas limit :
| (11.33) |
11.4.1. Contextual Validity of Reports
For convenience, we define two equivalences and to be, respectively, the set of all contexts and work-package hashes within the extrinsic:
| (11.34) |
There must be no duplicate work-package hashes (i.e. two work-reports of the same package). Therefore, we require the cardinality of to be the length of the work-report sequence :
| (11.35) |
We require that the anchor block be within the last blocks and that its details be correct by ensuring that it appears within our most recent blocks :
| (11.36) |
We require that each lookup-anchor block be within the last timeslots:
| (11.37) |
We also require that we have a record of it; this is one of the few conditions which cannot be checked purely with on-chain state and must be checked by virtue of retaining the series of the last headers as the ancestor set . Since it is determined through the header chain, it is still deterministic and calculable. Formally:
| (11.38) |
We require that the work-package of the report not be the work-package of some other report made in the past. We ensure that the work-package not appear anywhere within our pipeline. Formally:
| (11.39) | |||
| (11.40) | |||
| (11.41) |
We require that the prerequisite work-packages, if present, and any work-packages mentioned in the segment-root lookup, be either in the extrinsic or in our recent history.
| (11.42) |
We require that any segment roots mentioned in the segment-root lookup be verified as correct based on our recent work-package history and the present block:
| (11.43) | |||
| (11.44) |
(Note that these checks leave open the possibility of accepting work-reports in apparent dependency loops. We do not consider this a problem: the pre-accumulation stage effectively guarantees that accumulation never happens in these cases and the reports are simply ignored.)
Finally, we require that all work-digests within the extrinsic predicted the correct code hash for their corresponding service:
| (11.45) |
11.5. Transitioning for Reports
We define as being equivalent to , except where the extrinsic replaced an entry. In the case an entry is replaced, the new value includes the present time so that it can be cleared if it is not made available quickly enough (see equation 11.18).
| (11.46) |
It is worth noting that is always for ; this in particular ensures that we will never have more reports to audit than can safely be managed with the number of active validators.
This concludes the section on reporting and assurance. We now have a complete definition of together with to be utilized in section 12, describing the portion of the state transition happening once a work-report is guaranteed and made available.
12. Accumulation
Accumulation may be defined as some function whose arguments are and together with selected portions of (at times partially transitioned) state and which yields the posterior service state together with additional state elements , and .
The proposition of accumulation is in fact quite simple: we merely wish to execute the Accumulate logic of the service code of each of the services which has at least one work-digest, passing to it relevant data from said digests together with useful contextual information. However, there are three main complications. Firstly, we must define the execution environment of this logic and in particular the host functions available to it. Secondly, we must define the amount of gas to be allowed for each service’s execution. Finally, we must determine the nature of transfers within Accumulate.
12.1. History and Queuing
Accumulation of a work-report is deferred in the case that it has a not-yet-fulfilled dependency and is cancelled entirely in the case of an invalid dependency. Dependencies are specified as work-package hashes and in order to know which work-packages have been accumulated already, we maintain a history of what has been accumulated. This history, , is sufficiently large for an epoch worth of work-reports. Formally:
| (12.1) | ||||
| (12.2) |
We also maintain knowledge of ready (i.e. available and/or audited) but not-yet-accumulated work-reports in the state item . Each of these were made available at most one epoch ago but have or had unfulfilled dependencies. Alongside the work-report itself, we retain its unaccumulated dependencies, a set of work-package hashes. Formally:
| (12.3) |
The newly available work-reports, , are partitioned into two sequences based on the condition of having zero prerequisite work-reports. Those meeting the condition, , are accumulated immediately. Those not, , are for queued execution. Formally:
| (12.4) | ||||
| (12.5) | ||||
| (12.6) |
We define the queue-editing function , which is essentially a mutator function for items such as those of , parameterized by sets of now-accumulated work-package hashes (those in ). It is used to update queues of work-reports when some of them are accumulated. Functionally, it removes all entries whose work-report’s hash is in the set provided as a parameter, and removes any dependencies which appear in said set. Formally:
| (12.7) |
We further define the accumulation priority queue function , which provides the sequence of work-reports which are able to be accumulated given a set of not-yet-accumulated work-reports and their dependencies.
| (12.8) |
Finally, we define the mapping function which extracts the corresponding work-package hashes from a set of work-reports:
| (12.9) |
We may now define the sequence of accumulatable work-reports in this block as :
| (12.10) | ||||
| (12.11) | ||||
| (12.12) |
12.2. Execution
We work with a limited amount of gas per block and therefore may not be able to process all items in in a single block. There are two slightly antagonistic factors allowing us to optimize the amount of work-items, and thus work-reports, accumulated in a single block:
Firstly, while we have a well-known gas-limit for each work-item to be accumulated, accumulation may still result in a lower amount of gas used. Only after a work-item is accumulated can it be known if it uses less gas than the advertised limit. This implies a sequential execution pattern.
Secondly, since pvm setup cannot be expected to be zero-cost, we wish to amortize this cost over as many work-items as possible. This can be done by aggregating work-items associated with the same service into the same pvm invocation. This implies a non-sequential execution pattern.
We resolve this by defining a function which accumulates work-reports sequentially, and which itself utilizes a function which accumulates work-reports in a non-sequential, service-aggregated manner. In all but the first invocation of , we also integrate the effects of any deferred-transfers implied by the previous round of accumulation, thus the accumulation function must accept both the information contained in work-digests and that of deferred-transfers.
Rather than passing whole work-digests into accumulate, we extract the salient information from them and combine with information implied by their work-reports. We call this kind of combined value an operand tuple, . Likewise, we denote the set characterizing a deferred transfer as , noting that a transfer includes a memo component of octets, together with the service index of the sender , the service index of the receiver , the balance to be transferred and the gas limit for the transfer. Formally:
| (12.13) | ||||
| (12.14) |
Note that the union of the two characterizes inputs to the Accumulation invocation function (defined in equation B.9).
Our formalisms continue by defining as a characterization of (i.e. values capable of representing) state components which are both needed and mutable by the accumulation process. This comprises the service accounts state (as in ), the upcoming validator keys , the queue of authorizers and the privileges state . Formally:
| (12.15) |
Finally, we define and , the sets characterizing service-indexed commitments to accumulation output and service-indexed gas usage respectively:
| (12.16) |
We define the outer accumulation function which transforms a gas-limit, a sequence of deferred transfers, a sequence of work-reports, an initial partial-state and a dictionary of services enjoying free accumulation, into a tuple of the number of work-reports accumulated, a posterior state-context, the resultant accumulation-output pairings, the service-indexed gas usage and the sequence of processed transfers:
| (12.17) |
We come to define the parallelized accumulation function which, with the help of the single-service accumulation function , transforms an initial state-context, together with a sequence of deferred transfers, a sequence of work-reports and a dictionary of privileged always-accumulate services, into a tuple of the posterior state-context, the resultant deferred-transfers and accumulation-output pairings, and the service-indexed gas usage. Note that for the privileges we employ a function which selects the service to which the manager service changed, or if no change was made, then that which the service itself changed to. This allows privileges to be ‘owned‘ and facilitates the removal of the manager service which we see as a helpful possibility. Formally:
| (12.18) |
| (12.19) |
Δ R
And is the preimage integration function, which transforms a dictionary of service states and a set of service/blob pairs into a new dictionary of service states. Preimage provisions into services which no longer exist or whose relevant request is dropped are disregarded:
| (12.20) | ||||
| (12.21) |
We note that while forming the union of all altered, newly added service and newly removed indices, defined in the above context as , different services may not each contribute the same index for a new, altered or removed service. This cannot happen for the set of removed and altered services since the code hash of removable services has no known preimage and thus cannot execute itself to make an alteration. For new services this should also never happen since new indices are explicitly selected to avoid such conflicts. In the unlikely event it does happen, the block must be considered invalid.
The single-service accumulation function, , transforms an initial state-context, a sequence of deferred-transfers, a sequence of work-reports, a dictionary of services enjoying free accumulation (with the values indicating the amount of free gas) and a service index into an alterations state-context, a sequence of transfers, a possible accumulation-output, the actual pvm gas used and a set of preimage provisions. This function wrangles the work-digests of a particular service from a set of work-reports and invokes pvm execution with said data:
| (12.22) |
| (12.23) |
This draws upon , the gas limit implied by the selected deferred-transfers, work-reports and gas-privileges.
12.3. Final State Integration
Given the result of the top-level , we may define the posterior state , and as well as the first intermediate state of the service-accounts and the Accumulation Output Log :
| (12.24) | ||||
| (12.25) | ||||
| (12.26) |
From this formulation, we also receive , the total number of work-reports accumulated and , the gas used in the accumulation process for each service. We compose , our accumulation statistics, which is a mapping from the service indices which were accumulated to the amount of gas used throughout accumulation and the number of work-items and transfers accumulated. Formally:
| (12.27) | ||||
| (12.28) | ||||
| where | ||||
| and | ||||
| and | ||||
| and |
The second intermediate state may then be defined with the last-accumulation record being updated for all accumulated services:
| (12.29) | ||||
| (12.30) |
We define the final state of the ready queue and the accumulated map by integrating those work-reports which were accumulated in this block and shifting any from the prior state with the oldest such items being dropped entirely:
| (12.31) | ||||
| (12.32) | ||||
| (12.33) |
12.4. Preimage Integration
After accumulation, we must integrate all preimages provided in the lookup extrinsic to arrive at the posterior account state. The lookup extrinsic is a sequence of pairs of service indices and data. These pairs must be ordered and without duplicates (equation 12.35 requires this). The data must have been solicited by a service but not yet provided in the prior state. Formally:
| (12.34) | ||||
| (12.35) | ||||
| (12.36) |
We disregard, without prejudice, any preimages which due to the effects of accumulation are no longer useful. We define as the state after the integration of the still-relevant preimages:
| (12.37) |
13. Statistics
13.1. Validator Activity
The Jam chain does not explicitly issue rewards—we leave this as a job to be done by the staking subsystem (in Polkadot’s case envisioned as a system parachain—hosted without fees—in the current imagining of a public Jam network). However, much as with validator punishment information, it is important for the Jam chain to facilitate the arrival of information on validator activity in to the staking subsystem so that it may be acted upon.
Such performance information cannot directly cover all aspects of validator activity; whereas block production, guarantor reports and availability assurance can easily be tracked on-chain, Grandpa, Beefy and auditing activity cannot. In the latter case, this is instead tracked with validator voting activity: validators vote on their impression of each other’s efforts and a median may be accepted as the truth for any given validator. With an assumption of 50% honest validators, this gives an adequate means of oraclizing this information.
The validator statistics are made on a per-epoch basis and we retain one record of completed statistics () together with one record which serves as an accumulator for the present epoch (). For each epoch we track a performance record for each validator. Together with the core and service statistics defined in the following section and , these form , which is thus a tuple of four elements.
| (13.1) | ||||
| (13.2) | ||||
| (13.3) |
The six validator statistics we track are:
- :
-
The number of blocks produced by the validator.
- :
-
The number of tickets introduced by the validator.
- :
-
The number of preimages introduced by the validator.
- :
-
The total number of octets across all preimages introduced by the validator.
- :
-
The number of reports guaranteed by the validator.
- :
-
The number of availability assurances made by the validator.
The objective statistics are updated in line with their description, formally:
| (13.4) | ||||
| (13.5) | ||||
| (13.6) | ||||
Note that is the Reporters set, as defined in equation 11.28.
13.2. Cores and Services
The other two components of statistics are the core and service activity statistics. These are tracked only on a per-block basis unlike the validator statistics which are tracked over the whole epoch.
| (13.7) | ||||
| (13.8) |
The core statistics are updated using several intermediate values from across the overall state-transition function; , the incoming work-reports, as defined in 11.30 and , the newly available work-reports, as defined in 11.17. We define the statistics as follows:
| (13.9) | ||||
| (13.10) | ||||
| (13.11) | ||||
| (13.12) |
Finally, the service statistics are updated using the same intermediate values as the core statistics, but with a different set of calculations:
| (13.13) | ||||
| (13.14) | ||||
| (13.15) | ||||
| (13.16) | ||||
| (13.17) |
14. Work Packages and Work Reports
14.1. Honest Behavior
We have so far specified how to recognize blocks for a correctly transitioning Jam blockchain. Through defining the state transition function and a state Merklization function, we have also defined how to recognize a valid header. While it is not especially difficult to understand how a new block may be authored for any node which controls a key which would allow the creation of the two signatures in the header, nor indeed to fill in the other header fields, readers will note that the contents of the extrinsic remain unclear.
We define not only correct behavior through the creation of correct blocks but also honest behavior, which involves the node taking part in several off-chain activities. This does have analogous aspects within YP Ethereum, though it is not mentioned so explicitly in said document: the creation of blocks along with the gossiping and inclusion of transactions within those blocks would all count as off-chain activities for which honest behavior is helpful. In Jam’s case, honest behavior is well-defined and expected of at least of validators.
Beyond the production of blocks, incentivized honest behavior includes:
-
•
the guaranteeing and reporting of work-packages, along with chunking and distribution of both the chunks and the work-package itself, discussed in section 15;
-
•
assuring the availability of work-packages after being in receipt of their data;
-
•
determining which work-reports to audit, fetching and auditing them, and creating and distributing judgments appropriately based on the outcome of the audit;
-
•
submitting the correct amount of auditing work seen being done by other validators, discussed in section 13.
14.2. Segments and the Manifest
Work-packages are generally small to ensure guarantors need not invest a lot of bandwidth in order to discover whether they can get paid for their evaluation into a work-report. Rather than having much data inline, they instead reference data through commitments. The simplest commitments are extrinsic data.
Extrinsic data are blobs which are being introduced into the system alongside the work-package itself generally by the work-package builder. They are exposed to the Refine logic as an argument. We commit to them through including each of their hashes in the work-package.
Work-packages have two other types of external data associated with them: A cryptographic commitment to each imported segment and finally the number of segments which are exported.
14.2.1. Segments, Imports and Exports
The ability to communicate large amounts of data from one work-package to some subsequent work-package is a key feature of the Jam availability system. An export segment, defined as the set , is an octet sequence of fixed length . It is the smallest datum which may individually be imported from—or exported to—the long-term D3L during the Refine function of a work-package.
| (14.1) |
Exported segments are data which are generated through the execution of the Refine logic and thus are a side effect of transforming the work-package into a work-report. Since their data is deterministic based on the execution of the Refine logic, we do not require any particular commitment to them in the work-package beyond knowing how many are associated with each Refine invocation in order that we can supply an exact index.
On the other hand, imported segments are segments which were exported by previous work-packages. In order for them to be easily fetched and verified they are referenced not by hash but rather the root of a Merkle tree which includes any other segments introduced at the time, together with an index into this sequence. This allows for justifications of correctness to be generated, stored, included alongside the fetched data and verified. This is described in depth in the next section.
14.2.2. Data Collection and Justification
It is the task of a guarantor to reconstitute all imported segments through fetching said segments’ erasure-coded chunks from enough unique validators. Reconstitution alone is not enough since corruption of the data would occur if one or more validators provided an incorrect chunk. For this reason we ensure that the import segment specification (a Merkle root and an index into the tree) be a kind of cryptographic commitment capable of having a justification applied to demonstrate that any particular segment is indeed correct.
Justification data must be available to any node over the course of its segment’s potential requirement. At around 350 bytes to justify a single segment, justification data is too voluminous to have all validators store all data. We therefore use the same overall availability framework for hosting justification metadata as the data itself.
The guarantor is able to use this proof to justify to themselves that they are not wasting their time on incorrect behavior. We do not force auditors to go through the same process. Instead, guarantors build an Auditable Work Package, and place this in the Audit da system. This is the original work-package, its extrinsic data, its imported data and a concise proof of correctness of that imported data. This tactic routinely duplicates data between the D3L and the Audit da, however it is acceptable in order to reduce the bandwidth cost for auditors who must justify the correctness as cheaply as possible as auditing happens on average 30 times for each work-package whereas guaranteeing happens only twice or thrice.
14.3. Packages and Items
We begin by defining a work-package, of set , and its constituent work-items, of set . A work-package includes a simple blob acting as an authorization token , the index of the service which hosts the authorization code , an authorization code hash and a configuration blob , a context and a sequence of work items :
| (14.2) |
A work item includes: the identifier of the service to which it relates, the code hash of the service at the time of reporting (whose preimage must be available from the perspective of the lookup anchor block), a payload blob , gas limits for Refinement and Accumulation & , and the three elements of its manifest, a sequence of imported data segments which identify a prior exported segment through an index and the identity of an exporting work-package, , a sequence of extrinsic data hashes and lengths and the number of data segments exported by this work item.
| (14.3) |
Note that an imported data segment’s work-package is identified through the union of sets and a tagged variant . A value drawn from the regular implies the hash value is of the segment-root containing the export, whereas a value drawn from implies the hash value is the hash of the exporting work-package. In the latter case it must be converted into a segment-root by the guarantor and this conversion reported in the work-report for on-chain validation. Work-packages referenced in this manner are considered dependencies, as are work-packages explicitly listed as prerequisites in the context . We limit the total number of dependencies to :
| (14.4) |
We limit the total number of exported items to , the total number of imported items to , and the total number of extrinsics to :
| (14.5) |
We make an assumption that the preimage to each extrinsic hash in each work-item is known by the guarantor. In general this data will be passed to the guarantor alongside the work-package.
We limit the total size of the auditable work-bundle, containing the work-package, import and extrinsic items, together with all payloads, the authorizer configuration and the authorization token to around 13.6mb. This limit allows 2mb/s/core D3L imports, and thus a full complement of 3,072 imports, assuming no extrinsics, 64 bytes for each of the authorization token and trace, and a work-item payload of 4kb:
| (14.6) | ||||
| (14.7) | ||||
| (14.8) |
We limit the sums of each of the two gas limits to be at most the maximum gas allocated to a core for the corresponding operation:
| (14.9) |
Given the result and gas used of some work-item, we define the item-to-digest function as:
| (14.10) |
We define the work-package’s implied authorizer as , the hash of the authorization code hash concatenated with the configuration. We define the authorization code as and require that it be available at the time of the lookup anchor block from the historical lookup of service . Formally:
| (14.11) |
(The historical lookup function, , is defined in equation LABEL:eq:historicallookup.)
14.3.1. Exporting
Any of a work-package’s work-items may export segments and a segments-root is placed in the work-report committing to these, ordered according to the work-item which is exporting. It is formed as the root of a constant-depth binary Merkle tree as defined in equation E.4.
Guarantors are required to erasure-code and distribute two data sets: one blob, the auditable bundle containing the encoded work-package, extrinsic data and self-justifying imported segments which is placed in the short-term Audit da store; and a second set of exported-segments data together with the Paged-Proofs metadata. Items in the first store are short-lived; assurers are expected to keep them only until finality of the block in which the availability of the work-result’s work-package is assured. Items in the second, meanwhile, are long-lived and expected to be kept for a minimum of 28 days (672 complete epochs) following the reporting of the work-report. This latter store is referred to as the Distributed, Decentralized, Data Lake or D3L owing to its large size.
We define the paged-proofs function which accepts a series of exported segments and defines some series of additional segments placed into the D3L via erasure-coding and distribution. The function evaluates to pages of hashes, together with subtree proofs, such that justifications of correctness based on a segments-root may be made from it:
| (14.12) |
14.4. Computation of Work-Report
We now come to the work-report computation function . This forms the basis for all utilization of cores on Jam. It accepts some work-package , nominated core , segment-root dictionary , and assurer set size , and results in either an error or the work-report. This function is deterministic and requires only that it be evaluated within eight epochs of a recently finalized block thanks to the historical lookup functionality. It can thus comfortably be evaluated by any node within the auditing period, even allowing for practicalities of imperfect synchronization. Formally:
| (14.13) |
Where:
Note that we gracefully handle both the case where the size of the work output would take the work-report beyond its acceptable size and where number of segments exported by a work-item’s Refinement execution is incorrectly reported in the work-item’s export segment count. In both cases, the work-package continues to be valid as a whole, but the work-item’s exported segments are replaced by a sequence of zero-segments equal in size to the export segment count and its output is replaced by an error.
The first term to be introduced, , is the authorization trace, the result of the Is-Authorized function together with the amount of gas it used. The Is-Authorized logic must be executed first in order to ensure that the work-package warrants the needed core-time.
The next term, , is if there is no valid report for the given arguments. This is the case if the Is-Authorized function fails, or if the segment-root dictionary does not contain the expected entries. is expected to have entries for all unique work-package hashes of imported segments not identified directly via a segment-root but rather through a work-package hash.
In order to expect to be compensated for a work-report they are building, guarantors must compose a value for to ensure not only the above but also a further constraint that all pairs of work-package hashes and segment-roots do properly correspond:
| (14.14) | ||||
As long as the guarantor is unable to satisfy the above constraints, then it should consider the work-package unable to be guaranteed. Auditors are not expected to populate but rather to reuse the value in the work-report they are auditing.
The third term, is the sequence of results for each of the work-items in the work-package together with all segments exported by each work-item. The fourth definition performs an ordered accumulation (i.e. counter) in order to ensure that the Refine function has access to the total number of exports made from the work-package up to the current work-item.
The above relies on two functions, and which, respectively, define the import segment data (utilising the segment-root dictionary ) and the extrinsic data for some work-item argument . We also define the segment-root lookup function , which collapses a segment-root or work-package hash into a segment-root, and , which compiles justifications of segment data:
| (14.15) | ||||
Note that while the formulations of and seem to require (due to the inner term ) all segments exported by all work-packages exporting a segment to be imported, such a vast amount of data is not generally needed. In particular, each justification can be derived through a single paged-proof. This reduces the worst case data fetching for a guarantor to two segments for every one to be imported. In the case that contiguously exported segments are imported (which we might assume is a fairly common situation), then a single proof-page should be sufficient to justify many imported segments.
Using these three functions, we may now define the Bundle Assembly function , which assembles an audit-friendly bundle from a work-package and a segment-root dictionary. The bundle comprises the work-package itself, the extrinsic data, and the concatenated import segments along with their proofs of correctness.
| (14.16) |
Note the lack of length prefixes: only the Merkle paths for the justifications have a length prefix. All other sequence lengths are determinable through the work package itself.
Finally, we may define as the data availability specification of the package using together with the yet to be defined Availability Specifier function (see section 14.4.1):
| (14.17) |
Before executing the Refine function, the guarantor should ensure that all segment-tree roots which form imported segment commitments are known and have not expired. Additionally, the guarantor should ensure that they can fetch all preimage data referenced as the commitments of extrinsic segments.
The guarantor must reconstruct imported segments, though this process may in fact be lazy as the Refine function makes no usage of the data until the fetch host-call is made. Fetching generally implies that, for each imported segment, erasure-coded chunks are retrieved from enough unique validators (approximately one third) and is described in more depth in appendix H. (Since we specify systematic erasure-coding, its reconstruction is trivial in the case that the correct validators are responsive.) Chunks must be fetched for both the data itself and for justification metadata which allows us to ensure that the data is correct.
Validators, in their role as availability assurers, should index such chunks according to the index of the segments-tree whose reconstruction they facilitate. Since the data for segment chunks is so small (12 octets in the case of 1023 assurers), fixed communications costs should be kept to a bare minimum. A good network protocol (out of scope at present) will allow guarantors to specify only the segments-tree root and index together with a Boolean to indicate whether the proof chunk need be supplied. Since we assume enough other validators are online and benevolent, we can assume that the guarantor can compute and above with confidence, based on the general availability of data committed to with , which is specified below. Note that each imported segment must be fetched from the validator set which assured its availability; this set may no longer be active and its size may differ from that of the active validator set.
14.4.1. Availability Specifier
We define the availability specifier function , which creates an availability specifier from the package hash, an octet sequence of the audit-friendly work-package bundle, the sequence of exported segments, and the size of the assuring validator set:
| (14.18) |
The paged-proofs function , defined earlier in equation 14.12, accepts a sequence of segments and returns a sequence of paged-proofs sufficient to justify the correctness of every segment. There are exactly paged-proof segments as the number of yielded segments, each composed of a page of 64 hashes of segments, together with a Merkle proof from the root to the subtree-root which includes those 64 segments.
The functions and are the fixed-depth and simple binary Merkle root functions, defined in equations E.4 and E.3. The functions and are the erasure-coding and original-chunk-count functions, defined in appendix H.
And is the zero-padding function to take an octet array to some multiple of in length:
| (14.19) |
Validators are incentivized to distribute each newly erasure-coded data chunk to the relevant validator, since they are not paid for guaranteeing unless a work-report is considered to be available by a super-majority of validators. Given our work-package , we should therefore send the corresponding work-package bundle chunk and exported segments chunks to each validator, together with proofs such that each validator can justify completeness according to the work-report’s erasure-root. In the case of a coming epoch change, they may also maximize expected reward by distributing to the new validator set. Note that as erasure-coding depends on the size of the assuring validator set, distribution to two validator sets of different sizes necessitates performing erasure-coding twice and production of two distinct work-reports, only one of which may be reported on-chain.
We will see this function utilized in the next sections, for guaranteeing, auditing and judging.
A C I S P R X b
15. Guaranteeing
Guaranteeing work-packages involves the creation and distribution of a corresponding work-report which requires certain conditions to be met. Along with the report, a signature demonstrating the validator’s commitment to its correctness is needed. With two guarantor signatures, the work-report may be distributed to the forthcoming Jam chain block author in order to be used in the , which leads to a reward for the guarantors.
We presume that in a public system, validators will be punished severely if they malfunction and commit to a report which does not faithfully represent the result of applied on a work-package. Overall, the process is:
-
(1)
Evaluation of the work-package’s authorization, and cross-referencing against the authorization pool in the most recent Jam chain state.
-
(2)
Evaluation of each work-item’s refine logic.
-
(3)
Chunking of the work-package bundle and the data exported by the refine logic, according to the erasure codec.
-
(4)
Assembly and publication of a work-package report.
-
(5)
Distributing the aforementioned chunks across the validator set.
-
(6)
Providing the work-package bundle and exported data to other validators on request is also helpful for optimal network performance.
For any work-package a guarantor is in receipt of, the work-report , if any, may be determined by:
| (15.1) |
When Jam chain state is needed, the chain state of the most recent block should always be utilized. The guarantor may choose values for (the core index), (the segment-root dictionary), and (the size of the assuring validator set), but should be careful to choose values which allow the work-report to be included on-chain (see section 11.4). Typically should be the core currently assigned to the guarantor, should be the size of the active validator set, and should be derived from previously guaranteed work-reports.
Following production of a work-report , a guarantor may safely create and distribute the payload , where identifies the guarantor by index and is a signature created according to equation 11.28. Specifically, is a signature using the validator’s registered Ed25519 key on a payload :
| (15.2) |
To maximize profit, the guarantor should require the work-report meets all expectations which are in place during the guarantee extrinsic described in section 11.4. This includes contextual validity and inclusion of the authorization in the authorization pool. Not doing so does not result in punishment, but will prevent the block author from including the report and so reduces rewards.
Advanced nodes may maximize the likelihood that their reports will be includable on-chain by attempting to predict the state of the chain at the time that the report will get to the block author. Naive nodes may simply use the current chain head when verifying the work-package. To minimize work done, nodes should make all such evaluations prior to evaluating the function to calculate the report’s work-digests.
Once evaluated as a reasonable work-package to guarantee, guarantors should maximize the chance that their work is not wasted by attempting to form consensus over the core. To achieve this they should send the work-package to any other guarantors on the same core which they do not believe already know of it.
In order to minimize the work for block authors and thus maximize expected profits, guarantors should attempt to construct their core’s next guarantee extrinsic from the work-report and set of attestations including their own and as many others as possible.
In order to minimize the chance of any block authors disregarding the guarantor for anti-spam measures, guarantors should sign an average of no more than two work-reports per timeslot.
16. Availability Assurance
Every timeslot, each validator should issue a signed statement, called an assurance, indicating which of the current availability assignments the validator has received its erasure-coded shards for. There are two classes of shard a validator must have for an availability assignment before claiming possession in an assurance:
Firstly, their erasure-coded shard for the assigned report’s bundle. This shard is needed to verify the work-report’s validity and completeness and need not be retained after the work-report is considered audited. Until then, it should be provided on request to validators.
Secondly, the validator should have in hand the corresponding erasure-coded shard for each of the exported segments referenced by the assigned report’s segments root. These should be retained for 28 days and provided to any validator on request.
The validity of a validator’s shards for a work-report can be trivially proven through the work-report’s work-package erasure-root and a Merkle-proof of inclusion in the correct location. Shards distributed by guarantors should be accompanied by such proofs, and a validator should not claim possession of shards in an assurance unless it has, and has verified, the corresponding proofs.
17. Auditing and Judging
The auditing and judging system is theoretically equivalent to that in Elves, introduced by [4]. For a full security analysis of the mechanism, see this work. There is a difference in terminology, where the terms backing, approval and inclusion there refer to our guaranteeing, auditing and accumulation, respectively.
17.1. Overview
The auditing process involves each node requiring themselves to fetch, evaluate and issue judgment on a random but deterministic set of work-reports from each Jam chain block in which the work-report becomes available (i.e. from ). Prior to any evaluation, a node declares and proves its requirement. At specific common junctures in time thereafter, the set of work-reports which a node requires itself to evaluate from each block’s may be enlarged if any declared intentions are not matched by a positive judgment in a reasonable time or in the event of a negative judgment being seen. These enlargement events are called tranches.
If all declared intentions for a work-report are matched by a positive judgment at any given juncture, then the work-report is considered audited. Once all of any given block’s newly available work-reports are audited, then we consider the block to be audited. One prerequisite of a node finalizing a block is for it to view the block as audited. Note that while there will be eventual consensus on whether a block is audited, there may not be consensus at the time that the block gets finalized. This does not affect the crypto-economic guarantees of this system.
In regular operation, no negative judgments will ultimately be found for a work-report, and there will be no direct consequences of the auditing stage. In the unlikely event that a negative judgment is found, then one of several things happens; if greater than of the validators still issue positive judgments, then validators issuing negative judgments may receive a punishment for time-wasting. If greater than of the validators issue negative judgments, then the block which includes the work-report is ban-listed. It and all its descendants are disregarded and may not be built on. In all cases, once there are enough votes, a verdict can be constructed by a block author and placed in a disputes extrinsic to denote the outcome. See section 10 for details on this.
All announcements and judgments are published to all validators along with metadata describing the signed material. On receipt of sure data, validators are expected to update their perspective accordingly (later defined as and ).
17.2. Data Fetching
For each work-report to be audited, we use its erasure-root to request erasure-coded chunks from enough assurers, and then reconstruct the work-package bundle from these chunks. The bundle contains the work-package together with its extrinsic data, imported segments, and imported segment justifications.
We may validate the work-package by ensuring its hash is equivalent to the hash included as part of the work-package specification in the work-report. We may validate the extrinsic data through ensuring their hashes are each equivalent to those found in the relevant work-item.
Finally, we may validate each imported segment as a justification must follow the concatenated segments which allows verification that each segment’s hash is included in the referencing Merkle root and index of the corresponding work-item.
Exported segments need not be reconstructed in the same way, but rather should be determined in the same manner as with guaranteeing, i.e. through the execution of the Refine logic.
All items in the work-package specification field of the work-report should be recalculated from this now known-good data and verified, essentially retracing the guarantors steps and ensuring correctness.
17.3. Selection of Reports
Each validator shall perform auditing duties on each valid block received. Since we are entering off-chain logic, and we cannot assume consensus, we henceforth consider ourselves a specific validator of index and assume ourselves focused on some recent block with other terms corresponding to the state-transition implied by that block, so is said block’s prior availability assignments, is its prior validator set, is its header &c. Practically, all considerations must be replicated for all blocks and multiple blocks’ considerations may be underway simultaneously.
We define the sequence of work-reports which we may be required to audit as , a sequence of length equal to the number of active cores, which functions as a mapping of core index to a work-report which has just become available, or if no report became available on the core. Formally:
| (17.1) | ||||
| (17.2) |
We define our initial audit tranche in terms of a verifiable random quantity created specifically for it:
| (17.3) | ||||
| (17.4) |
We may then define as the non-empty items to audit through a verifiably random selection of ten cores:
| (17.5) |
Every seconds following a new time slot, a new tranche begins, and we may determine that additional cores warrant an audit from us. Such items are defined as where is the current tranche. Formally:
| (17.6) |
New tranches may contain items from stemming from one of two reasons: either a negative judgment has been received; or the number of judgments from the previous tranche is less than the number of announcements from said tranche. In the first case, the validator is always required to issue a judgment on the work-report. In the second case, a new special-purpose vrf must be constructed to determine if an audit and judgment is warranted from us.
In all cases, we publish a signed statement of which of the cores we believe we are required to audit (an announcement) together with evidence of the vrf signature to select them and the other validators’ announcements from the previous tranche unmatched with a judgment in order that all other validators are capable of verifying the announcement. Publication of an announcement should be taken as a contract to complete the audit regardless of any future information.
Formally, for each tranche we ensure the announcement statement is published and distributed to all other validators along with our validator index , evidence and all signed data. Validator’s announcement statements must be in the set :
| (17.7) | ||||
| (17.8) | ||||
| (17.9) |
We define as our perception of which validators are required to audit each of the work-reports at tranche . This comes from each other validators’ announcements (defined above). It cannot be correctly evaluated until is current. We have absolute knowledge about our own audit requirements.
| (17.10) |
We further define and to be the validator indices who we know to have made respectively, positive and negative, judgments mapped from each work-report. We don’t care from which tranche a judgment is made.
| (17.12) |
We are able to define for tranches beyond the first on the basis of the number of validators who we know are required to conduct an audit yet from whom we have not yet seen a judgment. It is possible that the late arrival of information alters and nodes should reevaluate and act accordingly should this happen.
We can thus define beyond the initial tranche through a new vrf which acts upon the set of no-show validators.
| (17.13) | ||||
| (17.14) | ||||
| (17.15) |
We define our bias factor , which is the expected number of validators which will be required to issue a judgment for a work-report given a single no-show in the tranche before. Modeling by [4] shows that this is optimal.
Later audits must be announced in a similar fashion to the first. If audit requirements lessen on the receipt of new information (i.e. a positive judgment being returned for a previous no-show), then any audits already announced are completed and judgments published. If audit requirements raise on the receipt of new information (i.e. an additional announcement being found without an accompanying judgment), then we announce the additional audit(s) we will undertake.
As increases with the passage of time becomes known and defines our auditing responsibilities. We must attempt to reconstruct all work-package bundles corresponding to each work-report we must audit. This may be done through requesting erasure-coded chunks from one-third of the validators, verified through the erasure coding’s Merkle root, and reconstructing the bundles as per the recovery function defined in section H.
Thus, for any such work-report we are assured we will be able to reconstruct some candidate work-package bundle . We decode this candidate bundle into a work-package and its associated extrinsic data, imported segments, and imported segment proofs. These are verified as described in section 17.2. We then attempt to reproduce the report on the core to give , a mapping from reports to evaluations:
| (17.16) | ||||
Here, is the bundle assembly function as defined in equation 14.16 and is the work-report computation function as defined in equation 14.13. Note that a failure to decode or verify the bundle implies an invalid work-report.
It may be possible to bypass the fetching of erasure-coded chunks and reconstruction of the bundle from them by asking a cooperative third party (e.g. an original guarantor) directly for the full bundle data. If this data cannot be decoded or verified however, we must fall back to reconstruction from erasure-coded chunks.
From the mapping the validator issues a set of judgments :
| (17.17) |
All judgments should be published to other validators in order that they build their view of and in the case of a negative judgment arising, can form an extrinsic for .
We consider a work-report as audited under two circumstances. Either, when it has no negative judgments and there exists some tranche in which we see a positive judgment from all validators who we believe are required to audit it; or when we see positive judgments for it from greater than two-thirds of the validator set.
| (17.18) |
Our block may be considered audited, a condition denoted , when all the work-reports which were made available are considered audited. Formally:
| (17.19) |
For any block we must judge it to be audited (i.e. ) before we vote for the block to be finalized in Grandpa. See section 19 for more information here.
Furthermore, we pointedly disregard chains which include the accumulation of a report which we know at least of validators judge as being invalid. Any chains including such a block are not eligible for authoring on. The best block, i.e. that on which we build new blocks, is defined as the chain with the most regular Safrole blocks which does not contain any such disregarded block. Implementation-wise, this may require reversion to an earlier head or alternative fork.
As a block author, if we have observed a sufficient number of judgments for a work-report, and at least one negative judgment, we may construct a verdict for the work-report and include it in the disputes extrinsic. If the verdict establishes that the work-report is not valid (i.e. fewer than two-thirds-plus-one of judgments confirm validity) then, as per the preceding paragraph, it should be introduced on a chain where the report has not yet been accumulated. The verdict will cause the corresponding availability assignment to be cleared from , preventing accumulation of the invalid work-report. Refer to section 10 for more details on this.
18. Beefy Distribution
For each finalized block which a validator imports, said validator shall make a bls signature on the bls12-381 curve, as defined by [17], affirming the Keccak hash of the block’s most recent Beefy mmr. This should be published and distributed freely, along with the signed material. These signatures may be aggregated in order to provide concise proofs of finality to third-party systems. The signing and aggregation mechanism is defined fully by [5].
Formally, let be the signed commitment of validator index which will be published:
| (18.1) | ||||
| (18.2) |
19. Grandpa and the Best Chain
Nodes take part in the Grandpa protocol as defined by [32].
We define the latest finalized block as . All associated terms concerning block and state are similarly superscripted. We consider the best block, to be that which is drawn from the set of acceptable blocks of the following criteria:
-
•
Has the finalized block as an ancestor.
-
•
Contains no unfinalized blocks where we see an equivocation (two valid blocks at the same timeslot).
-
•
Is considered audited.
Formally:
| (19.1) | ||||
| (19.2) | ||||
| (19.3) |
Of these acceptable blocks, that which contains the most ancestor blocks whose author used a slot-sealer ticket, rather than a fallback key should be selected as the best head, and thus the chain on which the participant should make Grandpa votes.
Formally, we aim to select to maximize the value where:
| (19.4) |
The specific data to be voted on in Grandpa shall be the block header of the best block, together with its posterior state root, . The state root has no direct relevance to the Grandpa protocol, but is included alongside the header during voting/signing into order to ensure that systems utilizing the output of Grandpa are able to verify the most recent chain state as possible.
This implies that the posterior state must be known at the time that Grandpa voting occurs in order to finalize the block. However, since Grandpa is relied on primarily for state-root verification it makes little sense to finalize a block without an associated commitment to the posterior state.
The posterior state only affects Grandpa voting in so much as votes for the same block hash but with different associated posterior state roots are considered votes for different blocks. This would only happen in the case of a misbehaving node or an ambiguity in the present document.
20. Discussion
20.1. Technical Characteristics
In total, with our stated target of 1,023 validators and three validators per core, along with requiring a mean of ten audits per validator per timeslot, and thus 30 audits per work-report, Jam is capable of trustlessly processing and integrating 341 work-packages per timeslot.
We assume node hardware is a modern 16 core cpu with 64gb ram, 8tb secondary storage and 0.5gbe networking.
Our performance models assume a rough split of cpu time as follows:
| Proportion | |
| Audits | |
| Merklization | |
| Block execution | |
| Grandpa and Beefy | |
| Erasure coding | |
| Networking & misc |
Estimates for network bandwidth requirements are as follows:
| Throughput, mb/slot | Tx | Rx |
| Guaranteeing | 106 | 48 |
| Assuring | 144 | 13 |
| Auditing | 0 | 133 |
| Authoring | 53 | 87 |
| Grandpa and Beefy | 4 | 4 |
| Total | 304 | 281 |
| Implied bandwidth, mb/s | 387 | 357 |
Thus, a connection able to sustain 500mb/s should leave a sufficient margin of error and headroom to serve other validators as well as some public connections, though the burstiness of block publication would imply validators are best to ensure that peak bandwidth is higher.
Under these conditions, we would expect an overall network-provided data availability capacity of 2pb, with each node dedicating at most tb to availability storage.
Estimates for memory usage are as follows:
| gb | ||
| Auditing | 20 | 2 10 pvm instances |
| Block execution | 2 | 1 pvm instance |
| State cache | 40 | |
| Misc | 2 | |
| Total | 64 |
As a rough guide, each parachain has an average footprint of around 2mb in the Polkadot Relay chain; a 40gb state would allow 20,000 parachains’ information to be retained in state.
What might be called the “virtual hardware” of a Jam core is essentially a regular cpu core executing at somewhere between 25% and 50% of regular speed for the whole six-second portion and which may draw and provide 2mb/s average in general-purpose i/o and utilize up to 2gb in ram. The i/o includes any trustless reads from the Jam chain state, albeit in the recent past. This virtual hardware also provides unlimited reads from a semi-static preimage-lookup database.
Each work-package may occupy this hardware and execute arbitrary code on it in six-second segments to create some result of at most 48kb. This work-result is then entitled to 10ms on the same machine, this time with no “external” i/o, but instead with full and immediate access to the Jam chain state and may alter the service(s) to which the results belong.
20.2. Illustrating Performance
In terms of pure processing power, the Jam machine architecture can deliver extremely high levels of homogeneous trustless computation. However, the core model of Jam is a classic parallelized compute architecture, and for solutions to be able to utilize the architecture well they must be designed with it in mind to some extent. Accordingly, until such use-cases appear on Jam with similar semantics to existing ones, it is very difficult to make direct comparisons to existing systems. That said, if we indulge ourselves with some assumptions then we can make some crude comparisons.
20.2.1. Comparison to Polkadot
Polkadot is at present capable of validating at most 80 parachains each doing one second of native computation and 5mb of i/o in every six. This corresponds to an aggregate compute performance of around 13x native cpu and a total 24-hour distributed availability of around 67mb/s. Accumulation is beyond Polkadot’s capabilities and so not comparable.
For comparison, in our basic models, Jam should be capable of attaining around 85x the computation load of a single native cpu core and a distributed availability of 682mb/s.
20.2.2. Simple Transfers
We might also attempt to model a simple transactions-per-second amount, with each transaction requiring a signature verification and the modification of two account balances. Once again, until there are clear designs for precisely how this would work we must make some assumptions. Our most naive model would be to use the Jam cores (i.e. refinement) simply for transaction verification and account lookups. The Jam chain would then hold and alter the balances in its state. This is unlikely to give great performance since almost all the needed i/o would be synchronous, but it can serve as a basis.
A 12mb work-package can hold around 96k transactions at 128 bytes per transaction. However, a 48kb work-result could only encode around 6k account updates when each update is given as a pair of a 4 byte account index and 4 byte balance, resulting in a limit of 3k transactions per package, or 171k tps in total. It is possible that the eight bytes could typically be compressed by a byte or two, increasing maximum throughput a little. Our expectations are that state updates, with highly parallelized Merklization, can be done at between 500k and 1 million reads/write per second, implying around 250k-350k tps, depending on which turns out to be the bottleneck.
A more sophisticated model would be to use the Jam cores for balance updates as well as transaction verification. We would have to assume that state and the transactions which operate on them can be partitioned between work-packages with some degree of efficiency, and that the 12mb of the work-package would be split between transaction data and state witness data. Our basic models predict that a 32-bit account system paginated into accounts/page and 128 bytes per transaction could, assuming only around 1% of oraclized accounts were useful, average upwards of 1.4mtps depending on partitioning and usage characteristics. Partitioning could be done with a fixed fragmentation (essentially sharding state), a rotating partition pattern or a dynamic partitioning (which would require specialized sequencing).
Interestingly, we expect neither model to be bottlenecked in computation, meaning that transactions could be substantially more sophisticated, perhaps with more flexible cryptography or smart-contract functionality, without a significant impact on performance.
20.2.3. Computation Throughput
The tps metric does not lend itself well to measuring distributed systems’ computational performance, so we now turn to another slightly more compute-focussed benchmark: the evm. The basic YP Ethereum network, now approaching a decade old, is probably the best known example of general purpose decentralized computation and makes for a reasonable yardstick. It is able to sustain a computation and i/o rate of 1.25M gas/sec, with a peak throughput of twice that. The evm gas metric was designed to be a time-proportional metric for predicting and constraining program execution. Attempting to determine a concrete comparison to pvm throughput is non-trivial and necessarily opinionated owing to the disparity between the two platforms, including word size, endianness, stack/register architecture and memory model. However, we will attempt to determine a reasonable range of values.
Evm gas does not directly translate into native execution as it also combines state reads and writes as well as transaction input data, implying it is able to process some combination of up to 595 storage reads, 57 storage writes and 1.25M computation-gas as well as 78kb input data in each second, trading one against the other.1313 13 The latest “proto-danksharding” changes allow it to accept 87.3kb/s in committed-to data though this is not directly available within state, so we exclude it from this illustration, though including it with the input data would change the results little. We cannot find any analysis of the typical breakdown between storage i/o and pure computation, so to make a very conservative estimate, we assume it does all four. In reality, we would expect it to be able to do on average of each.
Our experiments1414 14 This is detailed at {https://hackmd.io/@XXX9CM1uSSCWVNFRYaSB5g/HJarTUhJA} and intended to be updated as we get more information. show that on modern, high-end consumer hardware with a high-quality evm implementation, we can expect somewhere between 100 and 500 gas/µs in throughput on pure-compute workloads (we specifically utilized Odd-Product, Triangle-Number and several implementations of the Fibonacci calculation). To make a conservative comparison to pvm, we propose transpilation of the evm code into pvm code and then re-execution of it under the Polkavm prototype.1515 15 It is conservative since we don’t take into account that the source code was originally compiled into evm code and thus the pvm machine code will replicate architectural artifacts and thus is very likely to be pessimistic. As an example, all arithmetic operations in evm are 256-bit and 64-bit native pvm is being forced to honor this even if the source code only actually required 64-bit values.
To help estimate a reasonable lower-bound of evm gas/µs, e.g. for workloads which are more memory and i/o intensive, we look toward real-world permissionless deployments of the evm and see that the Moonbeam network, after correcting for the slowdown of executing within the recompiled WebAssembly platform on the somewhat conservative Polkadot hardware platform, implies a throughput of around 100 gas/µs. We therefore assert that in terms of computation, 1µs approximates to around 100-500 evm gas on modern high-end consumer hardware.1616 16 We speculate that the substantial range could possibly be caused in part by the major architectural differences between the evm isa and typical modern hardware.
Benchmarking and regression tests show that the prototype pvm engine has a fixed preprocessing overhead of around 5ns/byte of program code and, for arithmetic-heavy tasks at least, a marginal factor of 1.6-2% compared to evm execution, implying an asymptotic speedup of around 50-60x. For machine code 1mb in size expected to take of the order of a second to compute, the compilation cost becomes only 0.5% of the overall time. 1717 17 As an example, our odd-product benchmark, a very much pure-compute arithmetic task, execution takes 58s on evm, and 1.04s within our pvm prototype, including all preprocessing. For code not inherently suited to the 256-bit evm isa, we would expect substantially improved relative execution times on pvm, though more work must be done in order to gain confidence that these speed-ups are broadly applicable.
If we allow for preprocessing to take up to the same component within execution as the marginal cost (owing to, for example, an extremely large but short-running program) and for the pvm metering to imply a safety overhead of 2x to execution speeds, then we can expect a Jam core to be able to process the equivalent of around 1,500 evm gas/µs. Owing to the crudeness of our analysis we might reasonably predict it to be somewhere within a factor of three either way—i.e. 500-5,000 evm gas/µs.
Jam cores are each capable of 2mb/s bandwidth, which must include any state i/o and data which must be newly introduced (e.g. transactions). While writes come at comparatively little cost to the core, only requiring hashing to determine an eventual updated Merkle root, reads must be witnessed, with each one costing around 640 bytes of witness conservatively assuming a one-million entry binary Merkle trie. This would result in a maximum of a little over 3k reads/second/core, with the exact amount dependent upon how much of the bandwidth is used for newly introduced input data.
Aggregating everything across Jam, excepting accumulation which could add further throughput, numbers can be multiplied by 341 (with the caveat that each one’s computation cannot interfere with any of the others’ except through state oraclization and accumulation). Unlike for roll-up chain designs such as Polkadot and Ethereum, there is no need to have persistently fragmented state. Smart-contract state may be held in a coherent format on the Jam chain so long as any updates are made through the 8kb/core/sec work-results, which would need to contain only the hashes of the altered contracts’ state roots.
Under our modelling assumptions, we can therefore summarize:
| Eth. L1 | Jam Core | Jam | |
| Compute (evm gas/µs) | 500-5,000 | 0.15-1.5m | |
| State writes (s-1) | n/a | n/a | |
| State reads (s-1) | 4k‡ | 1.4m‡ | |
| Input data (s-1) | 78kb† | 2mb‡ | 682mb‡ |
What we can see is that Jam’s overall predicted performance profile implies it could be comparable to many thousands of that of the basic Ethereum L1 chain. The large factor here is essentially due to three things: spacial parallelism, as Jam can host several hundred cores under its security apparatus; temporal parallelism, as Jam targets continuous execution for its cores and pipelines much of the computation between blocks to ensure a constant, optimal workload; and platform optimization by using a vm and gas model which closely fits modern hardware architectures.
It must however be understood that this is a provisional and crude estimation only. It is included only for the purpose of expressing Jam’s performance in tangible terms. Specifically, it does not take into account:
-
•
that these numbers are based on real performance of Ethereum and performance modelling of Jam (though our models are based on real-world performance of the components);
-
•
any L2 scaling which may be possible with either Jam or Ethereum;
-
•
the state partitioning which uses of Jam would imply;
-
•
the as-yet unfixed gas model for the pvm;
-
•
that pvm/evm comparisons are necessarily imprecise;
-
•
(†) all figures for Ethereum L1 are drawn from the same resource: on average each figure will be only of this maximum.
-
•
(‡) the state reads and input data figures for Jam are drawn from the same resource: on average each figure will be only of this maximum.
We leave it as further work for an empirical analysis of performance and an analysis and comparison between Jam and the aggregate of a hypothetical Ethereum ecosystem which included some maximal amount of L2 deployments together with full Dank-sharding and any other additional consensus elements which they would require. This, however, is out of scope for the present work.
21. Conclusion
We have introduced a novel computation model which is able to make use of pre-existing crypto-economic mechanisms in order to deliver major improvements in scalability without causing persistent state-fragmentation and thus sacrificing overall cohesion. We call this overall pattern collect-refine-join-accumulate. Furthermore, we have formally defined the on-chain portion of this logic, essentially the join-accumulate portion. We call this protocol the Jam chain.
We argue that the model of Jam provides a novel “sweet spot”, allowing for massive amounts of computation to be done in secure, resilient consensus compared to fully-synchronous models, and yet still have strict guarantees about both timing and integration of the computation into some singleton state machine unlike persistently fragmented models.
21.1. Further Work
While we are able to estimate theoretical computation possible given some basic assumptions and even make broad comparisons to existing systems, practical numbers are invaluable. We believe the model warrants further empirical research in order to better understand how these theoretical limits translate into real-world performance. We feel a proper cost analysis and comparison to pre-existing protocols would also be an excellent topic for further work.
We can be reasonably confident that the design of Jam allows it to host a service under which Polkadot parachains could be validated, however further prototyping work is needed to understand the possible throughput which a pvm-powered metering system could support. We leave such a report as further work. Likewise, we have also intentionally omitted details of higher-level protocol elements including cryptocurrency, coretime sales, staking and regular smart-contract functionality.
A number of potential alterations to the protocol described here are being considered in order to make practical utilization of the protocol easier. These include:
-
•
Synchronous calls between services in accumulate.
-
•
Restrictions on the transfer function in order to allow for substantial parallelism over accumulation.
-
•
The possibility of reserving substantial additional computation capacity during accumulate under certain conditions.
-
•
Introducing Merklization into the Work Package format in order to obviate the need to have the whole package downloaded in order to evaluate its authorization.
The networking protocol is also left intentionally undefined at this stage and its description must be done in a follow-up proposal.
Validator performance is not presently tracked on-chain. We do expect this to be tracked on-chain in the final revision of the Jam protocol, but its specific format is not yet certain and it is therefore omitted at present.
22. Acknowledgements
Much of this present work is based in large part on the work of others. The Web3 Foundation research team and in particular Alistair Stewart and Jeff Burdges are responsible for Elves, the security apparatus of Polkadot which enables the possibility of in-core computation for Jam. The same team is responsible for Sassafras, Grandpa and Beefy.
Safrole is a mild simplification of Sassafras and was made under the careful review of Davide Galassi and Alistair Stewart.
The original CoreJam rfc was refined under the review of Bastian Köcher and Robert Habermeier and most of the key elements of that proposal have made their way into the present work.
The pvm is a formalization of a partially simplified PolkaVM software prototype, developed by Jan Bujak. Cyrill Leutwiler contributed to the empirical analysis of the pvm reported in the present work.
The PolkaJam team and in particular Arkadiy Paronyan, Emeric Chevalier and Dave Emett have been instrumental in the design of the lower-level aspects of the Jam protocol, especially concerning Merklization and i/o.
Numerous contributors to the repository since publication have helped correct errors. Thank you to all.
And, of course, thanks to the awesome Lemon Jelly, a.k.a. Fred Deakin and Nick Franglen, for three of the most beautiful albums ever produced, the cover art of the first of which was inspiration for this paper’s background art.
Appendix A Polkadot Virtual Machine
A.1. Basic Definition
We declare the general pvm function . We assume a single-step invocation function define and define the full pvm recursively as a sequence of such mutations up until the single-step mutation results in a halting condition. We additionally define the function deblob which extracts the instruction data, opcode bitmask and dynamic jump table from a pvm program blob, validates its structure, and verifies whether the given is a valid instruction counter location within the program:
| (A.1) | ||||
| (A.4) | deblob |
The pvm exit reason may be one of regular halt , panic or out-of-gas , or alternatively a host-call , in which the host-call identifier is associated, or page-fault F in which case the address into ram is associated.
In the case of a final halt, either through panic or success, the instruction counter returned is zero. In all other cases, the return value of the instruction counter indexes the one which caused the exit to happen and the machine state represents the prior state of said instruction, thus ensuring de facto consistency. In order to continue beyond these exit cases, some environmental factor must be adjusted; for a page-fault, ram must be changed, for a gas-underflow, more gas must be supplied and for a host-call, the instruction-counter must be incremented and the relevant host-call state-transition performed.
A.2. Instructions, Opcodes and Skip-distance
The pvm program blob is split into a series of octets which make up the instruction data and the opcode bitmask as well as the dynamic jump table, . The former two imply an instruction sequence, and by extension a basic-block sequence, itself a sequence of indices of the instructions which follow a block-termination instruction.
The latter, dynamic jump table, is a sequence of indices into the instruction data blob and is indexed into when dynamically-computed jumps are taken. It is encoded as a sequence of natural numbers (i.e. non-negative integers) each encoded with the same length in octets. This length, term above, is itself encoded prior.
The pvm counts instructions in octet terms (rather than in terms of instructions) and it is thus necessary to define which octets represent the beginning of an instruction, i.e. the opcode octet, and which do not. This is the purpose of , the instruction-opcode bitmask. We assert that the length of the bitmask is equal to the length of the instruction blob.
We define the Skip function skip which provides the number of octets, minus one, to the next instruction’s opcode, given the index of instruction’s opcode index into (and by extension ):
| (A.5) |
The Skip function appends with a sequence of set bits in order to ensure a well-defined result for the final instruction .
Given some instruction-index , its opcode is readily expressed as and the distance in octets to move forward to the next instruction is . However, each instruction’s “length” (defined as the number of contiguous octets starting with the opcode which are needed to fully define the instruction’s semantics) is left implicit though limited to being at most 16.
We define as being equivalent to the instructions except with an indefinite sequence of zeroes suffixed to ensure that no out-of-bounds access is possible. This effectively defines any otherwise-undefined arguments to the final instruction and ensures that a trap will occur if the program counter passes beyond the program code. Formally:
| (A.6) |
A.3. Basic Blocks and Termination Instructions
Instructions of the following opcodes are considered basic-block termination instructions; other than trap & fallthrough, they correspond to instructions which may define the instruction-counter to be something other than its prior value plus the instruction’s skip amount:
-
•
Trap and fallthrough: trap , fallthrough
-
•
Jumps: jump , jump_ind
-
•
Load-and-Jumps: load_imm_jump , load_imm_jump_ind
-
•
Branches: branch_eq , branch_ne , branch_ge_u , branch_ge_s , branch_lt_u , branch_lt_s , branch_eq_imm , branch_ne_imm
-
•
Immediate branches: branch_lt_u_imm , branch_lt_s_imm , branch_le_u_imm , branch_le_s_imm , branch_ge_u_imm , branch_ge_s_imm , branch_gt_u_imm , branch_gt_s_imm
We denote this set, as opcode indices rather than names, as , which is a subset of all valid opcode indices . We define the instruction opcode indices denoting the beginning of basic-blocks as :
| (A.7) |
For convenience we will define function which represents the start of a basic block for any within that basic block. Formally:
| (A.8) |
A.4. Single-Step State Transition
We must now define the single-step pvm state-transition function :
| (A.9) |
On the very first step of execution, and every time the execution enters a new basic block or jumps back to the beginning of the current basic block, the gas counter of the machine is updated according to the gas cost function (A.56) of the target basic block. No instruction is allowed to execute within a basic block unless the gas cost for the entire basic block has been charged in advance. In case there’s not enough gas remaining to cover the full gas cost, the execution is interrupted and the gas counter remains unchanged. Formally:
| (A.10) |
During the course of executing instructions ram may be accessed. When an index of ram below is required, the machine always panics immediately without further changes to its state regardless of the apparent (in)accessibility of the value. Otherwise, should the given index of ram not be accessible then machine state remains unchanged and the exit reason is a fault with the lowest inaccessible page address to be read. Similarly, where ram must be mutated and yet mutable access is not possible, then machine state is unchanged, and the exit reason is a fault with the lowest page address to be written which is inaccessible.
Formally, let and be the set of indices by which must be subscripted for inspection and mutation respectively in order to calculate the result of . We define the memory-access exceptional execution state as following:
| (A.11) | ||||
We define the final execution state, the value of the instruction counter, the values of the registers and the memory as follows:
| (A.12) |
We also have to adjust the state based on the executed instruction, to force a gas charge on the next step when necessary:
| (A.13) |
We define together with the posterior values of regular execution (denoted as prime) of each of the items of the machine state as being in accordance with the table below. When transitioning machine state for an instruction, a number of conditions typically hold true and instructions are defined essentially by their exceptions to these rules. Specifically, the machine does not halt, the instruction counter increments by one and ram & registers are unchanged. Formally:
| (A.14) |
We define signed/unsigned transitions for various octet widths:
| (A.15) | ||||
| (A.16) | ||||
| (A.17) | ||||
| (A.18) | ||||
| (A.19) | ||||
| (A.20) |
Immediate arguments are encoded in little-endian format with the most-significant bit being the sign bit. They may be compactly encoded by eliding more significant octets. Elided octets are assumed to be zero if the msb of the value is zero, and 255 otherwise. This allows for compact representation of both positive and negative encoded values. We thus define the signed extension function operating on an input of octets as :
| (A.21) |
Any alterations of the program counter stemming from an unconditional static jump must be to the start of a basic block or else a panic occurs. Formally:
| (A.22) |
Similarly, conditional jumps are valid if and only if both branches point to the start of a basic block (regardless of whether a branch is taken), otherwise a panic occurs:
| (A.23) |
Jumps whose next instruction is dynamically computed must use an address which may be indexed into the jump-table . Through a quirk of tooling1818 18 The popular code generation backend llvm requires and assumes in its code generation that dynamically computed jump destinations always have a certain memory alignment. Since at present we depend on this for our tooling, we must acquiesce to its assumptions., we define the dynamic address required by the instructions as the jump table index incremented by one and then multiplied by our jump alignment factor .
As with other irregular alterations to the program counter, target code index must be the start of a basic block or else a panic occurs. Formally:
| (A.24) |
A.5. Instruction Tables
We assume the skip length is well-defined:
| (A.25) |
A.5.1. Instructions without Arguments
| Name | Mutations | |
|---|---|---|
| 0 | trap | |
| 1 | fallthrough | |
| 2 | unlikely |
A.5.2. Instructions with Arguments of One Immediate
| (A.26) |
| Name | Mutations | |
|---|---|---|
| 10 | ecalli |
A.5.3. Instructions with Arguments of One Register and One Extended Width Immediate
| (A.27) |
| Name | Mutations | |
|---|---|---|
| 20 | load_imm_64 |
A.5.4. Instructions with Arguments of Two Immediates
| (A.28) | ||||||
| Name | Mutations | |
|---|---|---|
| 30 | store_imm_u8 | |
| 31 | store_imm_u16 | |
| 32 | store_imm_u32 | |
| 33 | store_imm_u64 |
A.5.5. Instructions with Arguments of One Offset
| (A.29) |
| Name | Mutations | |
|---|---|---|
| 40 | jump |
A.5.6. Instructions with Arguments of One Register & One Immediate
| (A.30) | ||||||
| Name | Mutations | |
|---|---|---|
| 50 | jump_ind | |
| 51 | load_imm | |
| 52 | load_u8 | |
| 53 | load_i8 | |
| 54 | load_u16 | |
| 55 | load_i16 | |
| 56 | load_u32 | |
| 57 | load_i32 | |
| 58 | load_u64 | |
| 59 | store_u8 | |
| 60 | store_u16 | |
| 61 | store_u32 | |
| 62 | store_u64 |
A.5.7. Instructions with Arguments of One Register & Two Immediates
| (A.31) | ||||||
| Name | Mutations | |
|---|---|---|
| 70 | store_imm_ind_u8 | |
| 71 | store_imm_ind_u16 | |
| 72 | store_imm_ind_u32 | |
| 73 | store_imm_ind_u64 |
A.5.8. Instructions with Arguments of One Register, One Immediate and One Offset
| (A.32) | ||||||
| Name | Mutations | |
|---|---|---|
| 80 | load_imm_jump | |
| 81 | branch_eq_imm | |
| 82 | branch_ne_imm | |
| 83 | branch_lt_u_imm | |
| 84 | branch_le_u_imm | |
| 85 | branch_ge_u_imm | |
| 86 | branch_gt_u_imm | |
| 87 | branch_lt_s_imm | |
| 88 | branch_le_s_imm | |
| 89 | branch_ge_s_imm | |
| 90 | branch_gt_s_imm |
A.5.9. Instructions with Arguments of Two Registers
| (A.33) | ||||||
| Name | Mutations | |
|---|---|---|
| 100 | move_reg | |
| 101 | count_set_bits_64 | |
| 102 | count_set_bits_32 | |
| 103 | leading_zero_bits_64 | |
| 104 | leading_zero_bits_32 | |
| 105 | trailing_zero_bits_64 | |
| 106 | trailing_zero_bits_32 | |
| 107 | sign_extend_8 | |
| 108 | sign_extend_16 | |
| 109 | zero_extend_16 | |
| 110 | reverse_bytes |
A.5.10. Instructions with Arguments of Two Registers & One Immediate
| (A.34) | ||||||
| Name | Mutations | |
|---|---|---|
| 120 | store_ind_u8 | |
| 121 | store_ind_u16 | |
| 122 | store_ind_u32 | |
| 123 | store_ind_u64 | |
| 124 | load_ind_u8 | |
| 125 | load_ind_i8 | |
| 126 | load_ind_u16 | |
| 127 | load_ind_i16 | |
| 128 | load_ind_u32 | |
| 129 | load_ind_i32 | |
| 130 | load_ind_u64 | |
| 131 | add_imm_32 | |
| 132 | and_imm | |
| 133 | xor_imm | |
| 134 | or_imm | |
| 135 | mul_imm_32 | |
| 136 | set_lt_u_imm | |
| 137 | set_lt_s_imm | |
| 138 | shlo_l_imm_32 | |
| 139 | shlo_r_imm_32 | |
| 140 | shar_r_imm_32 | |
| 141 | neg_add_imm_32 | |
| 142 | set_gt_u_imm | |
| 143 | set_gt_s_imm | |
| 144 | shlo_l_imm_alt_32 | |
| 145 | shlo_r_imm_alt_32 | |
| 146 | shar_r_imm_alt_32 | |
| 147 | cmov_iz_imm | |
| 148 | cmov_nz_imm | |
| 149 | add_imm_64 | |
| 150 | mul_imm_64 | |
| 151 | shlo_l_imm_64 | |
| 152 | shlo_r_imm_64 | |
| 153 | shar_r_imm_64 | |
| 154 | neg_add_imm_64 | |
| 155 | shlo_l_imm_alt_64 | |
| 156 | shlo_r_imm_alt_64 | |
| 157 | shar_r_imm_alt_64 | |
| 158 | rot_r_64_imm | |
| 159 | rot_r_64_imm_alt | |
| 160 | rot_r_32_imm | |
| 161 | rot_r_32_imm_alt |
A.5.11. Instructions with Arguments of Two Registers & One Offset
| (A.35) | ||||||
| Name | Mutations | |
|---|---|---|
| 170 | branch_eq | |
| 171 | branch_ne | |
| 172 | branch_lt_u | |
| 173 | branch_lt_s | |
| 174 | branch_ge_u | |
| 175 | branch_ge_s |
A.5.12. Instruction with Arguments of Two Registers and Two Immediates
| (A.36) | ||||||
| Name | Mutations | |
|---|---|---|
| 180 | load_imm_jump_ind |
A.5.13. Instructions with Arguments of Three Registers
| (A.37) | ||||||
| Name | Mutations | |
|---|---|---|
| 190 | add_32 | |
| 191 | sub_32 | |
| 192 | mul_32 | |
| 193 | div_u_32 | |
| 194 | div_s_32 | |
| 195 | rem_u_32 | |
| 196 | rem_s_32 | |
| 197 | shlo_l_32 | |
| 198 | shlo_r_32 | |
| 199 | shar_r_32 | |
| 200 | add_64 | |
| 201 | sub_64 | |
| 202 | mul_64 | |
| 203 | div_u_64 | |
| 204 | div_s_64 | |
| 205 | rem_u_64 | |
| 206 | rem_s_64 | |
| 207 | shlo_l_64 | |
| 208 | shlo_r_64 | |
| 209 | shar_r_64 | |
| 210 | and | |
| 211 | xor | |
| 212 | or | |
| 213 | mul_upper_s_s | |
| 214 | mul_upper_u_u | |
| 215 | mul_upper_s_u | |
| 216 | set_lt_u | |
| 217 | set_lt_s | |
| 218 | cmov_iz | |
| 219 | cmov_nz | |
| 220 | rot_l_64 | |
| 221 | rot_l_32 | |
| 222 | rot_r_64 | |
| 223 | rot_r_32 | |
| 224 | and_inv | |
| 225 | or_inv | |
| 226 | xnor | |
| 227 | max | |
| 228 | max_u | |
| 229 | min | |
| 230 | min_u |
Note that the two signed modulo operations have an idiosyncratic definition, operating as the modulo of the absolute values, but with the sign of the numerator. Formally:
| (A.38) |
Division operations always round their result towards zero. Formally:
| (A.39) |
A.6. Host Call Definition
An extended version of the pvm invocation which is able to progress an inner host-call state-machine in the case of a host-call halt condition is defined as :
| (A.40) | |||
| (A.41) | |||
| (A.42) |
As with , on exit the instruction counter references the instruction which caused the exit and the machine state is that prior to this instruction. Should the machine be invoked again using this instruction counter and code, then the same instruction which caused the exit would be executed on the proper (prior) machine state.
With , host-calls (i.e. ecalli instructions) are in effect handled internally with the state-mutator function provided as an argument, preventing the possibility of the result being a host-call fault. Note that in the case of a successful host-call transition, we must provide the new instruction counter value explicitly alongside the fresh posterior state for said instruction.
A.7. Standard Program Initialization
The software programs which will run in each of the three instances where the pvm is utilized in the main document have a very typical setup pattern characteristic of an output of a compiler and linker. This means that ram has sections for program-specific read-only data, read-write (heap) data and the stack. An adjunct to this, very typical of our usage patterns is an extra read-only section via which invocation-specific data may be passed (i.e. arguments). It thus makes sense to define this properly in a single initializer function. These sections are quantized into major zones, and one major zone is always left unallocated between sections in order to reduce accidental overrun. Sections are padded with zeroes to the nearest pvm memory page boundary.
We thus define the standard JAM program blob format , which includes not only the raw pvm program blob , but also information on the state of the ram at program start. Given JAM program blob and argument data , we can decode the pvm program blob , registers , and ram by invoking the standard initialization function :
| (A.43) |
With conditions:
| (A.44) | ||||
| (A.45) | ||||
| (A.46) | ||||
| (A.47) |
Thus, if the above conditions cannot be satisfied with unique values, then the result is , otherwise it is a tuple of as above and , such that:
| (A.48) |
| (A.49) |
A.8. Argument Invocation Definition
The three instances where the pvm is utilized each expect to be able to pass argument data in and receive some return data back. We thus define the common pvm program-argument invocation function :
| (A.50) |
Note that the first tuple item is the amount of gas consumed by the operation, but never greater than the amount of gas provided for the operation.
A.9. Gas Cost Model
The gas cost model for the pvm is a simplified model of a modern CPU microarchitecture, heavily inspired by what’s used by production-grade compilers to predict how much time a given piece of code will take. For each basic block in the program the model simulates its execution flow and computes the required number of virtual CPU cycles that would be needed to execute it.
A gas simulation state, of set , contains the current instruction counter , a cycle counter , the number of decode slots remaining in the current cycle, the maximum number of instructions which are still allowed to start execution in the current cycle, the remaining available execution units , and the reorder buffer . Formally:
| (A.51) |
A reorder buffer entry, of set , contains its current state , the number of cycles left , a set of reorder buffer indices considered its dependencies , a set of clobbered registers , and the execution units used . Formally:
| (A.52) |
A tuple of the set maps each execution unit kind () into a number:
| (A.53) |
For convenience we define addition and subtraction of two tuples of the set to be equivalent to the memberwise operation of the same kind; formally:
| (A.54) | ||||
Similarly, a comparison between two tuples of the set is true if and only if the chosen comparison holds for all corresponding members:
| (A.55) | ||||
We define the function which returns the set of source registers read by the instruction at , and the function which returns the set of destination registers written by the instruction at , as described by the equations in A.5. This is regardless of whether those registers would actually have been modified by that instruction when executed at runtime. ecalli is assumed to neither read nor write to any registers in this model.
We also define the function which returns the number of cycles the instruction at needs to finish execution, which returns the number of decoding slots necessary to decode it, and which returns the number of virtual CPU execution units required to start its execution. These simply return the values as specified in A.10.
The gas cost of a given basic block starting at instruction opcode index and given the instruction data and the opcode bitmask is defined by the number of virtual CPU cycles as determined by the gas cost model transition function, up until every instruction of the basic block it has ingested has been retired and the simulation has converged. Formally:
| (A.56) |
The gas cost model simulation function is defined as follows:
| (A.57) |
The state transition function which decodes the instructions without triggering the virtual CPU pipeline simulation is defined as follows:
| (A.58) |
The move_reg instruction is special-cased to be handled by the frontend of our virtual CPU, without being added to the reorder buffer:
| (A.59) |
Every other instruction is fully decoded and added to the reorder buffer as follows:
| (A.60) |
The state transition function which starts the execution of the next pending instruction is defined as follows:
| (A.61) |
The function which checks which instruction inside of the reorder buffer is ready to start executing (and whether such an instruction even exists) is defined as follows:
| (A.62) |
The state transition function which simulates the rest of the virtual CPU pipeline is defined as follows:
| (A.63) |
A.10. Gas Cost Tables
For some of the instructions their cost depends on whether the destination register overlaps with any of the source registers:
| (A.64) |
For non-immediate shift and rotate instructions only the first source register matters:
| (A.65) |
The cost of memory accesses is defined as follows:
| (A.66) |
The cost of a branch depends on whether any of its targets (either the jump target or the implicit fallthrough) point to an instruction byte which is equal to the opcode for the unlikely or the trap instruction; formally:
| (A.67) |
In the following table the , , and arguments are omitted for clarity.
| Instruction | |||||||
|---|---|---|---|---|---|---|---|
| move_reg | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| and | 1 | 1 | 0 | 0 | 0 | 0 | |
| xor | 1 | 1 | 0 | 0 | 0 | 0 | |
| or | 1 | 1 | 0 | 0 | 0 | 0 | |
| add_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| sub_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| add_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| sub_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| and_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| xor_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| or_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| add_imm_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| shlo_r_imm_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| shar_r_imm_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| shlo_l_imm_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| rot_r_64_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| reverse_bytes | 1 | 1 | 0 | 0 | 0 | 0 | |
| add_imm_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| shlo_r_imm_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| shar_r_imm_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| shlo_l_imm_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| rot_r_32_imm | 2 | 1 | 0 | 0 | 0 | 0 | |
| count_set_bits_64 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| count_set_bits_32 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| leading_zero_bits_64 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| leading_zero_bits_32 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| sign_extend_8 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| sign_extend_16 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| zero_extend_16 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| trailing_zero_bits_64 | 2 | 1 | 2 | 0 | 0 | 0 | 0 |
| trailing_zero_bits_32 | 2 | 1 | 2 | 0 | 0 | 0 | 0 |
| shlo_l_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| shlo_r_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| shar_r_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| rot_l_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| rot_r_64 | 1 | 1 | 0 | 0 | 0 | 0 | |
| shlo_l_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| shlo_r_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| shar_r_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| rot_l_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| rot_r_32 | 2 | 1 | 0 | 0 | 0 | 0 | |
| shlo_l_imm_alt_64 | 1 | 3 | 1 | 0 | 0 | 0 | 0 |
| shlo_r_imm_alt_64 | 1 | 3 | 1 | 0 | 0 | 0 | 0 |
| shar_r_imm_alt_64 | 1 | 3 | 1 | 0 | 0 | 0 | 0 |
| rot_r_64_imm_alt | 1 | 3 | 1 | 0 | 0 | 0 | 0 |
| shlo_l_imm_alt_32 | 2 | 4 | 1 | 0 | 0 | 0 | 0 |
| shlo_r_imm_alt_32 | 2 | 4 | 1 | 0 | 0 | 0 | 0 |
| shar_r_imm_alt_32 | 2 | 4 | 1 | 0 | 0 | 0 | 0 |
| rot_r_32_imm_alt | 2 | 4 | 1 | 0 | 0 | 0 | 0 |
| set_lt_u | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
| set_lt_s | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
| set_lt_u_imm | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
| set_lt_s_imm | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
| set_gt_u_imm | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
| set_gt_s_imm | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
| cmov_iz | 2 | 2 | 1 | 0 | 0 | 0 | 0 |
| cmov_nz | 2 | 2 | 1 | 0 | 0 | 0 | 0 |
| cmov_iz_imm | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
| cmov_nz_imm | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
| max | 3 | 1 | 0 | 0 | 0 | 0 | |
| max_u | 3 | 1 | 0 | 0 | 0 | 0 | |
| min | 3 | 1 | 0 | 0 | 0 | 0 | |
| min_u | 3 | 1 | 0 | 0 | 0 | 0 | |
| load_ind_u8 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_ind_i8 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_ind_u16 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_ind_i16 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_ind_u32 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_ind_i32 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_ind_u64 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_u8 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_i8 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_u16 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_i16 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_u32 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_i32 | 1 | 1 | 1 | 0 | 0 | 0 | |
| load_u64 | 1 | 1 | 1 | 0 | 0 | 0 | |
| store_imm_ind_u8 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_ind_u16 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_ind_u32 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_ind_u64 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_ind_u8 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_ind_u16 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_ind_u32 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_ind_u64 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_u8 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_u16 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_u32 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_imm_u64 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_u8 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_u16 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_u32 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| store_u64 | 25 | 1 | 1 | 0 | 1 | 0 | 0 |
| branch_eq | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_ne | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_lt_u | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_lt_s | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_ge_u | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_ge_s | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_eq_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_ne_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_lt_u_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_le_u_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_ge_u_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_gt_u_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_lt_s_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_le_s_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_ge_s_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| branch_gt_s_imm | 1 | 1 | 0 | 0 | 0 | 0 | |
| div_u_32 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| div_s_32 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| rem_u_32 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| rem_s_32 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| div_u_64 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| div_s_64 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| rem_u_64 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| rem_s_64 | 60 | 4 | 1 | 0 | 0 | 0 | 1 |
| and_inv | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
| or_inv | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
| xnor | 2 | 1 | 0 | 0 | 0 | 0 | |
| neg_add_imm_64 | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
| neg_add_imm_32 | 3 | 4 | 1 | 0 | 0 | 0 | 0 |
| load_imm | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| load_imm_64 | 1 | 2 | 0 | 0 | 0 | 0 | 0 |
| mul_64 | 3 | 1 | 0 | 0 | 1 | 0 | |
| mul_32 | 4 | 1 | 0 | 0 | 1 | 0 | |
| mul_imm_64 | 3 | 1 | 0 | 0 | 1 | 0 | |
| mul_imm_32 | 4 | 1 | 0 | 0 | 1 | 0 | |
| mul_upper_s_s | 4 | 4 | 1 | 0 | 0 | 1 | 0 |
| mul_upper_u_u | 4 | 4 | 1 | 0 | 0 | 1 | 0 |
| mul_upper_s_u | 6 | 4 | 1 | 0 | 0 | 1 | 0 |
| trap | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
| fallthrough | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
| unlikely | 40 | 1 | 0 | 0 | 0 | 0 | 0 |
| jump | 15 | 1 | 0 | 0 | 0 | 0 | 0 |
| load_imm_jump | 15 | 1 | 0 | 0 | 0 | 0 | 0 |
| jump_ind | 22 | 1 | 0 | 0 | 0 | 0 | 0 |
| load_imm_jump_ind | 22 | 1 | 0 | 0 | 0 | 0 | 0 |
| ecalli | 100 | 4 | 1 | 0 | 0 | 0 | 0 |
Appendix B Virtual Machine Invocations
We now define the three practical instances where we wish to invoke a pvm instance as part of the protocol. In general, we avoid introducing unbounded data as part of the basic invocation arguments in order to minimize the chance of an unexpectedly large ram allocation, which could lead to gas inflation and unavoidable underflow. This makes for a more cumbersome interface, but one which is more predictable and easier to reason about.
B.1. Host-Call Result Constants
- :
-
The return value indicating an item does not exist.
- :
-
Name unknown.
- :
-
The inner pvm memory index provided for reading/writing is not accessible.
- :
-
Index unknown.
- :
-
Storage full or resource already allocated.
- :
-
Core index unknown.
- :
-
Insufficient funds.
- :
-
Gas limit too low.
- :
-
The operation is invalid. For example, the item is already solicited, cannot be forgotten, or the service is insufficiently privileged.
- :
-
The return value indicating general success.
Inner pvm invocations have their own set of result codes:
- :
-
The invocation completed and halted normally.
- :
-
The invocation completed with a panic.
- :
-
The invocation completed with a page fault.
- :
-
The invocation completed with a host-call fault.
- :
-
The invocation completed by running out of gas.
Note return codes for a host-call-request exit are any non-zero value less than .
B.2. Is-Authorized Invocation
The Is-Authorized invocation is the first and simplest of the three, being totally stateless. It provides only host-call functions for inspecting its environment and parameters. It accepts as arguments only the core on which it should be executed, . Formally, it is defined as :
| (B.1) | ||||
| (B.2) |
Note for the Is-Authorized host-call dispatch function in equation B.2, we elide the host-call context since, being essentially stateless, it is always .
B.3. Refine Invocation
We define the Refine service-account invocation function as . It has no general access to the state of the Jam chain, with the slight exception being the ability to make a historical lookup. Beyond this it is able to create inner instances of the pvm and dictate pieces of data to export.
The historical-lookup host-call function, , is designed to give the same result regardless of the state of the chain for any time when auditing may occur (which we bound to be less than two epochs from being accumulated). The lookup anchor may be up to timeslots before the recent history and therefore adds to the potential age at the time of audit. We therefore set to have a safety margin of eight hours:
| (B.3) |
The inner pvm invocation host-calls, meanwhile, depend on an integrated pvm type, which we shall denote . It holds a pvm program blob, instruction counter, ram and a flag denoting whether gas was already charged for the currently executing basic block:
| (B.4) |
The Export host-call depends on two pieces of context; one sequence of segments (blobs of length ) to which it may append, and the other an argument passed to the invocation function to dictate the number of segments prior which may assumed to have already been appended. The latter value ensures that an accurate segment index can be provided to the caller.
Unlike the other invocation functions, the Refine invocation function implicitly draws upon some recent service account state item . The specific block from which this comes is not important, as long as it is no earlier than its work-package’s lookup-anchor block. It explicitly accepts the work-package and the index of the work item to be refined, together with the core which is doing the refining . Additionally, the authorizer trace is provided together with all work items’ import segments and an export segment offset . It results in a tuple of some error or the refinement output blob (signalling success), the export sequence in the case of success and the gas used in evaluation. Formally:
| (B.5) | ||||
| (B.6) |
B.4. Accumulate Invocation
Since this is a transition which can directly affect a substantial amount of on-chain state, our invocation context is accordingly complex. It is a tuple with elements for each of the aspects of state which can be altered through this invocation and beyond the account of the service itself includes the deferred transfer list and several dictionaries for alterations to preimage lookup state, core assignments, validator key assignments, newly created accounts and alterations to account privilege levels.
Formally, we define our result context to be , and our invocation context to be a pair of these contexts, (and thus for any value there exists ), with one dimension being the regular dimension and generally named and the other being the exceptional dimension and being named . The only function which actually alters this second dimension is , and so it is rarely seen.
| (B.7) | ||||
| (B.8) |
We define a convenience equivalence to easily denote the accumulating service account.
We track both regular and exceptional dimensions within our context mutator, but collapse the result of the invocation to one or the other depending on whether the termination was regular or exceptional (i.e. out-of-gas or panic).
We define , the Accumulation invocation function as:
| (B.9) | ||||
| (B.10) | ||||
| (B.11) | ||||
| (B.12) | ||||
| (B.13) |
The mutator governs how this context will alter for any given parameterization, and the collapse function selects one of the two dimensions of context depending on whether the virtual machine’s halt was regular or exceptional.
The initializer function maps some partial state along with a service account index to yield a mutator context such that no alterations to the given state are implied in either exit scenario. Note that the component utilizes the random accumulator and the block’s timeslot to create a deterministic sequence of identifiers which are extremely likely to be unique.
Concretely, we create the identifier from the Blake2 hash of the identifier of the creating service, the current random accumulator and the block’s timeslot. Thus, within a service’s accumulation it is almost certainly unique, but it is not necessarily unique across all services, nor at all times in the past. We utilize a check function to find the first such index in this sequence which does not already represent a service:
| (B.14) |
nb In the highly unlikely event that a block executes to find that a single service index has inadvertently been attached to two different services, then the block is considered invalid. Since no service can predict the identifier sequence ahead of time, they cannot intentionally disadvantage the block author.
e
B.5. General Functions
We come now to defining the host functions which are utilized by the pvm invocations. Generally, these map some pvm state, including invocation context, possibly together with some additional parameters, to a new pvm state.
The general functions are all broadly of the form . Functions which have a result component which is equivalent to the corresponding argument may have said components elided in the description. Functions may also depend upon particular additional parameters.
Unlike the Accumulate functions in appendix B.7, these do not mutate an accumulation context. Some, such as mutate a service account and both accept and return some . Others are more general functions, such as and do not assume any context but have a parameter list suffixed with an ellipsis to denote that the context parameter may be taken and is provided transparently into its result. This allows it to be easily utilized in multiple pvm invocations.
Elements of pvm state are each assumed to remain unchanged by the host-call unless explicitly specified.
| (B.15) | ||||
| (B.16) |
Memory-sized gas for host-calls is given by a rate (gas per 1024 octets) and size (octets). The following defines the gas cost for such a size:
| (B.17) |
With base cost , total gas is . The host-call table below uses this formula for all memory-sized terms.
| Function | |
|---|---|
| Identifier | |
| Gas usage () | Mutations |
| gas = 0 | |
| grow_heap = 1 | |
| fetch = 2 | |
| lookup = 3 | |
| read = 4 | |
| write = 5 | |
| info = 6 | |
B.6. Refine Functions
These assume some refine context pair , which are both initially empty. Other than the gas-counter which is explicitly defined, elements of pvm state are each assumed to remain unchanged by the host-call unless explicitly specified.
| (B.18) | ||||
| (B.19) |
| Function | |
|---|---|
| Identifier | |
| Gas usage () | Mutations |
| historical_lookup = 7 | |
| export = 8 | |
| machine = 9 | |
| peek = 10 | |
| poke = 11 | |
| pages = 12 | |
| invoke = 13 | |
| expunge = 14 | |
B.7. Accumulate Functions
This defines a number of functions broadly of the form . Functions which have a result component which is equivalent to the corresponding argument may have said components elided in the description. Functions may also depend upon particular additional parameters.
Other than the gas-counter which is explicitly defined, elements of pvm state are each assumed to remain unchanged by the host-call unless explicitly specified.
| (B.20) | ||||
| (B.21) |
| Function | |
|---|---|
| Identifier | |
| Gas usage () | Mutations |
| bless = 15 | |
| assign = 16 | |
| designate = 17 | |
| checkpoint = 18 | |
| new = 19 | |
| upgrade = 20 | |
| transfer = 21 | |
| eject = 22 | |
| query = 23 | |
| solicit = 24 | |
| forget = 25 | |
| yield = 26 | |
| provide = 27 | |
i
Appendix C Serialization Codec
C.1. Common Terms
Our codec function is used to serialize some term into a sequence of octets. We define the deserialization function as the inverse of and able to decode some sequence into the original value. The codec is designed such that exactly one value is encoded into any given sequence of octets, and in cases where this is not desirable then we use special codec functions.
C.1.1. Trivial Encodings
We define the serialization of as the empty sequence:
| (C.1) |
We also define the serialization of an octet-sequence as itself:
| (C.2) |
We define anonymous tuples to be encoded as the concatenation of their encoded elements:
| (C.3) |
Passing multiple arguments to the serialization functions is equivalent to passing a tuple of those arguments. Formally:
| (C.4) |
We define general natural number serialization, able to encode naturals of up to , as:
| (C.5) |
C.1.2. Sequence Encoding
We define the sequence serialization function for any which is itself a subset of the domain of . We simply concatenate the serializations of each element in the sequence in turn:
| (C.6) |
Thus, conveniently, fixed length octet sequences (e.g. hashes and its variants) have an identity serialization.
C.1.3. Discriminator Encoding
When we have sets of heterogeneous items such as a union of different kinds of tuples or sequences of different length, we require a discriminator to determine the nature of the encoded item for successful deserialization. Discriminators are encoded as a natural and are encoded immediately prior to the item.
We generally use a length discriminator when serializing sequence terms which have variable length (e.g. general blobs or unbound numeric sequences ) (though this is omitted in the case of fixed-length terms such as hashes ).1919 19 Note that since specific values may belong to both sets which would need a discriminator and those that would not then we are sadly unable to introduce a function capable of serializing corresponding to the term’s limitation. A more sophisticated formalism than basic set-theory would be needed, capable of taking into account not simply the value but the term from which or to which it belongs in order to do this succinctly. In this case, we simply prefix the term its length prior to encoding. Thus, for some term , we would generally define its serialized form to be . To avoid repetition of the term in such cases, we define the notation to mean that the term of value is variable in size and requires a length discriminator. Formally:
| (C.7) |
We also define a convenient discriminator operator specifically for terms defined by some serializable set in union with (generally denoted for some set as ):
| (C.8) |
C.1.4. Bit Sequence Encoding
A sequence of bits is a special case since encoding each individual bit as an octet would be very wasteful. We instead pack the bits into octets in order of least significant to most, and arrange into an octet stream. In the case of a variable length sequence, then the length is prefixed as in the general case.
| (C.9) |
C.1.5. Dictionary Encoding
In general, dictionaries are placed in the Merkle trie directly (see appendix E for details). However, small dictionaries may reasonably be encoded as a sequence of pairs ordered by the key. Formally:
| (C.10) |
C.1.6. Set Encoding
For any values which are sets and don’t already have a defined encoding above, we define the serialization of a set as the serialization of the set’s elements in proper order. Formally:
| (C.11) |
C.1.7. Fixed-length Integer Encoding
We first define the trivial natural number serialization functions which are subscripted by the number of octets of the final sequence. Values are encoded in a regular little-endian fashion. This is utilized for almost all integer encoding across the protocol. Formally:
| (C.12) |
For non-natural arguments, corresponds to the definitions of , except that recursive elements are made as rather than . Thus:
| (C.13) | ||||
| (C.14) | ||||
| (C.15) |
And so on.
C.2. Block Serialization
A block is serialized as a tuple of its elements in regular order, as implied in equations 4.2, 4.3 and 5.1. For the header, we define both the regular serialization and the unsigned serialization . Formally:
| (C.16) | ||||
| (C.17) | ||||
| (C.18) | ||||
| (C.19) | ||||
| (C.20) | ||||
| (C.21) | ||||
| (C.22) | ||||
| (C.23) | ||||
| (C.24) | ||||
| (C.25) | ||||
| (C.26) | ||||
| (C.27) | ||||
| (C.28) | ||||
| (C.29) | ||||
| (C.30) | ||||
| (C.31) | ||||
| (C.32) | ||||
| (C.33) | ||||
| (C.34) | ||||
| (C.35) | ||||
| (C.36) | ||||
| (C.37) |
Note the use of above to succinctly encode the result of a work item and the slight transformations of e.g. and to take account of the fact their inner tuples contain variable-length sequence terms and which need length discriminators.
Appendix D State Merklization
The Merklization process defines a cryptographic commitment from which arbitrary information within state may be provided as being authentic in a concise and swift fashion. We describe this in two stages; the first defines a mapping from 31-octet sequences to (unlimited) octet sequences in a process called state serialization. The second forms a 32-octet commitment from this mapping in a process called Merklization.
D.1. Serialization
The serialization of state primarily involves placing all the various components of into a single mapping from 31-octet sequence state-keys to octet sequences of indefinite length. The state-key is constructed from a hash component and a chapter component, equivalent to either the index of a state component or, in the case of the inner dictionaries of , a service index.
We define the state-key constructor functions as:
| (D.1) |
The state serialization is then defined as the dictionary built from the amalgamation of each of the components. Cryptographic hashing ensures that there will be no duplicate state-keys given that there are no duplicate inputs to . Formally, we define which transforms some state into its serialized form:
| (D.2) |
Note that most rows describe a single mapping between a key derived from a natural and the serialization of a state component. However, the final four rows each define sets of mappings since these items act over all service accounts and in the case of the final three rows, the keys of a nested dictionary with the service.
Also note that all non-discriminator numeric serialization in state is done in fixed-length according to the size of the term.
Finally, be aware that Jam does not allow service storage keys to be directly inspected or enumerated. Thus the key values themselves are not required to be known by implementations, and only the Merklisation-ready serialisation is important, which is a fixed-size hash (alongside the service index and item marker). Implementations are free to use this fact in order to avoid storing the keys themselves.
D.2. Merklization
With defined, we now define the rest of which primarily involves transforming the serialized mapping into a cryptographic commitment. We define this commitment as the root of the binary Patricia Merkle Trie with a format optimized for modern compute hardware, primarily by optimizing sizes to fit succinctly into typical memory layouts and reducing the need for unpredictable branching.
D.2.1. Node Encoding and Trie Identification
We identify (sub-)tries as the hash of their root node, with one exception: empty (sub-)tries are identified as the zero-hash, .
Nodes are fixed in size at 512 bit (64 bytes). Each node is either a branch or a leaf. The first bit discriminate between these two types.
In the case of a branch, the remaining 511 bits are split between the two child node hashes, using the last 255 bits of the 0-bit (left) sub-trie identity and the full 256 bits of the 1-bit (right) sub-trie identity.
Leaf nodes are further subdivided into embedded-value leaves and regular leaves. The second bit of the node discriminates between these.
In the case of an embedded-value leaf, the remaining 6 bits of the first byte are used to store the embedded value size. The following 31 bytes are dedicated to the state key. The last 32 bytes are defined as the value, filling with zeroes if its length is less than 32 bytes.
In the case of a regular leaf, the remaining 6 bits of the first byte are zeroed. The following 31 bytes store the state key. The last 32 bytes store the hash of the value.
Formally, we define the encoding functions and :
| (D.3) | ||||
| (D.4) |
We may then define the basic Merklization function as:
| (D.5) | ||||
| (D.6) |
Appendix E General Merklization
E.1. Binary Merkle Trees
The Merkle tree is a cryptographic data structure yielding a hash commitment to a specific sequence of values. It provides computation and proof size for inclusion. This well-balanced formulation ensures that the maximum depth of any leaf is minimal and that the number of leaves at that depth is also minimal.
The underlying function for our Merkle trees is the node function , which accepts some sequence of blobs of some length and provides either such a blob back or a hash:
| (E.1) |
The astute reader will realize that if our happens to be equivalent then this function will always evaluate into . That said, for it to be secure care must be taken to ensure there is no possibility of preimage collision. For this purpose we include the hash prefix $node to minimize the chance of this; simply ensure any items are hashed with a different prefix and the system can be considered secure.
We also define the trace function , which returns each opposite node from top to bottom as the tree is navigated to arrive at some leaf corresponding to the item of a given index into the sequence. It is useful in creating justifications of data inclusion.
| (E.2) |
From this we define our other Merklization functions.
E.1.1. Well-Balanced Tree
We define the well-balanced binary Merkle function as :
| (E.3) |
This is suitable for creating proofs on data which is not much greater than 32 octets in length since it avoids hashing each item in the sequence. For sequences with larger data items, it is better to hash them beforehand to ensure proof-size is minimal since each proof will generally contain a data item.
Note: In the case that no hash function argument is supplied, we may assume Blake 2b.
E.1.2. Constant-Depth Tree
We define the constant-depth binary Merkle function as . We define two corresponding functions for working with subtree pages, and . The latter provides a single page of leaves, themselves hashed, prefixed data. The former provides the Merkle path to a single page. Both assume size-aligned pages of size and accept page indices.
| (E.4) | ||||
| (E.5) | ||||
| (E.6) |
For the latter justification to be acceptable, we must assume the target observer also knows not merely the value of the item at the given index, but also all other leaves within its size subtree, given by .
As above, we may assume a default value for of Blake 2b.
For justifications and Merkle root calculations, a constancy preprocessor function is applied which hashes all data items with a fixed prefix “leaf” and then pads the overall size to the next power of two with the zero hash :
| (E.7) |
E.2. Merkle Mountain Ranges and Belts
The Merkle Mountain Range (mmr) is an append-only cryptographic data structure which yields a commitment to a sequence of values. Appending to an mmr and proof of inclusion of some item within it are both in time and space for the size of the set.
We define a Merkle Mountain Range as being within the set , a sequence of peaks, each peak the root of a Merkle tree containing items where is the index in the sequence. Since we support set sizes which are not always powers-of-two-minus-one, some peaks may be empty, rather than a Merkle root.
Since the sequence of hashes is somewhat unwieldy as a commitment, Merkle Mountain Ranges are themselves generally hashed before being published. Hashing them removes the possibility of further appending so the range itself is kept on the system which needs to generate future proofs.
We define the mmb append function as:
| (E.8) | ||||
We define the mmr encoding function as :
| (E.9) |
We define the mmr super-peak function as :
| (E.10) |
Appendix F Shuffling
The Fisher-Yates shuffle function is defined formally as:
| (F.1) |
Since it is often useful to shuffle a sequence based on some random seed in the form of a hash, we provide a secondary form of the shuffle function which accepts a 32-byte hash instead of the numeric sequence. We define , the numeric-sequence-from-hash function, thus:
| (F.2) | ||||
| (F.3) |
Appendix G Bandersnatch VRF
The Bandersnatch curve is defined by [24].
The singly-contextualized Bandersnatch Schnorr-like signatures are defined as a formulation under the IETF vrf template specified by [18] (as IETF VRF) and further detailed by [15].
| (G.1) | ||||
| (G.2) |
The singly-contextualized Bandersnatch Ringvrf proofs are a zk-snark-enabled analogue utilizing the Pedersen vrf, also defined by [18] and further detailed by [6].
| (G.3) | ||||
| (G.4) | ||||
| (G.5) |
Note that in the case a key has no corresponding Bandersnatch point when constructing the ring, then the Bandersnatch padding point as stated by [18] should be substituted.
Appendix H Erasure Coding
The foundation of the data-availability and distribution system of Jam is a systematic Reed-Solomon erasure coding function in gf(), the same transform as done by the algorithm of [23]. The coding rate is :, where is the number of validators.
| (H.1) |
This rate is derived from the fact we wish to be able to reconstruct even should almost two-thirds of the validators be malicious or incapacitated, the 16-bit Galois field on which the erasure-code is based and the desire to support, for simplicity, encoding segments of size without padding. The rate is optimal, i.e. there is no more redundancy than necessary, when . This occurs when . Note that the rate is least efficient when is slightly below one of these values. For example, when , the rate is approximately 1:4.5. It is recommended, for efficiency of the data availability system, that validator set sizes be chosen from the given set.
We use a little-endian form of the 16-bit gf points with a functional equivalence given by . From this we may assume the encoding function and the recovery function . Encoding is done by extrapolating a data blob of size octets (provided in here as octet pairs) into octet pairs. Recovery is done by collecting together any distinct octet pairs, together with their indices, and transforming this into the original sequence of octet pairs.
Practically speaking, this allows for the efficient encoding and recovery of data whose size is a multiple of octets. Data whose length is not divisible by must be padded (we pad with zeroes). We use this erasure-coding in two contexts within the Jam protocol; one where we encode variable sized (but typically very large) data blobs for the Audit da and block-distribution system, and the other where we encode much smaller fixed-size data segments for the Import da system.
For the Import da system, we deal with an input size of octets resulting in data-parallelism of order 6 with 1023 validators. We may attain a greater degree of data parallelism if encoding or recovering more than one segment at a time though for recovery, we may be restricted to requiring each segment to be formed from the same set of indices (depending on the specific algorithm).
H.1. Blob Encoding and Recovery
We assume validators and some data blob . This blob is split into a whole number of pieces, each a sequence of octet pairs. Each piece is erasure-coded using as above to give octet pairs per piece.
The resulting matrix is grouped by its pair-index and concatenated to form chunks, each of octet-pairs. Any of these chunks may then be used to reconstruct the original data .
Formally we begin by defining two utility functions for splitting some large sequence into a number of equal-sized sub-sequences and for reconstituting such subsequences back into a single large sequence:
| (H.2) | ||||
| (H.3) |
We define the transposition operator hence:
| (H.4) |
We may then define our erasure-code chunking function which accepts an arbitrary sized data blob whose length divides wholly into octets and results in a sequence of smaller blobs:
| (H.5) |
The original data may be reconstructed with any of the resultant items (along with their indices). If the original items are known then reconstruction is just their concatenation.
| (H.6) |
Segment encoding/decoding may be done using the same functions albeit with a fixed . Note that the definition of ensures this is always an integer, i.e. no padding is required.
H.2. Code Word representation
For the sake of brevity we call each octet pair a word. The code words (including the message words) are treated as element of finite field. The field is generated as an extension of using the irreducible polynomial:
| (H.7) |
Hence:
| (H.8) |
We name the generator of , the root of the above polynomial, as such: .
Instead of using the standard basis , we opt for a representation of which performs more efficiently for the encoding and the decoding process. To that aim, we name this specific representation of as and define it as a vector space generated by the following Cantor basis:
Every message word consists of 16 bits. As such it could be regarded as binary vector of length 16:
| (H.9) |
Where is the least significant bit of message word . Accordingly we consider the field element to represent that message word.
Similarly, we assign a unique index to each validator between 0 and and we represent validator with the field element:
| (H.10) |
where is the binary representation of .
H.3. The Generator Polynomial
To erasure code a message of words into code words, we represent each message as a field element as described in previous section and we interpolate the polynomial of maximum degree which satisfies the following equalities:
| (H.11) |
After finding with such properties, we evaluate at the following points:
| (H.12) |
We then distribute the message words and the extra code words among the validators according to their corresponding indices.
Appendix I Index of Notation
I.1. Sets
I.1.1. Regular Notation
- :
-
The set of finite fields.
- :
-
The set of non-negative integers. Subscript denotes one greater than the maximum. See section 3.4.
- :
-
The set of positive integers (not including zero).
- :
-
The set of balance values. Equivalent to . See equation 4.21.
- :
-
The set of unsigned gas values. Equivalent to . See equation 4.23.
- :
-
The set of blob length values. Equivalent to . See section 3.4.
- :
-
The set of register values. Equivalent to . See equation 4.23.
- :
-
The set from which service indices are drawn. Equivalent to . See section 9.1.
- :
-
The set of timeslot values. Equivalent to . See equation 4.28.
- :
-
The set of rational numbers. Unused.
- :
I.1.2. Custom Notation
- :
-
The set of dictionaries making a partial bijection of domain to range . See section 3.5.
- :
-
The set of service ccounts. See equation 9.3.
- :
-
The set of itstrings (Boolean sequences). Subscript denotes length. See section 3.7.
- :
- :
-
The set of work-ontexts. See equation 11.4. Not used as the set of complex numbers.
- :
-
The set of work-igests. See equation 11.6.
- :
-
The set of work execution rrors. See equation 11.7.
- :
-
The set of work-report uarantees. See equation 11.24.
- :
- :
-
The set representing the state of an nner pvm instance. See equation B.4.
- :
-
The set of data segments, equivalent to . See equation 14.1.
- :
-
The set of validator ey-sets. See equation 6.7.
- :
-
The set representing implications of accumulation. See equation B.7.
- :
-
The set of pvm emory (ram) states. See equation 4.24.
- :
-
The set of single-service accumulation utputs. See equation 12.22.
- :
-
The set of work-ackages. See equation 14.2.
- :
-
The set of work-eports. See equation 11.2. Note used for the set of real numbers.
- :
-
The set representating a portion of overall tate, used during accumulation. See equation 12.15.
- :
-
The set of slot-sealer ickets. See equation 6.6.
- :
-
The information concerning a single work-item once prepared as an operand for the accumulation function. See equation 12.13.
- :
-
The set of alidator key-set sizes. See equation 6.8.
- :
-
The set of alidly readable indices for pvm ram . See appendix A.
- :
-
The set of alidly writable indices for pvm ram . See appendix A.
- :
-
The set of alid Ed25519 signatures of the key and message . A subset of . See section 3.8.
- :
-
The set of alid Bandersnatch signatures of the public key , context and message . A subset of . See section 3.8.
- :
-
The set of alid Bandersnatch Ringvrf proofs of the root , context and message . A subset of . See section 3.8.
- :
-
The set of ork items. See equation 14.3.
- :
-
The set of deferred transfers. See equation 12.14.
- :
-
The set of availability specifications. See equation 11.5.
I.2. Functions
- :
- :
-
The historical lookup function. See equation LABEL:eq:historicallookup.
- :
-
The work-report computation function. See equation 14.13.
- :
-
The bundle assembly function. See equation 14.16.
- :
- :
-
The key-nullifier function. See equation 6.15.
- :
-
The whole-program pvm machine state-transition function. See equation A.
- :
-
The single-step (pvm) machine state-transition function. See appendix A.
- :
-
The Accumulate pvm invocation function. See appendix B.
- :
-
The host-function invocation (pvm) with host-function marshalling. See appendix A.
- :
-
The Is-Authorized pvm invocation function. See appendix B.
- :
-
The marshalling whole-program pvm machine state-transition function. See appendix A.
- :
-
The Refine pvm invocation function. See appendix B.
- :
-
Virtual machine host-call functions. See appendix B.
- :
-
Assign-core host-call.
- :
-
Empower-service host-call.
- :
-
Checkpoint host-call.
- :
-
Designate-validators host-call.
- :
-
Export segment host-call.
- :
-
Forget-preimage host-call.
- :
-
Gas-remaining host-call.
- :
-
Historical-lookup-preimage host-call.
- :
-
Information-on-service host-call.
- :
-
Eject-service host-call.
- :
-
Kickoff-pvm host-call.
- :
-
Lookup-preimage host-call.
- :
-
Make-pvm host-call.
- :
-
New-service host-call.
- :
-
Poke-pvm host-call.
- :
-
Peek-pvm host-call.
- :
-
Query-preimage host-call.
- :
-
Read-storage host-call.
- :
-
Solicit-preimage host-call.
- :
-
Transfer host-call.
- :
-
Upgrade-service host-call.
- :
-
Write-storage host-call.
- :
-
Expunge-pvm host-call.
- :
-
Fetch data host-call.
- :
-
Pages inner-pvm memory host-call.
- :
-
Yield accumulation trie result host-call.
- :
-
Provide preimage host-call.
- :
-
Grow heap host-call.
I.3. Utilities, Externalities and Standard Functions
- :
-
The Merkle mountain range append function. See equation E.8.
- :
-
The octets-to-bits function for octets. Superscripted -1 to denote the inverse. See equation A.17.
- :
-
The erasure-coding function for validators and pieces. See equation H.5.
- :
-
The original-chunk-count function. See equation H.1.
- :
-
The octet-sequence encode function. Superscripted -1 to denote the inverse. See appendix C.
- :
-
The Fisher-Yates shuffle function. See equation F.1.
- :
-
The memory-sized gas function. See equation B.17.
- :
-
The Blake 2b 256-bit hash function. See section 3.8.
- :
-
The Keccak 256-bit hash function. See section 3.8.
- :
-
The justification path to a specific size page of a constant-depth Merkle tree. See equation E.5.
- :
-
The domain, or set of keys, of a dictionary. See section 3.5.
- :
-
The size page function for a constant-depth Merkle tree. See equation E.6.
- :
-
The constant-depth binary Merklization function. See appendix E.
- :
-
The well-balanced binary Merklization function. See appendix E.
- :
-
The state Merklization function. See appendix D.
- :
- :
-
The octet-array zero-padding function. See equation 14.19.
- :
-
The numeric-sequence-from-hash function. See equation F.3.
- :
-
The erasure-coding recovery function for validators and pieces. See equation H.6.
- :
-
The Ed25519 signing function. See section 3.8.
- :
-
The bls signing function. See section 3.8.
- :
-
The current time expressed in seconds after the start of the Jam Common Era. See section 4.4.
- :
-
The substitute-if-nothing function. See equation 3.2.
- :
-
The range, or set of values, of a dictionary or sequence. See section 3.5.
- :
-
The signed-extension function for a value in . See equation A.21.
- :
- :
-
The into-signed function for a value in . Superscripted with -1 to denote the inverse. See equation A.15.
I.4. Values
I.4.1. Block-context Terms
These terms are all contextualized to a single block. They may be superscripted with some other term to alter the context and reference some other block.
- :
-
The ancestor set of the block. See equation 5.3.
- :
-
The block. See equation 4.2.
- :
-
The block extrinsic. See equation 4.3.
- :
-
The Beefy signed commitment of validator . See equation 18.1.
- :
-
The set of Ed25519 guarantor keys who made a work-report. See equation 11.28.
- :
-
The block header. See equation 5.1.
- :
- :
-
The current core assignments. See section 11.3.
- :
-
The core assignments in the previous rotation. See section 11.3.
- :
-
The sequence of work-reports which have now become available and ready for accumulation. See equation 11.17.
- :
- :
-
The audit condition, equal to once the block is audited. See section 17.
Without any superscript, the block is assumed to the block being imported or, if no block is being imported, the head of the best chain (see section 19). Explicit block-contextualizing superscripts include:
I.4.2. State components
Here, the prime annotation indicates posterior state. Individual components may be identified with a letter subscript.
- :
-
The core uthorizations pool. See equation 8.1.
- :
- :
-
State concerning Safrole. See equation 6.3.
- :
-
The sealing lottery ticket accumulator. See equation 6.5.
- :
-
The keys for the validators of the next epoch, equivalent to those keys which constitute . See equation 6.7.
- :
-
The slot-sealer sequence of the current epoch. See equation 6.5.
- :
-
The Bandersnatch root for the current epoch’s ticket submissions. See equation 6.4.
- :
- :
-
The entropy accumulator and epochal randomness. See equation 6.22.
- :
-
The validator keys and metadata to be drawn from next. See equation 6.7.
- :
-
The validator keys and metadata currently active. See equation 6.7.
- :
-
The validator keys and metadata which were active in the prior epoch. See equation 6.7.
- :
-
The availability assignments. A core’s availability assignment is the work-report guarantee which is being made available prior to accumulation. See equation 11.1.
- :
- :
-
The most recent block’s timeslot. See equation 6.1.
- :
-
The authorization queue. See equation 8.1.
- :
-
Past judgments on work-reports and validators. See equation 10.1.
- :
-
The privileged service indices. See equation 9.9.
- :
-
The index of the blessed service. See equation 12.26.
- :
-
The indices of the services able to assign each core’s authorizer queue. See equation 12.26.
- :
-
The index of the designate service. See equation 12.26.
- :
-
The index of the registrar service. See equation 12.26.
- :
-
The always-accumulate service indices and their basic gas allowance. See equation 12.26.
- :
-
The activity statistics for the validators. See equation 13.1.
- :
-
The accumulation queue. See equation 12.3.
- :
-
The accumulation history. See equation 12.1.
- :
I.4.3. Virtual Machine components
- :
-
The exit-reason resulting from all machine state transitions.
- :
-
The immediate values of an instruction.
- :
-
The memory sequence; a member of the set .
- :
-
The gas counter.
- :
-
The registers.
- :
-
The instruction sequence.
- :
-
The sequence of basic blocks of the program.
- :
-
The instruction counter.
I.4.4. Constants
- :
-
The period, in seconds, between audit tranches. See section 17.3.
- :
-
The additional minimum balance required per item of elective service state. See equation 9.8.
- :
-
The additional minimum balance required per octet of elective service state. See equation 9.8.
- :
-
The basic minimum balance which all services require. See equation 9.8.
- :
-
The total number of cores.
- :
-
The period in timeslots after which an unreferenced preimage may be expunged. See eject definition in section B.7.
- :
-
The length of an epoch in timeslots. See section 4.8.
- :
-
The audit bias factor, the expected number of additional validators who will audit a work-report in the following tranche for each no-show in the previous. See equation 17.13.
- :
-
The gas allocated to invoke a work-report’s Accumulation logic.
- :
-
The gas allocated to invoke a work-package’s Is-Authorized logic.
- :
-
The gas allocated to invoke a work-package’s Refine logic.
- :
-
The total gas allocated across for all Accumulation. Should be no smaller than .
- :
-
The size of recent history, in blocks. See equation 7.8.
- :
- :
-
The maximum sum of dependency items in a work-report. See equation 11.3.
- :
-
The maximum number of tickets which may be submitted in a single extrinsic. See equation 6.31.
- :
-
The maximum age in timeslots of the lookup anchor. See equation 11.37.
- :
-
Host-function gas costs, see below.
- :
-
The maximum number of verdicts which may be included in a single extrinsic. See equation 10.2.
- :
-
The maximum number each of culprits or faults which may be included in a single extrinsic. See equation 10.2.
- :
-
The maximum number of items in the authorizations pool. See equation 8.1.
- :
-
The slot period, in seconds. See equation 4.8.
- :
-
The number of items in the authorizations queue. See equation 8.1.
- :
- :
-
The minimum public service index. Services of indices below these may only be created by the Registrar. See equation B.14.
- :
-
The maximum number of extrinsics in a work-package. See equation 14.5.
- :
-
The period in timeslots after which reported but unavailable work is cleared. See equation 11.18.
- :
-
The maximum size of is-authorized code in octets. See equation B.1.
- :
-
The maximum size of the concatenated variable-size blobs, extrinsics and imported segments of a work-package, in octets. See equation 14.6.
- :
- :
-
The size of a segment in octets. See section 14.2.1.
- :
-
The additional footprint in the Audits DA of a single imported segment. See equation 14.7.
- :
-
The maximum number of imports in a work-package. See equation 14.5.
- :
-
The maximum total size of all unbounded blobs in a work-report, in octets. See equation 11.8.
- :
-
The size of a transfer memo in octets. See equation 12.14.
- :
-
The maximum number of exports in a work-package. See equation 14.5.
- :
-
Context strings, see below.
- :
- :
-
The pvm dynamic address alignment factor. See equation A.24.
- :
-
The standard pvm program initialization input data size. See equation A.7.
- :
-
The pvm memory page size. See equation 4.24.
- :
-
The standard pvm program initialization zone size. See section A.7.
I.4.5. Host-function gas costs
These constants are used in the host function gas cost equations in appendix B. The constant term for the host function is denoted , or simply if there are no other terms. The octet-linear term, if there is one, is denoted and specifies gas per 1024 octets. The page-linear term, again optional, is denoted and specifies gas per page. The gas for a host call with constant term and octet-linear term with octets is (see equation B.17).
- :
-
Gas cost charged for an unknown host-call.
- :
-
(assign) base gas cost.
- :
-
(bless) base gas cost.
- :
-
(bless) gas per item.
- :
-
(checkpoint) base gas cost.
- :
-
(designate) base gas cost.
- :
-
(designate) gas per validator.
- :
-
(export) base gas cost.
- :
-
(forget) base gas cost.
- :
-
(gas) base gas cost.
- :
-
(historical_lookup) base gas cost.
- :
-
(historical_lookup) gas per 1024 octets ().
- :
-
(info) base gas cost.
- :
-
(eject) base gas cost.
- :
-
(invoke) base gas cost.
- :
-
(lookup) base gas cost.
- :
-
(lookup) gas per 1024 octets ().
- :
-
(machine) base gas cost.
- :
-
(machine) gas per 1024 octets, program size ().
- :
-
(new) base gas cost.
- :
-
(poke) base gas cost.
- :
-
(poke) gas per 1024 octets ().
- :
-
(peek) base gas cost.
- :
-
(peek) gas per 1024 octets ().
- :
-
(query) base gas cost.
- :
-
(read) base gas cost.
- :
-
(read) key gas per 1024 octets ().
- :
-
(read) value gas per 1024 octets ().
- :
-
(solicit) base gas cost.
- :
-
(transfer) base gas cost.
- :
-
(upgrade) base gas cost.
- :
-
(write) base gas cost.
- :
-
(write) gas per 1024 octets ().
- :
-
(write) key gas per 1024 octets ().
- :
-
(expunge) base gas cost.
- :
-
(fetch) case 0: protocol parameters.
- :
-
(fetch) case 1: entropy.
- :
-
(fetch) case 2: auth trace.
- :
-
(fetch) case 3: any extrinsic, by index.
- :
-
(fetch) case 4: our extrinsic, by index.
- :
-
(fetch) case 5: any import, by index.
- :
-
(fetch) case 6: our import, by index.
- :
-
(fetch) case 7: encoded work-package.
- :
-
(fetch) case 8: auth config.
- :
-
(fetch) case 9: auth token.
- :
-
(fetch) case 10: refine context.
- :
-
(fetch) case 11: items summary.
- :
-
(fetch) case 12: any item summary.
- :
-
(fetch) case 13: any payload.
- :
-
(fetch) case 14: accumulate items.
- :
-
(fetch) case 15: any accumulate item.
- :
-
(fetch) otherwise: nothing.
- :
-
(pages) alloc base gas cost.
- :
-
(pages) alloc gas per page.
- :
-
(pages) free base gas cost.
- :
-
(pages) free gas per page.
- :
-
(pages) setmode base gas cost.
- :
-
(pages) setmode gas per page.
- :
-
(pages) invalid- base gas cost.
- :
-
(yield) base gas cost.
- :
-
(provide) base gas cost.
- :
-
(provide) gas per 1024 octets ().
- :
-
(grow_heap) base gas cost.
- :
-
(grow_heap) gas per additional page.
I.4.6. Signing Contexts
- :
-
Ed25519 Availability assurances. See equation 11.14.
- :
-
bls Accumulate-result-root-mmr commitment. See equation 18.1.
- :
-
On-chain entropy generation. See equation 6.18.
- :
-
Bandersnatch Fallback block seal. See equation 6.17.
- :
-
Ed25519 Guarantee statements. See equation 11.28.
- :
-
Ed25519 Audit announcement statements. See equation 17.7.
- :
-
Bandersnatch Ringvrf Ticket generation and regular block seal. See equation 6.16.
- :
- :
-
Ed25519 Judgments for valid work-reports. See equation 17.17.
- :
-
Ed25519 Judgments for invalid work-reports. See equation 17.17.
References
- [1] (2013) Keccak. In Annual international conference on the theory and applications of cryptographic techniques, pp. 313–314. Cited by: §3.8.1.
- [2] (2024) Assessing risc zero using zkit: an extensible testing and benchmarking suite for zkp frameworks. Ph.D. Thesis, OST Ostschweizer Fachhochschule. Cited by: §2.2.1.
- [3] (2004) Short signatures from the weil pairing. J. Cryptology 17, pp. 297–319. External Links: Document Cited by: §3.8.2.
- [4] (2024) Efficient execution auditing for blockchains under byzantine assumptions. Note: Cryptology ePrint Archive, Paper 2024/961 External Links: Link Cited by: §17.3, §17, §2.1, §2.3, footnote 11.
- [5] (2022) Efficient aggregatable bls signatures with chaum-pedersen proofs. Note: Cryptology ePrint Archive, Paper 2022/1611 External Links: Link Cited by: §18.
- [6] (2023) Ring verifiable random functions and zero-knowledge continuations. Note: Cryptology ePrint Archive, Paper 2023/002 External Links: Link Cited by: Appendix G.
- [7] (2019) Casper the friendly finality gadget. External Links: 1710.09437, Link Cited by: §2.2.
- [8] (2013) Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. External Links: Link Cited by: §2.2.
- [9] (2023) Interchain security begins a new era for cosmos. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.3.
- [10] (2024) Ethereum Staking. Note: Fetched 18th March, 2024 External Links: Link Cited by: footnote 2.
- [11] (2024) A digital future on a global scale. Note: Fetched 4th April, 2024 External Links: Link Cited by: §2.2.
- [12] (2024) Danksharding. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.2.
- [13] (1938) Statistical tables for biological, agricultural and medical research. Oliver and Boyd. Cited by: §3.7.5.
- [14] (2019) PLONK: permutations over lagrange-bases for oecumenical noninteractive arguments of knowledge. Note: Cryptology ePrint Archive, Paper 2019/953 External Links: Link Cited by: §2.2.1.
- [15] (2023-08) Verifiable Random Functions (VRFs). Request for Comments, RFC Editor. Note: RFC 9381 External Links: Document, Link Cited by: Appendix G.
- [16] (2016) So, ethereum’s blockchain is still under attack…. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.4.
- [17] (2020) BLS12-381. External Links: Link Cited by: §18, §3.8.2.
- [18] (2024) Bandersnatch vrf-ad specification. Note: Draft 29, Fetched 17th March, 2026 External Links: Link Cited by: Appendix G, Appendix G, Appendix G.
- [19] (2024) Solana outage raises questions about client diversity and beta status. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.4.
- [20] (2017-01) Edwards-Curve Digital Signature Algorithm (EdDSA). Request for Comments, RFC Editor. Note: RFC 8032 External Links: Document, Link Cited by: §3.8.2.
- [21] (2017) OmniLedger: a secure, scale-out, decentralized ledger via sharding. Note: Cryptology ePrint Archive, Paper 2017/406 External Links: Link Cited by: §2.3.
- [22] (2019) Cosmos whitepaper. A Netw. Distrib. Ledgers 27, pp. 1–32. Cited by: §2.3.
- [23] (2014) Novel polynomial basis and its application to reed-solomon erasure codes. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, Vol. , pp. 316–325. External Links: Document Cited by: Appendix H.
- [24] (2021) Bandersnatch: a fast elliptic curve built over the bls12-381 scalar field. Note: Cryptology ePrint Archive, Paper 2021/1152 External Links: Link Cited by: Appendix G.
- [25] (2024) Is measuring blockchain transactions per second stupid in 2024?. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.4.
- [26] (2024) PolkaVM/risc0 benchmark results. Note: Fetched 3rd April, 2024 External Links: Link Cited by: §2.2.1.
- [27] (2015-11) The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC). Request for Comments, RFC Editor. Note: RFC 7693 External Links: Document, Link Cited by: §3.8.1.
- [28] (2024) Bringing polkadot tech to ethereum. Note: Fetched 18th March, 2024 External Links: Link Cited by: footnote 4.
- [29] (2023) Ethereum’s rollups are centralized. External Links: Link Cited by: §2.2.
- [30] (2023) Solana data goes live on google cloud bigquery. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.4.
- [31] (2024) Solana validator requirements. Note: Fetched 18th March, 2024 External Links: Link Cited by: §2.4.
- [32] (2020) Grandpa: a byzantine finality gadget. arXiv preprint arXiv:2007.01560. Cited by: §19.
- [33] (2019) Avalanche blockchain protocol for distributed computing security. In 2019 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), pp. 1–3. Cited by: §2.3.
- [34] (2023) A technical faq on lasso, jolt, and recent advancements in snark design. Note: Fetched 3rd April, 2024 External Links: Link Cited by: §2.2.1.
- [35] (2024) Fisher-Yates shuffle: The modern algorithm. External Links: Link Cited by: §3.7.5.
- [36] (2014) Ethereum: a secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, pp. 1–32. Cited by: §2.2, §3.8.1.
- [37] (2018) Solana: a new architecture for a high performance blockchain v0. 8.13. Cited by: §2.4.