Skip to main content

EVM state manager

The state manager is the part of the execution client responsible for updating the state of the network globally, and the state of every account individually.

info

"State", in this context, refers to the data stored on the blockchain at any given point in time. To update state is to update the record of the contents of every account — if the contents have changed.

Its main task is to receive blocks that have been executed by the sequencer and use the trace data from their execution to update the state of the network. The state manager uses a variation of the Merkle tree called a sparse Merkle tree to track, manage, and update storage slots representing accounts more efficiently.

It then passes this updated network state information to the prover in the form of Merkle proofs for submission to Ethereum Mainnet (L1).

Below, we'll explain the element of Linea's state management in greater detail.

Merkle trees​

Overview: how Merkle trees work​

The Merkle tree and its variations are commonly used across EVM chains for efficient storage and retrieval of data about the state of every account on the blockchain.

A Merkle tree is comprised of 'nodes' that branch off from each other. At the base is the 'root', or state root, from which branches stem; and from the branches stem leaves. Each node, regardless of type, is represented by a cryptographic hash which encodes data about its properties — for example, the contents of your account.

Each hash encodes the hashes of its child nodes. Taken to its full extent, this cascading system means the root encodes data of the state of every single account on the blockchain.

Cryptographic hashes are deterministic, which means you can reverse the hash function to get the data which it encoded. If you have the hash of the root—the only node without a parent—you can theoretically derive from it the data of any node in the entire tree.

As a layer 2 (L2) network, Linea is in the business of making transacting faster and more efficient. Viewed from this angle, the standard implementation of a Merkle tree, described above, can present problems.

A Merkle tree must be computed anew for every single block, accounting for any state changes resulting from the transactions included in the block. This can quickly snowball to the point where an incredible amount of computation is required to finalize a block.

Sparse Merkle trees​

Linea's state management uses sparse Merkle trees to minimize computation and contribute to the blockchain's efficiency.

A sparse Merkle tree is a variation of a standard Merkle tree where not all leaf nodes are filled with data; instead, data is only stored in nodes where it's needed. It is a complete tree of fixed depth, meaning that all branches of the tree have the same length—i.e. the same number of leaves.

At initialization, all leaf nodes are set to a default value, which is typically a hash of a specific value, such as zero. Because all leaf nodes have the same hash value, the parent nodes and higher-level nodes also have the same hash value. A node whose hash is the default value for its level is therefore considered to represent an empty subtree.

With this construction, we do not need to keep track of every individual node's hash. Instead, we can assume hashes that reflect the default value are empty, and the subtree or node that lies further down the chain of nodes can be disregarded; we only need to pay attention to the ones that correspond to non-empty subtrees.

Cryptographic accumulator​

A sparse Merkle tree can be considered a cryptographic accumulator—a term for a system where a hash function is able to take multiple values as an input, and output a single hash. (This differs from a traditional hash function, which would only take one value as an input.) The accumulator is a data structure used in the state manager of Linea's zkEVM to efficiently track updates to the storage of Linea accounts.

The sparse Merkle tree, or accumulator, allows us to track updates to a large, pre-allocated 'vector', in mathematical terms. In this context, the vector is a data structure mapping integer indices (corresponding to each node in the tree) to leaves, and it is 'pre-allocated' because we already know that the sparse Merkle tree is populated with default values at initialization.

The Linea state manager uses the sparse Merkle tree as a powerful cryptographic accumulator that supports key-value addressing, rather than position addressing as in a traditional vector, which defines the location of the node relative to its origin. This means every leaf node is indexed into a sorted, doubly-linked list, where each leaf 'points' to the leaves whose keys are immediately above and below it in the list.

Leaves can also be referred to as storage slots, in that they contain data about the contents of the account in question.

Tracking empty leaves​

As mentioned (above)[#sparse-merkle-trees], all leaves in the tree are populated with default/zero values at initialization. Since a deterministic hashing function will ensure that these leaves are always represented by the same hash, empty leaves can be easily recognized by the accumulator.

However, in order for the state manager to update a storage slot with data about a Linea account's contents, it must know which empty leaf to overwrite, and exactly where these empty leaves are. A further consideration is that we require the index of any 'new' leaf—an empty leaf being updated so that it stores data—to be overwritten in a deterministic way. This requirement means that anyone can theoretically reconstruct the tree simply by looking at transaction history.

To ensure consistency in the leaves' position, the state manager only ever inserts 'new' leaves to the left of the previous leaf in the tree. If this wasn't the case, and the state manager was able to insert any node in any position, it would be impossible to reconstitute the tree in the exact same configuration, severely impacting the ability of L1 to verify the Merkle proof provided.

Applying the accumulator​

The Ethereum Virtual Machine (EVM) uses a variant of a Merkle tree known as a Merkle-Patricia Trie to track:

  • World state, which keeps track of accounts, and;
  • Account storage state (or simply 'storage'), which keeps track of the contents of each account.

On Linea, we adapt this structure, incorporating the custom crytographic accumulator described above.

The accumulator can perform the following operations:

  • Insertion: adding a new storage slot to the tree, triggered by storing a non-zero value in a previously zero-valued slot.
  • Update: changing the value of an existing storage slot in the tree, triggered by storing a new non-zero value in a previously non-zero slot.
  • Deletion: removing a storage slot from the tree, triggered by storing the zero value in a previously non-zero slot.
  • Read zero: proving non-membership, triggered when a storage slot has been accessed, but not updated, and its value is zero.
  • Read non-zero: proving membership, triggered when a storage slot has been accessed, but not updated and its value is non-zero.

These operations are applied to two trees:

World state​

The world state tree maps all accounts that exist on the blockchain—contracts and externally-owned accounts (EOAs)—and points towards the account storage state for each. While on Ethereum Mainnet, this data is stored in a standard trie, Linea uses the the accumulator to map accounts as key:value pairs. Otherwise, the implementation is similar to the EVM.

Their structure is as follows:

  • HKey: Hash(address)
  • Val: Hash(nonce, balance, storageRoot, codeHash, keccakCodeHash, CodeSize)

Critically, every piece of data fed into the Val (value) hash function must have a finite field interpretation. The data must be formatted this way to enable the Linea prover to correctly access the world state when verifying proofs. Each element is formatted as follows (all elements require one field, other than keccakCodeHash):

  • nonce: The nonce is written in big-endian form into a byte32. For instance if the nonce is 10, then the nonce should be encoded as 0x000000000000000000000000000000000000000000000000000000000000000a.
  • balance: Formatted the same as the nonce; big-endian byte32.
  • storageRoot: The storage root should not be the Keccak of the Patricia trie root as in the EVM, but the “custom Merkle tree” root of the account storage state that we describe in the following section.
  • codeHash: The code hash should not be the Keccak of the code, it should instead be the one obtained as described in the following section.
  • 2 field elements for keccakCodeHash: One for the 128 most significant bits and one for the 128 least significant bits. The Keccak code hash corresponds exactly to the Keccak hash as specified by the EVM (i.e. the output of EXTCODEHASH. We keep the Keccak and the “custom” version for practical reasons.
  • codeSize: The code size should be the same value as that returned by the CODESIZE/EXTCODESIZE opcodes.

Account storage state​

Also referred to as the storage trie, the account storage state is the database the state manager accesses to retrieve data about the contents of accounts. Account storage is mainly relevant for contract accounts; for EOAs, the data about assets and transactions is stored in the world state Val, and the codeHash and its variants are empty.

Since the main function of account storage is to record contracts in such a way that they can be easily retrieved and processed, it must efficiently encode the contract. It does this using the following format:

  • HKey: Hash(StorageKeyMSB, StorageKeyLSB)
  • Val: Hash(StorageValueMSB, StorageValueLSB)

In both cases, the MSB refers to the first 16 bytes of a 'word', and LSB the last 16. 'Word' in this context refers to the natural unit of data used by the EVM, which is 256-bit (32 byte) chunks.

For example, if the data regarding a contract's code was encoded in a byte32, the standard data type for words on the EVM and equivalents like Linea, it might look like this:

[a0, a1, a2, …., a15, b0, b1, …, b15]

That byte32 would be split into an MSB and LSB like this:

  • MSB: [0, 0, .., 0, a0, a1, a2, a3, .., a15]
  • LSB: [0, 0, .., 0, b0, b1, b2, b3, .., b15]

The MSB takes the first 16 bytes, and the LSB the second 16 bytes.

Generating state-root-transition witnesses​

The accumulator, built using a sparse Merkle tree, is simultaneously:

  • A data structure on which we can perform operations;
  • A dataset that we can summarize using a short string at any time (i.e. the root hash);
  • A tool that can be used by the Linea protocol to to verify that a given operation triggered a transition from hash A to hash B.

Once the accumulator has processed the trace information it receives about a new block and updated state accordingly, it can pass a new state root hash to the prover, via the coordinator. The state root hash can then be used by the prover as a "witness": a verifiable method of proving that the transactions in each block have taken place, without having to divulge the nature of those transactions.