One of the important attributes of a good blockchain user experience is fast transaction confirmation time. Today, Ethereum has made significant improvements compared to five years ago. Thanks to EIP-1559 and the stable block time after the transition to PoS (The Merge), transactions sent by users on L1 can usually be confirmed within 5-20 seconds, roughly equivalent to the experience of using a credit card for payment. However, further improving the user experience is valuable, and certain applications even require delays of a few hundred milliseconds or less. This article will explore some practical options for improving transaction confirmation time on Ethereum.
Table of Contents
1. Overview of existing ideas and technologies
2. Single Slot Finality
3. Rollup Preconfirmation
4. Based Preconfirmations
5. What are we actually looking at?
6. How should L2 proceed?
Currently, Ethereum’s Gasper consensus uses a single slot and epoch architecture. There is one slot every 12 seconds, and a portion of validators vote on the chain’s head, with all validators having the opportunity to vote once within 32 slots (6.4 minutes). These votes are then interpreted as messages in a consensus algorithm similar to PBFT, giving a very strong economic guarantee called finality after two epochs (12.8 minutes).
In the past few years, we have become increasingly dissatisfied with the current approach for two main reasons. First, this approach is complex, with many interaction errors between slot-to-slot voting and epoch-to-epoch finality mechanisms. Second, 12.8 minutes is too long, and no one wants to wait that long.
Single Slot Finality (SSF) replaces this architecture with a mechanism similar to Tendermint consensus, where block N is finalized before block N+1 is generated. The main difference from Tendermint is that we retain the “inactivity leak” mechanism, which allows the chain to continue running and recover when more than one-third of validators are offline.
The main challenge of SSF is that it means every Ethereum staker needs to publish two messages every 12 seconds, which is a significant load on the chain. There are some clever ideas to mitigate this problem, including the recent Orbit SSF proposal. While this significantly speeds up “finality” to improve the user experience, it does not change the fact that users still need to wait 5-20 seconds.
In the past few years, Ethereum has been following a roadmap centered around rollups, designing the Ethereum base layer (L1) to support data availability and other features, and then making these features available to L2 protocols such as rollups, validiums, and plasmas, to provide users with the same level of security at a larger scale.
This has led to a separation of concerns within the Ethereum ecosystem: Ethereum L1 focuses on being censorship-resistant, reliable, stable, and maintaining and improving core functionalities of a base layer, while L2 focuses on directly interacting with users through different cultures and technological approaches. But if we continue down this path, an inevitable problem arises: L2s want to provide faster confirmations than 5-20 seconds.
So far, it has been the responsibility of L2s, at least in theory, to create their own “decentralized sequencer” networks. A small group of validators could sign blocks every few hundred milliseconds and commit their staked assets behind these blocks. Eventually, the headers of these L2 blocks would be published to L1.
However, L2 validator sets can engage in “fraud”: they can sign block B1 first, and then sign a conflicting block B2 and submit it to the chain before B1. But if they do this, they would be caught and lose their staked assets. We have already seen actual cases of this in centralized versions, but the development of decentralized sequencer networks in rollups has been slow. One can argue that requiring all L2s to have decentralized sequencing is unfair: we are asking rollups to do almost the same work as creating a whole new L1. Therefore, Justin Drake has been advocating for an approach that allows all L2s (and L1) to use a shared preconfirmation mechanism within the Ethereum scope: based preconfirmations.
The idea behind based preconfirmations assumes that Ethereum proposers are highly complex participants related to MEV. The based preconfirmations method leverages this complexity by incentivizing these complex proposers to accept the responsibility of providing preconfirmation services.
The basic idea of this method is to create a standardized protocol where users can provide additional fees to ensure that their transactions will be immediately guaranteed to be included in the next block, along with a claim about the execution result of the transaction. If proposers violate any commitments made to any users, they can be penalized.
As described, based preconfirmations provide guarantees for L1 transactions. If rollups are “based,” then all L2 blocks are L1 transactions, so the same mechanism can be used to provide preconfirmations for any L2.
Assuming we have achieved single slot finality. We use techniques similar to Orbit to reduce the number of validators signing per slot, but not too much so that we can also make progress on reducing the minimum 32 ETH staking requirement. The slot time may increase to 16 seconds, and then we use either rollup preconfirmations or based preconfirmations to provide faster confirmations to users. In the end, we obtain an epoch-slot architecture.
There is a profound philosophical reason why the epoch-and-slot architecture seems so difficult to avoid: it takes less time to achieve rough consensus on something than it does to achieve maximal “economic finality” on that thing.
A simple reason is the number of nodes. Although the old tradeoff between decentralization/finality time/cost has been mitigated due to highly optimized BLS aggregation and upcoming ZK-STARKs, the following reasons cannot be ignored:
– “Rough consensus” requires only a small number of nodes, while economic finality requires a majority of nodes.
– Once the number of nodes exceeds a certain scale, you need to spend more time collecting signatures.
In today’s Ethereum, the 12-second slot is divided into three sub-slots: block publishing and distribution, proving, and proof aggregation. If the number of provers is significantly reduced, we can reduce it to two sub-slots and use an 8-second slot time. Another factor is the “quality” of nodes. If we can also rely on specialized subsets of nodes to achieve approximate consensus (while still using the full validator set to determine finality), we can reduce it to about 2 seconds.
Therefore, in my view, the epoch-and-slot architecture is clearly the right way to go, but not all epoch-and-slot architectures are equal, and it is valuable to explore the design space more fully. The direction worth delving into is not tightly coupling like Gasper, but having stronger focus separation between the two mechanisms.
In my view, L2 currently has three reasonable strategies:
1. Being “based” both technically and spiritually. That is, they optimize Ethereum’s base layer technical properties and its values (high decentralization, resistance to censorship, etc.). In its simplest form, you can think of these rollups as “branded shards,” but they can also have larger ambitions and experiment with new virtual machine designs and other technological improvements.
2. Becoming “servers with blockchain scaffolding” and fully leveraging it. If you start from servers and then add STARK validity proofs to ensure that servers follow the rules; ensure the right to exit or force transactions; the freedom of collective choice through coordinated massive exits or changing the votes of sequencers, then you have achieved most of the benefits of on-chain while retaining most of the efficiency of servers.
3. A compromise approach: a fast chain with a hundred nodes that provides additional interoperability and security for Ethereum. This is the practical roadmap for many L2 projects currently.
For certain applications like ENS, key storage, and partial payment protocols, the 12-second block time is already sufficient. For those applications that are not suitable, the only solution is the epoch-and-slot architecture. In all three cases, the “epoch” is Ethereum’s SSF, but the slot is different for each of the three cases:
– A native Ethereum epoch-and-slot architecture
– Server preconfirmations
– Committee preconfirmations
A key question is how well we can do in the first category. In particular, if it becomes very good, then the significance of the third category diminishes. Since all “based” solutions are not suitable for L2s like plasmas and validiums, the second category will always exist. If a native Ethereum epoch-and-slot architecture can reduce the slot time to 1 second, then the space for the third category becomes much smaller.
Today, we are far from the final answers to these questions. A key question is how complex block proposers will become, which is still an area of considerable uncertainty. Designs like Orbit SSF are very novel, so there is still plenty of design space to explore, such as incorporating Orbit SSF as the epoch in epoch-and-slot schemes. The more options we have, the better we can do for L1 and L2 users, and the easier we can make the work of L2 developers.
Original article link