Reclaiming Disk Space on Bitcoin Cash: Pruning, Fast Sync, & UTXO Commitments

4 592
Avatar for joshmgreen
3 years ago

Earlier this year Bitcoin Verde committed to implementing "pruning mode" and "utxo commitments" as a part of our 2021 Flipstarter. But why? What do these things even mean?

Disclaimer: This post is targeted to the mild/moderately knowledgable Bitcoin user. It contains simplifications of concepts that are not 100% accurate but are "close enough" such that knowledge of the intricacies of Bitcoin are not required. The complete implementation proposal will be outlined in a CHIP.

What is Pruning Mode?

Satoshi's white paper describes pruning mode in Section 7, "Reclaiming Disk Space", but what does that mean to the everyday user? Basically, "pruning mode" allows users who run nodes to do so with less disk space.

To understand how this works it is necessary to understand how data in the blockchain is used.

Blocks describe an update to the blockchain's state, and once a block is processed, only a small portion of its data needs to be kept. This "state" is often described as the "UTXO Set", which is essentially all of the spendable coins on the network; once a coin is spent it is removed from the UTXO set and often another one takes its place.

For perspective, keeping only the current state of the blockchain reduces 150+ GB of blockchain data to somewhere around 25 GB.

However, the typical implementation of "pruning mode" requires the node download the entire blockchain so that it can apply each blocks' changes to the blockchain's state. So while the disk usage may be reduced to 25 GB after it's done syncing, the node's network still downloaded the full 150 GB. More importantly, applying and validating each block to the blockchain's state is a long process. Even the most optimized implementation, BCHN, can take ~8 hours to sync the blockchain via the traditional method, and other implementations, like Bitcoin Verde, taking even longer (~20 hours).

As the blockchain grows older and its size increases, the time it takes to sync a node via the traditional methods may become increasingly time-consuming, and far enough in the future may even become a barrier to syncing new nodes.

But is there a way to avoid downloading the full blockchain since its "extra" data is immediately thrown away?

UTXO Commitments / Fast-Sync

As mentioned earlier, the "UTXO Set" is the current "state" of the blockchain. As an example, picture an accounting log recording your purchases:

  1. Received $4 from DayJob.

  2. Received $1 from SideGig.

  3. Sent $5 to GroceryStore.

  4. Received $3 from NightJob.

  5. Received $4 from DayJob.

  6. Sent $3 to ElectricCompany.

  7. Sent $4 to MeowMeowCatSupplies.

In this example, the blockchain is the full list of transactions. It's pretty long. Meanwhile, the UTXO Set only consists of coins still available for spending, and looks like:

  • GroceryStore has $4.

  • ElectricCompany has $3.

  • MeowMeowCatSupplies has $4.

This is obviously a lot less data to store and sync, and is therefore much faster to process. However, the downside is a lack of accountability for where "GroceryStore" received its $4how does a user know the balance was not made up or tampered with? Sending and syncing only the UTXO Set data is often referred to as "fast sync", but it relies on trust.

This is where UTXO Commitments come into play. Without UTXO Commitments (that is to say, just using "fast sync"), the user running the node must trust that the UTXO state wasn't tampered with. But Bitcoin isn't about trust. In Bitcoin, in order to trust something we must first verify it. Once UTXO Commitments are implemented by miners, that verification can take place, but for now, users can only trust their node implementation (or use the traditional syncing method).

In what ways are the users trusting the node implementation if they only use fast sync?

While the UTXO Set contains all of the blockchain's state (~4GB), a UTXO Commitment is only a 32 byte hash of that data and is put into the coinbase of a block. This hash gives users the ability to verify that all of the data included in their UTXO Set matches up with the rest of the network. Without the miner's "committing" to this hash, the validation hashes must be hard coded into the node software. While this current state is obviously not ideal, the improved user experience for syncing a node might be worth the short-term reliance on trustafter all, the node implementation developers would compromise their reputation if they intentionally published an invalid UTXO commitment hash. Additionally, an invalid UTXO Set would eventually cause the node to fail (or "get stuck"), so the incentive to intentionally corrupt the UTXO commitment is quite low.

So why not add UTXO Commitments to the coinbase today? Well, BCH could do that, but it's not something Verde is recommendingat least not right now. In order for the UTXO Commitments to be valuable, they must also be enforced to be correct by the entire network. This enforcement is what is often called a "consensus rule". So if the UTXO Commitment is incorrect then the whole block is invalid. In order to justify the cost and complexity of this change, it is prudent to first ensure the demand (and realized benefit) for such a change exists. Therefore, creating a method to "fast sync" while minimizing (but not eliminating) trust should be the first step. Only after adoption of the improved fast-sync feature occurs and both users (and developers) gain confidence in generating and validating compatible UTXO Sets, should Bitcoin Cash consider making the commitment a consensus rule.

Ultimately, pruning mode and fast sync (with UTXO Commitments) is an important part of the long-term scaling model of Bitcoin Cash. It may not completely replace the traditional method of syncing (as there are benefits for "archival nodes", such as SLP support (unless OP_GROUP becomes the main token system)), but these methods significantly reduce the barrier of entry to users and businesses needing/wanting to run a BCH node.

History of Fast Sync and Future Plans

BCHD became the first implementation to provide fast sync. Currently, BCHD supports fast sync via hardcoding the commitment hash within their codebase and downloading the UTXO set from IPFS (IPFS is not a part of the Bitcoin network). BCHD does not create UTXO commitments automatically, but they do provide a tool to generate one and is updated and published with each new release. The BCHD implementation is based off a proposal from Tomas van der Wansem in 2018. His proposal was never accepted into Bitcoin ABC's codebase despite publishing a PR.

Bitcoin Verde recently completed a beta version of fast syncing, and like BCHD, the commitment hashes are hard coded into the node. However, unlike BCHD, Bitcoin Verde introduces extensions to the existing P2P protocol for downloading and distributing commitments. The goal is to have other implementations generating and distributing UTXO commitments dynamically as well enabling P2P distribution; this progress will allow developers to gain confidence that the nodes are in consensus before enforcing new consensus rules. Even if commitments are not generated by the miners, this first phase would still be a successful user-experience improvement.

Non-coordinated implementation of this proposal does not affect block-, or network-, consensus, nor does it affect 0-conf security. Furthermore, it is impossible to introduce any accidental or intentional forking of risk to the network at this phase since UTXO Commitment Hashes are not appended to the coinbase.

This proposal will be published to https://bitcoincashresearch.org/ for feedback within the next couple of weeks. Stay tuned!

22
$ 42.28
$ 26.52 from @TheRandomRewarder
$ 5.00 from @ErdoganTalk
$ 5.00 from @mtrycz
+ 13
Avatar for joshmgreen
3 years ago

Comments

Reclaiming disk space remains very important on the bitcoin onchain.

$ 0.00
2 years ago

I think "reclaiming disk space" is a key point for scaling Bitcoin onchain. 2 questions: is there any reference on how much blocks is required to consider a transaction being buried under enough blocks? And how the chain is being verified, Satoshi refers to RAM and root hashes in block hashes, I guess the nodes need to verify old blocks by verifying root hash and block hash, so to quickly verify it, these hashes need to be quickly accessible, that's why RAM memory needs to store this compressed data for verification and propagation purposes. Satoshi says that interior hashes do not need to be stored for old blocks, I guess the root hash and block hash is required to help other nodes to sync with the network. Please correct me if I am wrong.

$ 0.00
User's avatar Val
3 years ago

Good content

$ 0.00
3 years ago

Without ABC in the game, I hope UTXO commitments to become a reality soon.

$ 0.00
3 years ago