CashFusion on Bitcoin Cash: what it is and let's try to crack it!

26 1749
Avatar for Read.Cash
4 years ago

What is CashFusion?

CashFusion is an anonymity technology for Bitcoin Cash currently in alpha testing. It lets people mix coins anonymously and trustlessly. Unlike usual mixer services - there's no trace of who got how much in and who got how much out.

So, even if a CashFusion server is busted or undercover agents are listening on the network - there's no chance to be deanonymized.

As you know Bitcoin Cash's blockchain is a public ledger (like many other blockchains).

For example we can know that someone, let's name her Alice, has currently 1 satoshi.

We don't know who Alice is, but if she sends this satoshi to an exchange and reveals the identity - suddenly you know who she is and you can see all her previous operations. Also, you can trace other people who she interacted with.

This is called "chain analysis".

How can poor Alice hide from the prying eyes? She needs help.

So, Bob comes to help.

Two people come into a room. Alice has 1 satoshi in one hand and 3 satoshis in the other. Bob has 2 satoshis in one hand and 4 satoshis in the other.

Something happens inside.

Two people in black robes come out of the room. One has 3 satoshis and another one has 4 satoshis. We don't know who of them is Alice and who is Bob.

Can we try to guess who is who?

So, what happened inside? CashFusion!

Alice took her 1 satoshi and 3 satoshis and have put it on a table as 4 identical satoshis. Bob did the same with 2 and 4 satoshis. In total, there were 1+3+2+4=10 identical satoshis on the table. Anyone can take any amount, but less than they brought in. Now, Alice takes 3 satoshis, Bob takes 4 satoshis and leave the room.

3 satoshis that are left on the table are taken by a Bitcoin Cash miner to process this transaction.

(In reality nobody put anything on any table, it was just math)

Alice: 1 + 3  =  3 (+1 to the miner)
Bob:   2 + 4  =  4 (+2 to the miner)
One more way to see what happened as money flow

Note that "3 satoshis" that Alice takes out are not the same "3 satoshis" that Alice brought in - they were divided and mixed with others and "fused" back again, so they are not linked in any way. I didn't think about this coincidence until after it was a bit too late, but the second example should be more clear. Same with Bob - just a coincidence.

What do we see on the blockchain?

The numbers are satoshi amounts. 3 sats on the left are not the same as 3 on the right.

Nice, huh? We don't even know if Person 1 and 2 are the same person.

Even if we somehow knew that 1 sat and 3 sat previously belonged to Alice - we can't know whether Alice is now Person 1 or Person 2.

It could be that Alice took 1+3=4 satoshis, paid the miner nothing and left with 4 satoshis. But it's totally possible otherwise (which is correct) that she brought 1+3, gave miner 1 satoshi and left with 3. (In reality, of course, you must pay miners fee)

This is a simple example. Let's get a bit more complex.

Charlie joins the room and they do the following:

Alice:   2 + 8      = 10  =  9 + (1 to a miner)
Bob:     7 + 5 + 3  = 15  =  5 + 8 + (2 to a miner)
Charlie: 7 + 2      =  9  =  7 + (2 to a miner)

Notice that Bob does a clever thing. He brings in 7, 5 and 3 satoshis, but leaves as "two people", each holding 5 and 8 satoshis respectively.

So, suddenly we have this on the blockchain:

Looking just at this: how many people participated? What did everyone do?

From the looks of it, it might have been four people, who took 9, 5, 8 and 7 satoshis each.

But, really it was three, because Bob got outside with 5 and 8, looking like two people.

Just looking at the transaction - we can tell that it's plausible that one to four people participated.

But we can't tell who is who here, can we?

Let's ask our good friend computer to help us. We will try all possible combinations and see what we get.

At first we can try to deanonymize any random person (Alice) and see what possible thing could she have done.

8 - 5 = 3 → It's possible that Alice got in with 8 satoshis and got out with 5 satoshis and left 3 satoshis to the miner. Yep.

8 - 7 = 1 → Put in 8, got out with 7, leaving 1 satoshi. Yep.

7 - 5 = 2 → Put in 7, got out with 5, leaving 2. Yep.

7+7 - 5-8 = 1 → Went in with 7 and 7 (14), got out with 5 and 8 (13), leaving 1 satoshi to the miner. Yep, plausible.

So, we've got a few plausible combinations. How many more are there?

7 - 5 = 2
7 - 5 = 2
2+8 - 9 = 1
...
7+2 - 8 = 1
7+2 - 7 = 2
5+3 - 5 = 3
5+3 - 7 = 1
...
2+8+5 - 9-5 = 1
2+8+5 - 5-8 = 2
2+8+5 - 5-7 = 3
2+8+3 - 5-7 = 1
...
8+3+7 - 9-8 = 1
8+3+7 - 9-7 = 2
8+3+7 - 8-7 = 3
8+3+2 - 5-7 = 1
8+7+2 - 9-5 = 3
8+7+2 - 9-7 = 1
...
2+8+7+5 - 9-5-7 = 1
2+8+7+5 - 5-8-7 = 2
2+8+7+3 - 9-8 = 3
2+8+7+7 - 9-5-8 = 2
2+8+7+7 - 9-5-7 = 3
...
2+8+7+5+7+2 - 9-5-8-7 = 2
2+8+5+3+7+2 - 9-8-7 = 3
2+7+5+3+7+2 - 9-8-7 = 2
8+7+5+3+7+2 - 9-5-8-7 = 3

What? How many were there?

231 possible combinations!

So, now to deanonymize the second person - we need to take each of these as a guess and see how many possible combinations left for each of these cases.

But still, 231 combinations for a first try isn't that bad.

Let's look at a real example. This is a real CashFusion transaction: 9e28..887.

Here are the inputs and outputs in satoshis (truncated):

The miner got 16342 satoshis. The transaction has 111 inputs ("people" getting in) and 17 outputs ("people" getting out).

It seems like these numbers would be easier to deanonymize, since they look like not many of those inputs would add up to some outputs, leaving less than 16342 satoshis to a miner...

When we try to run our tester that generated all of the previous outputs - it starts giving theories slowly, like this:

21348863 - 9538973-11802770 = 7120 
# got in with 21348863 sat, got out with 9538973 sat and 11802770 sat, 
# leaving 7120 sats to a miner

52741370 - 11802770-18998471-21932001 = 8128

85075026 - 937750-21932001-25985765-36204143 = 15367

... but then it gets stuck. What happened?

To explain it, we need to try a simpler example first.

Let's try a much simpler transaction. I've generated a similar transaction, where only 3 players, just like before (Alice, Bob and Charlie) have participated with unique enough sums, that seem like the ones from the real transaction.

inputs = [827625, 311935, 219481, 305525, 438511, 644967, 515276]
outputs = [663392, 474748, 1607187, 514230]
the miner got 3763

If we run our program - it seems OK, it gives out that indeed, despite there being crazy amounts - there are only 6 plausible ways to combine them that are consistent with what the miner got.

Only 6 things that the first person could have done.

That's much lower than our original 231, so it seems that our theory about more unique numbers being easier to deanonymize holds.

But is it?

What we're looking at is "7 inputs and 4 outputs" and we get 6 plausible explanations almost instantly.

Let's bump it up!

10 inputs, 5 outputs. 4 players. Instant. 16 plausible combinations.

14 inputs, 8 outputs. 5 players. About 1 second. 71 plausible combinations.

17 inputs, 9 outputs. 6 players. 19 seconds. 860 plausible combinations.

Note that the program is written in Go language, so it's pretty close to what the computer can achieve. It's fast. Yet it takes 19 seconds to get through all possibilities.

7 players. 18 inputs, 11 outputs. 2 minutes! 7859 plausible combinations.

Ouch! Now that's getting slow.

If we try to chart how long our experiments take, considering how many inputs and outputs go into the transaction:

The time just skyrockets. 150 seconds is 2.5 minutes! Each next input and output is going to be even slower.

So, back to our transaction with 111 inputs and 17 outputs. How much time would we need to get all plausible combinations that might be what person one did?

I've measured how fast my computer can try out random combinations - it turns out that it can test about 53 million combinations per minute.

How many tries do we need?

That's where we go into crazy numbers. We need to try about 2*2*2*2*2....(repeat this 111+17=128 times) combinations - it's a very big number, it has 38 zeroes in it!

To imagine that number - imagine all the stars of our galaxy (Milky Way), now imagine each grain of sand on each of these stars... wow, right? Now multiply that by 1 billion! That's how many tries we need to do to go through all of the possible combinations to find all things that person one could have done. (Source)

So, at the speed of 53 million tries per minute... lets take all the computers in the world (about 2 billion) and ask them to try these combinations. Now, just to be safe - we'll start building computers and won't stop until we create 100 000 times more computers and ask them to join us too. 100 000 times more than there are computers in the World. Surely, we're going to end this computation fast...

Alas, it would take us 61 billion years.... or about 4 times longer than the universe has existed. Just to try all possible combinations for person one.

Ok... maybe there are some optimizations that would allow us to cut the whole huge chunks of data and we can somehow end it up in some reasonable time...

Well, we will still need to find all the plausible combinations and try them out. Would we be able to even store them? Somewhere?

Let's see at how many combinations are there, just like we looked at the time:

Oh, no...

You see where this is going, right?

I've asked my computer to try random guesses from these 111 inputs and 17 outputs to estimate how many plausible results there would be. In about 1 minute, my computer found 1723 plausible combinations out of 53616189 possibilities it tried. So, each minute that my computer works - it would find 1723 more combinations.

Remember these 100 000 times the number of computers in the World? They will be finding 1723 combinations per minute per each computer for the next 61 billion years.

This is just to find all the possible things that person one did in that transaction.

In total all these computers after 61 billion years would find about 10 000 000 000 000 000 000 000 000 000 000 000 possible things that person one did.

Here's the problem. After we're done with that, we would need to take first of this unimaginable number of combinations and then spend about 6 more billion years trying to figure out all plausible combinations of what person two did assuming person one did that first thing... then about 600 million years for person three... then... then.. then... after we have finally found some billions or trillions of plausible explanations for what all people did, we would move to second number, with 9 999 999 999 999 999 999 999 999 999 999 998 left, each taking probably about 7 billion years to finish.

Even if we try to do "depth-first" search - i.e. as soon as we find anything reasonable - we try to find what person two did, then three, then four, etc... we would probably find something pretty fast... But there is no way to prove that this is the only possible thing that happened. You still need to try all other combinations... one way or another.

That's why it really seems that CashFusion math checks out and it would be a pretty great way to anonymity in Bitcoin Cash.

Do you want to test it? See here.

Additional attack vector

Late addition: @ZeusOnPills gives an interesting attack vector on CashFusion in the comment below.

Let's imagine a a few CashFusions (or CashShuffles). So a user does a few Fusions, then closes the wallet.

Looks anonymous enough, very hard to guess who's who.

Then the user opens the wallet. Electron Cash, being a HD wallet, requests balances of all of your previous addresses, which makes the previous scheme kind of look like this:

The addresses in red are the addresses that Electron Cash just requested from the ElectrumX server that it just connected to. You can't see it on the blockchain, but if the ElectrumX server is logging this information - that could be really terrible, because not only you deanonymized yourself, but also severely reduced anonymity for other people...

One possible solution that I think can be done is to include additional addresses from each CashFusion in the future requests. This increases the load, but protects anonymity (somewhat). You should read the comments below - @ZeusOnPills has some interesting thoughts.

8
$ 8.36
$ 5.00 from @unitedstatian
$ 2.00 from @sploit
$ 0.50 from @nyusternie
+ 10
Sponsors of Read.Cash
empty
empty
Avatar for Read.Cash
4 years ago

Comments

Very interesting! What is the difference with CashShuffle?

$ 0.00
4 years ago

CashShuffle requires equal outputs. So if Alice, Bob and Charlie come with 5, 6 an 7 satoshis - all they get out is 3 anonymized "5 satoshi" outputs and not really anonymous change. It can't join two coins, so you end up with lots of anonymous outputs, that you usually need to spend later together, thereby deanonymizing yourself and other users in the process too.

$ 0.10
4 years ago

If you want to deanonymize CashFusion or CashShuffle users, all you need to do is plant a backdoor or otherwise compromise one of the ElectronX server that the wallet connects to. You see, when you check the balance of your addresses, the server can see your full set of addresses, which includes the addresses which went as inputs in all your past shuffles or fusions and all the addresses that were outputs to all your past shuffles or fusions. All of them are used in requests for their balance. You don't even need to compromise all the ElectronX servers. Just compromise one of them and then DoS the rest to force users to connect to the compromised one. And you just need to connect ONCE to a compromised server, to leak to them ALL your past history of shuffles and fusions.

The only way to avoid this issue as a user, is to run your own ElectronX server. Which pretty much nobody does. To make things worse, even if you use your own ElectronX server to lookup the balance of your addresses, other users in your fusions and shuffles don't do that. And the more of these users are deanonymized using the above method, the closer one comes to deanonymizing you as well (by deduction).

Maybe I can write an article about this because many people seem to sort of gloss over the problem and present BCH as having great anonymity. It does not. Shuffle and Fusion are great protocols that WILL bring anonymity eventually, but right now they are overkill given the way more serious privacy leaks of most wallets.

$ 1.62
4 years ago

I think the BCHD team is working on addressing this problem of SPV privacy (Neutrino wallet?)

Still, trusting one server to not leak your data is much better than having all your data publicly visible on the blockchain.

$ 0.86
4 years ago

Yep, Neutrino it is.

$ 0.00
4 years ago

trusting one server to not leak your data is much better than having all your data publicly visible on the blockchain.

I would also very much agree with that.

$ 0.11
4 years ago

It would be fine if it was one server. Unfortunately as long as you keep your wallet, you just need to connect to one compromised server to retroactively deanonymize all your past shuffles or fusions. So you are trusting all the servers that you will ever connect in the future with that wallet, not just one server.

$ 0.00
4 years ago

Yeah they use bloom filters to check whether there might be a payment to one of your addresses in a block, and then they ask random nodes for the block data. The have to download the full block if it is suspected to contain a payment though, which right now is fine, but if blocks get bigger it may exclude users that pay per GB.

Yes it is better not to have the history on the blockchain. It's just that if literally a couple of the ElectronX servers are compromised, whether because the operator is malicious, or has been coerced to comply and gagged, or simply because someone hacked in the server to compromise them, we could right now be in a state where the actual privacy of the system is way lower than what we think it is. Someone could be actively creating a database of de-anonymization data for shuffles and fusions right now and we wouldn't even know about it.

It is still useful! Don't take me wrong. As they say: when you pay for your VPN the company doesn't see your salary. So it's better to do it rather than not do it. Better to have it rather than not have it. But if someone is trying to sell stuff that could end up putting them in prison in some country, then perhaps they should not be relying on any of these. But I feel like we are inviting people to rely on these because of the way we promote them. I keep seeing happy posts about how awesome these protocols are and people are doing comparisons with Monero when, essentially, whoever uses CashFusion/Shuffle is like they are using it publicly outside a police station without ever looking who is behind them. I don't see as many people warning people about the issues with SPV wallets, and I think there should be a footnote about what's the threat model that Electron Cash with CashFusion combats.

I hope Electron Cash also implements the Neutrino protocol. Or that Neutrino implements CashFusion.

$ 0.10
4 years ago

I've decided to update the article with your concerns. Thank you!

$ 0.00
4 years ago

Thank you. Nice graphs for it as well!

$ 0.00
4 years ago

Also probably an interesting idea would be to actually request balances and history of all other participants as well... thereby adding fake data in the process.

$ 0.01
4 years ago

Yeah I've thought of that as well. The problem is that while it's perfect for a single shuffle or fusion, the more you shuffle the less it helps. Here's an example:

You shuffled 1 Sent 0.6 to someone and got back 0.4 as change You shuffled those 0.4

You made two shuffles, and you are requesting the balance of all addresses in these shuffles. Now it is very likely that there was no other user which participated in BOTH of these shuffles with you, so you are the only person who happens to be asking for both of their address sets. The more shuffles you do, the more probable that you are the only user asking for these shuffles inputs and outputs. So as long as you connect ONCE to a compromised server, they can deduce which coins were yours in these fusions and trace them through them. They would be unable to do that if you only made one fusion.

Basically, take the set of addresses that a users asks the balance for. Find the biggest connected subgraph of the fusions which has the property that the user is asking for all addresses that are on the edges of that subgraph. Now you know the user's fusions, and the edges will usually belong to the user or maybe some of them will belong to other users that happened to take part in two adjacent fusions of the user (but that should more rare probably).

I haven't tried to simulate this in order to verify it, I just think it is doable.

$ 0.35
4 years ago

I actually think you have a very valid concern that needs to be thought through.

$ 0.10
4 years ago

Yeah, I think you're right, we do need something like Neutrino SPV:

What is client-side filtering?

Neutrino's client-side filtering means you no longer have to disclose your Bitcoin address to find out if you have received a transaction. Neutrino downloads a hyper-compressed representation of each block, and your mobile phone checks to see if any transactions are related to you.

Doesn't this increase bandwidth?

Yes it does, but when you crunch the numbers it’s not that bad. Even with large blocks (e.g. 100MB) the data usage is within the range of a typical mobile data plan. If you need to save even more bandwidth you can always adjust how the wallet syncs unconfirmed transactions.

$ 0.15
4 years ago

Network privacy is an ongoing problem for all SPV wallets in general; the tried-and-true way to address it is simply running a full node yourself... which obviously isn't applicable for most people (it does provide a strong incentive to run nodes though).

There are, however, ways to mitigate this vector to a certain extent. BIP157/158 filtering is one solution as some people mentioned - however I don't see it as a true solution in the long run, due to both the need to download all filters and the need to download blocks (there are initiatives for partial blocks, but those aren't here yet).

Another route that additionally benefits usability is, of course, reusable addresses. Sadly not nearly enough resources has been dedicated to it, so it's stuck in Proof-of-Concept land. =\

In the near term, I see a more practical path forward - implementable purely on the client side - to be multiplexing: request only a small subset of addresses from each Electrumx/Electronx/Fulcrum/Electrs server, and bumping redundancy of requests up to 2 or 3x for reliability. Privacy will then depend on the number of non-sybil servers available. If implemented this has the benefit of improving privacy and robustness, and does not require modifying server infrastructure; however again lack of resources is limiting its deployment.

$ 0.30
4 years ago

be multiplexing: request only a small subset of addresses from each Electrumx

Though this still needs to be implemented right - if today you're asking a server for addresses 2,3,7,9 and tomorrow for 1,3,9,15 - then it's pretty linkable :) I still think that adding spurious(fake) requests is a good addition to that - it adds potential confusion.

Totally forgot about reusable addresses - added it to What's so great about Bitcoin Cash

$ 0.10
4 years ago

We had a long discussion about this in Electron-Cash channel, the way to do it properly is to be deterministic - always request the same addresses from the same server as much as possible. One possible way is to have a "server rank" for each address based on hash of address+server-domain.

False positives help, but they'll have to be deterministic as well, else they're easily weeded out. :P

$ 0.30
4 years ago

Totally agree about both points. Good to know that Electron Cash devs are aware of this. Kudos!

$ 0.10
4 years ago