There has been a lot of discussion lately about CashFusion and in particular: the combinatoric math behind it. I had written an article analyzing it but not everyone is convinced. Bitcoin magazine covered the development, so thanks to those guys for the coverage, but it really got me thinking if there was a better way to prove the math is right.
The approach to the math in the spec (and my previous article) is basically to compare the total number of partitions available to the size of the space available for different combinations of amounts. While this approach seems roughly sensible, one could still ask "How do you know the vast majority of partitions aren't invalid due to overlap?" In other words -- if we "assign" one input to a player, it's no longer available to another player, and that might have been the very input that player needed to complete a correct sum.
So, here, I present a different approach that attempts a more stringent accounting of the transaction components, without allowing any "overlap".
It goes like this: Assume 10 players, each with 10 inputs and 10 outputs. We start with the inputs, and with the first player. To generate a possible combination, the first player can select 10 inputs from the entire pool of 100 inputs. (They could contain all, none, or some of his actual inputs.)
We calculate the number of ways this is possible using an "n choose k" formula, which means given a set of n elements, how many ways can we chose k items?
This is also known as the binomial coefficient.
Continuing on, player 2 cannot choose any of the inputs player 1 already chose, so he can choose 10 inputs from only 90 possibilities "90 choose 10". Player 3's combinations are "80 choose 10", and so on. This continues all the way up to player 9, who is "20 choose 10". (Player 10 has nothing to choose at all because he simply has the "leftovers")
100 choose 10 is greater than 10^13. But 20 choose 10 is only 184,756. This may seem like a small number compared to, say, a billion possibilities for a BCH amount. However, in addition to choosing 10 inputs, each player also chooses 10 outputs. Player 9 is the last to choose, so he will have 184,756 possible input partitions, AND 184,756 possible output partitions. The total is 184,756^2, which is 34,134,779,536.
When calculating the likelihood of finding a match, all players 1-9 must have a match, but the number of partitions are so large for most of them, that the probabilities are almost equivalent to certainty. We can calculate it all of it if desired, but we can also simplify in practice by focusing on the last and least likely player, and so we get 34 billion partitions compared to 1 billion possible values.
If this ratio seems unimpressive, keep in mind it is a very rough lower bound. The reason is that in this example, we are only calculating for combinations in which all 10 players have exactly 10 inputs and 10 outputs. In reality, the number of partitions is higher because it could be any number of inputs and outputs for each player, so long as each input and output is assigned to someone.
Also, even the most modest "quantizing" increases the number of valid partitions by orders of magnitude. For example, a 1.34394038 output could be rounded to 1.3439403, cutting the number of values from 10^9 to 10^8, and making it 10 times more likely to find a valid partitions.
One caveat is that the chance for each player's sum to be correct is still assuming each sum value is random (with a uniform distribution) within a range of values (in this case 1-10^9). In reality we know the distribution is probably not uniform, but it still seems a reasonable assumption as a sum of random numbers is more or less random, and that the input values are arbitrary, and the output values are quasi-random based on the inputs.