There are two minor trade-offs required to make long chain mining possible. Those are the use of “dirty” transaction chains and also the concept of grouping transactions by ancestor state.
As for dirty chain data, typical dirty chains will have the correct data because diamond transaction graphs are relatively rare, therefore, even chains marked dirty will typically have the correct ancestor state data. It is only the diamonds (where a transaction has two or more outputs then converge one or more of them into a single input) that are left dirty because of their excessive need to parse through the entire transaction chain to find correct ancestor values, but again, these are quite rare. In a recent mainnet survey it was found that only 0.1% of transactions have a diamond pattern in their ancestor chain. This means that in a typical 1MB block, where there are roughly 3000 transactions, you would typically have only 3 that had “dirty” and slightly inflated numbers for their ancestor states.
The other trade-off concerns considering a group of ancestors as a single transaction. We can call these transactions, Ancestor Grouped Transactions (AGT). This approach to grouping allows us to process packages orders of magnitude faster than other methods of package mining since we no longer have to continuously update the descendant state as we mine part of an unconfirmed chain.
However, there is a theoretical limitation in this approach which could happen when a block is almost full. We could for instance end up including a lower fee transaction as part of an ancestor group when in fact it would be better, in terms of fees, to include some other single transaction. This would result in slightly less fees (perhaps a few hundred satoshis) rewarded to the miner. However, this situation is not likely to be seen for two reasons. One, long unconfirmed chains are typically having transactions with all the same fees and two, the typical child pays for parent scenario has only two transactions with the child having the higher fee. And neither of these two types of packages could cause any loss of fees with this mining algorithm, when the block is nearly full.
The mining algorithm is surprisingly simple and entails parsing through the mempool’s “ancestor_score” index and adding the AGT's into the new block. There is however a pathological case which has to be accounted for where a child transaction has less fees per KB than its parent which causes child transactions to show up later as we parse through the ancestor index. In this case we then have to recalculate the ancestor sigops and package size which can be time consuming, given we have to parse through the ancestor tree each time. However we get around that by shortcutting the process by parsing through only the portion of the tree that is currently not in the block. This shortcutting happens in “_CalculateMempoolAncestors()” where we pass in the “inBlock” data structure containing the already added transactions.