-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure txPricedList reheaps regularly (avoid tx memory leak) #1816
Conversation
Coverage from tests in coverage: 49.4% of statements across all listed packagescoverage: 65.9% of statements in consensus/istanbul coverage: 39.8% of statements in consensus/istanbul/announce coverage: 54.5% of statements in consensus/istanbul/backend coverage: 0.0% of statements in consensus/istanbul/backend/backendtest coverage: 24.3% of statements in consensus/istanbul/backend/internal/replica coverage: 65.3% of statements in consensus/istanbul/core coverage: 45.0% of statements in consensus/istanbul/db coverage: 0.0% of statements in consensus/istanbul/proxy coverage: 75.3% of statements in consensus/istanbul/uptime coverage: 100.0% of statements in consensus/istanbul/uptime/store coverage: 51.8% of statements in consensus/istanbul/validator coverage: 79.2% of statements in consensus/istanbul/validator/randomCommentID: a350172000 |
the ratio:
is it possible for the 4 to be taken in as a geth argument / flag, with default being 4? or even make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -336,7 +336,8 @@ func NewTxPool(config TxPoolConfig, chainconfig *params.ChainConfig, chain block | |||
pool.reset(nil, chain.CurrentBlock().Header()) | |||
// TODO: Does this reordering cause a problem? | |||
// priced list depends on the current ctx which is set in reset | |||
pool.priced = newTxPricedList(pool.all, &pool.currentCtx) | |||
// Use the global slots as the max amount of stale transactions in the priced heap before a re-heap. | |||
pool.priced = newTxPricedList(pool.all, &pool.currentCtx, int64(pool.config.GlobalSlots)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess logic here is that we can have as many stales a tx have in the pool, if we consider that a tx usually is 1 slot, which i'm not sure if true in average.
Still, it's works a default number, just a bit ackward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if stales > as globalslots and a reheap hadn't been done before, it means that the heaps sizes are at a minimum 4x the size (due to the 25% rule) so something is off. A lower number could be set, but I feel like as a ceiling it should be ok.
While adding them as arguments is possible, it probable adds more noise than anything, since now the Reheap will be run on every pool reorg with a reset, which happens on every ChainHeadEvent i.e every new block. We could argue that it's not even needed, except for extreme cases where the blockchain is not advancing. |
This PR fixes a memory leak found while running v1.5.0 nodes before the Espresso fork. txPricedList keeps both a list of all txs in the pool, and also a heap for fast access to the cheapest txs. These heaps are not always synchronized with the txs list: they increment at the same time as the list, but they do not decrement at the same time. To decrement, the txpool marks how many txs are 'stale', and after a critical point, a Reheap() is called, where the whole memory of the heaps is released and they are recreated. The issue in the code was that the critical point was never reached, and the heaps grew indefinitely. Strangely, the behavior is different after Expresso HF activation, since then Reheap() is called periodically on changes for the base fee. But before the HF nothing forces this so the heaps grow forever. One of the reasons for the critical point never being reached is the tx pool not respecting the contract of calling Removed every time it should. The issue seems to be similar to ethereum/go-ethereum#23690, which was workarounded by reheaping after every pool reorg. Here we also add a couple of safeguards based on the size of the tx pool to ensure that, even if the node is stuck (due to a bad block or any other reason), proper reheaps are triggered if necessary. A follow-up investigation should be made to have an exhaustive list of all the reasons why the de-synchronization occurs. An example of those is the method demoteUnexecutables, where many txs are removed from the pool but not from the pricedlist.
This PR fixes a memory leak found while running v1.5.0 nodes before the Espresso fork. txPricedList keeps both a list of all txs in the pool, and also a heap for fast access to the cheapest txs. These heaps are not always synchronized with the txs list: they increment at the same time as the list, but they do not decrement at the same time. To decrement, the txpool marks how many txs are 'stale', and after a critical point, a Reheap() is called, where the whole memory of the heaps is released and they are recreated. The issue in the code was that the critical point was never reached, and the heaps grew indefinitely. Strangely, the behavior is different after Expresso HF activation, since then Reheap() is called periodically on changes for the base fee. But before the HF nothing forces this so the heaps grow forever. One of the reasons for the critical point never being reached is the tx pool not respecting the contract of calling Removed every time it should. The issue seems to be similar to ethereum/go-ethereum#23690, which was workarounded by reheaping after every pool reorg. Here we also add a couple of safeguards based on the size of the tx pool to ensure that, even if the node is stuck (due to a bad block or any other reason), proper reheaps are triggered if necessary. A follow-up investigation should be made to have an exhaustive list of all the reasons why the de-synchronization occurs. An example of those is the method demoteUnexecutables, where many txs are removed from the pool but not from the pricedlist.
This PR fixes a memory leak found while running v1.5.0 nodes before the Espresso fork.
txPricedList
keeps both a list of all txs in the pool, and also a heap for fast access to the cheapest txs. These heaps are not always synchronized with the txs list: they increment at the same time as the list, but they do not decrement at the same time. To decrement, the txpool marks how many txs are 'stale', and after a critical point, aReheap()
is called, where the whole memory of the heaps is released and they are recreated.The issue in the code was that the critical point was never reached, and the heaps grew indefinitely. Strangely, the behavior is different after Expresso HF activation, since then
Reheap()
is called periodically on changes for the base fee. But before the HF nothing forces this so the heaps grow forever. One of the reasons for the critical point never being reached is the tx pool not respecting the contract of callingRemoved
every time it should.The issue seems to be similar to ethereum/go-ethereum#23690, which was workarounded by reheaping after every pool reorg. Here we also add a couple of safeguards based on the size of the tx pool to ensure that, even if the node is stuck (due to a bad block or any other reason), proper reheaps are triggered if necessary.
A follow-up investigation should be made to have an exhaustive list of all the reasons why the de-synchronization occurs. An example of those is the method
demoteUnexecutables
, where many txs are removed from the pool but not from the pricedlist.