Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kusama parachains not producing blocks for a session #3058

Closed
alexggh opened this issue Jan 25, 2024 · 2 comments · Fixed by #3063
Closed

Kusama parachains not producing blocks for a session #3058

alexggh opened this issue Jan 25, 2024 · 2 comments · Fixed by #3063
Assignees
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network.

Comments

@alexggh
Copy link
Contributor

alexggh commented Jan 25, 2024

Issue

After v1,001,000 upgrade https://kusama.subsquare.io/referenda/328, kusama parachains blocks were very rarely included in the relay chain until the next session.

Investigation

v1,001,000 runtime upgrade brought with the async_backing_params API: https://github.com/polkadot-fellows/runtimes/blob/e115160682a87798cb6557db9c73e77d00ddb0ab/relay/kusama/src/lib.rs#L1973, that caused several new subsystems(prospective-parachains, statement-distribution-v2) to activate mid-session and that cause backing of candidate to slow down significantly see https://grafana.teleport.parity.io/goto/6jZXXmtSg?orgId=1

Screenshot 2024-01-25 at 12 46 11

Reproducibility

The issue can be reproduce easily with zombienet if we hacked the runtime-api subsystem to fake the presence of async_backign_params at block after the session start, something like this:

-               Request::AsyncBackingParams(sender) => {
-                       query!(
-                               AsyncBackingParams,
-                               async_backing_params(),
-                               ver = Request::ASYNC_BACKING_STATE_RUNTIME_REQUIREMENT,
-                               sender
-                       )
+               Request::AsyncBackingParams(block_number, sender) => {
+                       let sender = sender;
+                       let version: u32 = (Request::ASYNC_BACKING_STATE_RUNTIME_REQUIREMENT);
+                       let runtime_version = client
+                               .api_version_parachain_host(relay_parent)
+                               .await
+                               .unwrap_or_else(|e| {
+                                       gum::warn!(target:LOG_TARGET,"cannot query the runtime API version: {}",e,);
+                                       Some(0)
+                               })
+                               .unwrap_or_else(|| {
+                                       gum::warn!(target:LOG_TARGET,"no runtime version is reported");
+                                       0
+                               });
+                       let res = if runtime_version >= version && block_number > ENABLE_ASYNC_PARAMS_BLOCK {
+                               client.async_backing_params(relay_parent).await.map_err(|e| {
+                                       RuntimeApiError::Execution {
+                                               runtime_api_name: stringify!(async_backing_params),
+                                               source: std::sync::Arc::new(e),
+                                       }
+                               })
+                       } else {
+                               Err(RuntimeApiError::NotSupported {
+                                       runtime_api_name: stringify!(async_backing_params),
+                               })
+                       };
+                       metrics.on_request(res.is_ok());
+                       let _ = sender.send(res.clone());
+                       res.ok().map(|res| RequestResult::AsyncBackingParams(relay_parent, res))

Root-cause

The problem was that NewGossipTopology is emitted at the beginning of each session and we store it in statement-distribution v2 in what is called per_session_state, here: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/network/statement-distribution/src/v2/mod.rs#L456.

However, the per_session_state is actually created the first time we encounter a leaf that has prospective parachains enabled here: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/network/statement-distribution/src/lib.rs#L320.

So, since we receive the topology at the begining of the session the first block of the session did not have prospective parachains enabled we discarded the topology, so once we started processing blocks with async backing enabled we did not have a valid topology, so we weren't able to distribute our backing statements to other peers. QED.

Conclusions

@alexggh alexggh self-assigned this Jan 25, 2024
@alexggh alexggh added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Jan 25, 2024
@alexggh alexggh moved this from Backlog to In Progress in parachains team board Jan 25, 2024
alexggh added a commit that referenced this issue Jan 25, 2024
…lement

Topology is coming only at the beginning of each session, so we might
lose if prospective parachains was not enabled at the begining of the
session, so cache it for later use

Fixes: #3058

Signed-off-by: Alexandru Gheorghe <[email protected]>
alexggh added a commit that referenced this issue Jan 25, 2024
…lement

Topology is coming only at the beginning of each session, so we might
lose if prospective parachains was not enabled at the begining of the
session, so cache it for later use

Fixes: #3058

Signed-off-by: Alexandru Gheorghe <[email protected]>
@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-digest-26-jan-2024/5848/1

github-merge-queue bot pushed a commit that referenced this issue Jan 29, 2024
…lement (#3063)

Topology is coming only at the beginning of each session, so we might
lose it if prospective parachains was not enabled at the begining of the
session, so cache it for later use.

Fixes: #3058

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to Completed in parachains team board Jan 29, 2024
@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/2024-04-21-polkadot-parachains-stalled-until-next-session/7526/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
Status: Completed
Development

Successfully merging a pull request may close this issue.

2 participants