Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-4799] Fix upgrade bugs when switching charms #268

Merged
merged 9 commits into from
Jun 28, 2024

Conversation

dragomirp
Copy link
Contributor

@dragomirp dragomirp commented Jun 27, 2024

  • Fixing new snap install failure due to writing to SNAP_DATA prematurely
  • Fixing upgrade not triggering on a single unit

Lib changes should be upstreamed

Fixes #267

Copy link

codecov bot commented Jun 27, 2024

Codecov Report

Attention: Patch coverage is 56.25000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 70.96%. Comparing base (6c83f6e) to head (a95f137).

Files Patch % Lines
src/charm.py 50.00% 5 Missing ⚠️
src/upgrade.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #268      +/-   ##
==========================================
- Coverage   71.06%   70.96%   -0.11%     
==========================================
  Files           7        7              
  Lines        1182     1195      +13     
  Branches      209      210       +1     
==========================================
+ Hits          840      848       +8     
- Misses        264      269       +5     
  Partials       78       78              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dragomirp dragomirp changed the title [DPE-4799] Suppress rendering hooks when upgrading [DPE-4799] Fix upgrade bugs when switching charms Jun 27, 2024
@@ -921,7 +921,7 @@ def _on_upgrade_charm(self, event: UpgradeCharmEvent) -> None:
self.charm.unit.status = WaitingStatus("other units upgrading first...")
self.peer_relation.data[self.charm.unit].update({"state": "ready"})

if self.charm.app.planned_units() == 1:
if len(self.app_units) == 1:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planned units don't work on subordinate charms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see PR for LIBPATCH = 18: https://github.com/canonical/data-platform-libs/pulls
You can shoot your leg.

We need subordinate test on dpe-lib... as LIBPATCH=17 should notice this regression.

CC: @marceloneppel : canonical/data-platform-libs#176

Proposal: copy tests from data-integrator to dpe-libs? Yeah, yeah... shared tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see PR for LIBPATCH = 18: https://github.com/canonical/data-platform-libs/pulls You can shoot your leg.

Let's make sure the changes work here first. Rollback doesn't seem to work anymore in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started a draft for the lib changes: canonical/data-platform-libs#179

However, there are issues with CI there and it will take me time to fix them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I executed some tests. All passed, as I commented on canonical/data-platform-libs#179 (review).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer early exit events that may trigger cfg reload and dir creation during upgrade

Comment on lines +490 to +493
if not self.upgrade.idle:
logger.debug("Early exit on_update_status: Cluster is upgrading")
return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early exit update status, since it will retrigger eventually. Not strictly necessary, but if upgrade is stuck or failed it will overwrite the status.

Comment on lines +91 to +96
logger.info("Wait for upgrade to start")
await ops_test.model.block_until(
lambda: ("waiting" if "pre-upgrade-check" in actions else "maintenance")
in {unit.workload_status for unit in application.units},
timeout=TIMEOUT,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda didn't work for me locally, when upgrading a single unit.

@dragomirp dragomirp marked this pull request as ready for review June 28, 2024 02:04
@@ -921,7 +921,7 @@ def _on_upgrade_charm(self, event: UpgradeCharmEvent) -> None:
self.charm.unit.status = WaitingStatus("other units upgrading first...")
self.peer_relation.data[self.charm.unit].update({"state": "ready"})

if self.charm.app.planned_units() == 1:
if len(self.app_units) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see PR for LIBPATCH = 18: https://github.com/canonical/data-platform-libs/pulls
You can shoot your leg.

We need subordinate test on dpe-lib... as LIBPATCH=17 should notice this regression.

CC: @marceloneppel : canonical/data-platform-libs#176

Proposal: copy tests from data-integrator to dpe-libs? Yeah, yeah... shared tests...

Copy link
Contributor

@lucasgameiroborges lucasgameiroborges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see all 3 test_upgrade.py integration tests failing in (apparently) the same way. Is it expected / known issue?

@dragomirp
Copy link
Contributor Author

I see all 3 test_upgrade.py integration tests failing in (apparently) the same way. Is it expected / known issue?

Rollback doesn't work. Investigating ATM.

@dragomirp
Copy link
Contributor Author

Rollback test was broken. Should be fixed now.

@@ -152,7 +152,7 @@ async def test_fail_and_rollback(ops_test, continuous_writes, pgb_charm_jammy) -
application = ops_test.model.applications[PGB]

logger.info("Refresh the charm")
await application.refresh(path=pgb_charm_jammy)
await application.refresh(path=fault_charm)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We weren't using the correct charm and hitting other blocked statuses.

Copy link
Member

@marceloneppel marceloneppel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -921,7 +921,7 @@ def _on_upgrade_charm(self, event: UpgradeCharmEvent) -> None:
self.charm.unit.status = WaitingStatus("other units upgrading first...")
self.peer_relation.data[self.charm.unit].update({"state": "ready"})

if self.charm.app.planned_units() == 1:
if len(self.app_units) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I executed some tests. All passed, as I commented on canonical/data-platform-libs#179 (review).

@dragomirp
Copy link
Contributor Author

Merging this for the sake of fixing upgrades, I will sync up lib later if it diverges.

@dragomirp dragomirp merged commit c379b0d into main Jun 28, 2024
52 checks passed
@dragomirp dragomirp deleted the dpe-4799-snap-switch-upgrade branch June 28, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unstable upgrade when switching snaps
4 participants