-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-4799] Fix upgrade bugs when switching charms #268
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #268 +/- ##
==========================================
- Coverage 71.06% 70.96% -0.11%
==========================================
Files 7 7
Lines 1182 1195 +13
Branches 209 210 +1
==========================================
+ Hits 840 848 +8
- Misses 264 269 +5
Partials 78 78 ☔ View full report in Codecov by Sentry. |
@@ -921,7 +921,7 @@ def _on_upgrade_charm(self, event: UpgradeCharmEvent) -> None: | |||
self.charm.unit.status = WaitingStatus("other units upgrading first...") | |||
self.peer_relation.data[self.charm.unit].update({"state": "ready"}) | |||
|
|||
if self.charm.app.planned_units() == 1: | |||
if len(self.app_units) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planned units don't work on subordinate charms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see PR for LIBPATCH = 18: https://github.com/canonical/data-platform-libs/pulls
You can shoot your leg.
We need subordinate test on dpe-lib... as LIBPATCH=17 should notice this regression.
CC: @marceloneppel : canonical/data-platform-libs#176
Proposal: copy tests from data-integrator to dpe-libs? Yeah, yeah... shared tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see PR for LIBPATCH = 18: https://github.com/canonical/data-platform-libs/pulls You can shoot your leg.
Let's make sure the changes work here first. Rollback doesn't seem to work anymore in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started a draft for the lib changes: canonical/data-platform-libs#179
However, there are issues with CI there and it will take me time to fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I executed some tests. All passed, as I commented on canonical/data-platform-libs#179 (review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer early exit events that may trigger cfg reload and dir creation during upgrade
if not self.upgrade.idle: | ||
logger.debug("Early exit on_update_status: Cluster is upgrading") | ||
return | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early exit update status, since it will retrigger eventually. Not strictly necessary, but if upgrade is stuck or failed it will overwrite the status.
logger.info("Wait for upgrade to start") | ||
await ops_test.model.block_until( | ||
lambda: ("waiting" if "pre-upgrade-check" in actions else "maintenance") | ||
in {unit.workload_status for unit in application.units}, | ||
timeout=TIMEOUT, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lambda didn't work for me locally, when upgrading a single unit.
@@ -921,7 +921,7 @@ def _on_upgrade_charm(self, event: UpgradeCharmEvent) -> None: | |||
self.charm.unit.status = WaitingStatus("other units upgrading first...") | |||
self.peer_relation.data[self.charm.unit].update({"state": "ready"}) | |||
|
|||
if self.charm.app.planned_units() == 1: | |||
if len(self.app_units) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see PR for LIBPATCH = 18: https://github.com/canonical/data-platform-libs/pulls
You can shoot your leg.
We need subordinate test on dpe-lib... as LIBPATCH=17 should notice this regression.
CC: @marceloneppel : canonical/data-platform-libs#176
Proposal: copy tests from data-integrator to dpe-libs? Yeah, yeah... shared tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see all 3 test_upgrade.py
integration tests failing in (apparently) the same way. Is it expected / known issue?
Rollback doesn't work. Investigating ATM. |
Rollback test was broken. Should be fixed now. |
@@ -152,7 +152,7 @@ async def test_fail_and_rollback(ops_test, continuous_writes, pgb_charm_jammy) - | |||
application = ops_test.model.applications[PGB] | |||
|
|||
logger.info("Refresh the charm") | |||
await application.refresh(path=pgb_charm_jammy) | |||
await application.refresh(path=fault_charm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't using the correct charm and hitting other blocked statuses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -921,7 +921,7 @@ def _on_upgrade_charm(self, event: UpgradeCharmEvent) -> None: | |||
self.charm.unit.status = WaitingStatus("other units upgrading first...") | |||
self.peer_relation.data[self.charm.unit].update({"state": "ready"}) | |||
|
|||
if self.charm.app.planned_units() == 1: | |||
if len(self.app_units) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I executed some tests. All passed, as I commented on canonical/data-platform-libs#179 (review).
Merging this for the sake of fixing upgrades, I will sync up lib later if it diverges. |
Lib changes should be upstreamed
Fixes #267