Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large deployments - remove-relation error in _promote_failover #566

Open
smethnani opened this issue Feb 27, 2025 · 1 comment
Open

Large deployments - remove-relation error in _promote_failover #566

smethnani opened this issue Feb 27, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@smethnani
Copy link
Contributor

smethnani commented Feb 27, 2025

Steps to reproduce

  1. Deploy main, failover and data apps
juju deploy -n 3 opensearch main \
 --config cluster_name="app" \
 --channel 2/edge \
 --config profile="testing"

juju deploy -n 1 opensearch failover \
--config cluster_name="app" \
--config init_hold="true" \
--config roles='cluster_manager' \
--channel 2/edge \
--config profile="testing"

juju deploy -n 1 opensearch data-hot \
--config cluster_name="app" \
--config init_hold="true" \
--channel 2/edge \
--config roles='data.hot' \
--config profile="testing"

juju deploy self-signed-certificates --config ca-common-name="Test CA"
juju integrate self-signed-certificates main
juju integrate self-signed-certificates failover
juju integrate self-signed-certificates data-hot
juju integrate main:peer-cluster-orchestrator failover:peer-cluster
juju integrate main:peer-cluster-orchestrator data-hot:peer-cluster
juju integrate failover:peer-cluster-orchestrator data-hot:peer-cluster
  1. Remove main-failover relation: juju remove-relation failover main

Expected behavior

The relation should be removed

Actual behavior

Failover moves to error state with hook failed: "peer-cluster-relation-departed"

Log output

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-failover-0/charm/src/charm.py", line 221, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 553, in main
    manager.run()
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 529, in run
    self._emit()
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 518, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name, self._juju_context)
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 139, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/framework.py", line 347, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-failover-0/charm/venv/lib/python3.10/site-packages/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-failover-0/charm/lib/charms/opensearch/v0/opensearch_relation_peer_cluster.py", line 750, in _on_peer_cluster_relation_departed
    self._promote_failover(orchestrators, cms)
  File "/var/lib/juju/agents/unit-failover-0/charm/lib/charms/opensearch/v0/opensearch_relation_peer_cluster.py", line 787, in _promote_failover
    main_cms = [cm for cm in cms if cm.app.id == orchestrators.main_app.id]
  File "/var/lib/juju/agents/unit-failover-0/charm/lib/charms/opensearch/v0/opensearch_relation_peer_cluster.py", line 787, in <listcomp>
    main_cms = [cm for cm in cms if cm.app.id == orchestrators.main_app.id]
AttributeError: 'NoneType' object has no attribute 'id'

Additional context

Related to #555. The error is in _promote_failover but the failover shouldn't be promoted in this case as main is still the elected leader

@smethnani smethnani added the bug Something isn't working label Feb 27, 2025
Copy link

Thank you for reporting your feedback to us!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6713.

This message was autogenerated

@smethnani smethnani changed the title Large deployments remove-relation error in _promote_failover Large deployments - remove-relation error in _promote_failover Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant