Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large deployments - Data node Active status without cluster manger #564

Open
smethnani opened this issue Feb 25, 2025 · 1 comment
Open
Labels
bug Something isn't working

Comments

@smethnani
Copy link
Contributor

smethnani commented Feb 25, 2025

Steps to reproduce

  1. Deploy and relate applications:
juju deploy -n 3 opensearch main \
 --config cluster_name="app" \
 --channel 2/edge \
 --config profile="testing"

juju deploy -n 2 opensearch failover \
--config cluster_name="app" \
--config init_hold="true" \
--config roles='cluster_manager' \
--channel 2/edge \
--config profile="testing"

juju deploy -n 1 opensearch data-hot \
--config cluster_name="app" \
--config init_hold="true" \
--channel 2/edge \
--config roles='data.hot' \
--config profile="testing"

juju deploy self-signed-certificates --config ca-common-name="Test CA"
juju integrate self-signed-certificates main
juju integrate self-signed-certificates failover
juju integrate self-signed-certificates data-hot
juju integrate main:peer-cluster-orchestrator failover:peer-cluster
juju integrate main:peer-cluster-orchestrator data-hot:peer-cluster
juju integrate failover:peer-cluster-orchestrator data-hot:peer-cluster
  1. Remove the main-data-hot relation before removing failover-data-hot relation:
juju remove-relation main data-hot
juju remove-relation failover data-hot

Expected behavior

The data node should be in blocked state

Actual behavior

The data node stays in active state

Addional Context

This only happens if juju remove-relation failover data-hot is called directly after calling juju remove-relation main data-hot. If you wait until the units have settled to idle state before removing the failover relation, the data unit moves to an error state with hook-failed: "opensearch-peers-relation-changed. The debug-log for that case is below:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 3213, in _run
    result = subprocess.run(args, **kwargs)  # type: ignore
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-data-hot-0/relation-set', '-r', '16', '--app', '--file', '-')' returned non-zero exit status 2.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 3316, in relation_set
    self._run(*args, input_stream=content)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 3215, in _run
    raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR invalid value "16" for option -r: relation not found


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-data-hot-0/charm/src/charm.py", line 221, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 553, in main
    manager.run()
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 529, in run
    self._emit()
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/main.py", line 515, in _emit
    self.framework.reemit()
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/framework.py", line 863, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/lib/charms/opensearch/v0/opensearch_relation_peer_cluster.py", line 555, in _on_peer_cluster_relation_changed
    self._put_current_app(event, deployment_desc)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/lib/charms/opensearch/v0/opensearch_relation_peer_cluster.py", line 695, in _put_current_app
    self.put_in_rel(data={"app": current_app.to_str()}, rel_id=event.relation.id)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/lib/charms/opensearch/v0/opensearch_relation_peer_cluster.py", line 104, in put_in_rel
    relation.data[self.charm.app].update(data)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 1833, in update
    super().update(other, **kwargs)
  File "/usr/lib/python3.10/_collections_abc.py", line 999, in update
    self[key] = other[key]
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 1810, in __setitem__
    self._commit(key, value)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 1814, in _commit
    self._backend.update_relation_data(self.relation.id, self._entity, key, value)
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 3556, in update_relation_data
    self.relation_set(relation_id, key, value, isinstance(_entity, Application))
  File "/var/lib/juju/agents/unit-data-hot-0/charm/venv/lib/python3.10/site-packages/ops/model.py", line 3319, in relation_set
    raise RelationNotFoundError() from e
ops.model.RelationNotFoundError
@smethnani smethnani added the bug Something isn't working label Feb 25, 2025
Copy link

Thank you for reporting your feedback to us!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6694.

This message was autogenerated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant