Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Tribe problems (join/leave, syncing tasks) #1226

Closed
evilezh opened this issue Sep 21, 2016 · 2 comments
Closed

Tribe problems (join/leave, syncing tasks) #1226

evilezh opened this issue Sep 21, 2016 · 2 comments

Comments

@evilezh
Copy link

evilezh commented Sep 21, 2016

Last days i was trying to get tribe to work. As documentation is quite poor, some things should be discovered by browsing source code or monitoring network activity.
First of all:
For tribe you need TCP/UDP 6000 and TCP 8181 as minimum (not sure about 8082).

So - first of all, weird way to create add modules/tasks to agreement.
Something like - direct command - load this module/task into this agreement would be beneficial.
Currently it is mess .. especially if you want to join multiple agreements, if you want to upload tasks/modules to exact agreement
It looks like. Join node to agreementA, upload modules/tasks. leave, join agreementB, upload modules/tasks etc.
I would suggest in tribe mode add extra attribute where task/module should go.

Next is bug.
Once node A,B joined tribe. I did not succeed of running command on node A command ... join node B to agreement X.
It joins .. but produces forever cycle.
On node A - port 8181 comes requests in at rapid rate from node B.
node A log sample:
time="2016-09-21T18:13:30Z" level=debug msg="API request" _module="_mgmt-rest" index=126 method=GET url="/v1/tasks/5fbeb443-17e3-4545-88a3-a1e3d778e375"
time="2016-09-21T18:13:30Z" level=debug msg="API response" _module="_mgmt-rest" index=126 method=GET status=OK status-code=200 url="/v1/tasks/5fbeb443-17e3-4545-8
8a3-a1e3d778e375"
time="2016-09-21T18:13:31Z" level=debug msg="API request" _module="_mgmt-rest" index=127 method=GET url="/v1/tasks/5fbeb443-17e3-4545-88a3-a1e3d778e375"
time="2016-09-21T18:13:31Z" level=debug msg="API response" _module="_mgmt-rest" index=127 method=GET status=OK status-code=200 url="/v1/tasks/5fbeb443-17e3-4545-8
8a3-a1e3d778e375"

on node B in logs
time="2016-09-20T14:14:17Z" level=error msg="error starting task" _block=start-task _error="Task not found" _module=scheduler task-id=d2533f04-e093-414b-a248-4f15009e7158

Also when you leave agreement on node A tell, that node B leaves.
It produces plenty of errors.

But .. it seems fine if you issue command of join/leave on same node which you intend to join.

Will add more as i found something more.

@IRCody
Copy link
Contributor

IRCody commented Sep 21, 2016

@evilezh: Thanks for the feedback. In your exploration of tribe, if you identify gaps in the documentation we'd be happy to accept/merge PR's to close those gaps. The tribe docs could definitely use some love.

Something that I think would be helpful and is probably causing most of the issues you're seeing is that tribe is only meant to be run with 1 agreement. The idea is to keep a cluster of snapd's in "sync" as much as possible.

Supporting multiple agreements is something that has been talked about but has not been implemented. It introduces a decent amount of complexity (some that isn't immediately obvious). There was an RFC regarding this and other improvements to tribe but I'm not sure what the timeline is on implementing those, see #640.

Something like - direct command - load this module/task into this agreement would be beneficial.

If a node is part of an agreement -- This should be as easy as snapctl plugin load x executed against any node that is part of the agreement. This action should be propagated to all nodes in the agreement. Same idea works for tasks.

@evilezh
Copy link
Author

evilezh commented Sep 22, 2016

Yeah .. but snap allows you to join to multiple agreements btw. :)

As for for RFC, it seems something similar, only changes new commands .. new tribes...

I was purposing just add few extra attributes to tasks/modules, if snap is in tribe mode.
Even more, to simplify all things ...
We can assume that tribe is always on and standalone node is tribe with one member. Then you do not need special cases and can have one code base.

Here is how i was intended to use snap:

  1. Create "initial node" (one or more), join them and upload agreements and modules to them.
  2. on each server snap joins to tribe
  3. each server computes it's role and based on that joins one or more agreements, which would result uploading modules and tasks.

I did some initial tests and it seemed that one node can join one or more agreements.
Only after I figured out that it is not so straight forward and uploading tasks/modules to agreements requires node to join1/leave1/join2/leave2 ... etc.

I did not try to dig deep into source code .. but it seems quite straight forward to implement such behavior.
And each agreement would execute in own context.
new agreement -> new runner
leave agreement -> kill runner
It's like running n-snaps on same machine.

Either way, thanks for explanation, it seems I need to run in standalone and manage modules/tasks from outside.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants