Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to setup transport: failed to connect to NATS #4860

Open
YuriyGavrilov opened this issue Feb 20, 2025 · 5 comments
Open

failed to setup transport: failed to connect to NATS #4860

YuriyGavrilov opened this issue Feb 20, 2025 · 5 comments
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected

Comments

@YuriyGavrilov
Copy link

YuriyGavrilov commented Feb 20, 2025

Bug Description

just setup fresh cli by curl -sL https://get.bacalhau.org/install.sh | bash

next run orchestrator by

bacalhau serve --orchestrator 


rock64@rockpro64:~$ sudo bacalhau serve --orchestrator 
20:33:41 | INF Config loaded from: [/home/rock64/.bacalhau/config.yaml], and with data-dir /home/rock64/.bacalhau
20:33:41 | INF Starting bacalhau...
20:33:42 | INF Starting server backend=http://0.0.0.0:1234 listen=0.0.0.0:8438
20:33:42 | INF bacalhau node running name=n-f4ce17b5-ebea-4e28-9ee7-27af9d6a0522

bacalhau serve --compute API.Host=192.168.0.105

base) yuriygavrilov@MBP-Yuriy evidence % bacalhau serve --compute API.Host=192.168.0.105
23:31:49 | INF Config loaded from: [/Users/yuriygavrilov/.bacalhau/config.yaml], and with data-dir /Users/yuriygavrilov/.bacalhau
23:31:49 | INF Starting bacalhau...
23:31:50 | WRN failed to start legacy connection manager. falling back to ncl protocol error="failed to create NATS client: no orchestrator available for connection at nats://127.0.0.1:4222"
23:31:50 | INF Starting connection manager node_id=QmTeDSDo6QCUuZw17qEU9LHMMtNFTWs1vLP46nwe7V5txw start_time=2025-02-20T23:31:50.709878+03:00
23:31:50 | INF Attempting to establish connection node_id=QmTeDSDo6QCUuZw17qEU9LHMMtNFTWs1vLP46nwe7V5txw
23:31:50 | ERR Connection attempt failed error="failed to setup transport: failed to connect to NATS: no orchestrator available for connection at nats://127.0.0.1:4222" backoffDuration=10s consecutiveFailures=1
23:31:50 | INF bacalhau node running name=QmTeDSDo6QCUuZw17qEU9LHMMtNFTWs1vLP46nwe7V5txw orchestrators=["nats://127.0.0.1:4222"]
^C23:31:54 | INF bacalhau node shutting down...

Expected Behavior

No errors with nats

Steps to Reproduce

as described below

Bacalhau Versions

1.6.4

Host Environment

Mac OS X client compute
linux - orchestrator

Job Specification

Just tried to start

Logs

provided

Client Logs:

provided

@YuriyGavrilov YuriyGavrilov added request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected labels Feb 20, 2025
Copy link

linear bot commented Feb 20, 2025

@YuriyGavrilov
Copy link
Author

YuriyGavrilov commented Feb 20, 2025

it seams to be wrong compute start

need to setup like this

bacalhau serve --compute --config Compute.Orchestrators=192.168.0.105

so it start perfectly

(base) yuriygavrilov@MBP-Yuriy evidence % bacalhau serve --compute --config Compute.Orchestrators=192.168.0.105
23:53:07 | INF Config loaded from: [/Users/yuriygavrilov/.bacalhau/config.yaml], and with data-dir /Users/yuriygavrilov/.bacalhau
23:53:07 | INF Starting bacalhau...
23:53:09 | INF Starting connection manager node_id=n-668f1331-5ee4-4638-bf38-3f32bb316acb start_time=2025-02-20T23:53:08.994416+03:00
23:53:09 | INF Attempting to establish connection node_id=n-668f1331-5ee4-4638-bf38-3f32bb316acb
23:53:09 | INF bacalhau node running name=n-668f1331-5ee4-4638-bf38-3f32bb316acb orchestrators=["192.168.0.105"]

so docs is not up to date https://docs.bacalhau.org/getting-started/create-private-network

@YuriyGavrilov
Copy link
Author

So tried to run simple example bacalhau docker run alpine echo hello -c API.Host=192.168.0.105

receive panic

23:53:09 | INF bacalhau node running name=n-668f1331-5ee4-4638-bf38-3f32bb316acb orchestrators=["192.168.0.105"]
^[[Apanic: runtime error: index out of range [-1]

goroutine 60 [running]:
github.com/bacalhau-project/bacalhau/pkg/docker.(*Client).SupportedPlatforms(0x1?, {0xedfc2b0?, 0xc000632460?})
	github.com/bacalhau-project/bacalhau/pkg/docker/docker.go:244 +0x250
github.com/bacalhau-project/bacalhau/pkg/executor/docker/bidstrategy/semantic.(*ImagePlatformBidStrategy).ShouldBid(0xc000122680, {0xedfc2b0, 0xc000632460}, {{{0xc0006ae3c0, 0x26}, {0xc0006ae3f0, 0x26}, {0xc000806b50, 0x7}, {0xc000806b57, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/docker/bidstrategy/semantic/image_platform.go:52 +0x114
github.com/bacalhau-project/bacalhau/pkg/executor/docker.(*Executor).ShouldBid(0x20?, {0xedfc2b0, 0xc000632460}, {{{0xc0006ae3c0, 0x26}, {0xc0006ae3f0, 0x26}, {0xc000806b50, 0x7}, {0xc000806b57, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/docker/executor.go:106 +0x88
github.com/bacalhau-project/bacalhau/pkg/executor/util.(*bidStrategyFromExecutor).ShouldBid(0xc00090c280?, {0xedfc2b0, 0xc000632460}, {{{0xc0006ae3c0, 0x26}, {0xc0006ae3f0, 0x26}, {0xc000806b50, 0x7}, {0xc000806b57, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/util/executors_bid_strategy.go:47 +0xc8
github.com/bacalhau-project/bacalhau/pkg/bidstrategy.(*ChainedBidStrategy).ShouldBid(0xc000888960, {0xedfc2b0, 0xc000632460}, {{{0xc0006ae3c0, 0x26}, {0xc0006ae3f0, 0x26}, {0xc000806b50, 0x7}, {0xc000806b57, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/bidstrategy/chained.go:54 +0x10e
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.runSemanticBidding({{0xee15cd0, 0xc0008a47e0}, {0xedde040, 0xc00061aa20}, {0xc0000d8e10, 0x9, 0x9}, {0xc0008a4960, 0x2, 0x2}}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:100 +0x1d0
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.doBidding({{0xee15cd0, 0xc0008a47e0}, {0xedde040, 0xc00061aa20}, {0xc0000d8e10, 0x9, 0x9}, {0xc0008a4960, 0x2, 0x2}}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:69 +0x58
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.RunBidding({{0xee15cd0, 0xc0008a47e0}, {0xedde040, 0xc00061aa20}, {0xc0000d8e10, 0x9, 0x9}, {0xc0008a4960, 0x2, 0x2}}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:49 +0x58
github.com/bacalhau-project/bacalhau/pkg/compute/watchers.(*ExecutionUpsertHandler).HandleEvent(0xc0004a2270?, {0xedfc2b0?, 0xc000632460?}, {0x1, {0xe0ef0bc, 0x6}, {0xe104811, 0xf}, {0xec2eac0, 0xc000395560}, ...})
	github.com/bacalhau-project/bacalhau/pkg/compute/watchers/executor_watcher.go:34 +0x165
github.com/bacalhau-project/bacalhau/pkg/lib/watcher.(*watcher).processEventWithRetry(0xc00067e5b0, {0xedfc2b0, 0xc000632460}, {0x1, {0xe0ef0bc, 0x6}, {0xe104811, 0xf}, {0xec2eac0, 0xc000395560}, ...})
	github.com/bacalhau-project/bacalhau/pkg/lib/watcher/watcher.go:269 +0xd8
github.com/bacalhau-project/bacalhau/pkg/lib/watcher.(*watcher).run(0xc00067e5b0, {0xedfc2b0, 0xc000632460})
	github.com/bacalhau-project/bacalhau/pkg/lib/watcher/watcher.go:212 +0x210
created by github.com/bacalhau-project/bacalhau/pkg/lib/watcher.(*watcher).Start in goroutine 25
	github.com/bacalhau-project/bacalhau/pkg/lib/watcher/watcher.go:184 +0x386

Job task is frozen in loop

Job successfully submitted. Job ID: j-2315bf20-c557-417f-ad63-b29244bfda5c
Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):

 TIME          EXEC. ID    TOPIC            EVENT         
 20:57:02.135              Submission       Job submitted 
 20:57:02.191  e-16e22445  Scheduling       Requested execution on n-668f1331 
 Processing    ....🐠................

orchestrator marks node as disconnected

rock64@rockpro64:~$ sudo bacalhau serve --orchestrator
20:50:57 | INF Config loaded from: [/home/rock64/.bacalhau/config.yaml], and with data-dir /home/rock64/.bacalhau
20:50:57 | INF Starting bacalhau...
20:50:59 | INF Starting server backend=http://0.0.0.0:1234 listen=0.0.0.0:8438
20:50:59 | INF bacalhau node running name=n-059752c8-39f9-4cbd-8fea-91965f970aea
20:53:09 | INF handshake successful with node n-668f1331-5ee4-4638-bf38-3f32bb316acb
20:57:58 | INF Marking node as disconnected lastHeartbeat=2025-02-20T20:56:54.459572488Z node=n-668f1331-5ee4-4638-bf38-3f32bb316acb

@YuriyGavrilov
Copy link
Author

retry to run compute node

(base) yuriygavrilov@MBP-Yuriy evidence % bacalhau serve --compute --config Compute.Orchestrators=192.168.0.105
00:01:34 | INF Config loaded from: [/Users/yuriygavrilov/.bacalhau/config.yaml], and with data-dir /Users/yuriygavrilov/.bacalhau
00:01:34 | INF Starting bacalhau...
00:01:35 | INF Starting connection manager node_id=n-668f1331-5ee4-4638-bf38-3f32bb316acb start_time=2025-02-21T00:01:35.696804+03:00
00:01:35 | INF Attempting to establish connection node_id=n-668f1331-5ee4-4638-bf38-3f32bb316acb
00:01:35 | INF bacalhau node running name=n-668f1331-5ee4-4638-bf38-3f32bb316acb orchestrators=["192.168.0.105"]
00:01:35 | ERR Error handling message error="execution already exists: e-16e22445-bf28-4c61-b360-2ee7afce1bcd" Bacalhau-EventTime=1740085296045520020 Bacalhau-MessageID=seq-2 Bacalhau-PayloadEncoding=json Bacalhau-SeqNum=2 Bacalhau-Source=orchestrator-n-668f1331-5ee4-4638-bf38-3f32bb316acb Bacalhau-Subject=bacalhau.global.compute.n-668f1331-5ee4-4638-bf38-3f32bb316acb.in.msgs Bacalhau-Type=AskForBid PayloadEncoding=json Type=AskForBid

Jot exit due to timeout

Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):

 TIME          EXEC. ID    TOPIC            EVENT         
 20:57:02.135              Submission       Job submitted 
 20:57:02.191  e-16e22445  Scheduling       Requested execution on n-668f1331 
                                             

Warning: Progress tracking timed out after 5m0s. The job is still running.
         

To get more details about the run, execute:
	bacalhau job describe j-2315bf20-c557-417f-ad63-b29244bfda5c

To get more details about the run executions, execute:
	bacalhau job executions j-2315bf20-c557-417f-ad63-b29244bfda5c

Web UI

Image

@YuriyGavrilov
Copy link
Author

YuriyGavrilov commented Feb 21, 2025

it seams problem with Mac OS X bacalhau compute node, but If i run it on linux compute and orchestrator "hello world" docker works well.

I have intel chip on Mac OS. maybe there are some miss in the installation script curl -sL https://get.bacalhau.org/install.sh | bash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected
Projects
None yet
Development

No branches or pull requests

1 participant