Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panics on darwin arm #803

Closed
4 tasks
mangas opened this issue Apr 5, 2021 · 23 comments
Closed
4 tasks

panics on darwin arm #803

mangas opened this issue Apr 5, 2021 · 23 comments

Comments

@mangas
Copy link

mangas commented Apr 5, 2021

Summary of Bug

Following the docs on mac arm (apple m1) causes a panic. The same instructions work on amd64.

I'm happy to help look into it if anyone is available to pair on it, I'll have a look in meantime but some guidance would appreciated.

starting ABCI with Tendermint
unexpected fault address 0x5f6c61697469a2bd
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x5f6c61697469a2bd pc=0x1036a6eb0]

Version

v4.2.0

Steps to Reproduce

git clone -b v4.2.0 https://github.com/cosmos/gaia
cd gaia
make install
gaiad init chooseanicehandle
wget https://github.com/cosmos/mainnet/raw/master/genesis.cosmoshub-4.json.gz
gzip -d genesis.cosmoshub-4.json.gz
mv genesis.cosmoshub-4.json ~/.gaia/config/genesis.json
gaiad start --p2p.seeds bf8328b66dceb4987e5cd94430af66045e59899f@public-seed.cosmos.vitwit.com:26656,[email protected]:26656,[email protected]:26656,ba3bacc714817218562f743178228f23678b2873@public-seed-node.cosmoshub.certus.one:26656,[email protected]:26656 --x-crisis-skip-assert-invariants

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@gsora
Copy link
Contributor

gsora commented Apr 5, 2021

I wonder if Tendermint makes some running architecture assumptions...

What Go version are you compiling with?

Could you post the complete stack trace?

@faddat
Copy link
Contributor

faddat commented Apr 6, 2021

I do not have an M1 yet but have run a bunch of chains on arm64 Linux.

Arm64 Linux works quite well and so I bet that it is an issue with macos.

If you sent me a DM on twitter, I would be super happy to do a google meet + screenshare.

@mangas

I've never seen anything like that segfault before. Interesting.

@tac0turtle
Copy link
Member

I have a m1, I run Tendermint all the time on my machine for testing. Will test gaia shortly

@tac0turtle
Copy link
Member

tac0turtle commented Apr 6, 2021

I used two separate config files. One that was previously generated and one that was newly generated the old one works and the new comes with the same errors. @shahankhatch was there any sort of change that could cause this?

@mangas
Copy link
Author

mangas commented Apr 6, 2021

@marbar3778 could you post the files? I'm specifically interested in the database backend and possibly the indexer.

@mangas
Copy link
Author

mangas commented Apr 6, 2021

It works when I compile with the command below and set the database to badgerdb:

This is the default command generated by make build and I added badgerdb to the tags

go build -mod=readonly -tags "netgo ledger badgerdb" -ldflags '-X github.com/cosmos/cosmos-sdk/version.Name=gaia -X github.com/cosmos/cosmos-sdk/version.AppName=gaiad -X github.com/cosmos/cosmos-sdk/version.Version=v4.2.0 -X github.com/cosmos/cosmos-sdk/version.Commit=535be14a8bdbfeb0d950914b5baa2dc72c6b081c -X "github.com/cosmos/cosmos-sdk/version.BuildTags=netgo,ledger" -X github.com/tendermint/tendermint/version.TMCoreSemVer=v0.34.8 -w -s' -trimpath -o /Users/fa/git/gaia/build/ ./...

This is off tag v4.2.0.

@shahankhatch
Copy link
Contributor

@marbar3778 Aside from sdk/tm version bumps, and some Makefile, doc and Docker updates, Gaia itself hasn't had changes that seem to impact building on an M1.

@faddat
Copy link
Contributor

faddat commented May 11, 2021

@shahankhatch correct.

It builds just fine, but well, I wanted to confirm and I love toys so I picked up one of these M1 MBP's.

Can fully confirm @mangas issue.

10:48AM INF Starting EventBus service impl=EventBus module=events
10:48AM INF Starting PubSub service impl=PubSub module=pubsub
unexpected fault address 0x5f6c61697469a2bd
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x5f6c61697469a2bd pc=0x104923010]

goroutine 26 [running]:
runtime.throw(0x10523fd01, 0x5)
        runtime/panic.go:1117 +0x54 fp=0x1400054d5a0 sp=0x1400054d570 pc=0x104237a74

@faddat
Copy link
Contributor

faddat commented May 11, 2021

@mangas I don't know the "why" of this, but build it with the badgerdb flag and it should run fine.

PS:

Actually it runs fine for about 5 minutes, and then it crashes in a different way.

@cyborgshead
Copy link

@faddat I have the same issue on M1, also on Nvidia Jetson AGX Xavier and Jetson Nano. Trace

This happens when trying to sync from other nodes in the network that already launched (e.g not local setup). Also tried with badgerdb - same issue

@faddat
Copy link
Contributor

faddat commented May 12, 2021

Thank you, it's helpful to know that we now have three confirmed.

Rpi works great... So now I'm very curious why the Jetson does not work.

@faddat
Copy link
Contributor

faddat commented May 13, 2021

@litvintech Do you have a raspberry pie model 4?

I think you would find that works on the RPI but not M1 ...which is weird...

@cyborgshead
Copy link

@faddat I think that RPi is too slow compared to others and that's why the chain is not halted. On M1 chain stop sync in less than 30 seconds, on Xavier it stops in around 1 min and a couple of minutes on Nano. (approx time as I remembered as tested 2 months ago)

Anyway, I cannot understand is it a general problem cause there is not enough reports. Does nobody run gaia and their chains on ARMs?

@faddat
Copy link
Contributor

faddat commented May 14, 2021

I have successfully run the latest gaia on ARM (raspi) but I moved my office and don't have one up on the latest version. Trying on my AWS graviton node now.

@faddat
Copy link
Contributor

faddat commented May 14, 2021

It doesn't work on my graviton either. As soon as I have a raspberry pie up I will test that but I believe that we are dealing with an ecosystem wide regression that prevents us from running on ARM CPUs

@faddat
Copy link
Contributor

faddat commented May 14, 2021

@faddat
Copy link
Contributor

faddat commented May 14, 2021

ubuntu@ip-172-31-63-80:~/gaia/cmd/gaiad$ ./gaiad start --p2p.seeds bf8328b66dceb4987e5cd94430af66045e59899f@public-seed.cosmos.vitwit.com:26656,[email protected]:26656,[email protected]:26656,ba3bacc714817218562f743178228f23678b2873@public-seed-node.cosmoshub.certus.one:26656,[email protected]:26656,[email protected]:26656 --x-crisis-skip-assert-invariants
11:09AM INF starting ABCI with Tendermint
11:09AM INF Starting multiAppConn service impl=multiAppConn module=proxy
11:09AM INF Starting localClient service connection=query impl=localClient module=abci-client
11:09AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
11:09AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
11:09AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
11:09AM INF Starting EventBus service impl=EventBus module=events
11:09AM INF Starting PubSub service impl=PubSub module=pubsub
11:09AM INF Starting IndexerService service impl=IndexerService module=txindex
11:09AM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=
11:09AM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0

4.1.0 may resolve it, I know I've run the most recent gaia on pi

✔️ 4.1.0 works on arm64, bet it also works on M1.

@litvintech random thought: move your libs back to the cosmos-sdk version in 4.1.0, bet that works.

@cyborgshead
Copy link

cyborgshead commented May 14, 2021

@faddat I tested with the current dev branch of cyber that was upgraded to cosmos-sdk v42 (same as gaia 4.1.0). Will check again soon as will finish the upgrade to cosmos-sdk v43

@faddat
Copy link
Contributor

faddat commented May 14, 2021

That's interesting, thank you. I'll track what other libraries and stuff changed as I do so enjoy running my chains on arm.

@tac0turtle
Copy link
Member

fixed the issue here cosmos/cosmos-sdk#9345

@tac0turtle
Copy link
Member

this is fixed with https://github.com/cosmos/cosmos-sdk/releases/tag/v0.42.5. Gaia needs a release to fix it

@shahankhatch shahankhatch mentioned this issue May 25, 2021
6 tasks
@faddat
Copy link
Contributor

faddat commented Jul 30, 2021

Resolved and safe to close.

@faddat
Copy link
Contributor

faddat commented Dec 11, 2021

bump re: safe to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants