Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to latest version of ouroboros-network and other deps #569

Merged
merged 12 commits into from
Feb 17, 2020

Conversation

dcoutts
Copy link
Contributor

@dcoutts dcoutts commented Feb 13, 2020

Most code changes due to ouroboros-network and cardano-ledger

Closes #566

@dcoutts dcoutts requested a review from Jimbo4350 February 13, 2020 16:28
@dcoutts dcoutts force-pushed the dcoutts/update-dependencies branch from c3d7be5 to 65622a2 Compare February 13, 2020 17:01
@intricate intricate self-assigned this Feb 13, 2020
@intricate intricate mentioned this pull request Feb 13, 2020
@intricate intricate force-pushed the dcoutts/update-dependencies branch 2 times, most recently from 2317874 to cd9ef4d Compare February 13, 2020 21:05
@intricate intricate changed the title Update to latest version of ouboros-network and other deps Update to latest version of ouroboros-network and other deps Feb 13, 2020
@intricate intricate requested a review from mrBliss February 13, 2020 21:09
Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, there's a problem that arises as a result of the changes in this PR.

I'm running the 3 node cluster that's used for benchmarking (benchmarking/cluster3nodes/) and, for some reason, the nodes aren't exchanging transactions with one another.

The local transaction submission protocol seems to be working as node-0 is indeed receiving the initial transactions submitted by the transaction generator and adding them to the mempool, but there seems to be an issue with the node-to-node transaction submission protocol since none of the other nodes ever receive any transactions.

I can confirm that everything works as expected prior to updating the dependencies. I'm currently doing some debugging to see if I can determine what's going on.

@erikd
Copy link
Contributor

erikd commented Feb 14, 2020

@intricate That is a really interesting side effect. Ideally we would figure out what went wrong here and add a test to the dependency that would catch this error in the dependency's test suite before it bubbles up into the node.

@intricate
Copy link
Contributor

intricate commented Feb 14, 2020

On this branch, it appears that I'm only seeing 2 mux-layer trace events related to MiniProtocolNum 4 (the node-to-node transaction submission protocol) and they're both MuxTraceChannelRecvStart events related to each of the other nodes in the cluster:

...
{"at":"2020-02-14T03:58:49.57Z","env":"1.5.0:cd9ef","ns":["cardano","node","Mux"],"data":{"event":"Channel Receive Start on MiniProtocolNum 4","kind":"MuxTrace","bearer":"ConnectionId {localAddress = 127.0.0.1:XXXXX, remoteAddress = 127.0.0.1:3002}"},...}
...
{"at":"2020-02-14T03:59:09.33Z","env":"1.5.0:cd9ef","ns":["cardano","node","Mux"],"data":{"event":"Channel Receive Start on MiniProtocolNum 4","kind":"MuxTrace","bearer":"ConnectionId {localAddress = 127.0.0.1:XXXXX, remoteAddress = 127.0.0.1:3001}"},...}
...

In Network.Mux:

    recv :: m (Maybe BL.ByteString)
    recv = do
        -- We receive CBOR encoded messages as ByteStrings (possibly partial) from the
        -- matching ingress queueu. This is the same queue the 'demux' thread writes to.
        traceWith tracer $ MuxTraceChannelRecvStart mc
        blob <- atomically $ do
            blob <- readTVar q
            if blob == BL.empty
                then retry
                else writeTVar q BL.empty >> return blob
        -- say $ printf "recv mid %s mode %s blob len %d" (show mid) (show md) (BL.length blob)
        traceWith tracer $ MuxTraceChannelRecvEnd mc blob
        return $ Just blob

After a successful recv, we should also trace a MuxTraceChannelRecvEnd. However, that never ends up happening for the node-to-node tx submission protocol. Could we be indefinitely blocking in that STM transaction?

@karknu
Copy link
Contributor

karknu commented Feb 14, 2020

Could we be indefinitely blocking in that STM transaction?

Yes, that thread will block forever if the peer never sends any data on the corresponding miniprotocol.

@intricate
Copy link
Contributor

In an effort to track down the problem, I've been testing out the 3-node cluster benchmarking scripts with builds of cardano-node based on master, but depending on different revisions of ouroboros-network.

IntersectMBO/ouroboros-network@a43961c

Node-to-node transaction submission works fine with the 3-node cluster for my build based on cardano-node master (f1a5b69d205599ad99c07f19ed9d6cf50575564a) with the following dependencies:

  • ouroboros-network@a43961c1133b9e1b0826f3d4d1bc5b93819037d6
  • cardano-base@965b8b143632faea16680a195e6de57091382700

To reproduce, see branch intricate/upd-deps-a43961c.

IntersectMBO/ouroboros-network@c1b9ba3

We are able to reproduce the same node-to-node transaction submission problem given a build based on cardano-node master with the following dependencies:

  • ouroboros-network@c1b9ba34ee2eac870a10e5194a9e2c85ddf595ec
  • cardano-base@965b8b143632faea16680a195e6de57091382700

To reproduce, see branch intricate/upd-deps-c1b9ba3.

Thoughts

As a result, I'm thinking that the problem has been introduced in ouroboros-network somewhere within the range a43961c1133b9e1b0826f3d4d1bc5b93819037d6..c1b9ba34ee2eac870a10e5194a9e2c85ddf595ec.

At a glance, I'm suspecting that the changes introduced by IntersectMBO/ouroboros-network#1536 could be the culprit, but I haven't looked at that PR very closely yet.

@karknu
Copy link
Contributor

karknu commented Feb 14, 2020

I'm looking into this.

@Jimbo4350
Copy link
Contributor

Jimbo4350 commented Feb 14, 2020

Side note: I've tested ./scripts/shelley-testnet-live.sh and transaction submission successfully
on f4e5cc6afcaead00f15a4f6560becc07c6f6dee2

[Mempool:Info:155] {"kind":"TraceMempoolAddedTx","txAdded":"tx: Tx 6875a2f9e27d625b with inputs [TxInUtxo a8386f9e #0], outputs: [TxOut 863000000000000 lovelace -> 2cWKMJemoBahs2gd4sQX1XgSLVLyAMrtNy2qgp4t2BYjBFygD3dLo4ejXC4p3C3pnc6qd]\nwitnesses: [\n    VKWitness: key = pub:3ec9c54e, key hash = f0f85d06, sig = <signature>\n]","mempoolSize":"MempoolSize {msNumTxs = 1, msNumBytes = 249}"}
[LocalTxSubmissionProtocol:Debug:155] TraceLabelPeer (ConnectionId {localAddress = /home/jordan/Repos/Work/userepotool/cardano-node/socket/0, remoteAddress = }) Send MsgAcceptTx
[LocalTxSubmissionProtocol:Debug:155] TraceLabelPeer (ConnectionId {localAddress = /home/jordan/Repos/Work/userepotool/cardano-node/socket/0, remoteAddress = }) Recv MsgDone

@intricate
Copy link
Contributor

intricate commented Feb 14, 2020

FYI: I have tested ./scripts/shelley-testnet-live.sh and transaction submission successfully
on f4e5cc6afcaead00f15a4f6560becc07c6f6dee2

[Mempool:Info:155] {"kind":"TraceMempoolAddedTx","txAdded":"tx: Tx 6875a2f9e27d625b with inputs [TxInUtxo a8386f9e #0], outputs: [TxOut 863000000000000 lovelace -> 2cWKMJemoBahs2gd4sQX1XgSLVLyAMrtNy2qgp4t2BYjBFygD3dLo4ejXC4p3C3pnc6qd]\nwitnesses: [\n    VKWitness: key = pub:3ec9c54e, key hash = f0f85d06, sig = <signature>\n]","mempoolSize":"MempoolSize {msNumTxs = 1, msNumBytes = 249}"}
[LocalTxSubmissionProtocol:Debug:155] TraceLabelPeer (ConnectionId {localAddress = /home/jordan/Repos/Work/userepotool/cardano-node/socket/0, remoteAddress = }) Send MsgAcceptTx
[LocalTxSubmissionProtocol:Debug:155] TraceLabelPeer (ConnectionId {localAddress = /home/jordan/Repos/Work/userepotool/cardano-node/socket/0, remoteAddress = }) Recv MsgDone

Right, but those logs are related to the LocalTxSubmissionProtocol which I've observed does work fine, but I'm currently experiencing issues with the node-to-node tx submission protocol.

@dcoutts dcoutts force-pushed the dcoutts/update-dependencies branch from 3080e30 to 934e0e6 Compare February 16, 2020 22:25
@dcoutts dcoutts requested a review from intricate February 16, 2020 22:53
@dcoutts dcoutts dismissed intricate’s stale review February 16, 2020 22:54

The fix is in, it now needs verifying.

@intricate intricate force-pushed the dcoutts/update-dependencies branch from 805df21 to 2caface Compare February 17, 2020 15:00
@intricate intricate force-pushed the dcoutts/update-dependencies branch from 2caface to e88e3e1 Compare February 17, 2020 15:01
Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯

The issue looks like it's been resolved. Thanks @karknu @dcoutts

Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Forgot to address @mrBliss review.

Chairman --socket-path argument can be given multiple times.  It
indicates to which unix sockets to connect to (and check for consensus).

Chairman now identifies nodes by `SocketPath`, `CoreNodeId` is
not used now, maybe it can be removed from `ourobors-consensus` as well.

Nix scripts are not updated, and need to accomodate for chairman CLI
changes.
Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, LGTM for real this time.

@CodiePP
Copy link
Contributor

CodiePP commented Feb 17, 2020

ran cluster3nodes bm with tx-gen on top of this branch: looks fine
image

Copy link
Contributor

@CodiePP CodiePP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine for me

@intricate
Copy link
Contributor

Just awaiting a CI job fix from @deepfire (#577) and then we can merge.

Copy link
Contributor

@Jimbo4350 Jimbo4350 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just awaiting the chairman fix.

iohk-bors bot added a commit that referenced this pull request Feb 17, 2020
569: Update to latest version of ouroboros-network and other deps r=deepfire a=dcoutts

Most code changes due to ouroboros-network and cardano-ledger

Closes #566 

Co-authored-by: Duncan Coutts <[email protected]>
Co-authored-by: Luke Nadur <[email protected]>
Co-authored-by: Samuel Leathers <[email protected]>
Co-authored-by: Jordan Millar <[email protected]>
Co-authored-by: Marcin Szamotulski <[email protected]>
Co-authored-by: Kosyrev Serge <[email protected]>
@deepfire
Copy link
Contributor

bors r+
..as agreed with Duncan.

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Feb 17, 2020

Already running a review

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Feb 17, 2020

@iohk-bors iohk-bors bot merged commit a376106 into master Feb 17, 2020
@iohk-bors iohk-bors bot deleted the dcoutts/update-dependencies branch February 17, 2020 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update for latest consensus
10 participants