Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry_ack response: implemenation of fields "uptime, peercount, protocol_version" does not match documentation #3490

Closed
dsiganos opened this issue Oct 4, 2021 · 13 comments
Assignees
Labels
documentation This item indicates the need for or supplies updated or expanded documentation

Comments

@dsiganos
Copy link
Contributor

dsiganos commented Oct 4, 2021

telemetry_ack response: "uptime" field contains non-changing data, which doesn't make sense

@dsiganos dsiganos added the bug label Oct 4, 2021
@zhyatt
Copy link
Collaborator

zhyatt commented Oct 4, 2021

@dsiganos So you are seeing the same value for uptime when receiving telemetry responses beyond the telemetry message cache window?

@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 4, 2021

@zhyatt Hmmm, what is a telemetry cache window? Does telemetry get sampled only at specific times and not whenever I issue telemetry request?

@zhyatt
Copy link
Collaborator

zhyatt commented Oct 4, 2021

@zhyatt Hmmm, what is a telemetry cache window? Does telemetry get sampled only at specific times and not whenever I issue telemetry request?

Yeah, there is a cache window to reduce impact of repeat requests. I believe it may be 15s or 30s. Here is an update with regards to that, not sure where in the code all of that sits exactly though: #2650.

@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 4, 2021

This is what I get for uptime for one of the beta servers with v23.0.0.2, which was restarted within the last hour:

ds@tux:~/nano/pynanocoin$ date; ./telemetry_req.py -b --peer [::ffff:167.172.215.52]:54000
Mon  4 Oct 19:26:12 BST 2021
connecting to ::ffff:167.172.215.52:54000
NetID: B, VerMaxUsingMin: 18/18/18, MsgType: 13(telemetry_ack), Extensions: 00CA
Signature: 1C5C804923D1C3564E5A1F8FB774C0E4C9CA94ABC4038382C10EF53D6AE5D6FB8101614C839DF090B9F1933927EF6563D0BF1DE8B7F812A0112D9B6D65FA870A
Node ID: 810D551A6ADEF9D49C266335E4E6B9AF2B83534D1B0092C8CDF1B73C39FE6204
         node_31afcnf8oqqstkg4ersowmmdmdsdifbnt8r1kd6euwfq9iwzwri6tm4wga9g
Block Count: 49563134
Cemented Count: 49563134
Unchecked Count: 0
Account Count: 3701408
Bandwidth Cap: 10485760
Uptime: 81906368512 s
Peer Count: 23
Protocol Version: 57
Genesis Hash: 01A92459E69440D5C1088D3B31F4CA678BE944BAB3776C2E6B7665E9BD99BD5A
Major Version: 23
Minor Version: 0
Patch Version: 0
Pre-release Version: 2
Maker Version: 0
Timestamp: 1633371973861 ms
Active Difficulty: 18446726481523507200 (0xfffff00000000000)

The uptime cannot possibly be representing seconds of uptime as the protocol says.
https://github.com/nanocurrency/protocol/blob/b94bb467bf5264603b8a1870524791279988783e/reference/protocol.ksy#L382

@dsiganos dsiganos changed the title telemetry_ack response: "uptime" field contains non-changing data, which doesn;t make sense telemetry_ack response: "uptime" field contains non-changing data, which doesn't make sense Oct 4, 2021
@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 4, 2021

Minutes later, iuptime is still showing the same value 81906368512:

ds@tux:~/nano/pynanocoin$ date; ./telemetry_req.py -b --peer [::ffff:167.172.215.52]:54000
Mon  4 Oct 19:30:29 BST 2021
connecting to ::ffff:167.172.215.52:54000
NetID: B, VerMaxUsingMin: 18/18/18, MsgType: 13(telemetry_ack), Extensions: 00CA
Signature: 2DFD7C509B68DC2C014F6623C6D7BD0BFC6061FDE74C7686010B63A11605CF81E91C069D21BE85B6F8E11523C906551A7172301C6AA3F4806BBB27E130D7FE00
Node ID: 810D551A6ADEF9D49C266335E4E6B9AF2B83534D1B0092C8CDF1B73C39FE6204
         node_31afcnf8oqqstkg4ersowmmdmdsdifbnt8r1kd6euwfq9iwzwri6tm4wga9g
Block Count: 49563134
Cemented Count: 49563134
Unchecked Count: 0
Account Count: 3701408
Bandwidth Cap: 10485760
Uptime: 81906368512 s
Peer Count: 24
Protocol Version: 57
Genesis Hash: 01A92459E69440D5C1088D3B31F4CA678BE944BAB3776C2E6B7665E9BD99BD5A
Major Version: 23
Minor Version: 0
Patch Version: 0
Pre-release Version: 2
Maker Version: 0
Timestamp: 1633372230276 ms
Active Difficulty: 18446726481523507200 (0xfffff00000000000)

@zhyatt
Copy link
Collaborator

zhyatt commented Oct 4, 2021

So the telemetry RPC documentation mentions the uptime field being number of seconds since the UTC epoch at the point where the response is sent from the peer which is in accurate for the RPC response, but may be accurate for the message on the wire (assuming this is the time the node last started up).

Locally I assume this means the telemetry number subtracted from the current time since epoch will be arrive at number of seconds the node has been up. Does this better line up with what you are seeing (ignoring the issue of returning old values in telemetry)? Currently that node you reference above is reporting ~7000s of uptime (it is NF Beta 1)

@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 4, 2021

I cannot even understand what the "number of seconds since the UTC epoch at the point where the response is sent from the peer" means exactly. In any case, what I see is not an increasing number so it cannot be anything that is tracking time. I have logged this here so I can move on with other more important things. These are not recent regressions, I have been aware of these issues for a at least 2-3 months but I just hadn't logged them yet because i wasnt sure they were real problems or misunderstanding on my part.

@zhyatt
Copy link
Collaborator

zhyatt commented Oct 4, 2021

I don't think I explained this clearly and the docs need updating, so I will make some additional notes here for future reference if they are useful:

  • The uptime number captured locally and sent via the telemetry may be an Epoch timestamp of when that node last restarted, I'm not 100% sure. When the local node processes the telemetry response, it could be taking the current Epoch timestamp and subtracting the one from the other node telemetry = number of seconds it has been up. Regardless, clearly the example you gave was not seconds or a timestamp so something else is going on.
  • When I call RPC telemetry with raw option there are varying uptime numbers in seconds across the nodes, and they do shift over time as one would expect with repeated requests. And the number returned for our beta PR 1 lines up with when I reset them for the DB2 update today, so it is getting accurate data into the node RPC responses.

@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 5, 2021

I found the problem:

write (stream_a, boost::endian::native_to_big (uptime));

The 3 telemetry_ack fields: uptime, peercount, protocol_version are written in the wrong order in the telemetry_ack packet data.
The order documented is:

  • uptime
  • peercount
  • protocol_version
    but the order implemented is:
  • peercount
  • protocol_version
  • uptime

That means that all 3 fields hold the wrong values (according to the protocol documentation).

The questions is, do we change the documentation (https://github.com/nanocurrency/protocol/blob/b94bb467bf5264603b8a1870524791279988783e/reference/protocol.ksy#L382) to reflect the implementation or do we change the implementation to reflect the documentation?

The source code fix is trivial.

@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 5, 2021

The plan is to change the documentation to match the implementation, to preserve backwards compatibility.

@dsiganos dsiganos self-assigned this Oct 5, 2021
@dsiganos dsiganos changed the title telemetry_ack response: "uptime" field contains non-changing data, which doesn't make sense telemetry_ack response: implemenation of fields "uptime, peercount, protocol_version" does not match documentation Oct 5, 2021
@cryptocode
Copy link
Contributor

Since the spec is currently derived from the implementation, it's very helpful if https://github.com/nanocurrency/protocol/issues gets a ticket whenever code is merged with wire format changes. Same with database changes in the nanodb-specification repo. Once v23 is released, I plan to generate a new parser for nanocap to verify the protocol documentation.

@zhyatt
Copy link
Collaborator

zhyatt commented Oct 5, 2021

There are some lagging items to resolve on the protocol side, I will make an attempt at resolving them, including this issue, which appears to have been around since the telemetry was introduced a couple years ago.

@zhyatt zhyatt assigned zhyatt and unassigned dsiganos Oct 5, 2021
@zhyatt zhyatt added documentation This item indicates the need for or supplies updated or expanded documentation and removed bug labels Oct 6, 2021
@dsiganos
Copy link
Contributor Author

dsiganos commented Oct 7, 2021

We fixed the documentation to reflect the implementation so this ticket can be closed now.

@dsiganos dsiganos closed this as completed Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This item indicates the need for or supplies updated or expanded documentation
Projects
None yet
Development

No branches or pull requests

3 participants