Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging and tracing design and initial implementation for authorities #620

Closed
velvia opened this issue Mar 2, 2022 · 6 comments
Closed
Assignees
Labels
Type: Documentation Improvements or additions to documentation

Comments

@velvia
Copy link
Contributor

velvia commented Mar 2, 2022

The goal is to have a design and initial implementation for logging and tracing semantics esp at authorities, but also any client / gateway code. The goals are a system that

  • makes tracing flow of any one transaction easy
  • makes it easy to derive good insights about what is happening
  • allows for observation using open source, widely used tools
  • makes it easy to debug a distributed, asynchronous system

The outcome would involve docs and initial code that uses proper context, scoping, keywords in order to achieve the above.

This is for an initial implementation. Subsequent tickets would be created that actualize being able to connect logs traces and metrics to external entities.

@velvia velvia self-assigned this Mar 2, 2022
@velvia velvia added Design Task Type: Documentation Improvements or additions to documentation labels Mar 2, 2022
@velvia velvia added this to the Post-GDC milestone Mar 2, 2022
@velvia
Copy link
Contributor Author

velvia commented Mar 3, 2022

Example of structured logging/tracing output:

2022-03-03T18:44:13.547825Z DEBUG test_native_transfer:process_tx{tx_digest=t#82fd2ee00ab4d23c498369ff4d6fc0fc1a74be6f56a6d0490022e0920577f4c7}: sui_core::authority_aggregator: Broadcasting transaction request to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s
2022-03-03T18:44:13.547869Z TRACE test_native_transfer:process_tx{tx_digest=t#82fd2ee00ab4d23c498369ff4d6fc0fc1a74be6f56a6d0490022e0920577f4c7}: sui_core::authority_aggregator: Transaction data: TransactionData { kind: Transfer(Transfer { recipient: k#be362f958600c6a2e8a5e9095e8f93c7eae325d9, object_ref: (576D52D6F34E52664105F74551A6022D96243090, SequenceNumber(0), o#6322b2b021badb66439b206114c5942a844e84be79719b2baa79448392eee16e) }), sender: k#81ce559e70da90db1faf09082bf5ff82e33cfd7d, gas_payment: (45A140DD1FD625B040A1F69650F5ECBA297229B5, SequenceNumber(0), o#c0a1c8300cdd8af7cf360c7acb87724c391a84fe91bb5433f205ee1aff55acb7) }
2022-03-03T18:44:13.553143Z DEBUG test_package_publish_command:process_tx{tx_digest=t#76ba2694f70f4d8d4b0874f4997040fdb7aad6aadfff9de3be4fe746deea8ecf}: sui_core::authority_aggregator: Received signatures response from authorities for transaction req broadcast num_errors=0 good_stake=3 bad_stake=0 num_signatures=3 has_certificate=true
2022-03-03T18:44:13.553223Z DEBUG test_package_publish_command:process_cert{tx_digest=t#76ba2694f70f4d8d4b0874f4997040fdb7aad6aadfff9de3be4fe746deea8ecf}: sui_core::authority_aggregator: Broadcasting certificate to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s
2022-03-03T18:44:13.555484Z DEBUG test_native_transfer:process_tx{tx_digest=t#82fd2ee00ab4d23c498369ff4d6fc0fc1a74be6f56a6d0490022e0920577f4c7}: sui_core::authority_aggregator: Received signatures response from authorities for transaction req broadcast num_errors=0 good_stake=3 bad_stake=0 num_signatures=3 has_certificate=true
2022-03-03T18:44:13.555544Z DEBUG test_native_transfer:process_cert{tx_digest=t#82fd2ee00ab4d23c498369ff4d6fc0fc1a74be6f56a6d0490022e0920577f4c7}: sui_core::authority_aggregator: Broadcasting certificate to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s

Things to note:

  • The above is traditional logging where everything is serialized to a string, but the important thing is that the structure in the logs can be sent to many different types of backends in a structured way, and more advanced ones can take advantage of the structure
  • The context is in brackets and we can see the tx digest number in the different steps of the transaction protocol. This will allow us to trace a single request and its flow.
  • At the end of each log line there are key-value pairs. These can enable smart filtering, even alerts

@velvia
Copy link
Contributor Author

velvia commented Mar 3, 2022

https://github.com/timberio/vector

^^ A low-profile logging/metrics/etc sidecar that might be super useful for us to connect to various observability backends

@akichidis
Copy link
Contributor

Super useful and interested to see the design of this! We'll definitely take inspiration for Narwhal as well

@velvia
Copy link
Contributor Author

velvia commented Mar 7, 2022

Noting this down as useful: notes about setting up tracing with subscribers and Bunyan etc. to prettify JSON tracing output

https://www.lpalmieri.com/posts/2020-09-27-zero-to-production-4-are-we-observable-yet/#5-structured-logging

@velvia
Copy link
Contributor Author

velvia commented Mar 8, 2022

Updated log output from integration tests showing tracing a single transaction across both gateway and authority:

7ab7774d1f7bd40848}: sui_core::authority_aggregator: Broadcasting transaction request to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s
2022-03-05T01:35:03.383791Z TRACE test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority_aggregator: Transaction data: TransactionData { kind: Call(MoveCall { package: (0000000000000000000000000000000000000002, SequenceNumber(1), o#3104eb8786a94f58d88564c38e22f13d79e3868c5cf81c9c9228fe91465eccad), module: Identifier("ObjectBasics"), function: Identifier("transfer"), type_arguments: [], object_arguments: [(DA40C299F382CBC3C1EBEEA97351F5F185BAD359, SequenceNumber(1), o#d299113b3b52fd1b9dc01e3ba9cf70345faed592af04a56e287057f166ed2783)], shared_object_arguments: [], pure_arguments: [[145, 123, 205, 38, 175, 158, 193, 63, 122, 56, 238, 127, 139, 117, 186, 164, 89, 46, 222, 252]], gas_budget: 1000 }), sender: k#37ebb9c16574a57bcc7b52a6312a35991748be55, gas_payment: (3EE0283D2D12D5C49D0E4E2F509D07227A64ADF2, SequenceNumber(1), o#3ad1a71ee65e8e6675e6a0fb1e893e48c1820b274d3055d75f4abb850c9663e5) }
2022-03-05T01:35:03.385294Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=2
2022-03-05T01:35:03.386500Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=2
2022-03-05T01:35:03.387681Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority_aggregator: Received signatures response from authorities for transaction req broadcast num_errors=0 good_stake=3 bad_stake=0 num_signatures=3 has_certificate=true
2022-03-05T01:35:03.391891Z DEBUG test_move_call_args_linter_command:process_cert{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority_aggregator: Broadcasting certificate to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s
2022-03-05T01:35:03.394529Z DEBUG test_move_call_args_linter_command:process_cert{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Read inputs for transaction from DB num_inputs=3
2022-03-05T01:35:03.395917Z DEBUG test_move_call_args_linter_command:process_cert{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Finished execution of transaction with status Success { gas_used: 7 } gas_used=7

velvia added a commit that referenced this issue Mar 9, 2022
…ent) (#706)

* Initial observability doc and example of context/structured logging in authority_aggregator

* Start logging tx / cert processing in authority.rs

* Fix tracing_subscriber initialization to respect RUST_LOG environment var

* Add note about enabling logs

* Add JSON span logging output by setting SUI_JSON_SPAN_LOGS=1

* Add tx_kind to both gateway side and authority side transaction tags

* Cargo clippy

* CR feedback esp on docs

* Change authors in build files

* Add to observability doc
@velvia
Copy link
Contributor Author

velvia commented Mar 15, 2022

PR has been merged, going to file follow on issues:

  • distributed tracing
  • enabling various parts of logging infra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants