Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#620: Add structured logging to basic transaction flow (Authority/Client) #706

Merged
merged 12 commits into from
Mar 9, 2022
Merged

#620: Add structured logging to basic transaction flow (Authority/Client) #706

merged 12 commits into from
Mar 9, 2022

Conversation

velvia
Copy link
Contributor

@velvia velvia commented Mar 8, 2022

Initial PR for #620
The doc explains the goals - using structured logging so we can have proper context for entire transaction flows and be able to easily trace a transaction across processes, components, and async threads. Plus using proper key-value tagging to be able to easily filter logs.

Example of logging showing tracing a single TX across authority_aggregator on gateway/client side and authority:

7ab7774d1f7bd40848}: sui_core::authority_aggregator: Broadcasting transaction request to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s
2022-03-05T01:35:03.383791Z TRACE test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority_aggregator: Transaction data: TransactionData { kind: Call(MoveCall { package: (0000000000000000000000000000000000000002, SequenceNumber(1), o#3104eb8786a94f58d88564c38e22f13d79e3868c5cf81c9c9228fe91465eccad), module: Identifier("ObjectBasics"), function: Identifier("transfer"), type_arguments: [], object_arguments: [(DA40C299F382CBC3C1EBEEA97351F5F185BAD359, SequenceNumber(1), o#d299113b3b52fd1b9dc01e3ba9cf70345faed592af04a56e287057f166ed2783)], shared_object_arguments: [], pure_arguments: [[145, 123, 205, 38, 175, 158, 193, 63, 122, 56, 238, 127, 139, 117, 186, 164, 89, 46, 222, 252]], gas_budget: 1000 }), sender: k#37ebb9c16574a57bcc7b52a6312a35991748be55, gas_payment: (3EE0283D2D12D5C49D0E4E2F509D07227A64ADF2, SequenceNumber(1), o#3ad1a71ee65e8e6675e6a0fb1e893e48c1820b274d3055d75f4abb850c9663e5) }
2022-03-05T01:35:03.385294Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=2
2022-03-05T01:35:03.386500Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=2
2022-03-05T01:35:03.387681Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=2
2022-03-05T01:35:03.388835Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Checked locks and found mutable objects num_mutable_objects=2
2022-03-05T01:35:03.391841Z DEBUG test_move_call_args_linter_command:process_tx{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority_aggregator: Received signatures response from authorities for transaction req broadcast num_errors=0 good_stake=3 bad_stake=0 num_signatures=3 has_certificate=true
2022-03-05T01:35:03.391891Z DEBUG test_move_call_args_linter_command:process_cert{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority_aggregator: Broadcasting certificate to authorities quorum_threshold=3 validity_threshold=2 timeout_after_quorum=60s
2022-03-05T01:35:03.394529Z DEBUG test_move_call_args_linter_command:process_cert{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Read inputs for transaction from DB num_inputs=3
2022-03-05T01:35:03.395917Z DEBUG test_move_call_args_linter_command:process_cert{tx_digest=t#7e5f08ab09ec80e3372c101c5858c96965a25326c21af27ab7774d1f7bd40848}: sui_core::authority: Finished execution of transaction with status Success { gas_used: 7 } gas_used=7

Log initialization is fixed to take in log levels from the environment RUST_LOG variable.

This is just a start - much more to come, but I think it gives everyone the basic idea and template to do more. TODOs (either this PR or subsequent ones):

  • Logging of spans, one should be able to see the time spans took
  • Automatic logging of expects and unwraps with context
  • more detailed logging everywhere

Copy link
Contributor

@Clay-Mysten Clay-Mysten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs look great, Evan! I left some optional suggestions. Otherwise, LGTM!

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, but the doc part is meant to be internal developer doc, i.e. it's for people developing Sui itself rather than on Sui. Is docs the right place?

/cc @Clay-Mysten

@Clay-Mysten
Copy link
Contributor

Hi Francois,

I ran this past Sam, and he thought it appropriate for the Contribute section of the site aimed at folks looking to augment Sui. I updated our Dev Portal sitemap outline to reflect this:
https://docs.google.com/document/d/1izze8way6Z6-NiTabJADowoUECnMIl4c8KKNoCyL1fA/edit#

Let me know if that doesn't make sense to you.

@velvia
Copy link
Contributor Author

velvia commented Mar 8, 2022

Thanks for the feedback @Clay-Mysten !
As for where docs go, will leave suggestions in your good hands.

Also just added ability to dump span events (start, end, duration) as JSON. Sample output:

{"v":0,"name":"sui","msg":"[PROCESS_CERT - START]","level":20,"hostname":"Evan-MLbook.lan","pid":51425,"time":"2022-03-08T22:48:11.241421Z","target":"sui_core::authority_server","line":67,"file":"sui_core/src/authority_server.rs","tx_digest":"t#d1385064287c2ad67e4019dd118d487a39ca91a40e0fd8e678dbc32e112a1493"}
{"v":0,"name":"sui","msg":"[PROCESS_CERT - EVENT] Read inputs for transaction from DB","level":20,"hostname":"Evan-MLbook.lan","pid":51425,"time":"2022-03-08T22:48:11.246688Z","target":"sui_core::authority","line":393,"file":"sui_core/src/authority.rs","num_inputs":2,"tx_digest":"t#d1385064287c2ad67e4019dd118d487a39ca91a40e0fd8e678dbc32e112a1493"}
{"v":0,"name":"sui","msg":"[PROCESS_CERT - EVENT] Finished execution of transaction with status Success { gas_used: 18 }","level":20,"hostname":"Evan-MLbook.lan","pid":51425,"time":"2022-03-08T22:48:11.246759Z","target":"sui_core::authority","line":409,"file":"sui_core/src/authority.rs","gas_used":18,"tx_digest":"t#d1385064287c2ad67e4019dd118d487a39ca91a40e0fd8e678dbc32e112a1493"}
{"v":0,"name":"sui","msg":"[DB_UPDATE_STATE - START]","level":20,"hostname":"Evan-MLbook.lan","pid":51425,"time":"2022-03-08T22:48:11.247888Z","target":"sui_core::authority","line":430,"file":"sui_core/src/authority.rs","tx_digest":"t#d1385064287c2ad67e4019dd118d487a39ca91a40e0fd8e678dbc32e112a1493"}
{"v":0,"name":"sui","msg":"[DB_UPDATE_STATE - END]","level":20,"hostname":"Evan-MLbook.lan","pid":51425,"time":"2022-03-08T22:48:11.248114Z","target":"sui_core::authority","line":430,"file":"sui_core/src/authority.rs","tx_digest":"t#d1385064287c2ad67e4019dd118d487a39ca91a40e0fd8e678dbc32e112a1493","elapsed_milliseconds":0}
{"v":0,"name":"sui","msg":"[PROCESS_CERT - END]","level":20,"hostname":"Evan-MLbook.lan","pid":51425,"time":"2022-03-08T22:48:11.248688Z","target":"sui_core::authority_server","line":67,"file":"sui_core/src/authority_server.rs","tx_digest":"t#d1385064287c2ad67e4019dd118d487a39ca91a40e0fd8e678dbc32e112a1493","elapsed_milliseconds":2}

@sblackshear
Copy link
Collaborator

This looks great, but the doc part is meant to be internal developer doc, i.e. it's for people developing Sui itself rather than on Sui. Is docs the right place?

/cc @Clay-Mysten

I agree with not co-mingling these with external developer docs. Maybe a dev-docs or contributor-docs folder would make sense?

Copy link
Collaborator

@gdanezis gdanezis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this! Probably worth also adding a note on how practically one can run an authority and observe its logs (or are they just printing on stderr by default?)

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super dope! I would pay good money to sit down and watch a demo of this.

Here's the main missing things to me:

  • I think the doc section on ## Ways to View Logs, Traces, Metrics should be expanded.
  • I don't think the markdown should live in our dev portal, except perhaps in a "for Sui core developers" section (/cc @Clay-Mysten ).

@velvia
Copy link
Contributor Author

velvia commented Mar 9, 2022

@Clay-Mysten I've moved the doc to a new dev-docs folder.

Thanks everyone for the feedback. I'm planning to restructure the "ways to view logs, traces, etc." section and write some stuff on the basic architecture of it, but leave the actual implementation of subscriber/viewing options to a later PR, in favour of getting this in faster.

@velvia velvia merged commit 18568c9 into MystenLabs:main Mar 9, 2022
@velvia velvia deleted the ec/initial-logging-pr branch March 9, 2022 23:35
@patrickkuo
Copy link
Contributor

Just noticed ./sui genesis and ./sui start no longer prints logs after this change.... @velvia can you take a look?

@patrickkuo
Copy link
Contributor

Just noticed ./sui genesis and ./sui start no longer prints logs after this change.... @velvia can you take a look?

Ok I just saw this in the doc

NOTE: For logs, set RUST_LOG=debug before invoking ./sui start.

most of the output of sui start and genesis are informative and are info level logging, do you think we should print INFO logs by default?

@velvia
Copy link
Contributor Author

velvia commented Mar 11, 2022

@patrickkuo Let me fix it in an upcoming PR, am looking into it

@huitseeker huitseeker mentioned this pull request Apr 15, 2022
mwtian pushed a commit that referenced this pull request Sep 12, 2022
Context:
The `BatchLoader` is establishing a primary->worker communication using a public address, which is adding latency and competing with outbound traffic.
The proper fix us to use the `BlockWaiter`.

The issue:
We would like a mitigation to be deployed and effective sooner.

The fix:
We inspect the worker addresses used by the `BatchLoader`, and rewrite their hostname to localhost when it matches the local primary.
mwtian pushed a commit that referenced this pull request Sep 12, 2022
)

We operate an executor with a bound on the concurrent number of messages (see #463, #559, #706).
PR #472 added logging for the bound being hit.

We expect the executors to operate for a long time at this limit (e.g. in recovery situation).
The spammy logging is not usfeful

This removes the logging of the concurrency bound being hit.
Fixes #759
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
Context:
The `BatchLoader` is establishing a primary->worker communication using a public address, which is adding latency and competing with outbound traffic.
The proper fix us to use the `BlockWaiter`.

The issue:
We would like a mitigation to be deployed and effective sooner.

The fix:
We inspect the worker addresses used by the `BatchLoader`, and rewrite their hostname to localhost when it matches the local primary.
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
…ystenLabs#763)

We operate an executor with a bound on the concurrent number of messages (see MystenLabs#463, MystenLabs#559, MystenLabs#706).
PR MystenLabs#472 added logging for the bound being hit.

We expect the executors to operate for a long time at this limit (e.g. in recovery situation).
The spammy logging is not usfeful

This removes the logging of the concurrency bound being hit.
Fixes MystenLabs#759
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants