-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Reduce the stalled logging for snapshot #625
Conversation
pghoard/basebackup/delta.py
Outdated
@@ -99,6 +99,7 @@ def progress_callback(progress_step: ProgressStep, progress_data: ProgressMetric | |||
progress_step.value, progress_data["handled"], elapsed | |||
) | |||
else: | |||
self.last_flush_time = time.monotonic() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better at the outer if, no need to have it in each branch. Also can we improve the log messages to better reflect what is happening? Right now it is a bit confusing.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #625 +/- ##
==========================================
- Coverage 91.01% 90.70% -0.32%
==========================================
Files 31 31
Lines 4917 4957 +40
==========================================
+ Hits 4475 4496 +21
- Misses 442 461 +19
|
a9c631a
to
3933b55
Compare
status = "FAILED" if not result.success else "successfully" | ||
log_msg = f"{operation_type.capitalize()} of key: {key}, " \ | ||
log_msg = f"{oper.capitalize()} of key: {key}, " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reused oper and already contained the necessary info
self.metrics.gauge("pghoard.seconds_since_backup_progress_stalled", 0, tags=tags) | ||
self.log.info( | ||
"Updated snapshot progress for %s to %d files; elapsed time since last check: %.2f seconds.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It confuses operators, and because it doesn't log frequently, we might think it is stuck. Because of progress_data["handled"] value
3933b55
to
7e3d0bf
Compare
Reduce the stalled logging for the snapshot process. Previously, we were not resetting last_flush_time, resulting in a lot of logs being printed if the previous base backup failed and the current progress surpassed the previous base backup progress for the snapshot process
7e3d0bf
to
e1ac0bb
Compare
Reduce the stalled logging for the snapshot process. Previously, we were not resetting last_flush_time, resulting in a lot of logs being printed if the previous base backup failed and the current progress surpassed the previous base backup progress for the snapshot process
About this change - What it does
Resolves: #xxxxx
Why this way