-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] File transport under stress hangs logger #1318
Conversation
This looks like it was removed (by accident?) in 40c1ce4
Here is an example that using winston master should freeze fairly quickly as it is an extreme example. |
First: thank you for your PR @mempf – fixing this scenario once and for all is high on the project's list of priorities. Would if be possible for you to write a unit test verifying this fix works appropriately? Given the importance of this fix it's prudent to prevent future regressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played around with the PR and it seems it is working. However, I think the fix can be much simpler.
As for the tests I've noticed this problem occurs on master
when logging messages with a length - or longer than 16384. I've created a MWE below which should NOT work on master
but does work with your patch @mempf, so the tests can be based around this.
'use strict';
const fs = require('fs');
const path = require('path');
const os = require('os');
const winston = require('winston');
const filename = path.join(__dirname, 'combined.log');
try {
fs.unlinkSync(filename); // eslint-disable-line no-sync
} catch (ex) { } // eslint-disable-line no-empty
const logger = winston.createLogger({
//
// Custom format to only print out the message
//
format: winston.format.printf(info => info.message),
transports: [
new winston.transports.File({
filename,
//
// By default winston already sets the
// eol this way, but this is for the demo.
//
eol: os.EOL
})
]
});
setInterval(() => {
logger.log({
level: 'info',
//
// Creates a message which is just under the `highWaterMark`limit.
// See: https://nodejs.org/api/stream.html#stream_constructor_new_stream_writable_options
// This should NOT work on `master`, but does work with @mempf's patch.
//
message: 'a'.repeat(16384 - os.EOL.length - 1)
});
}, 100);
lib/winston/transports/file.js
Outdated
@@ -89,7 +88,7 @@ module.exports = class File extends TransportStream { | |||
this._size = 0; | |||
this._pendingSize = 0; | |||
this._created = 0; | |||
this._drains = 0; | |||
this._drain = false; | |||
this._next = noop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/winston/transports/file.js
Outdated
}); | ||
return; | ||
} | ||
|
||
const written = this._stream.write(output, logged.bind(this)); | ||
if (written === false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to L175.
lib/winston/transports/file.js
Outdated
@@ -143,9 +142,21 @@ module.exports = class File extends TransportStream { | |||
this._endStream(() => this._rotateFile()); | |||
} | |||
|
|||
if (this._drain) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary. With the example you've provided in the PR I haven't been able to get in this block of code, but feel free to prove me wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying all my test cases and you seem to be right, this might be unnecessary. I haven't made a huge effort to investigate if the log function of the file transport might be called again if we've been told to wait for a drain event. This was to ensure we don't call .write on the stream again until its ready but if something else higher up prevents that making this check unnecessary than it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _writev
method of winstonjs/winston-transport
seems to call the log
method multiple times. I've managed to get into the if
-statement by changing the setInterval
part of the script I provided in my previous comment with the following:
const infos = new Array(10).fill().map(() => ({
level: 'info',
//
// Creates a message which is just under the `highWaterMark`limit.
// See: https://nodejs.org/api/stream.html#stream_constructor_new_stream_writable_options
// This should NOT work on `master`, but does work with @mempf's patch.
//
message: 'a'.repeat(16385 - os.EOL.length - 1)
}));
infos.forEach(info => logger.log(info));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So synchronous logging calls could overload the stream buffer still without something preventing it then.
…ting" This reverts commit 7ff756b.
This test should fail on current winston 'master' and succeed on this PR
Thanks for adding the test. If @ChrisAlderson is 👍then this should be good to go. |
Looks great! Thanks for the contribution. We could add one more test for the synchronous logging I have it below, you can add it @mempf or we can add it post-merge. it('should handle a high volume of large writes synchronous', function (done) {
const logger = winston.createLogger({
transports: [new winston.transports.File({
filename: logPath
})]
});
const counters = {
write: 0,
read: 0
};
const msgs = new Array(10).fill().map(() => ({
counter: ++counters.write,
message: 'a'.repeat(16384 - os.EOL.length - 1)
}));
msgs.forEach(msg => logger.info(msg));
setTimeout(function () {
helpers.tryRead(logPath)
.on('error', function (err) {
assume(err).false();
done();
})
.pipe(split())
.on('data', function (d) {
const json = JSON.parse(d);
assume(json.level).equal('info');
assume(json.message).equal('a'.repeat(16384 - os.EOL.length - 1));
assume(json.counter).equal(++counters.read);
})
.on('end', function () {
assume(counters.write).equal(counters.read);
done();
});
}, 10000);
}); |
Added your synchronous test, nice catch with that one. |
Looks good to merge @indexzero. |
@mempf - Thanks for the fix. Unfortunately it still doesn't work for us. You say this however:
Can you share with us how you modified my code? We tried this in our production app and although it doesn't stop the logging within a second like my example code does, it does make it hang within about 15 minutes. We've got a very high log throughput which is detrimental to understanding and managing our application (which handles financial decisions). Also; would you know a workaround for this? We might for example queue Winston logging alltogether using something like kue. Would that be an option or do you see problems with that? |
@kramer65 This whole issue is about the file writing buffering getting backed up because the OS is not writing it to disk fast enough. If you have enough sustained logging beyond what is being written to disk for long enough I imagine something in node will break, maybe even run out of memory. Is it possible to monitor the ram/heap usage of your application and see if it grows out of control. If I recall the modified version of your example was simply to exit out of the while loop eventually. Since node is single threaded a while(true) loop will run forever never giving a chance for other code to run. Edit: I also am curious what behavior you see in your production use case with winston 2.X and if this is a regression in winston 3 or if this has always been a problematic use case. |
* [fix] Add drain handler back into file transport This looks like it was removed (by accident?) in 40c1ce4 * [fix] Remove extra drain event * Prevent this._drains from going negative * Refactor drain mechanism to fix file transport logging under stress * Removed early return when write returns false * Greatly simplify drain mechanism based on PR feedback and testing * Revert "Greatly simplify drain mechanism based on PR feedback and testing" This reverts commit 7ff756b. * Changes and cleanup based on PR feebback * Add test case that should cover the original reason for this change This test should fail on current winston 'master' and succeed on this PR * Test the content of the message as well just in case * Add synchronous version of the file transport large message stress test
I noticed in one of our project using winston3 that all our logging would stop after a few large (sometimes very large) log messages. The use of the file transport was the cause and it seemed like it was not draining correctly. Upon investigation I found that the drain mechanism seemed flawed by using a drain counter and not resetting it to zero when the drain event occurs.
Additionally the file transport would attempt to write to the stream after receiving the indicator to stop and wait for a drain event.
I have tested this against the examples provided by @juanda99 and @kramer65 mentioned in #1144 and they both seem to work although the @kramer65 code had to by slightly modified to allow the while(true) loop to give the event loop some time to breathe.
I know this transport has been particularly tricky to get right and I hope this helps but unfortunately as it stands now, winston3 rc5 still can get the file transport stuck leading to all logging getting stuck. I am trying to create an isolated test case although my attempts so far have not gone well.