-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipwb index outputs unrelated information into CDXJ file #639
Comments
Yes, this is a bug. The |
|
I am unable to replicate this behavior and want to nail down the why so we can also fix it in other places like you have provided in #640, @anatoly-scherbakov. Perhaps it is the WARC itself but maybe it is the OS. I am using macOS 10.15.4 Python 3.8.0 and ipwb 0.2019.08.15.0213 from pypi (reinstalled previous version for testing).
I agree that the first three lines should not be printed. I am curious as to why my freshly pulled copy did not provide the outdated version error. I also agree that we should use I would like the update command to be run automatically, but perhaps we should incorporate some smarts into it. Software that requires manually invocation for updating rarely gets updated. We were retaining pseudo-preferences for ipwb within the config file in For the crux of this issue, @anatoly-scherbakov can you verify if this is replicable when using the latest version of ipwb we released last week (v0.2020.4.24.1847)? Regardless of whether it is (i.e., it will rear its head again on the next update), I would like to identify why this is not replicable between our two instances of the same build version. |
Using loggers is certainly a good practice and I am in support for it. However, I suggested to reuse
I personally do not like built-in check for update feature in every app and almost hate per application auto update feature. This sounds so non-Unixy to me. Software distribution, update check/notification, and update installation are responsibilities of app repositories and update managers. In early days of Windows and Mac, individual applications used to have their own distribution strategies so it made more sense in those applications. I think @machawk1 likes this feature. I remember talking to him about moving this functionality in a separate thread or something so it does not block the main execution thread, but then the notification might be lost in other messages and not be as noticeable. Unintentional DoS is a good point when the script is invoked multiple times, thought indexing does not have to be executed once for each WARC file, it should accept multiple files, wildcard, or a directory path containing WARC (it should be implemented, if it does not do so already). |
Yep, it is what I am used to: the option to know, on launch, whether you are using the latest version of the software. Many Mac apps continue to do this. Those that use a package manager, e.g., homebrew or the macOS app store, rely on said manager to let the system know that software may be out-of-date. I do not think the "manager" we are anticipating to be the means of installation does this auto-check like homebrew does when installing any other package. Ultimately, it is not critically important to me and the automatic check on each run is invasive and could be strategically culled.
Right, it should only be called once on the initial invocation and not while iterating through each file...unless multiple instances are running as the usage pattern instead of feeding all WARCs to a single instance of ipwb. |
I was able to replicate it:
|
One way to minimize the footprint of automatic check would be to only call it when a version check is performed (i.e., |
@ibnesayeed I see here that you can replicate the version being reported but can you replicate it being included in a CDXJ? |
This is a good approach. For discussion (though I agree with your method), why not have it also be run with the index/replay help as well as the top level? |
Yes, bellow the fold I demonstrate it in the same container I illustrated the issue above:
Now, installing and running IPFS daemon and then trying to index again:
|
Yes, that's a point. But, my practice suggests such a PR usually never happens :)
If @machawk1 likes this feature - maybe we could revisit the option of creating a very special subcommand for it, the name of which explicitly and unambiguously explains what it does? |
I also refactored the code into |
This is a very good point. |
I completely agree.
This is sort-of what go-ipfs does to the extreme of |
In that case, does everyone agree to completely remove the code related to this version check? If ever needed, it can always be retrieved from Git history. |
One more thing I could add is that not everyone actually uses
or
and everything is done for you. Let's delegate this task to the systems which are built specifically for the purpose. |
I agree and interesting point on poetry, I am unfamiliar with it. Feel free to send a PR removing the version checks, @anatoly-scherbakov. |
#366 is related regarding the usage of other package managers and dependency specification (e.g., Pipfile). |
I am usually very happy when more lines of code are removed from a repository than added. ;-) Again, I am generally not in favor of implementing check for update feature in every tool, which has a lot of coupling and dictates one way of doing things that may not align with individual users' preferences and workflows. This may be a cool feature for desktop applications with UIs, but server apps or packages do not need such things in place. Often these services are run using bootstrapping scripts and there are no eyes to look at what was logged at the time of application boot. Auto updates (that were not part of IPWB in the first place) are even more dangerous when services are run in containers and state of those container images should be frozen, especially when big deployments require signing of application state by their DevOp and QA chain. That said, if one is using pip to install and manage IPWB, they can do something like:
Or
So, let's get rid of the built-in check for update feature and optionally document update commands in the README. |
I am creating a CDXJ file from a WARC archive in a bash script, like this:
I subsequently put the file into IPFS, where it can be easily looked up, and an issue can be demonstrated:
First three lines should not be printed: due to them, my CDXJ file is now invalid. I suggest to use
logger.warning
for these messages instead of usingprint()
. This way, the messages will be sent to stderr rather than to stdout. I aim to prepare a PR for that purpose shortly.The text was updated successfully, but these errors were encountered: