You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the WAT files generated with the extractor, we have the following structure:
Envelope: {
Format: "WARC",
WARC-Header-Length: "298",
Actual-Content-Length": "1343",
WARC-Header-Metadata: {},
Block-Digest: "sha1:XW7VSE74YCSE6AIJNT5AVSELMVBCIYYN",
Payload-Metadata: {}
}
Block-Digest and Actual-Content-Length are not supposed to be in this section.
There are also an Actual-Content-Length and a Entity-Digest in the Payload-Metadata section.
Content and computation of these 4 metadata need to be clarified.
The text was updated successfully, but these errors were encountered:
According to the WAT specification (https://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Metadata+File+Specification), the enveloppe structure should be:
"Envelope": {
"Format": "WARC",
"Payload-Metadata": {}
"WARC-Header-Length": "298",
"WARC-Header-Metadata": {}
}
In the WAT files generated with the extractor, we have the following structure:
Envelope: {
Format: "WARC",
WARC-Header-Length: "298",
Actual-Content-Length": "1343",
WARC-Header-Metadata: {},
Block-Digest: "sha1:XW7VSE74YCSE6AIJNT5AVSELMVBCIYYN",
Payload-Metadata: {}
}
Block-Digest and Actual-Content-Length are not supposed to be in this section.
There are also an Actual-Content-Length and a Entity-Digest in the Payload-Metadata section.
Content and computation of these 4 metadata need to be clarified.
The text was updated successfully, but these errors were encountered: