You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In an API request is over some threshold and you are using the API:Allrevisions API, MediaWiki truncates the data and does not include any revision data/metadata (i.e., it simply includes an empty list of revisions). This seems to be a special case of [this MediaWiki bug])https://phabricator.wikimedia.org/T86611) although I've not seen any reference to it in this particular case online.
The page content seems to be 8.5MB and the API limit is a bit less than that. It seems like a spam edit.
This is the JSON version of the API response:
{
"batchcomplete": "",
"continue": {
"arvcontinue": "20210219051441|2674861",
"continue": "-||userinfo"
},
"warnings": {
"result": {
"*": "This result was truncated because it would otherwise be larger than the limit of 8,388,608 bytes."
},
"main": {
"*": "Subscribe to the mediawiki-api-announce mailing list at for notice of API deprecations and breaking changes."
},
"allrevisions": {
"*": "Because \"arvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
}
},
"query": {
"allrevisions": [],
"userinfo": {
"id": 45359,
"name": "Benjamin Mako Hill"
}
}
}
Because the API has not returned any revisions, the value to the arvcontinue slug that is included in the data returned by the API does not change. As a result, dumpgenerator.py assumes that everything is well and stores all the data returned (i.e., nothing) and makes the same request again. This will then happen and over and again until a user intervenes.
How to fix this
At a minimum I think it should notice that we're repeatedly seeing the same continuation for repeated subsequent "successful" (200) requests and then error out. Maybe we want to add the added stipulation that the data is empty?
A more bold approach would involve munging the continuation to add one or something else? I could imagine why we might not want to support this in the tool though. This appears to work in the specific case above but might not work in general.
I manually worked around it by removing the "content" from the arvprop parameter manually (for just this single request), handcrafting XML for that single <page>, concatenating it to the results, and then restarting. Doing something like this automatically is definitely possible, but I'm not sure it's either worth it or a good idea.
I'm happy to help coding something up to fix this but I'm honestly not sure what the best way to approach this would be.
The text was updated successfully, but these errors were encountered:
yzqzss
changed the title
truncated API requests for "allrevisions" causes infinite loop
truncated API response for "allrevisions" causes infinite loop
Aug 9, 2023
yzqzss
added a commit
to saveweb/wikiteam3
that referenced
this issue
Aug 9, 2023
Description of the issue
In an API request is over some threshold and you are using the API:Allrevisions API, MediaWiki truncates the data and does not include any revision data/metadata (i.e., it simply includes an empty list of revisions). This seems to be a special case of [this MediaWiki bug])https://phabricator.wikimedia.org/T86611) although I've not seen any reference to it in this particular case online.
In at least the version of MediaWiki I'm looking at, it still returns a status of 200. You can see an example here of a request that I manage to extract out of
dumpgenerator
: https://wikitravel.org/wiki/en/api.php?list=allrevisions&arvlimit=1&arvdir=newer&arvcontinue=20210219051441|2674861&arvprop=ids|timestamp|user|userid|size|sha1|contentmodel|comment|content|flags&continue=&meta=userinfo&uiprop=blockinfo|hasmsg&action=query&format=jsonThe page content seems to be 8.5MB and the API limit is a bit less than that. It seems like a spam edit.
This is the JSON version of the API response:
Because the API has not returned any revisions, the value to the
arvcontinue
slug that is included in the data returned by the API does not change. As a result,dumpgenerator.py
assumes that everything is well and stores all the data returned (i.e., nothing) and makes the same request again. This will then happen and over and again until a user intervenes.How to fix this
At a minimum I think it should notice that we're repeatedly seeing the same continuation for repeated subsequent "successful" (200) requests and then error out. Maybe we want to add the added stipulation that the data is empty?
A more bold approach would involve munging the continuation to add one or something else? I could imagine why we might not want to support this in the tool though. This appears to work in the specific case above but might not work in general.
I manually worked around it by removing the "
content
" from thearvprop
parameter manually (for just this single request), handcrafting XML for that single<page>
, concatenating it to the results, and then restarting. Doing something like this automatically is definitely possible, but I'm not sure it's either worth it or a good idea.I'm happy to help coding something up to fix this but I'm honestly not sure what the best way to approach this would be.
The text was updated successfully, but these errors were encountered: