truncated API response for "allrevisions" causes infinite loop #166

makoshark · 2023-08-07T07:15:58Z

Description of the issue

In an API request is over some threshold and you are using the API:Allrevisions API, MediaWiki truncates the data and does not include any revision data/metadata (i.e., it simply includes an empty list of revisions). This seems to be a special case of [this MediaWiki bug])https://phabricator.wikimedia.org/T86611) although I've not seen any reference to it in this particular case online.

The page content seems to be 8.5MB and the API limit is a bit less than that. It seems like a spam edit.

This is the JSON version of the API response:

{
  "batchcomplete": "",
  "continue": {
    "arvcontinue": "20210219051441|2674861",
    "continue": "-||userinfo"
  },
  "warnings": {
    "result": {
      "*": "This result was truncated because it would otherwise be larger than the limit of 8,388,608 bytes."
    },
    "main": {
      "*": "Subscribe to the mediawiki-api-announce mailing list at  for notice of API deprecations and breaking changes."
    },
    "allrevisions": {
      "*": "Because \"arvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
    }
  },
  "query": {
    "allrevisions": [],
    "userinfo": {
      "id": 45359,
      "name": "Benjamin Mako Hill"
    }
  }
}

Because the API has not returned any revisions, the value to the arvcontinue slug that is included in the data returned by the API does not change. As a result, dumpgenerator.py assumes that everything is well and stores all the data returned (i.e., nothing) and makes the same request again. This will then happen and over and again until a user intervenes.

How to fix this

At a minimum I think it should notice that we're repeatedly seeing the same continuation for repeated subsequent "successful" (200) requests and then error out. Maybe we want to add the added stipulation that the data is empty?

A more bold approach would involve munging the continuation to add one or something else? I could imagine why we might not want to support this in the tool though. This appears to work in the specific case above but might not work in general.

I manually worked around it by removing the "content" from the arvprop parameter manually (for just this single request), handcrafting XML for that single <page>, concatenating it to the results, and then restarting. Doing something like this automatically is definitely possible, but I'm not sure it's either worth it or a good idea.

I'm happy to help coding something up to fix this but I'm honestly not sure what the best way to approach this would be.

The text was updated successfully, but these errors were encountered:

close: mediawiki-client-tools#166

robkam · 2024-06-13T13:35:28Z

Closing this, if it's still an issue please reopen.

yzqzss added the bug Something isn't working label Aug 7, 2023

yzqzss self-assigned this Aug 7, 2023

yzqzss changed the title ~~truncated API requests for "allrevisions" causes infinite loop~~ truncated API response for "allrevisions" causes infinite loop Aug 9, 2023

yzqzss added a commit to saveweb/wikiteam3 that referenced this issue Aug 9, 2023

fix: truncated API response for "allrevisions" causes infinite loop

76465d3

close: mediawiki-client-tools#166

yzqzss removed their assignment Aug 21, 2023

yzqzss mentioned this issue Aug 27, 2023

wikiteam3 v4 release #176

Closed

robkam closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

truncated API response for "allrevisions" causes infinite loop #166

truncated API response for "allrevisions" causes infinite loop #166

makoshark commented Aug 7, 2023 •

edited

Loading

robkam commented Jun 13, 2024

truncated API response for "allrevisions" causes infinite loop #166

truncated API response for "allrevisions" causes infinite loop #166

Comments

makoshark commented Aug 7, 2023 • edited Loading

Description of the issue

How to fix this

robkam commented Jun 13, 2024

makoshark commented Aug 7, 2023 •

edited

Loading