Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Duplicate entries for some feeds after update of Nextcloud News - use "guidHash" as the only message identifier #358

Closed
pprkut opened this issue Feb 5, 2021 · 9 comments
Assignees
Labels
Component-Plugins-Nextloud Status-Not-Our-Bug Bug is present in some upstream libraries used by RSS Guard. Type-Defect This is BUG!!!

Comments

@pprkut
Copy link

pprkut commented Feb 5, 2021

Brief description of the issue.

I updated Nextcloud News from 15.1.1 to 15.2.2, and now I'm seeing duplicate items in some feeds in rssguard (but not in the web interface of Nextcloud News).

It doesn't happen for all feeds, having difficulty figuring out the exact behavior. Some items are in there once, some twice and some three times by now (same feed). Other feeds show no such issue.

How to reproduce the bug?

The feed where it happens the worst for me is http://www.heise.de/newsticker/heise-atom.xml, but there's a chance it only affects items before the update of Nextcloud News.

Is there a way I can debug myself what's going on here?

Other information

  • OS: Linux
  • Desktop Environment: KDE Plasma
  • RSS Guard version: 3.8.4
@pprkut pprkut added the Type-Defect This is BUG!!! label Feb 5, 2021
@martinrotter
Copy link
Owner

martinrotter commented Feb 5, 2021

Does the number of messages duplicate -> triplicate -> doubles if you hit "download messages" button for the feed several times. Or does it just stays.

Think is, Nextcloud News made some changes with how they handle "message ID" numbers. It could happen that after the installation of new News plugin version, you will see some messages TWICE (old message with "old" id and new message with "new" id). You should not in any case see "three" same messages.

Also, can you please just test the behavior if you start RSS Guard with clean "data" folder/profile? If the messages still duplicate themselves over several feed updates?

@martinrotter
Copy link
Owner

Also, now I tested with latest Nextcloud and latest News (15.2.2):

image

and everything seems to work with your feed. I get no duplications at all with clean profile. So cleaning profile should help. News developers are to blame for this, not RSS Guard.

@martinrotter
Copy link
Owner

Anyway, test my above suggestions and let me know if it all worked.

@pprkut
Copy link
Author

pprkut commented Feb 5, 2021

Thanks for checking! I'll keep an eye on this and see if it keeps happening. If it's just a one-off, I don't mind as much. Was just getting triggered when I saw items appearing for a third time.

I tried manually triggering a download, and that doesn't add duplicates, so it doesn't look like items are duplicated on every request.

FWIW, I checked the data in sqlite, and this is what I see:

sqlite> SELECT * FROM Messages WHERE title LIKE "Amazon plant Doppel%";
id    is_read  is_deleted  is_important  feed  title                                                                   url                                                                                                                                                      author  date_created   contents                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               is_pdeleted  enclosures  account_id  custom_id  custom_hash
----  -------  ----------  ------------  ----  ----------------------------------------------------------------------  -------------------------------------------------------------------------------------------------------------------------------------------------------  ------  -------------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -----------  ----------  ----------  ---------  --------------------------------
4939  1        0           0             13    Amazon plant Doppel-Helix-förmiges Hauptquartier aus Glas in Arlington  https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag          1612342860000  <p><a target="_blank" rel="noreferrer" target="_blank" rel="noreferrer" href="https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag"><img src="https://www.heise.de/scale/geometry/450/q80//imgs/18/3/0/4/9/9/6/0/aerial-reduced-14824e088dffbaba.jpeg" alt="" /></a></p><p>Amazon-Mitarbeiter sollen in einem architektonisch ungewöhnlichen Gebäude in Arlington arbeiten. Im Zentrum soll der Mensch stehen.</p>  0                        1           5127       e288eae7e1b522a087c6dd45a028c96e
5287  1        0           0             13    Amazon plant Doppel-Helix-förmiges Hauptquartier aus Glas in Arlington  https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag          1612342860000  <p><a target="_blank" rel="noreferrer" target="_blank" rel="noreferrer" href="https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag"><img src="https://www.heise.de/scale/geometry/450/q80//imgs/18/3/0/4/9/9/6/0/aerial-reduced-14824e088dffbaba.jpeg" alt="" /></a></p><p>Amazon-Mitarbeiter sollen in einem architektonisch ungewöhnlichen Gebäude in Arlington arbeiten. Im Zentrum soll der Mensch stehen.</p>  0                        1           5425       e288eae7e1b522a087c6dd45a028c96e
5449  1        0           0             13    Amazon plant Doppel-Helix-förmiges Hauptquartier aus Glas in Arlington  https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag          1612342860000  <p><a target="_blank" rel="noreferrer" target="_blank" rel="noreferrer" href="https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag"><img src="https://www.heise.de/scale/geometry/450/q80//imgs/18/3/0/4/9/9/6/0/aerial-reduced-14824e088dffbaba.jpeg" alt="" /></a></p><p>Amazon-Mitarbeiter sollen in einem architektonisch ungewöhnlichen Gebäude in Arlington arbeiten. Im Zentrum soll der Mensch stehen.</p>  0                        1           5646       e288eae7e1b522a087c6dd45a028c96e

@martinrotter
Copy link
Owner

martinrotter commented Feb 5, 2021

@pprkut Wow, perfect answer. Check out the columns "custom_id custom_hash". These are absolutely crucial and their values are returned by News server. They should UNIQUELY identify THE message.

In other words, if News returns suddenly different "custom_id", RSS Guard automatically considers the message to be "different". These "custom_id"s are the key source of "uniqueness" when it comes to synchronized services like News, TT-RSS or Inoreader.

In other attributes, those messages seem to be identical. So here, we have clear situation because those "custom_id" values are maintained by News server and clearly it is News's fault. Anyway, keep me posted if something interesting appears.

EDIT: Also, note that there is quite a mess in News's API about message "ID" as they actually have THREE IDs for each message and honestly, I am quite unsure which one is the real unique ID. Maybe I got that wrong maybe I did not. I will post a bug ticket upstream.

@pprkut
Copy link
Author

pprkut commented Feb 5, 2021

Thanks! I figured it might be this, but wasn't sure if custom_id really means the remote ID. I'll keep watching :)

@martinrotter
Copy link
Owner

@pprkut It seems that this situation is the combination of two bugs, one major bug upstream (those multiple messages should not even be there), my small mistake -> use "guidHash" as the only and the primary "ID" of message.

You should follow to here and report your situation.

@pprkut
Copy link
Author

pprkut commented Feb 5, 2021

For reference, this is what I see in Nextcloud's DB:

MariaDB [nextcloud]> SELECT * FROM oc_news_items WHERE title LIKE "Amazon plant Doppel%" \G
*************************** 1. row ***************************
               id: 5646
        guid_hash: e288eae7e1b522a087c6dd45a028c96e
      fingerprint: 3339bb2bce76e3a0341db5bd9054625d
     content_hash: 3339bb2bce76e3a0341db5bd9054625d
              rtl: 0
     search_index: amazon-mitarbeiter sollen in einem architektonisch ungewöhnlichen gebäude in arlington arbeiten. im zentrum soll der mensch stehen.amazon plant doppel-helix-förmiges hauptquartier aus glas in arlingtonhttps://www.heise.de/news/amazon-plant-doppel-helix-foermiges-hauptquartier-aus-glas-in-arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag
             guid: http://heise.de/-5044059
              url: https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag
            title: Amazon plant Doppel-Helix-förmiges Hauptquartier aus Glas in Arlington
           author: NULL
         pub_date: 1612342860
     updated_date: NULL
             body: <p><a target="_blank" rel="noreferrer" href="https://www.heise.de/news/Amazon-plant-Doppel-Helix-foermiges-Hauptquartier-aus-Glas-in-Arlington-5044059.html?wt_mc=rss.red.ho.ho.atom.beitrag.beitrag"><img src="https://www.heise.de/scale/geometry/450/q80//imgs/18/3/0/4/9/9/6/0/aerial-reduced-14824e088dffbaba.jpeg" alt="" /></a></p><p>Amazon-Mitarbeiter sollen in einem architektonisch ungewöhnlichen Gebäude in Arlington arbeiten. Im Zentrum soll der Mensch stehen.</p>
   enclosure_mime: NULL
   enclosure_link: NULL
  media_thumbnail: NULL
media_description: NULL
          feed_id: 13
           status: 0
           unread: 0
          starred: 0
    last_modified: 1612519212704904
1 row in set (0.064 sec)

@martinrotter martinrotter added this to the 3.9.0 milestone Feb 5, 2021
@martinrotter martinrotter removed the Status-Fixed Ticket is resolved. label Feb 5, 2021
@martinrotter martinrotter changed the title [BUG]: Duplicate entries for some feeds after update of Nextcloud News [BUG]: Duplicate entries for some feeds after update of Nextcloud News - use "guidHash" as the only message identifier Feb 5, 2021
@martinrotter martinrotter added Status-Not-Enough-Data Ticket creator must append more precise info to the ticket. Status-Not-Our-Bug Bug is present in some upstream libraries used by RSS Guard. and removed Status-Not-Enough-Data Ticket creator must append more precise info to the ticket. labels Feb 5, 2021
@martinrotter
Copy link
Owner

@pprkut OK, final resolution from me:

  1. I will not change RSS Guard's internal DB "id" handling, because the root of the issue is that Nextcloud News automatically purges old messages after some time (and they had a bug which purged incorrect messages during 15.2.2 transition), and when those purged messages get included in News again, they get assigned different "id" while maintaining same "guidHash". The changed in "id" confuses RSS Guard which treats subsequent message as being new one and does not overwrite existing one.
  2. The behavior is very rare can be avoided by setting Maximum read count per feed to -1 in Nextcloud News settings.
  3. In the future, this maybe properly fixed only if Nextcloud News comes with sane API, which will provide signular "id" for each message.

@martinrotter martinrotter removed this from the 3.9.0 milestone Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Plugins-Nextloud Status-Not-Our-Bug Bug is present in some upstream libraries used by RSS Guard. Type-Defect This is BUG!!!
Projects
None yet
Development

No branches or pull requests

2 participants