Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRITICAL] Not receiving realtime updates to desktop/web session #52437

Open
1 of 8 tasks
m-natarajan opened this issue Nov 12, 2024 · 63 comments
Open
1 of 8 tasks

[CRITICAL] Not receiving realtime updates to desktop/web session #52437

m-natarajan opened this issue Nov 12, 2024 · 63 comments
Assignees
Labels
AutoAssignerNewDotQuality Used to assign quality issues to engineers Bug Something is broken. Auto assigns a BugZero manager. Internal Requires API changes or must be handled by Expensify staff Reviewing Has a PR in review Weekly KSv2

Comments

@m-natarajan
Copy link

m-natarajan commented Nov 12, 2024

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


Version Number:
Reproducible in staging?: Needs Reproduction
Reproducible in production?: Needs Reproduction
If this was caught on HybridApp, is this reproducible on New Expensify Standalone?:
If this was caught during regression testing, add the test name, ID and link from TestRail:
Email or phone of affected tester (no customers):
Logs: https://stackoverflow.com/c/expensify/questions/4856
Expensify/Expensify Issue URL:
Issue reported by: @quinthar
Slack conversation (hyperlinked to channel name): ts_external_expensify_quality

Action Performed:

  1. Login to staging.new.expensify.com as user A
  2. As user B send messages to user A

Expected Result:

User A receives message in real time

Actual Result:

For user A typing indicator displayed, not receiving realtime updates to desktop/web session, but receiving push notifications in mobile for the same

Workaround:

Can the user still use Expensify without this being fixed? Have you informed them of the workaround?

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • Android: Standalone
  • Android: HybridApp
  • Android: mWeb Chrome
  • iOS: Standalone
  • iOS: HybridApp
  • iOS: mWeb Safari
  • MacOS: Chrome / Safari
  • MacOS: Desktop

Screenshots/Videos

Add any screenshot/video evidence

image (18)

image (19)

image (20)

image (21)

Screen.Recording.2024-11-10.at.8.37.50.PM.mov

View all open jobs on GitHub

Issue OwnerCurrent Issue Owner: @tgolen
@m-natarajan m-natarajan added Daily KSv2 Needs Reproduction Reproducible steps needed Bug Something is broken. Auto assigns a BugZero manager. AutoAssignerNewDotQuality Used to assign quality issues to engineers labels Nov 12, 2024
Copy link

melvin-bot bot commented Nov 12, 2024

Triggered auto assignment to @deetergp (AutoAssignerNewDotQuality)

Copy link

melvin-bot bot commented Nov 12, 2024

Triggered auto assignment to @trjExpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.

@melvin-bot melvin-bot bot added Weekly KSv2 and removed Weekly KSv2 labels Nov 12, 2024
@MelvinBot
Copy link

This has been labelled "Needs Reproduction". Follow the steps here: https://stackoverflowteams.com/c/expensify/questions/16989

@trjExpensify
Copy link
Contributor

@deetergp I'm assuming notification issues like this need to remain internal, but let me know if you don't think so and we can ask a C+ to get involved as a next step to try and reproduce.

I seemingly can't repro this myself. Question from the thread is: "Why isn't the ping/ping detecting and fixing this?"

@melvin-bot melvin-bot bot added the Overdue label Nov 15, 2024
@quinthar
Copy link
Contributor

This happened again; I can't figure out how to reproduce reliably though.

Copy link

melvin-bot bot commented Nov 18, 2024

@deetergp, @trjExpensify Huh... This is 4 days overdue. Who can take care of this?

@trjExpensify trjExpensify moved this to CRITICAL in [#whatsnext] #quality Nov 18, 2024
@trjExpensify
Copy link
Contributor

@deetergp thoughts on the above, will you be able to look at this today?

CC: @muttmuure I think this one is in the CRITICAL category for #quality, so I've moved it there.

@muttmuure
Copy link
Contributor

muttmuure commented Nov 18, 2024 via email

@deetergp
Copy link
Contributor

@trjExpensify I've spent a bit of time with this today and I also cannot seem to reproduce it. I've been having a protracted conversation between the ExpensiScotts (-fy.com & -fail.com) in splitscreen browser windows and they both come through fine. I'm looking at DM chat between DB & Kadie to see if there's anything "off" about what's in Auth and in the logs.

@melvin-bot melvin-bot bot removed the Overdue label Nov 18, 2024
@trjExpensify
Copy link
Contributor

Gotcha. I'm sure DB would be happy to live debug or something, if you want to take it to the thread: https://expensify.slack.com/archives/C05LX9D6E07/p1731449676200089?thread_ts=1731299637.345689&cid=C05LX9D6E07

@melvin-bot melvin-bot bot added the Overdue label Nov 21, 2024
Copy link

melvin-bot bot commented Nov 22, 2024

@deetergp, @trjExpensify Whoops! This issue is 2 days overdue. Let's get this updated quick!

@deetergp
Copy link
Contributor

Spent a bit of time looking into this today and it interesting. A log search for blob:"PusherError" returns tens of thousands of results for just the last 24 hours. They all have the 1006 error code which Pusher's documentation has this to say about it:

When a WebSocket connection is closed without a "close frame", the pusher-js library emits an error with code 1006. Usually this is caused by WebSocket-incompatible proxies, which can't close the connection in the correct way.

Looking specifically into @quinthar's logs, I see an interesting 1006 log line that pops up: Software caused connection abort. Between my own searching and ChatGPT, it sounds like poor network connectivity can be a culprit, as can "Version or Library Mismatch". I found some GH issues from 2021 that talk about needing to be on the latest (for the time) version of 9.x. Looking in our package.json file, it looks like we are on v 8.3.0. Maybe we need to update the version of the pusher client we are using?

I'm not sure how involved updating to a newer version might be, maybe @mountiny or @AndrewGable might have some insight?

@melvin-bot melvin-bot bot removed the Overdue label Nov 25, 2024
Copy link

melvin-bot bot commented Nov 26, 2024

@deetergp @trjExpensify this issue was created 2 weeks ago. Are we close to a solution? Let's make sure we're treating this as a top priority. Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

@mountiny
Copy link
Contributor

@deetergp I dont know the specifics it would involve to update the pusher, but here is a PR when we did it last time and seems like it was fine without any specific testing and it was fine. So I would check if there are any specific breaking changes that should worry us and try to update it. However, we are already on the latest officially stable version 8.3.0 https://www.npmjs.com/package/pusher-js?activeTab=versions the next version 8.4.0 is still a release candidate.

@deetergp
Copy link
Contributor

Hmm… Maybe I'm confusing versions of other things. @quinthar Does this happen when you're using a poor connectivity setting in Dev Tools? Just trying to narrow down possible causes…

@melvin-bot melvin-bot bot added the Overdue label Nov 29, 2024
@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Jan 28, 2025
@melvin-bot melvin-bot bot changed the title Not receiving realtime updates to desktop/web session [HOLD for payment 2025-02-04] Not receiving realtime updates to desktop/web session Jan 28, 2025
@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label Jan 28, 2025

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@tgolen tgolen added Internal Requires API changes or must be handled by Expensify staff and removed Needs Reproduction Reproducible steps needed Awaiting Payment Auto-added when associated PR is deployed to production labels Jan 28, 2025
@tgolen tgolen changed the title [HOLD for payment 2025-02-04] Not receiving realtime updates to desktop/web session Not receiving realtime updates to desktop/web session Jan 28, 2025
@tgolen
Copy link
Contributor

tgolen commented Jan 28, 2025

I'm removing the payment stuff from this one since it's an internal issue.

@tgolen tgolen added Reviewing Has a PR in review and removed Reviewing Has a PR in review labels Jan 28, 2025
@tgolen
Copy link
Contributor

tgolen commented Jan 28, 2025

Daily Update

  • I still did not get any response from Pusher yet
  • I have replied to their email asking for an update, and they acknowledged my request (it really is tiresome)

Next Steps

  • Same

ETA

  • TBD

@tgolen
Copy link
Contributor

tgolen commented Jan 29, 2025

Daily Update

  • I got a reply from Pusher support

Hi Tim,

Thanks for your patience. We have an update from our engineers, here's their response:

"""
Unfortunately, we are not able to provide more information. That being said, it appears that these clients who're not able to reconnect is because of some ssl issue. Could you please describe these clients? Maybe the issue is that there are some old versions of OS which has outdated ssl certificates for instance?

By the way, did you tried to implement the reconnect behavior from my previous message?
"""

They clearly didn't read my previous email so I am referring them to what I already told them.

Next Steps

  • Try to get more clarification from them if they can understand the situation

ETA

  • TBD

@tgolen
Copy link
Contributor

tgolen commented Jan 31, 2025

Weekly Update

  • I'm getting pretty tired of trying to interface with their support, and it's frustrating, but I'm forging ahead
  • I still think that they should have some kind of knowledge about these errors if they try hard enough, so I want to keep pushing them

Next Steps

  • Continue the back-and-forth to try and get more information from them

ETA

  • TBD

@tgolen
Copy link
Contributor

tgolen commented Jan 31, 2025

I have also pinged our shared slack channel with pusher to try and get some better support on this issue.

@tgolen
Copy link
Contributor

tgolen commented Feb 3, 2025

Update

  • I got a response from Pusher asking to clear the browser cache and try again 😦 😬

Next Steps

  • Even after escalating the bad support through our channels, I feel that it's maybe time to give up trying to get any real support from them :(
  • I need to think about what next steps will be with this.
  • We could look into alternatives such as:
    • rolling our own
    • going with a pusher competitor as a backup
    • something else?

ETA

  • TBD

@tgolen
Copy link
Contributor

tgolen commented Feb 6, 2025

Update

  • I was able to get on a call with two of the Pusher engineers this morning and get some ideas of things we could try to fix the reconnect problems we are seeing.

Next Steps

  • Look into this React-Native Pusher Client and try to implement it at least in Android to see if it improves the behavior. I was not aware they had a RN version, so this was good to find out.

ETA

  • Try to have a PR for this sometime next week.

@tgolen
Copy link
Contributor

tgolen commented Feb 6, 2025

Some other suggestions they had:

  • Log if the client is backgrounded
  • Log if the client is in low-power mode

@tgolen
Copy link
Contributor

tgolen commented Feb 14, 2025

Weekly Update

  • There has been steady progress on the E/App PR to implement the RN client

Next Steps

  • Wait for that PR to be deployed and merged

ETA

  • Friday, Feb. 21

@tgolen
Copy link
Contributor

tgolen commented Feb 21, 2025

Weekly Update

  • The E/App PR to implement the RN client has been merged

Next Steps

  • Once the PR goes to production, @tgolen monitor logs to see if there are improvements with the connection errors
  • I'll report back here in a week!

  • Friday, Feb. 27

@mallenexpensify mallenexpensify changed the title Not receiving realtime updates to desktop/web session [CRITICAL] Not receiving realtime updates to desktop/web session Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoAssignerNewDotQuality Used to assign quality issues to engineers Bug Something is broken. Auto assigns a BugZero manager. Internal Requires API changes or must be handled by Expensify staff Reviewing Has a PR in review Weekly KSv2
Projects
Status: CRITICAL
Development

No branches or pull requests

9 participants