Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(seer grouping): Save grouphash metadata before adding seer results #85804

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Feb 24, 2025

Once we turned on the grouphash metadata backfill for existing grouphashes, a weird thing started sometimes happening with new grouphashes: when we went to update their metadata with Seer results, we got a Cannot update an instance that has not yet been created error. The breadcrumbs show us querying for the metadata record (the instance in question), and presumably we find it, since we then test whether the result is truthy and only proceed if it is. Indeed, in the stack locals of the error, we can see that there's a metadata object there.

One odd thing we noticed is that the metadata object retrieved has an id of None, as if it only exists in memory and hasn't yet been saved to the database. That shouldn't be possible, since it's created with objects.create, which should immediately save the record, but perhaps creation is buffered and we're running into a race condition? To test that hypothesis, this PR forces the object to be saved in cases where the id is None. It also bails before the problematic update call if that save doesn't fix the problem, so ingestion can proceed without errors. Finally, it adds logging so we can see both how often it happens and whether the fix works.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 24, 2025
@lobsterkatie lobsterkatie force-pushed the kmclb-save-grouphash-metadata-before-adding-seer-results branch from 607c1ad to 2583a60 Compare February 24, 2025 22:26
@lobsterkatie lobsterkatie force-pushed the kmclb-save-grouphash-metadata-before-adding-seer-results branch from 2583a60 to 64c7668 Compare February 24, 2025 22:41
Copy link

codecov bot commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/grouping/ingest/seer.py 25.00% 6 Missing ⚠️

❌ Your patch check has failed because the patch coverage (25.00%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #85804       +/-   ##
===========================================
+ Coverage   46.67%   87.88%   +41.21%     
===========================================
  Files        9669     9683       +14     
  Lines      548150   548819      +669     
  Branches    21318    21318               
===========================================
+ Hits       255824   482325   +226501     
+ Misses     292019    66187   -225832     
  Partials      307      307               

@lobsterkatie lobsterkatie marked this pull request as ready for review February 24, 2025 23:22
@lobsterkatie lobsterkatie requested a review from a team as a code owner February 24, 2025 23:22
Copy link
Member

@JoshFerge JoshFerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pasting slack comment for posterity:

so i'm thinking that the grouphash that we're passing into the function has a stale value of grouphashmetadata in cases where we're creating the grouphash for the same event. this is because when we create grouphash metadata, this doesn't automatically update the instantiated grouphash model that we have.
my thought is that we can call
grouphash.refresh_from_db() at the top of the create_or_update_grouphash_metadata_if_needed function, and that should fix this.
alternatively, i think we could also just say grouphash.metadata = metadata in

GroupHashMetadata.objects.create(**new_data)

@lobsterkatie
Copy link
Member Author

pasting slack comment for posterity:

so i'm thinking that the grouphash that we're passing into the function has a stale value of grouphashmetadata in cases where we're creating the grouphash for the same event. this is because when we create grouphash metadata, this doesn't automatically update the instantiated grouphash model that we have. my thought is that we can call grouphash.refresh_from_db() at the top of the create_or_update_grouphash_metadata_if_needed function, and that should fix this. alternatively, i think we could also just say grouphash.metadata = metadata in

GroupHashMetadata.objects.create(**new_data)

Agreed through IRL convo that this isn't the issue (otherwise it'd never work, rather than working in 99%+ of cases). Will update here once we figure out what is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants