-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create testruns model in timeseries app #508
base: main
Are you sure you want to change the base?
Conversation
e31d268
to
8d1d0a2
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #508 +/- ##
==========================================
- Coverage 90.46% 89.98% -0.49%
==========================================
Files 463 324 -139
Lines 13264 9044 -4220
Branches 2116 1599 -517
==========================================
- Hits 11999 8138 -3861
+ Misses 1140 845 -295
+ Partials 125 61 -64
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
0d177e8
to
2fdefc4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one q - do you want to do RiskyRunSQL
? I'm not sure if we do that for timeseries things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
it would be nice to give the migrations a readable name, as well as rename the aggregate field failing_commits
to make it more obvious.
otherwise, I would maybe add another index on repo_id, branch
for the main testruns table.
The experiments I did last week using clickhouse made it clear that doing pre-aggregation on the high-cardinality branch
(lots of feature branches, but very few runs for those each) are not making a ton of sense, and its possible to use the raw data table to query for per-test, or aggregate per-feature-branch.
The pre-aggregation / materialized view make sense for the main branch and "across all branches" though.
COUNT(DISTINCT CASE WHEN outcome = 'failure' OR outcome = 'flaky_fail' THEN commit_sha ELSE NULL END) AS cwf, | ||
time_bucket(interval '1 days', timestamp) as timestamp_bin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COUNT(DISTINCT CASE WHEN outcome = 'failure' OR outcome = 'flaky_fail' THEN commit_sha ELSE NULL END) AS cwf, | |
time_bucket(interval '1 days', timestamp) as timestamp_bin, | |
time_bucket(interval '1 days', timestamp) as timestamp_bin, | |
COUNT(DISTINCT CASE WHEN outcome = 'failure' OR outcome = 'flaky_fail' THEN commit_sha ELSE NULL END) AS failing_commits, |
please spell out failing_commits
instead of the cryptic cwf
.
also IMO its a bit more readable to visually group / separate the "group by" columns and the aggregates.
|
||
class Migration(migrations.Migration): | ||
dependencies = [ | ||
("timeseries", "0018_auto_20250206_1657"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give these migrations a readable name?
also, it is possible to merge some of those together, or do they have to be separate migrations?
start_offset => '7 days', | ||
end_offset => '1 days', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there some documentation of what these mean?
I thought we want to aggregate things for 60 days. what does the 7 days
here mean in relation to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's docs here, we do want to aggregate for 60 days, but we won't be refreshing the aggregated rows past 7 days, so if we somehow process a seven day old upload again, we won't be refreshing the continuous aggregate with that data.
I think we can tune this policy as we see fit going forward, this is kinda just a placeholder, and I don't think the start_offset even needs to be that far back.
models.Index( | ||
fields=["repo_id", "test_id", "flags_hash"], | ||
), | ||
] | ||
constraints = [ | ||
models.UniqueConstraint( | ||
fields=["repo_id", "test_id", "flags_hash"], | ||
name="flags_hash_test_id_unique", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe a unique constraint already implicitly creates an index that can be used for queries. which means this would be duplicated.
on the other hand, I don’t see any indices being defined on the materialized views. Do we not need those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timescale automatically creates an index on the group by condition of continuous aggregates: https://docs.timescale.com/use-timescale/latest/continuous-aggregates/create-index/#automatically-created-indexes
i'm going to remove the unique constraint for now
test_id = models.BinaryField(null=False) | ||
flags_hash = models.BinaryField(null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the flags_hash
not part of the test_id
?
we also store all the flags inline as the flags
array (which IMO is a better idea than having a join table)
do we need the flags_hash
at all in that case then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the flags_hash not part of the test_id?
not anymore, i guess we can remove it since we're storing the flags in the table and we aren't using it in any of the CAggs
right, i just verified this locally with timescale and i think you're right, it also reflects what you saw with your testing in Alloy for the dedicated tests view. |
2fdefc4
to
11856f5
Compare
i think it should be fine in this case because from what i understand we usually do risky migrations for operations that will lock a table for a long time, like creating an index on a large table like uploads or repos. Since in this case we're creating a new table then running operations on it while its empty I think it should be fine |
953bfa0
to
4bef0cc
Compare
migrations.AddIndex( | ||
model_name="measurement", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems unrelated? might be good to move to a different PR/migration.
4bef0cc
to
aadbae1
Compare
creates the Testrun and TestrunSummary models for TA. We start by creating the regular table and then make it a hypertable, then we create the continuous aggregates, then set the cagg policies, then finally create the Testrun and TestrunBranchSummary