Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutually_exclusive_ranges gives non-deterministic results in certain situations where zero-length ranges are allowed #659

Closed
1 of 5 tasks
sfc-gh-ancoleman opened this issue Aug 30, 2022 · 1 comment · Fixed by #660
Labels
bug Something isn't working

Comments

@sfc-gh-ancoleman
Copy link
Contributor

Describe the bug

The window functions used in the test only order by the lower_bound_column. If zero_length_ranges are allowed, it is possible for this to result in failure when it shouldn't.

Steps to reproduce

RECORD    STATUS    LOWER_BOUND            UPPER_BOUND  
1         good      2021-12-01 00:00:00    2022-01-01 00:00:00
2         ok        2022-01-01 00:00:00    2022-01-01 00:00:00
3         bad       2022-01-01 00:00:00    2022-02-01 00:00:00

Expected results

This data should pass a mutually_exclusive_ranges test with configuration

lower_bound_column: lower_bound
upper_bound_column: upper_bound
zero_length_range_allowed: true

and fail mutually_exclusive_ranges test with configuration

lower_bound_column: lower_bound
upper_bound_column: upper_bound
zero_length_range_allowed: false

Actual results

Both of the following are recordsets ordered by LOWER_BOUND. The database could produce either of the following orderings of the records given above.

Ordering A:

RECORD    STATUS    LOWER_BOUND            UPPER_BOUND  
1         good      2021-12-01 00:00:00    2022-01-01 00:00:00
2         ok        2022-01-01 00:00:00    2022-01-01 00:00:00
3         bad       2022-01-01 00:00:00    2022-02-01 00:00:00

Ordering B:

RECORD    STATUS    LOWER_BOUND            UPPER_BOUND  
1         good      2021-12-01 00:00:00    2022-01-01 00:00:00
3         bad       2022-01-01 00:00:00    2022-02-01 00:00:00
2         ok        2022-01-01 00:00:00    2022-01-01 00:00:00

If the first ordering is returned, the dataset will pass the test. If the 2nd ordering is returned, the dataset will fail the test. Which ordering is returned appears to be non-deterministic, at least in Snowflake.

Also, at least in Snowflake, there is no guarantee that the same ordering will be used in the different window functions. I.e., it's possible for next_lower_bound to use Ordering B, but is_last_record to use the reverse of Ordering A.

Screenshots and log output

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_utils
    version: 0.8.4

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 1.0.0
   latest version: 1.0.0

Up to date!

Plugins:
  - snowflake: 1.0.0

Additional context

Are you interested in contributing the fix?

Yes, I am preparing a PR to include the upper_bound_column as a second field in the order by, which would still not quite provide a deterministic ordering, but should provide orderings that are identical in every way that matters for the test.

@joellabes
Copy link
Contributor

This makes a ton of sense, thank you for the well-fleshed out backstory and test case! Off to have a look at the PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants