Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: change the evaluation order of columns in Update and Insert statements #57123

Merged
merged 14 commits into from
Dec 18, 2024

Conversation

joechenrh
Copy link
Contributor

@joechenrh joechenrh commented Nov 5, 2024

What problem does this PR solve?

Issue Number: ref #56829

Problem Summary:

In the previous logic, when we use UPDATE or INSERT ON DUPLICATE, the new row will be generated in the following order:

  • Fill all the explicitly set columns.
  • Evaluate all auto-generated columns. For UPDATE and INSERT, they are evaluated in composeGeneratedColumns and doDupRowUpdate respectively.
  • Update on-update-now columns if necessary.

However, auto-generated columns may rely on the on-update-now columns to generate value. For example in #56829 (comment)

CREATE TABLE cache (
  cache_key varchar(512) NOT NULL,
  updated_at datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  expired_at datetime GENERATED ALWAYS AS (if(expires > 0, date_add(updated_at, interval expires second), date_add(updated_at, interval 99 year))) VIRTUAL,
  expires int(11),
  PRIMARY KEY (cache_key) /*T![clustered_index] CLUSTERED */,
  KEY idx_c_on_expired_at (expired_at)
);

expired_at is generated based on the latest timestamp value from updated_at. So we will get wrong expired_at value. Even worse, expired_at is the part of the index idx_c_on_expired_at. So querying data expired_at using index scan and full table scan will get different result, since in full table scan, expired_at is calculated in real-time.

This also explains #56829 (comment) why changing VIRTUAL to STORED will not yield such error, although this value itself is incorrect.

What changed and how does it work?

To address this problem, this PR refactor the logic of INSERT ON DUPLICATE and UPDATE. More specifically:

  • Move the evalation of auto-generated columns in updateRecord.
  • ForeignKey Check is also moved into updateRecord.
  • Extract errorHandler function for UPDATE and INSERT to handle error/warning in updateRecord.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2024
Copy link

tiprow bot commented Nov 5, 2024

Hi @joechenrh. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 80.97166% with 47 lines in your changes missing coverage. Please review.

Project coverage is 73.0050%. Comparing base (22c91d0) to head (12a3f93).
Report is 216 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #57123        +/-   ##
================================================
- Coverage   73.2085%   73.0050%   -0.2036%     
================================================
  Files          1679       1698        +19     
  Lines        462531     507757     +45226     
================================================
+ Hits         338612     370688     +32076     
- Misses       103136     115414     +12278     
- Partials      20783      21655       +872     
Flag Coverage Δ
unit 72.4228% <63.5627%> (+0.0676%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 40.1288% <ø> (-5.8702%) ⬇️

@joechenrh
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Nov 6, 2024

@joechenrh: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -430,8 +443,15 @@ func (e *InsertExec) initEvalBuffer4Dup() {
}

// doDupRowUpdate updates the duplicate row.
func (e *InsertExec) doDupRowUpdate(ctx context.Context, handle kv.Handle, oldRow []types.Datum, newRow []types.Datum,
extraCols []types.Datum, cols []*expression.Assignment, idxInBatch int, dupKeyMode table.DupKeyCheckMode, autoColIdx int) error {
func (e *InsertExec) doDupRowUpdate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only fixes the INSERT .. ON DUPLICATE UPDATE .. case. As I have tested, the UPDATE path also has the same bug:

CREATE TABLE cache (
  cache_key varchar(512) NOT NULL,
  updated_at datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  expired_at datetime GENERATED ALWAYS AS (if(expires > 0, date_add(updated_at, interval expires second), date_add(updated_at, interval 99 year))) VIRTUAL,
  expires int(11),
  PRIMARY KEY (cache_key) /*T![clustered_index] CLUSTERED */,
  KEY idx_c_on_expired_at (expired_at)
);

INSERT INTO cache(cache_key, expires) VALUES ('2001-01-01 11:11:11', 60) ON DUPLICATE KEY UPDATE expires = expires + 1;
update cache set expires = expires + 1 where cache_key = '2001-01-01 11:11:11';

Then the following two queries will have different result:

select /*+ force_index(test.cache, idx_c_on_expired_at) */ cache_key, expired_at from cache order by cache_key;
select /*+ ignore_index(test.cache, idx_c_on_expired_at) */ cache_key, expired_at from cache order by cache_key;

if err != nil {
return err

if _, err := updateRecord(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move the logic of handling generated column into the updateRecord? It seems that the function updateRecord also handles the ON UPDATE columns (and as we specified all ON UPDATE column in this functioin, these logic are meaningless).

Another possible solution is to remove the codes related to ON UPDATE columns in updateRecord function. However, as the same logic will be used multiple times (in INSERT .. ON DUPLICATE UPDATE and a normal UPDATE statement), I prefer to write the codes related to generated column in updateRecord to avoid repeating the codes.

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 20, 2024
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 10, 2024
@YangKeao YangKeao self-requested a review December 11, 2024 05:29
@joechenrh
Copy link
Contributor Author

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Dec 11, 2024
@joechenrh
Copy link
Contributor Author

/retest

@joechenrh joechenrh changed the title executor: change the order of column value evaluation in doDupRowUpdate executor: change the evaluation order of columns in Update and Insert statements Dec 11, 2024
Copy link

ti-chi-bot bot commented Dec 16, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-12-12 05:53:51.808632211 +0000 UTC m=+504221.897434752: ☑️ agreed by YangKeao.
  • 2024-12-16 09:35:02.66825884 +0000 UTC m=+863092.757061377: ☑️ agreed by wjhuang2016.

@joechenrh
Copy link
Contributor Author

/hold
schrddl found another problem

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 17, 2024
@joechenrh
Copy link
Contributor Author

/unhold
The problem is related to plan builder, not executor. I will fix it in another PR.

@ti-chi-bot ti-chi-bot bot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-triage-completed labels Dec 18, 2024
@joechenrh
Copy link
Contributor Author

/retest

@joechenrh
Copy link
Contributor Author

/retest-all

@joechenrh
Copy link
Contributor Author

/retest-required

@ti-chi-bot ti-chi-bot bot merged commit 5ac0b2e into pingcap:master Dec 18, 2024
24 checks passed
@joechenrh
Copy link
Contributor Author

/cherry-pick release-7.5

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Dec 24, 2024
@ti-chi-bot
Copy link
Member

@joechenrh: new pull request created to branch release-7.5: #58494.

In response to this:

/cherry-pick release-7.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@joechenrh
Copy link
Contributor Author

/cherry-pick release-7.1

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Dec 25, 2024
@ti-chi-bot
Copy link
Member

@joechenrh: new pull request created to branch release-7.1: #58524.

In response to this:

/cherry-pick release-7.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@joechenrh
Copy link
Contributor Author

/cherry-pick release-8.5

@ti-chi-bot
Copy link
Member

@joechenrh: new pull request created to branch release-8.5: #58708.

In response to this:

/cherry-pick release-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot bot pushed a commit that referenced this pull request Jan 9, 2025
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Jan 21, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request could not be created: failed to create pull request against pingcap/tidb#release-6.5 from head ti-chi-bot:cherry-pick-57123-to-release-6.5: the GitHub API request returns a 403 error: {"message":"You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later. If you reach out to GitHub Support for help, please include the request ID B9C8:1BF88E:FA678C:1F2CF15:678F16AC and timestamp 2025-01-21 03:38:21 UTC.","documentation_url":"https://docs.github.com/rest/overview/rate-limits-for-the-rest-api#about-secondary-rate-limits","status":"403"}

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jan 21, 2025
ti-chi-bot bot pushed a commit that referenced this pull request Feb 11, 2025
ti-chi-bot bot pushed a commit that referenced this pull request Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants