Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support resolve specified lock keys #10292

Merged
merged 20 commits into from
Jun 10, 2019

Conversation

zhangjinpeng87
Copy link
Contributor

@zhangjinpeng87 zhangjinpeng87 commented Apr 28, 2019

What problem does this PR solve?

Currently ResolveLock command will scan the whole range and trying to resolve all keys that transaction involved. In the case of read write conflict is heavy in a hot region, ResolveLock will called very frequently, and this will cause TiKV scheduler in very heavy load, and then write requests will be blocked for a long while.

What is changed and how it works?

  • Add a txn_size in Prewrite request to indicate how many keys this transaction involved.
  • For small transaction(involve small number of keys) only resolve specified lock keys, this is much more faster than scan a region.

Should merge after pingcap/kvproto#386 and tikv/tikv#4589

Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
@@ -375,6 +381,7 @@ func (lr *LockResolver) getTxnStatus(bo *Backoffer, txnID uint64, primary []byte

func (lr *LockResolver) resolveLock(bo *Backoffer, l *Lock, status TxnStatus, cleanRegions map[RegionVerID]struct{}) error {
tikvLockResolverCountWithResolveLocks.Inc()
cleanWholeRegion := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanWholeRegion := l.TxnSize >= BigTxnThreshold, no need to update it in the loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to distinguish normal resolve and lite resolve in metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to set cleanWholeRegion=false in resolve_lock_lite related code block, this makes the logic more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

em, I don't think this will make logic more clear. A variable that is never changed should be more clear.

@@ -33,6 +33,9 @@ import (
// ResolvedCacheSize is max number of cached txn status.
const ResolvedCacheSize = 2048

// Transaction which involve keys exceed this threshold can treat as `Big Transaction`.
const BigTxnThreshold = 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is too large. A cop request may need 200+ round trips to resolve lock.

Copy link
Contributor Author

@zhangjinpeng87 zhangjinpeng87 Apr 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i change the txn_size to keys_in_region, it is more accurate.

Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
@codecov
Copy link

codecov bot commented May 30, 2019

Codecov Report

Merging #10292 into master will increase coverage by 0.0465%.
The diff coverage is 84.6153%.

@@               Coverage Diff                @@
##             master     #10292        +/-   ##
================================================
+ Coverage   79.5906%   79.6371%   +0.0465%     
================================================
  Files           415        415                
  Lines         88131      88141        +10     
================================================
+ Hits          70144      70193        +49     
+ Misses        12801      12768        -33     
+ Partials       5186       5180         -6

@zhangjinpeng87 zhangjinpeng87 changed the title [DNM] support resolve specified lock keys support resolve specified lock keys May 30, 2019
@zhangjinpeng87
Copy link
Contributor Author

@disksing @tiancaiamao PTAL

@zhangjinpeng87
Copy link
Contributor Author

@tiancaiamao @disksing @MyonKeminta PTAL

@@ -477,6 +477,7 @@ func (c *twoPhaseCommitter) buildPrewriteRequest(batch batchKeys) *tikvrpc.Reque
LockTtl: c.lockTTL,
IsPessimisticLock: isPessimisticLock,
ForUpdateTs: c.forUpdateTS,
TxnSize: uint64(len(batch.keys)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use len(c.keys) is better. It is possible that many batches write to a same region because we limit each batch not exceed 16KB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also should consider this case that a batch write whose index keys are discrete, maybe one region per index key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about set the bigTxnThreshold a smaller number like 16?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@zhangjinpeng87
Copy link
Contributor Author

@tiancaiamao @lysu PTAL

@disksing
Copy link
Contributor

disksing commented Jun 6, 2019

LGTM

@disksing
Copy link
Contributor

disksing commented Jun 6, 2019

It will be useful to add config for the bigTxnThreshold later.

@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao
Copy link
Contributor

/run-all-tests

@tiancaiamao tiancaiamao added status/LGT2 Indicates that a PR has LGTM 2. component/tikv labels Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants