-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] PIP-192: Write the child ownership to ServiceUnitStateChannel
instead of ZK when handling bundle split
#18858
[improve][broker] PIP-192: Write the child ownership to ServiceUnitStateChannel
instead of ZK when handling bundle split
#18858
Conversation
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelTest.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
ServiceUnitStateChannel
instead of ZK when handling bundle splitServiceUnitStateChannel
instead of ZK when handling bundle split
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
369c5d4
to
e97ca14
Compare
Codecov Report
@@ Coverage Diff @@
## master #18858 +/- ##
============================================
- Coverage 47.43% 47.19% -0.25%
- Complexity 9535 10685 +1150
============================================
Files 632 712 +80
Lines 59839 69722 +9883
Branches 6234 7485 +1251
============================================
+ Hits 28384 32904 +4520
- Misses 28375 33098 +4723
- Partials 3080 3720 +640
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@Demogorgon314 I think it's incorrect in the PR's description, right? |
@codelipenghui Thanks for reminding me, I have fixed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think the right solution is to improve the NamespaceService to decide to write to Zookeeper or service unit channel? We also have Admin API to split the bundle, it will be a problem if the load manager write the bundle split to service unit state channel, but the Admin API can't write to the service unit channel?
Yes, then we need to wait for this PR merge first. #19102 |
e97ca14
to
568a39b
Compare
# Conflicts: # pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelTest.java
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Show resolved
Hide resolved
# Conflicts: # pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelTest.java
@codelipenghui I've had a second thought, Actually, we should keep this method in |
bundleFactory.invalidateBundleCache(bundle.getNamespaceObject()); | ||
updateFuture.complete(splitBundles); | ||
}).exceptionally(e -> { | ||
// Clean the new bundle when has exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about logging the error here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the updateFuture.completeExceptionally(e);
is called, the log will print in this place. https://github.com/apache/pulsar/pull/18858/files#diff-191ed1df4f5804c8fd6cdaf909cca01d071a110c3e701bdc98352f99e7a92e8eR534
FutureUtil.waitForAll(futureList) | ||
.whenComplete((__, ex) -> { | ||
if (ex != null) { | ||
log.warn("Clean new bundles failed,", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the consequences of this failure ?
can you please point me out to where we handle this inconsistent state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two cases to entry this failure.
- When
pubAsync
has an exception. - When
updateNamespaceBundles
,updateNamespaceBundlesForPolicies
has exception.
In these two cases, we won’t write the new bundle to ZK, so when the client tries to lookup, it will still find the old bundle, but some of the new bundle will be left in table-view, and when the broker is down, the unused bundle will be deleted in table-view.
&& (counter.incrementAndGet() < NamespaceService.BUNDLE_SPLIT_RETRY_LIMIT)) { | ||
pulsar.getExecutor().schedule(() -> splitServiceUnitOnceAndRetry(namespaceService, bundleFactory, | ||
bundle, serviceUnit, data, counter, startTime, completionFuture), 100, MILLISECONDS); | ||
} else if (ex instanceof IllegalArgumentException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are doing the test on "ex" here and on "ex.getCause()" above ?
and why this case is so special ? we should at least log that we are passing from this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. It should use FutureUtil.unwrapCompletionException(ex)
here.
# Conflicts: # pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/pulsar/broker/loadbalance/extensions/channel/ServiceUnitStateChannelImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Master Issue: #16691
Motivation
When the channel handles the split event, we should write the child ownership to
ServiceUnitStateChannel
instead of zookeeper, since we are using theServiceUnitStateChannel
to store the bundle ownership.Modifications
ServiceUnitStateChannel
instead of ZK when handling bundle split.BadVersion
exception.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: Demogorgon314#8