Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Broker gets block while creating topic using IsolatedBookieEnsemblePlacement in missing cache #20148

Closed
1 of 2 tasks
janmee opened this issue Apr 20, 2023 · 5 comments
Closed
1 of 2 tasks
Assignees
Labels
Stale type/bug The PR fixed a bug or issue reported a bug

Comments

@janmee
Copy link

janmee commented Apr 20, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Version

2.11.0

Minimal reproduce step

1、create namespace and set bookie affinity group
2、create topic under that namespace
3、broker will appear log "ZooKeeper session reconnection timeout. Notifying session is lost."

What did you expect to see?

create topic success

What did you see instead?

topic creating is block

Anything else?

`2023-04-19T21:53:11,365+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 4.166 seconds
2023-04-19T21:53:11,366+0800 [metadata-store-zk-session-watcher-11-1] INFO org.apache.pulsar.broker.PulsarService - Received metadata service session event: ConnectionLost
2023-04-19T21:53:11,367+0800 [metadata-store-zk-session-watcher-11-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - Received MetadataStore session event: ConnectionLost
2023-04-19T21:53:11,369+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 88,3 replyHeader:: 88,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:11,701+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 3.83 seconds
2023-04-19T21:53:11,702+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 89,3 replyHeader:: 89,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:12,035+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 3.496 seconds
2023-04-19T21:53:12,036+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 90,3 replyHeader:: 90,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:12,368+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 3.163 seconds
2023-04-19T21:53:12,369+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 91,3 replyHeader:: 91,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:12,702+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 2.83 seconds
2023-04-19T21:53:12,703+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 92,3 replyHeader:: 92,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:13,035+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 2.496 seconds
2023-04-19T21:53:13,036+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 93,3 replyHeader:: 93,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:13,369+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 2.163 seconds
2023-04-19T21:53:13,371+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 94,3 replyHeader:: 94,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:13,702+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 1.829 seconds
2023-04-19T21:53:13,703+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 95,3 replyHeader:: 95,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:14,035+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 1.496 seconds
2023-04-19T21:53:14,037+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 96,3 replyHeader:: 96,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:14,369+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 1.162 seconds
2023-04-19T21:53:14,370+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 97,3 replyHeader:: 97,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:14,702+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 0.829 seconds
2023-04-19T21:53:14,703+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 98,3 replyHeader:: 98,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:15,036+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 0.495 seconds
2023-04-19T21:53:15,037+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Reading reply session id: 0x24109391046d001e, packet:: clientPath:/ serverPath:/ finished:false header:: 99,3 replyHeader:: 99,4294969298,0 request:: '/,F response:: s{0,0,0,0,0,9,0,0,0,11,4294967580}
2023-04-19T21:53:15,176+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got notification session id: 0x24109391046d001e
2023-04-19T21:53:15,176+0800 [main-SendThread(11.238.117.36:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Got WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/11.238.117.36:8080 for session id 0x24109391046d001e
2023-04-19T21:53:15,369+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 0.162 seconds
2023-04-19T21:53:15,703+0800 [metadata-store-zk-session-watcher-11-1] INFO org.apache.pulsar.broker.PulsarService - Received metadata service session event: SessionLost
2023-04-19T21:53:15,978+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.broker.PulsarService - The session with metadata service was lost. Shutting down.

"main-EventThread" #19 daemon prio=5 os_prio=0 cpu=363.30ms elapsed=12.86s tid=0x00007fcd19a3b480 nid=0x3296 waiting on condition [0x00007fcaf02e5000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x0000100015adbd50> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock([email protected]/ForkJoinPool.java:3463)
at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3434)
at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.join([email protected]/CompletableFuture.java:2117)
at org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.initialize(IsolatedBookieEnsemblePlacementPolicy.java:91)
at org.apache.pulsar.bookie.rackawareness.IsolatedBookieEnsemblePlacementPolicy.initialize(IsolatedBookieEnsemblePlacementPolicy.java:51)
at org.apache.bookkeeper.client.BookKeeper.initializeEnsemblePlacementPolicy(BookKeeper.java:583)
at org.apache.bookkeeper.client.BookKeeper.(BookKeeper.java:507)
at org.apache.bookkeeper.client.BookKeeper$Builder.build(BookKeeper.java:307)
at org.apache.pulsar.broker.BookKeeperClientFactoryImpl.create(BookKeeperClientFactoryImpl.java:83)
at org.apache.pulsar.broker.ManagedLedgerClientFactory.lambda$initialize$0(ManagedLedgerClientFactory.java:97)
at org.apache.pulsar.broker.ManagedLedgerClientFactory$$Lambda$715/0x0000000801192fb8.apply(Unknown Source)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent([email protected]/ConcurrentHashMap.java:1708)
- locked <0x0000100015aa9f60> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
at org.apache.pulsar.broker.ManagedLedgerClientFactory.lambda$initialize$1(ManagedLedgerClientFactory.java:95)
at org.apache.pulsar.broker.ManagedLedgerClientFactory$$Lambda$187/0x0000000800edc080.get(Unknown Source)
at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.lambda$asyncOpen$6(ManagedLedgerFactoryImpl.java:373)
at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$$Lambda$709/0x0000000801186698.apply(Unknown Source)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent([email protected]/ConcurrentHashMap.java:1708)
- locked <0x0000100014f25a88> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.asyncOpen(ManagedLedgerFactoryImpl.java:369)
at org.apache.pulsar.broker.service.BrokerService.lambda$createPersistentTopic$54(BrokerService.java:1429)
at org.apache.pulsar.broker.service.BrokerService$$Lambda$707/0x000000080117f898.accept(Unknown Source)
at java.util.concurrent.CompletableFuture.uniAcceptNow([email protected]/CompletableFuture.java:757)
at java.util.concurrent.CompletableFuture.uniAcceptStage([email protected]/CompletableFuture.java:735)
at java.util.concurrent.CompletableFuture.thenAccept([email protected]/CompletableFuture.java:2182)
at org.apache.pulsar.broker.service.BrokerService.createPersistentTopic(BrokerService.java:1407)
at org.apache.pulsar.broker.service.BrokerService.lambda$checkOwnershipAndCreatePersistentTopic$49(BrokerService.java:1375)
at org.apache.pulsar.broker.service.BrokerService$$Lambda$698/0x000000080117a5c0.accept(Unknown Source)
at java.util.concurrent.CompletableFuture.uniAcceptNow([email protected]/CompletableFuture.java:757)
at java.util.concurrent.CompletableFuture.uniAcceptStage([email protected]/CompletableFuture.java:735)
at java.util.concurrent.CompletableFuture.thenAccept([email protected]/CompletableFuture.java:2182)
at org.apache.pulsar.broker.service.BrokerService.checkOwnershipAndCreatePersistentTopic(BrokerService.java:1377)
at org.apache.pulsar.broker.service.BrokerService.lambda$loadOrCreatePersistentTopic$46(BrokerService.java:1350)
at org.apache.pulsar.broker.service.BrokerService$$Lambda$708/0x00000008011af880.run(Unknown Source)
at java.util.concurrent.CompletableFuture.uniRunNow([email protected]/CompletableFuture.java:819)
at java.util.concurrent.CompletableFuture.uniRunStage([email protected]/CompletableFuture.java:803)
at java.util.concurrent.CompletableFuture.thenRun([email protected]/CompletableFuture.java:2195)
at org.apache.pulsar.broker.service.BrokerService.loadOrCreatePersistentTopic(BrokerService.java:1346)
at org.apache.pulsar.broker.service.BrokerService.lambda$getTopic$23(BrokerService.java:991)
at org.apache.pulsar.broker.service.BrokerService$$Lambda$701/0x00000008011ae8e8.apply(Unknown Source)
at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.put(ConcurrentOpenHashMap.java:404)
at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.computeIfAbsent(ConcurrentOpenHashMap.java:238)
at org.apache.pulsar.broker.service.BrokerService.getTopic(BrokerService.java:990)
at org.apache.pulsar.broker.service.BrokerService.getTopic(BrokerService.java:952)
at org.apache.pulsar.broker.service.BrokerService.getTopicIfExists(BrokerService.java:944)
at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalCreateNonPartitionedTopicAsync$49(PersistentTopicsBase.java:530)
at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$616/0x000000080113d660.apply(Unknown Source)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire([email protected]/CompletableFuture.java:1150)
at java.util.concurrent.CompletableFuture.postComplete([email protected]/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete([email protected]/CompletableFuture.java:2147)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:262)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$8(ZKMetadataStore.java:212)
at org.apache.pulsar.metadata.impl.ZKMetadataStore$$Lambda$162/0x0000000800e8a448.processResult(Unknown Source)
at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:489)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:712)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553)

Locked ownable synchronizers:
- None
`

in the broker log above, we can see zk client has replied request, but ZKMetadataStore still disconnect. In the stack of zk EventThread, we know that EventThread got block in IsolatedBookieEnsemblePlacementPolicy.initialize(IsolatedBookieEnsemblePlacementPolicy.java:91) .

image

in source code, we can see creating bookie client must wait until getting the node "/bookies" succeed from the cache. if node "/bookies" is not in cache, ZKMetadataStore will get it from zk. However, creating bookie client is running in the zk EventThread, and waiting for getting node "/bookies" complete, and the callback getting node "/bookies" from zk must run in EventThread。it will appear deadlock in this situation

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@janmee janmee added the type/bug The PR fixed a bug or issue reported a bug label Apr 20, 2023
@janmee janmee changed the title [Bug] ZKMetadataStore run into deadlock when creating topic using IsolatedBookieEnsemblePlacement in missing cache [Bug] Broker get block while creating topic using IsolatedBookieEnsemblePlacement in missing cache Apr 20, 2023
@janmee janmee changed the title [Bug] Broker get block while creating topic using IsolatedBookieEnsemblePlacement in missing cache [Bug] Broker gets block while creating topic using IsolatedBookieEnsemblePlacement in missing cache Apr 20, 2023
@Technoboy- Technoboy- self-assigned this Apr 23, 2023
@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label May 24, 2023
@tisonkun
Copy link
Member

#20303 handles a related issue that is possibly solve this one also.

cc @poorbarcode

@poorbarcode
Copy link
Contributor

#20303 handles a related issue that is possibly solving this one also.

No, the PR #20303 is not helpful for this issue. I'll take note of the issue and will try to fix it later.

@github-actions github-actions bot removed the Stale label Jun 16, 2023
@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jul 16, 2023
@Technoboy-
Copy link
Contributor

#21096 has fixed this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants