-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save the linked list of the executor to suffer from GC nepotism #74
Conversation
694da73
to
acc2b67
Compare
@dmlloyd @carterkozak @Test
public void gcNepotism() throws ExecutionException, InterruptedException {
System.setProperty("jboss.threads.eqe.park-spins", Long.toString(Integer.MAX_VALUE));
System.setProperty("jboss.threads.eqe.tail-lock", Boolean.FALSE.toString());
Assume.assumeFalse(EnhancedQueueExecutor.TAIL_LOCK);
EnhancedQueueExecutor executor = (new EnhancedQueueExecutor.Builder())
.setCorePoolSize(10)
.setMaximumPoolSize(20)
.build();
final int producers = 10;
int total = 100_000;
final CountDownLatch totalExecution = new CountDownLatch(total * producers);
ExecutorService service = Executors.newFixedThreadPool(producers);
Future<?>[] tasks = new Future[producers];
for (int i = 0; i<producers;i++) {
tasks[i] = service.submit(() -> {
for (int j = 0; j < total; j++) {
executor.execute(totalExecution::countDown);
}
});
}
for (int i = 0; i<producers;i++) {
tasks[i].get();
}
Assert.assertTrue(totalExecution.await(30, TimeUnit.SECONDS));
service.shutdown();
executor.shutdown();
} On java.util.concurrent.ExecutionException: java.lang.AssertionError
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.jboss.threads.EnhancedQueueExecutorTest.gcNepotism(EnhancedQueueExecutorTest.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: java.lang.AssertionError
at org.jboss.threads.EnhancedQueueExecutor.tryExecute(EnhancedQueueExecutor.java:1740)
at org.jboss.threads.EnhancedQueueExecutor.execute(EnhancedQueueExecutor.java:755)
at org.jboss.threads.EnhancedQueueExecutorTest.lambda$gcNepotism$0(EnhancedQueueExecutorTest.java:176)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) ie it's failing on |
// GC Nepotism: | ||
// We can save consumerNode::next from being dragged into | ||
// old generation, if possible | ||
consumerNode.compareAndSetNext(tailNextNext, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used a compareAndSetNext because I'm not sure that this case is single-writer, maybe yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should do a weakCompareAndSet here to avoid the heavier memory barrier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO is safe wekCompareAndSet, but probably will make any difference only for ARM/PowerPC servers: AFAIK newish x86s won't have any specific code path implemented on the OpenJDK for this (but I could be wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Well we do care about ARM and AArch64 so I think that makes it valid.
If I need to provide a test to make evident the GC nepotism problem I can do it as well :+1 |
fafc219
to
db0355c
Compare
TaskNode taskNode = (TaskNode) headNext; | ||
if (compareAndSetHead(head, taskNode)) { | ||
// save from GC nepotism | ||
head.setNextRelaxed(head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you prove that this can never cause nodes to be skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point: looking at the original code I see that is draining TaskNode
s until tail::getNext
is null, after a shutdown: I can preserve the original logic to handle the case of a concurrent call to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the visibility guarantess while setting next too, taking "inspiration" from CLQ:
https://github.com/openjdk-mirror/jdk7u-jdk/blob/8aaca72128077594ecf4ecdee761e78d4ecb8b7d/src/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java#L305
final void updateHead(Node<E> h, Node<E> p) {
if (h != p && casHead(h, p))
h.lazySetNext(h);
}
// cross-generational references, so better to "clean-up" head::next | ||
// to save dragging head::next into the old generation. | ||
// Clean-up cannot just null out next | ||
head.setNextOrdered(head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to see a correctness proof here. What are the possible observable output states, and why? Are the states all valid and already described above? What are the preconditions and postconditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly expressed in term of acquire-release relationships between atomic operations:
- only one thread can win compareAndSetHead(head, taskNode) for a specific head, taskNode pair
- other threads can reach head while its head::next is valued before/after head.setNextOrdered
- "slow" producers (ie producers descheduled after reading this.tail for enough time that other producers have moved on and a consumer has already moved head):
- before
head.setNextOrdered(head)
: they try tocompareAndSetNext(null, newValue)
and fail, because, given that the consumer has been able to move past to it, it means that next != null - after
head.setNextOrdered(head)
: they try tocompareAndSetNext(null, newValue)
and fail, because head != null
- slow consumers (ie consumers that have been descheduled after reading this.head and are going to read head::getNext)
- before
head.setNextOrdered(head)
: they see the samehead::next
value of the consumer that has won casHead, hence they will attempts casHead, failing because has been already moved on - after
head.setNextOrdered(head)
: they see the new value (head::next == head) and they will reattempt reading this.head. Given that head.setNextOrdered is write releasing any previous write (including the won casHead), load head::next and finding == head is load acquiring the new value of head, hence reading head will return at least the value released by the consumer that has won casHead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The states already written in the overview of the class should be valid, because I've tried hard to not break the way we use null
and just add checks that head::next == head means that "something" has changed: in particular head
and by consequence tail
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think I understand all the possibilities here. But couldn't this be setNextRelaxed
? We're not using node.next == node
as the solitary indicator that it has been removed, so it's probably OK if some threads don't observe that (they would just spin past the node using the current behavior). Does it make a difference performance-wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make a difference performance-wise?
setNextOrdered on x86 isn't using any specific hardware barrier; thanks to the strong memory model of x86 and the only perf hit would have been by using "updaters" (often negligible).
From a correctness pov yes: the only thing we care about here is atomicity, but sadly Unsafe
isn't providing the right methods to just have that: ideally this is what we need here https://docs.oracle.com/javase/9/docs/api/java/lang/invoke/VarHandle.AccessMode.html#SET_OPAQUE , because the compareAndSet previous to it already provide the right sequential consistent write we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe in cases where EQE.head
passes EQE.tail
?
The tail reference may lag (currently by only one node, but the potential jdk14 CQL optimization could potentially allow more lag), which I think allows EQE.head
to be EQE.tail.next
. Clearing references may unlink tail from head and orphan submitted tasks.
It's very possible that I've missed an important piece here, I'm not confident that I've gotten this right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't AFAIK.
A producer set prev_tail::casNext(null, new_tail) and then casTail(prev_tail,new_tail), while a consumer can consume only prev_tail, not new_tail (yet).
Given that other producers are able to read this.tail
(ie new_tail) and evetually this.tail == new_tail
others are always able to append new element without loosing bits around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carterkozak Thinking about it again and again, probably we're "lucky" that we cannot lag > 1, but if we use the same jdk 14 optimization (as you've suggested) we need, on
producer side, to chase the last tail starting from head (that in theory should be near(ish)) given that it has passed tail, that has lagged behind, if node.next == node (see https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java#L348).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a correctness pov yes: the only thing we care about here is atomicity
@franz1981 you can use a plain write in this case, since reference writes are always atomic IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitsanw thanks, yep JLS should ensure that ref equality should work fine here, no tearing on the reference value indeed, thanks!
// GC Nepotism: | ||
// save dragging headNext into old generation | ||
// (although being a PoolThreadNode it won't make a big difference) | ||
nextPoolThreadNode.setNextRelaxed(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in understanding that this is presumed safe because nextPoolThreadNode
has not yet been published?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excactly and because nextPoolThreadNode
(if not published) is still owned by a single thread (the consumer thread itself), hence setNextRelaxed
is fine,because no other threads would never be able to read nextPoolThreadNode::next
while this is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think it looks good. Any closing remarks or final testing before I merge?
@dmlloyd I'm concerned about the asssert on #74 (comment) that's failing on |
The assertion is based on |
Agree: in this case, the test I've written was failing on |
This may be naive, but isn't it possible that tail is destined to become a dead task node but getAndClearTask hasn't run yet? In this case I think head may have progressed past the tail, but should self-correct. Aside from this assert I'm curious why TaskNode.task is volatile. |
IIRC it was originally volatile because that was the means of publishing the task from the producer to the consumer. But maybe it no longer needs to be; if someone has time to do the requisite analysis I'd be open to the change. |
Running the |
I've started a round of benchmarks, I'll post the results here in the morning. |
@carterkozak Re the assertion error: so do you think it worths to investigate and fix it, although is there from some time? I suggest to run the benchmark with CondCardMark JVM arg on/off |
Full suite of benchmarks for raw comparison (still using UseParallelOldGC): https://gist.github.com/carterkozak/0f265ca33bd668b4f1641da362f613dd Using
Master:
Using this PR:
Master:
There's not much difference in the values, but our tasks run very quickly and nodes die young for the most part. |
@carterkozak Parallel GC needs UseCondCardMark to perform well on high-core machines. G1GC already have conditional card marks by default: regardless tasks getting old in human metrics, it mostly depends on the (young) heap size that affect the promotion rate. Probably worth using GCProfiler or (-prof gc) on the bench IMO |
Just want to note that the assertion problem does require a solution. I'm looking into it, but you are all welcome to look as well (chances are it will take me longer to find the problem anyway). |
Well, I solved the assert mystery. It is indeed invalid (see #75 for the explanation). |
Any other remarks before I merge? |
Looks great, thanks! I've opened a PR to address TaskNode.task volatility #76 describing why I believe it's a safe change. |
This has turned to be a relevant GC issue that can cause several problems (including OOM) on many linked-lists implementations.
For reference:
I've solved the problem in the same way CLQ explain here: https://github.com/AdoptOpenJDK/openjdk-jdk14/blob/f7165c322a6abefa34afa9eeb76dbd56f70aae09/src/java.base/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java#L149
In addition, I see that this comment gives an additional hint on how to handle producer-side the case where
tail.next == next
ieself-link implicitly means to advance to head
:I haven't implemented this strategy, but I let it re-attempt reading
this.tail
if a consumer past over a producer.