-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix explain action on query rewrite #17286
base: main
Are you sure you want to change the base?
Conversation
❌ Gradle check result for 3b159cd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a re-submittal of #17277 which has unresolved comments. Please reference earlier PRs when re-submitting them so that reviewers have the full context of previous comments (and can make sure they are resolved).
Code looks like it addresses the problem, but it's overly complex chaining an action listener with a null that gets ignored. Suggest moving the null check up to the top of the method and calling super.doExecute()
(basically the existing code, but wrapped in a null check with a shortcut return before creating more objects). Only if the query is non-null do you need to instantiate the actionListener and chain that code.
Also looks like some checks are failing like DCO and Change log.
if (request.query() == null) { | ||
rewriteListener.onResponse(request.query()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This completes the rewriteListener
with null
. I'm not sure that's what you intend. Following the code path it looks like it just eventually calls super.doExecute(task, request, listener);
so perhaps it's better to put that line here rather than chaining the listener.
That would allow you to simplify your code by just putting the wrapped ActionListener in the rewriteAndFetch()
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coordinator node must rewrite the request before forwarding it to any shard. Rewriting on shards is prohibited when there are pending asynchronous operations.
Rewritable.rewriteAndFetch()
to perform query rewriting to register and execute the async actions fromqueryRewriteContext.registerAsyncAction
- After rewriting the query, the code must pass the rewritten query (not the original query) to
super.doExecute()
. This ensures the updated request containing the rewritten query is used during execution.
I found there is another place in core that handle this rewritable issue before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found there is another place in core that handle this rewritable issue before
I see, but I'd make the same comments about that code. And your code has now diverged from that in that you're using an assert
for non-null, which won't actually run in production; you'd just pass the null query to rewriteAndFetch()
.
OpenSearch/server/src/main/java/org/opensearch/index/query/Rewriteable.java
Lines 111 to 119 in e397903
static <T extends Rewriteable<T>> void rewriteAndFetch( | |
T original, | |
QueryRewriteContext context, | |
ActionListener<T> rewriteResponse, | |
int iteration | |
) { | |
T builder = original; | |
try { | |
for (T rewrittenBuilder = builder.rewrite(context); rewrittenBuilder != builder; rewrittenBuilder = builder.rewrite(context)) { |
You'd be passing that null on line 112 and getting an NPE on line 119. So you need to put the null check back in closer to what you had (or what the linked code does).
The point I'm making both with your original code and the other example you linked is that this:
if (request.query() == null) {
rewriteListener.onResponse(request.query());
}
is really
if (request.query() == null) {
rewriteListener.onResponse(null);
}
And if you trace that into the rewrite listener you're passing a null rewrittenQuery
into this block:
rewrittenQuery -> {
request.query(rewrittenQuery);
super.doExecute(task, request, listener);
}
This sets request.query(null)
(but already was null!) and then calls super.doExecute()
with that "updated" request (which is really the same request with a null query).
So my suggestion (and I'd suggest the same on the linked code but that's out of scope) is to move the null check and super.doExecute()
call to the very top of the method, before spending any time creating new objects.
protected void doExecute(Task task, ExplainRequest request, ActionListener<ExplainResponse> listener) {
request.nowInMillis = System.currentTimeMillis();
// if there's no query we can't rewrite it
if (request.query() == null) {
super.doExecute(task, request, listener);
return;
}
// Rewrite the query if present
ActionListener<QueryBuilder> rewriteListener = ActionListener.wrap(...<existing code>...);
Rewriteable.rewriteAndFetch(
request.query(),
searchService.getIndicesService().getRewriteContext(() -> request.nowInMillis),
rewriteListener
);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot, Dan for the detailed explanation. previously, I thought all request would need a rewrite. now it makes sense. raised another revision accordingly. Appreciate for your time!
❌ Gradle check result for 2dcc19b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for a111f2f: Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 9e7e026: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17286 +/- ##
============================================
- Coverage 72.47% 72.46% -0.01%
- Complexity 65693 65697 +4
============================================
Files 5304 5304
Lines 304981 304990 +9
Branches 44238 44239 +1
============================================
- Hits 221033 221023 -10
+ Misses 65808 65792 -16
- Partials 18140 18175 +35 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Fen Qin <[email protected]>
Signed-off-by: Fen Qin <[email protected]>
❕ Gradle check result for a5778cb: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some style nits that I won't hold up approval on.
client().admin() | ||
.indices() | ||
.prepareCreate("twitter") | ||
.setMapping("user", "type=integer", "followers", "type=integer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Style Nit) It's not clear to the casual observer that this is a set of two key/value pairs. While this is valid, there are other options to use a Map
for setting these keys. This is a test class so it's really not a big deal.
.indices() | ||
.prepareCreate("twitter") | ||
.setMapping("user", "type=integer", "followers", "type=integer") | ||
.setSettings(Settings.builder().put(SETTING_NUMBER_OF_SHARDS, 2).put("index.number_of_routing_shards", 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Style Nit) It's strange to see one setting as a constant and the other one not. I think you can use INDEX_NUMBER_OF_ROUTING_SHARDS_SETTING
here.
Description
There is an exception when call explain in "by_doc_id" mode. Response looks like this:
The error were caught from
TransportExplainAction -> Rewriteable.java
because there still be asyncActions left when the flagassertNoAsyncTasks
returned as true.The conflict is down to:
The NeuralQueryBuilder or TermsQueryBuilder
doRewrite
method introduces asynchronous actions viaqueryRewriteContext.registerAsyncAction
. So it then returns a context with potentially unresolved async tasks.Rewritable.rewrite
then checksassertNoAsyncTasks
flag, throws the exception if there exists any async tasks left.Fix in this PR is resolve this conflict:
TransportExplainAction.
TermsQueryBuilder
Related Issues
Resolves - opensearch-project/neural-search#1126
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.