Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support s3 using repackage #482

Merged
merged 27 commits into from
Jan 26, 2025

Conversation

xinyual
Copy link
Collaborator

@xinyual xinyual commented Jan 17, 2025

Description

This PR is to support t2ppl in spark PPL like s3/cloudwatch. We cannot use mapping API to get schema and samples. So frontend need to pass schema and samples. We need to re-parse the schema since the schema is the string. The request body is like

{
  "parameters": {
    "question": "what is the error rate yesterday",
    "index": "flight",
    "samples": [
        {
            "httpRequest": {
				"args": "",
				"country": "US",
				"headers": [
					{
						"name": "accept",
						"value": "*/*"
					},
					{
						"name": "accept-language",
						"value": "*"
					},
					{
						"name": "sec-fetch-mode",
						"value": "cors"
					},
					{
						"name": "user-agent",
						"value": "node"
					},
					{
						"name": "accept-encoding",
						"value": "br, gzip, deflate"
					}
				],
				"httpVersion": "HTTP/1.1",
				"requestId": "ALAqbFTqvHcEWEQ=",
				"clientIp": "34.210.155.133",
				"httpMethod": "POST",
				"uri": "/dev"
			}
        }
    ],
    "schema": {
        "httpRequest": {
            "col_name": "httpRequest",
            "data_type": "struct<clientIp:string,country:string,headers:array<struct<name:string,value:string>>,uri:string,args:string,httpVersion:string,httpMethod:string,requestId:string>",
            "comment": null
        }
    },
    "type": "s3"
  }
}

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 91.36691% with 12 lines in your changes missing coverage. Please review.

Project coverage is 82.31%. Comparing base (2b76e3c) to head (50cc2eb).
Report is 44 commits behind head on main.

Files with missing lines Patch % Lines
.../main/java/org/opensearch/agent/tools/PPLTool.java 91.36% 5 Missing and 7 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #482      +/-   ##
============================================
+ Coverage     81.78%   82.31%   +0.52%     
- Complexity      193      355     +162     
============================================
  Files            11       17       +6     
  Lines           961     1685     +724     
  Branches        137      240     +103     
============================================
+ Hits            786     1387     +601     
- Misses          121      204      +83     
- Partials         54       94      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuye-aws
Copy link
Member

@xinyual Can you update the PR description?

@@ -27,6 +27,7 @@
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.math.NumberUtils;
import org.apache.commons.text.StringSubstitutor;
import org.apache.spark.sql.types.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wildcard import is not prefered

build.gradle Outdated
@@ -46,6 +46,7 @@ plugins {
id 'com.diffplug.spotless' version '6.25.0'
id "io.freefair.lombok" version "8.10.2"
id "de.undercouch.download" version "5.6.0"
//id 'com.github.johnrengelman.shadow' version '8.1.1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat: remove the commented out line

}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat: remove the new added blank lines

@@ -5,4 +5,5 @@

grant {
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.RuntimePermission "getClassLoader";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this permission used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use it in AccessController.doPrivileged((PrivilegedExceptionAction) () -> DataType.fromDDL(schema)) The dataType needs class loader.

private void extractS3FieldToType(String prefix, Map<String, Object> structMap, Map<String, String> fieldToType) {
String type = (String) structMap.get("type");

if (StringUtils.equals(type, "array")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle the corner case? e.g. type != "array" and type != "struct"; or "fields" key not in structMap etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input schema is generated by calling "describe <index_name>" in front end. So we expect the type only equals to "array", "struct" or leaf type (what we want). And the schema is from "describe <index_name>", it muse contain "fields"

Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@zhichao-aws
Copy link
Member

@xinyual The PR looks good to me overall. As the codecov action fails (80.31% of diff hit (target 81.78%)), I'll approve it once have enough tests

Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@xinyual
Copy link
Collaborator Author

xinyual commented Jan 22, 2025

@zhichao-aws Hi Zhichao, I already add enough UT, please approve this PR.

Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
Signed-off-by: xinyual <[email protected]>
@xinyual xinyual merged commit 785404d into opensearch-project:main Jan 26, 2025
9 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/skills/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/skills/backport-2.x
# Create a new branch
git switch --create backport/backport-482-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 785404ddefaa835113d8a525bf7e50f4897b4ba5
# Push it to GitHub
git push --set-upstream origin backport/backport-482-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/skills/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-482-to-2.x.

@zane-neo zane-neo mentioned this pull request Jan 27, 2025
5 tasks
zane-neo pushed a commit that referenced this pull request Jan 27, 2025
* fix conflict

Signed-off-by: xinyual <[email protected]>

* fix dependency error

Signed-off-by: xinyual <[email protected]>

---------

Signed-off-by: xinyual <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants