Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Arrays for the Map scalar functions #11436

Closed
Tracked by #11429
goldmedal opened this issue Jul 12, 2024 · 8 comments · Fixed by #11712
Closed
Tracked by #11429

Support Arrays for the Map scalar functions #11436

goldmedal opened this issue Jul 12, 2024 · 8 comments · Fixed by #11712
Assignees
Labels
enhancement New feature or request

Comments

@goldmedal
Copy link
Contributor

Is your feature request related to a problem or challenge?

As @alamb mentioned in #11361 (comment), we should support not only scalar lists but also arrays for the map and make_map functions. However, I encountered some challenges, which I detailed in #11361 (comment). I left the disabled test cases in

query error
SELECT make_map(column1, column2, column3, column4) FROM t;
# TODO: support array value
# ----
# {a: 1, k1: 10}
# {b: 2, k3: 30}
# {d: 4, k5: 50}
query error
SELECT map(column5, column6) FROM t;
# TODO: support array value
# ----
# {k1:1, k2:2}
# {k3: 3}
# {k5: 5}
.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@dharanad
Copy link
Contributor

take

@jayzhan211
Copy link
Contributor

MakeMap is possible to be removed after #11434, so you can find out array support for MapFunc only

@dharanad
Copy link
Contributor

Apologies for the delay, prioritizing this issue.

@dharanad
Copy link
Contributor

dharanad commented Aug 7, 2024

Sharing my observation from DuckDB, both Map exprssion and map function are meant to do two different things.
But in Datafusion we are using map function to support Map expression.
In case of DuckDB Map expr & map function would yeild different result for same input.
Shouldn't we have two different function to support these two use cases ? Or any ideas on how should we support both usecases using using single map function.

Observations

D create table t as values
· ('a', 1, 'k1', 10, ['k1', 'k2'], [1, 2], 'POST', [[1,2,3]]),
· ('b', 2, 'k3', 30, ['k3'], [3], 'PUT', [[4]]),
‣ ('d', 4, 'k5', 50, ['k5'], [5], null, [[1,2]]);

D select map(col4, col5) from t;
┌───────────────────────┐
│    map(col4, col5)    │
│ map(varchar, integer) │
├───────────────────────┤
│ {k1=1, k2=2}          │
│ {k3=3}                │
│ {k5=5}                │
└───────────────────────┘

D select Map {col4: col5} from t;
┌────────────────────────────────────────────────────────┐
│ main.map(main.list_value(col4), main.list_value(col5)) │
│               map(varchar[], integer[])                │
├────────────────────────────────────────────────────────┤
│ {[k1, k2]=[1, 2]}                                      │
│ {[k3]=[3]}                                             │
│ {[k5]=[5]}                                             │
└────────────────────────────────────────────────────────┘

In the meantime i will continue my investigation.
The helper function which i have implemented in #11713 is to support arrays for map scalar function as per issue title.

I am considering introducing a boolean flag to differentiate whether the function was invoked from a Map expression or a map function call. And handle the invocation accordingly

cc: @goldmedal @jayzhan211

@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

Shouldn't we have two different function to support these two use cases ? Or any ideas on how should we support both usecases using using single map functio

In general, I think DataFuion strives not to invent / determine semantics, so in my opinion we should follow other implementations rather than having our own custom behavior whenever possible.

I think by default we should follow the postgres model

If postgres doesn't support maps / this syntax then I think we should follow the DuckDB model

@jayzhan211
Copy link
Contributor

jayzhan211 commented Aug 8, 2024

@dharanad I think select Map {col4: col5} from t; is equivalent to SELECT MAP([col4], [col5]);, so we just need to rewrite the syntax to the existing map function.

@dharanad
Copy link
Contributor

dharanad commented Aug 9, 2024

Shouldn't we have two different function to support these two use cases ? Or any ideas on how should we support both usecases using using single map functio

In general, I think DataFuion strives not to invent / determine semantics, so in my opinion we should follow other implementations rather than having our own custom behavior whenever possible.

I think by default we should follow the postgres model

If postgres doesn't support maps / this syntax then I think we should follow the DuckDB model

I totally agree with you.

@dharanad
Copy link
Contributor

dharanad commented Aug 9, 2024

@dharanad I think select Map {col4: col5} from t; is equivalent to SELECT MAP([col4], [col5]);, so we just need to rewrite the syntax to the existing map function.

Thanks for clarifying, i will have to say that i totally misunderstood the problem here. Should have checked what Postgres/DuckDB were doing & clarified the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants