[SPARK-51290][SQL] Enable filling default values in DSv2 writes #50044

aokolnychyi · 2025-02-21T18:53:49Z

What changes were proposed in this pull request?

This PR enables filling default values in DSv2 writes.

Why are the changes needed?

These changes are needed for proper support of default values for DSv2 connectors.

Does this PR introduce any user-facing change?

Users will be able to omit columns with default values. There is no impact to existing jobs.

How was this patch tested?

This patch comes with tests.

Was this patch authored or co-authored using generative AI tooling?

No.

aokolnychyi · 2025-02-21T19:01:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -3534,7 +3534,8 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
        TableOutputResolver.suitableForByNameCheck(v2Write.isByName,
          expected = v2Write.table.output, queryOutput = v2Write.query.output)
        val projection = TableOutputResolver.resolveOutputColumns(
-          v2Write.table.name, v2Write.table.output, v2Write.query, v2Write.isByName, conf)
+          v2Write.table.name, v2Write.table.output, v2Write.query, v2Write.isByName, conf,
+          supportColDefaultValue = true)


I don't think there is value in validating if the catalog defines SUPPORT_COLUMN_DEFAULT_VALUE in capabilities during writes. If a connector includes default value metadata in its schema, it should be enough to fill default values. The flag exists for ALTER and CREATE/REPLACE statements.

aokolnychyi · 2025-02-21T19:05:37Z

sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala

@@ -718,6 +724,11 @@ private class BufferedRowsReader(
      schema: StructType,
      row: InternalRow): Any = {
    val index = schema.fieldIndex(field.name)
+
+    if (index >= row.numFields) {


This is needed for support for adding columns with default values to the end.

aokolnychyi · 2025-02-21T23:16:11Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala

@@ -423,8 +423,8 @@ abstract class V2WriteAnalysisSuiteBase extends AnalysisTest {
    assertNotResolved(parsedPlan)
    assertAnalysisErrorCondition(
      parsedPlan,
-      expectedErrorCondition = "INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_FIND_DATA",
-      expectedMessageParameters = Map("tableName" -> "`table-name`", "colName" -> "`x`")
+      expectedErrorCondition = "INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_COLUMNS",


This is because of spark.sql.defaultColumn.useNullsForMissingDefaultValues and is aligned with V1 writes.

aokolnychyi · 2025-02-21T23:21:07Z

cc @cloud-fan @szehon-ho @amaliujia @gengliangwang @dongjoon-hyun @viirya @huaxingao

[SPARK-51290][SQL] Enable filling default values in DSv2 writes

62c0840

github-actions bot added the SQL label Feb 21, 2025

aokolnychyi commented Feb 21, 2025

View reviewed changes

Adapt tests

6ced18e

aokolnychyi commented Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51290][SQL] Enable filling default values in DSv2 writes #50044

[SPARK-51290][SQL] Enable filling default values in DSv2 writes #50044

aokolnychyi commented Feb 21, 2025

aokolnychyi Feb 21, 2025

aokolnychyi Feb 21, 2025

aokolnychyi Feb 21, 2025

aokolnychyi commented Feb 21, 2025

[SPARK-51290][SQL] Enable filling default values in DSv2 writes #50044

Are you sure you want to change the base?

[SPARK-51290][SQL] Enable filling default values in DSv2 writes #50044

Conversation

aokolnychyi commented Feb 21, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

aokolnychyi Feb 21, 2025

Choose a reason for hiding this comment

aokolnychyi Feb 21, 2025

Choose a reason for hiding this comment

aokolnychyi Feb 21, 2025

Choose a reason for hiding this comment

aokolnychyi commented Feb 21, 2025