[FLINK-37245] Prevent masking null values in BinaryRowData #26100

mxm · 2025-02-03T20:25:02Z

RowData#createFieldGetter is the go-to method for creating field getters for a given field type and position. The method FieldGetter#getFieldOrNull suggests that null is returned if the field has been nulled. But that is not always the case.

When using BinaryRowData with a non-null field, which has been set to null, a call to FieldGetter#getFieldOrNull will return a non-null value, interpreting whatever bytes are backing the field as an actual value instead of null.

Example:

  public static void main(String[] args) {
    IntType nullableIntType = new IntType(true);
    IntType nonNullableIntType = new IntType(false);
    RowDataSerializer rowDataSerializer = new RowDataSerializer(
            nullableIntType, nonNullableIntType
    );
    BinaryRowData binaryRow = rowDataSerializer.toBinaryRow(GenericRowData.of(null, null));
    RowData.FieldGetter fieldGetter1 = RowData.createFieldGetter(nullableIntType, 0);
    RowData.FieldGetter fieldGetter2 = RowData.createFieldGetter(nonNullableIntType, 1);
    System.out.println(fieldGetter1.getFieldOrNull(binaryRow));
    System.out.println(fieldGetter2.getFieldOrNull(binaryRow));
  }

Output is:

null
0

The expected output would be that the second non-null field also returns null, or raises a NullPointerException directly. That's not the case because RowData#createFieldGetter only checks for null values (via a call to Rowdata#isNullAt(pos)) when the type is nullable (see

flink/flink-table/flink-table-common/src/main/java/org/apache/flink/table/data/RowData.java

Line 289 in b86fdf9

if (!fieldType.isNullable()) {

).

It seems fair to always check for null fields, instead of deferring this easy to forget check to the caller.

We needed to work around this bug in apache/iceberg#12049.

flinkbot · 2025-02-03T20:45:22Z

CI report:

d0d44d6 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

davidradl

@mxm If I have understood this correctly before the change a non nullable field containing null would return a value. I am thinking this is correct. If a field is defined as non nullable then I would not expect to be able to get a null from it. It does seem reasonable to return a non null value in this case. After the fix we can get a null from a non-nullable field - this does not seem correct. WDYT?

mxm · 2025-02-06T17:54:14Z

I think there are different concerns here: (a) data integrity and (b) data semantics.

(a) The value of a field should always return its correct value. That's why the Flink runtime always check isNullAt before attempting to read. In Iceberg, we don't always do that because we rely on the FieldGetters for which this change is intended. If a field is null, it should read null. If a field is 42, it should read 42, but it shouldn't return 42 if its value is null. Unfortunately, this can happen without a null check when reading from BinaryRowData.

(b) It is a separate concern, whether or not to allow null values for nonNull types. This needs to be enforced by the layer which initializes the RowData. In Iceberg there are actually writers which allow nulled "required" types.

Does that make sense?

lincoln-lil · 2025-02-07T06:59:13Z

My intuitive understanding of the mentioned problem in Iceberg is why not fix the nullability of the field?
RowData as the basic structure of flink sql, follows the semantics of sql nullability, one field declared not-null should not be filled with null values. On the other hand, if a nullable field has set null bit, then its value can be seen as null directly and no need to read the value in the memorysegment (because it is reused as much as possible at the sql runtime implementation, which can save some operations).
Accurate nullability is important to the sql system, not only to ensure proper processing on the consumer side (e.g., the field may be used as the primary key of the target table, which requires not null), but also gives the planner more optimization possibilities (e.g., predicate derivation, and also the runtime code can skip the null check overhead for non-null fields, etc.).

mxm · 2025-02-07T16:07:24Z

My intuitive understanding of the mentioned problem in Iceberg is why not fix the nullability of the field?

In Iceberg, users can directly provide a stream of RowData, for which the precise schema is only known when writing the stream to the table. Once the table schema is available, Iceberg relies on the Flink FieldGetters to read values from the provided data, but it turns out that the result of the FieldGetter#getFieldOrNull cannot be trusted. When a null field returns an arbitrary value, depending what random bytes are currently stored for that field with the null flag being set, then the result is plainly wrong.

Accurate nullability is important to the sql system, not only to ensure proper processing on the consumer side (e.g., the field may be used as the primary key of the target table, which requires not null), but also gives the planner more optimization possibilities (e.g., predicate derivation, and also the runtime code can skip the null check overhead for non-null fields, etc.).

Thanks for highlighting the value of the non-null concept. I'm definitely not trying to get rid of non-null, but we need to be able to trust the values stored in RowData when using FieldGetters. If there are lower levels that don't want that check, that is fine too, they can directly operate on RowData and not use the FieldGetters.

What you say about skipping null checks is not in line with what Flink does. The Flink runtime performs null checks when reading from RowData, regardless of the nullability type. I believe the externally facing API of RowData#createFieldGetter(..) should do that as well.

I think it would be interesting to check how Flink handles user-provided null values for non-null types. I've seen table.exec.sink.not-null-enforcer option, but that only applies to sinks. That wouldn't help in this case because the null value would have already been resurrected in case of a missing isNullAt(..) check.

lincoln-lil · 2025-02-10T13:44:52Z

For the 1st part, my question is since the nullability info provided by users are not reliable, why not enforce the use of nullable type when use RowData#createFieldGetter? Am I missing something here?

Regarding Flink's current handling, you're right. I checked the code, and during the codegen phase, null checks are always added when accessing fields of Row types (not relying on fieldgetter).

mxm · 2025-02-18T11:05:47Z

For the 1st part, my question is since the nullability info provided by users are not reliable, why not enforce the use of nullable type when use RowData#createFieldGetter? Am I missing something here?

I understand where you are coming from, but not all Iceberg writers have the concept of "non-null" / required columns. A schema may have certain type definitions, but ultimately the writer decides what can be written.

[FLINK-37245] Prevent masking null values in BinaryRowData

d0d44d6

flinkbot added component=TableSQL/API component=TableSQL/Runtime labels Feb 3, 2025

mxm requested a review from twalthr February 4, 2025 09:33

davidradl suggested changes Feb 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-37245] Prevent masking null values in BinaryRowData #26100

[FLINK-37245] Prevent masking null values in BinaryRowData #26100

mxm commented Feb 3, 2025 •

edited

Loading

flinkbot commented Feb 3, 2025 •

edited

Loading

davidradl left a comment

mxm commented Feb 6, 2025 •

edited

Loading

lincoln-lil commented Feb 7, 2025

mxm commented Feb 7, 2025

lincoln-lil commented Feb 10, 2025

mxm commented Feb 18, 2025

[FLINK-37245] Prevent masking null values in BinaryRowData #26100

Are you sure you want to change the base?

[FLINK-37245] Prevent masking null values in BinaryRowData #26100

Conversation

mxm commented Feb 3, 2025 • edited Loading

flinkbot commented Feb 3, 2025 • edited Loading

CI report:

davidradl left a comment

Choose a reason for hiding this comment

mxm commented Feb 6, 2025 • edited Loading

lincoln-lil commented Feb 7, 2025

mxm commented Feb 7, 2025

lincoln-lil commented Feb 10, 2025

mxm commented Feb 18, 2025

mxm commented Feb 3, 2025 •

edited

Loading

flinkbot commented Feb 3, 2025 •

edited

Loading

mxm commented Feb 6, 2025 •

edited

Loading