Fix the nx_char type for numpy to and . #554

RubelMozumder · 2025-02-18T15:47:03Z

@lukaspie, you can also check this issue with here.

lukaspie · 2025-02-18T16:11:26Z

What is this solving? The linked issue is about integers, this is about strings/bytes.

Why don't we allow chararrays anymore?

rettigl · 2025-02-18T16:19:41Z

Please see #555 and the discussed way for solving this. We should remove arrays generally, and check the array dtype instead.

RubelMozumder · 2025-02-19T11:41:40Z

What is this solving? The linked issue is about integers, this is about strings/bytes.

Why don't we allow chararrays anymore?

chararry is a data structure like ndarray which stores data according to the premitive datatype. With the current implementation, we are not checking any data structure like list or ndarray but rather the primitive types of that ndarray such as int, bytes_, float, str_, bool.

RubelMozumder · 2025-02-19T11:53:41Z

@GinzburgLev

Can you remember which issue was handled in the modification https://github.com/FAIRmat-NFDI/pynxtools/blame/nx_char_type/src/pynxtools/dataconverter/helpers.py#L729 ?
Currently, this code converts an integer to a sting and skips the validation warning.

Then I can also test my code for that issue as well.

GinzburgLev · 2025-02-19T12:09:22Z

@GinzburgLev

Can you remember which issue was handled in the modification https://github.com/FAIRmat-NFDI/pynxtools/blame/nx_char_type/src/pynxtools/dataconverter/helpers.py#L729 ? Currently, this code converts an integer to a sting and skips the validation warning.

Then I can also test my code for that issue as well.

This change was addressing the following issue: #393

The example from this issue should do: nexus file with one of the datasets with type NX_FLOAT, filled with integer = 0 (such as mpes example upload file, with sp.binned.attrs["metadata"]["energy_calibration"]["tof"] = 0 instead of 0.0). The is_valid_data_field() function in helpers.py successfully converted it to float(0.0), but this result was later interpreted as False (as in, failure to convert) by whatever was calling is_valid_data_field(). I can send you the files I used for testing (to big to be attached here).

RubelMozumder · 2025-02-19T14:28:16Z

@GinzburgLev
Can you remember which issue was handled in the modification https://github.com/FAIRmat-NFDI/pynxtools/blame/nx_char_type/src/pynxtools/dataconverter/helpers.py#L729 ? Currently, this code converts an integer to a sting and skips the validation warning.
Then I can also test my code for that issue as well.

This change was addressing the following issue: #393

The example from this issue should do: nexus file with one of the datasets with type NX_FLOAT, filled with integer = 0 (such as mpes example upload file, with sp.binned.attrs["metadata"]["energy_calibration"]["tof"] = 0 instead of 0.0). The is_valid_data_field() function in helpers.py successfully converted it to float(0.0), but this result was later interpreted as False (as in, failure to convert) by whatever was calling is_valid_data_field(). I can send you the files I used for testing (to big to be attached here).

Thanks, I see it is not from your code. Can you help me to reproduce this error?

GinzburgLev · 2025-02-19T14:41:00Z

Then I can also test my code for that issue as well.

This change was addressing the following issue: #393
The example from this issue should do: nexus file with one of the datasets with type NX_FLOAT, filled with integer = 0 (such as mpes example upload file, with sp.binned.attrs["metadata"]["energy_calibration"]["tof"] = 0 instead of 0.0). The is_valid_data_field() function in helpers.py successfully converted it to float(0.0), but this result was later interpreted as False (as in, failure to convert) by whatever was calling is_valid_data_field(). I can send you the files I used for testing (to big to be attached here).

Thanks, I see it is not from your code. Can you help me to reproduce this error?

In the version of pynxtools that we had back then, I used dataconverter in the following way:

dataconverter --reader mpes --nxdl NXmpes raw_files/mZlhpEzOQY6Ms4hJbIIu3g/config_file.json raw_files/mZlhpEzOQY6Ms4hJbIIu3g/MoTe2_float.h5

dataconverter --reader mpes --nxdl NXmpes raw_files/mZlhpEzOQY6Ms4hJbIIu3g/config_file.json raw_files/mZlhpEzOQY6Ms4hJbIIu3g/MoTe2_int.h5

The first one worked fine, no errors; the second gave something like "Field /ENTRY[entry]/PROCESS_MPES[process]/energy_calibration/original_axis written without documentation." (aside from usual messages). The resulting file was missing corresponding field (or its value, I am not sure anymore) (output.nxs/entry/process/energy_calibration/original_axis)

The issue was present before this PR was merged:
#522

RubelMozumder · 2025-02-20T10:50:03Z

Currently, verification of NX data type does not convert the data to other data type anymore rather pop up an warning for any data type inconsistency. For example data 2.0 for NX_INT would not convert to 2 and vice-versa. In NeXus We have specific, data type NX_INT and NX_FLOAT in this case. If 2.0 and 2 is equivalent one can use NX_NUMBER in this case.

Such conversion creates issues in some cases such as for numbers >1 to true and any string e.g. "2" to int/float 2/2.0 and silently validator pass them without any warning massages. Such issues is also observed in here #555. In that issue, a vector attributes has been defined in app def NX_CHAR which is incorrect. As the validator does not create any warning message here, nobody can detect that error that comes through the app def.

RubelMozumder · 2025-02-20T13:13:56Z

Nyaml needs to be fixed and merged before merging this PR, so that we can skip the circumvent the errors.

lukaspie

LGTM, thanks for the fix. Left some small comments.

Let's wait with the merge until we have the correct definitions here and the plugin tests pass.

lukaspie · 2025-02-20T14:36:55Z

src/pynxtools/dataconverter/helpers.py

+# Not to be confused with `np.byte` and `np.ubyte`, these store
+# and integer of `8bit` and `unsigned 8bit` respectively.


Suggested change

# Not to be confused with `np.byte` and `np.ubyte`, these store

# and integer of `8bit` and `unsigned 8bit` respectively.

# Not to be confused with `np.byte` and `np.ubyte`, these store

# integers of `8bit` and `unsigned 8bit` respectively.

What exactly is not to be confused? Maybe write "np.xxx is not to be confused..."

lukaspie · 2025-02-20T14:37:47Z

src/pynxtools/dataconverter/helpers.py

-            return False
+def check_all_children_for_callable(
+    objects: Union[list, np.ndarray],
+    checker: Optional[Callable] = None,


I suggest to rename this to callable

lukaspie · 2025-02-20T14:39:55Z

src/pynxtools/dataconverter/helpers.py

+    This function also converts bool value comes in str format. In case, it fails to
+    convert, it raises an Exception.


Suggested change

This function also converts bool value comes in str format. In case, it fails to

convert, it raises an Exception.

This function also converts boolean value that are given as strings (i.e., "True" to True).

It doesn't really raise an Exception, but just a ValidationProblem.InvalidDatetime warning. What were you trying to say here?

rettigl · 2025-02-25T08:38:39Z

src/pynxtools/dataconverter/helpers.py

+            collector.collect_and_log(
+                path, ValidationProblem.InvalidType, accepted_types, nxdl_type
+            )
+            return False, value


This return value False here is interpreted as "undocumented field" in the calling function. So we get an additional wrong warning if the dtype does not match.

I see that it is just for internal logic (please see line 750-766 in validation.py. In is_docmentedit checks the datatype, node type (is field or group). But the real type of the errors are collect and print according to the validation error type in and fromcollect` function.

I don't understand what you are trying to say here. My point is that is_documented returns False if the datatype does not match, which I would say is wrong, because it is documented, but just has the wrong data type.

This behavior was btw. introduced in #522, I believe. Maybe @GinzburgLev or @lukaspie can comment?

rettigl · 2025-02-25T08:41:15Z

src/pynxtools/dataconverter/helpers.py

+    Returns two values:
+        boolean (True if the the value corresponds to nxdl_type, False otherwise)
+        converted_value bool value.


Add typing annotation. The calling function expects a single bool.

The return handling is correct, nevertheless I suggest to add typing everywhere.

rettigl · 2025-02-25T08:42:47Z

src/pynxtools/dataconverter/validation.py

Adjust all occurences of is_valid_data_field (e.g. line 552)

@RubelMozumder @sanbrock Returning False in L552 if the data type of a field does not fit effectively makes this field "undocumented". Is this intentional? The produced warnings at least are rather contradictive:

WARNING: The value at /ENTRY[entry]/data/@axes should be one of: (<class 'str'>, <class 'numpy.str_'>, <class 'numpy.bytes_'>), as defined in the NXDL as NX_CHAR. WARNING: Field /ENTRY[entry]/data/@axes written without documentation.

@sanbrock @sherjeelshabih Also, apparently during this second round of checking for undocumented fields, that are not being found before, no checks for enumerations are being done. I don't really understand the logic of this checking completely, but I suggest a checking for enums should happen here as well.
Why can't we re-use the mechanism in recurse_tree here, i.e. the handling_map and corresponding functions?

rettigl · 2025-02-25T08:51:46Z

src/pynxtools/dataconverter/helpers.py

-        if not check(obj, *args):
-            return False
+def check_all_children_for_callable(
+    objects: Union[list, np.ndarray],


We should extend this to all iterable data types (e.g. tuples, ...)

rettigl · 2025-02-25T08:52:14Z

src/pynxtools/dataconverter/helpers.py

+    tmp_arr = None
+    if isinstance(objects, list):
+        # Handles list and list of list
+        tmp_arr = np.array(objects)


Will this work of the dtypes within a list are not the same?

rettigl · 2025-02-27T13:24:13Z

@RubelMozumder @sherjeelshabih I suggest to meet and discuss how to handle this PR and #557 . It does not make sense to independently develop different solutions for the same problem.

sherjeelshabih · 2025-02-27T13:55:36Z

Agreed. I already plan to merge what Rubel has here with some changes I want to keep in the other branch.

Fix the nx_char type for numpy to and .

0f46c34

RubelMozumder requested a review from lukaspie February 18, 2025 15:47

Still char instead of the int is being validated which is wrong.

dd8beb4

RubelMozumder added 2 commits February 20, 2025 10:08

Remove auto conversion for datatype.

351f377

extends tests.

00b78aa

RubelMozumder linked an issue Feb 20, 2025 that may be closed by this pull request

Datatype checking for array dtype #555

Open

lukaspie approved these changes Feb 20, 2025

View reviewed changes

lukaspie mentioned this pull request Feb 24, 2025

add enum checking for attributes #557

Open

rettigl reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the nx_char type for numpy to and . #554

Fix the nx_char type for numpy to and . #554

RubelMozumder commented Feb 18, 2025 •

edited

Loading

lukaspie commented Feb 18, 2025

rettigl commented Feb 18, 2025

RubelMozumder commented Feb 19, 2025

RubelMozumder commented Feb 19, 2025

GinzburgLev commented Feb 19, 2025

RubelMozumder commented Feb 19, 2025

GinzburgLev commented Feb 19, 2025 •

edited

Loading

RubelMozumder commented Feb 20, 2025 •

edited

Loading

RubelMozumder commented Feb 20, 2025

lukaspie left a comment

lukaspie Feb 20, 2025

lukaspie Feb 20, 2025

lukaspie Feb 20, 2025

rettigl Feb 25, 2025

RubelMozumder Feb 27, 2025

rettigl Feb 27, 2025

rettigl Feb 27, 2025

rettigl Feb 25, 2025

rettigl Feb 27, 2025

rettigl Feb 25, 2025

rettigl Feb 26, 2025

rettigl Feb 26, 2025

rettigl Feb 25, 2025

rettigl Feb 25, 2025

rettigl commented Feb 27, 2025

sherjeelshabih commented Feb 27, 2025

		# Not to be confused with `np.byte` and `np.ubyte`, these store
		# and integer of `8bit` and `unsigned 8bit` respectively.

		This function also converts bool value comes in str format. In case, it fails to
		convert, it raises an Exception.

	This function also converts bool value comes in str format. In case, it fails to
	convert, it raises an Exception.
	This function also converts boolean value that are given as strings (i.e., "True" to True).

Fix the nx_char type for numpy to and . #554

Are you sure you want to change the base?

Fix the nx_char type for numpy to and . #554

Conversation

RubelMozumder commented Feb 18, 2025 • edited Loading

lukaspie commented Feb 18, 2025

rettigl commented Feb 18, 2025

RubelMozumder commented Feb 19, 2025

RubelMozumder commented Feb 19, 2025

GinzburgLev commented Feb 19, 2025

RubelMozumder commented Feb 19, 2025

GinzburgLev commented Feb 19, 2025 • edited Loading

RubelMozumder commented Feb 20, 2025 • edited Loading

RubelMozumder commented Feb 20, 2025

lukaspie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rettigl commented Feb 27, 2025

sherjeelshabih commented Feb 27, 2025

RubelMozumder commented Feb 18, 2025 •

edited

Loading

GinzburgLev commented Feb 19, 2025 •

edited

Loading

RubelMozumder commented Feb 20, 2025 •

edited

Loading