-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the nx_char type for numpy to and . #554
base: master
Are you sure you want to change the base?
Conversation
What is this solving? The linked issue is about integers, this is about strings/bytes. Why don't we allow chararrays anymore? |
Please see #555 and the discussed way for solving this. We should remove arrays generally, and check the array dtype instead. |
chararry is a data structure like |
Can you remember which issue was handled in the modification https://github.com/FAIRmat-NFDI/pynxtools/blame/nx_char_type/src/pynxtools/dataconverter/helpers.py#L729 ? Then I can also test my code for that issue as well. |
This change was addressing the following issue: #393 The example from this issue should do: nexus file with one of the datasets with type NX_FLOAT, filled with integer = 0 (such as mpes example upload file, with sp.binned.attrs["metadata"]["energy_calibration"]["tof"] = 0 instead of 0.0). The is_valid_data_field() function in helpers.py successfully converted it to float(0.0), but this result was later interpreted as False (as in, failure to convert) by whatever was calling is_valid_data_field(). I can send you the files I used for testing (to big to be attached here). |
Thanks, I see it is not from your code. Can you help me to reproduce this error? |
In the version of pynxtools that we had back then, I used dataconverter in the following way:
The first one worked fine, no errors; the second gave something like "Field /ENTRY[entry]/PROCESS_MPES[process]/energy_calibration/original_axis written without documentation." (aside from usual messages). The resulting file was missing corresponding field (or its value, I am not sure anymore) (output.nxs/entry/process/energy_calibration/original_axis) The issue was present before this PR was merged: |
Currently, verification of NX data type does not convert the data to other data type anymore rather pop up an warning for any data type inconsistency. For example data Such conversion creates issues in some cases such as for numbers |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix. Left some small comments.
Let's wait with the merge until we have the correct definitions here and the plugin tests pass.
# Not to be confused with `np.byte` and `np.ubyte`, these store | ||
# and integer of `8bit` and `unsigned 8bit` respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Not to be confused with `np.byte` and `np.ubyte`, these store | |
# and integer of `8bit` and `unsigned 8bit` respectively. | |
# Not to be confused with `np.byte` and `np.ubyte`, these store | |
# integers of `8bit` and `unsigned 8bit` respectively. |
What exactly is not to be confused? Maybe write "np.xxx is not to be confused..."
return False | ||
def check_all_children_for_callable( | ||
objects: Union[list, np.ndarray], | ||
checker: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to rename this to callable
This function also converts bool value comes in str format. In case, it fails to | ||
convert, it raises an Exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function also converts bool value comes in str format. In case, it fails to | |
convert, it raises an Exception. | |
This function also converts boolean value that are given as strings (i.e., "True" to True). |
It doesn't really raise an Exception, but just a ValidationProblem.InvalidDatetime
warning. What were you trying to say here?
collector.collect_and_log( | ||
path, ValidationProblem.InvalidType, accepted_types, nxdl_type | ||
) | ||
return False, value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return value False here is interpreted as "undocumented field" in the calling function. So we get an additional wrong warning if the dtype does not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it is just for internal logic (please see line 750-766 in validation.py. In
is_docmentedit checks the datatype, node type (is field or group). But the real type of the errors are collect and print according to the validation error type in and from
collect` function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you are trying to say here. My point is that is_documented returns False if the datatype does not match, which I would say is wrong, because it is documented, but just has the wrong data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior was btw. introduced in #522, I believe. Maybe @GinzburgLev or @lukaspie can comment?
Returns two values: | ||
boolean (True if the the value corresponds to nxdl_type, False otherwise) | ||
converted_value bool value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add typing annotation. The calling function expects a single bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return handling is correct, nevertheless I suggest to add typing everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust all occurences of is_valid_data_field (e.g. line 552)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RubelMozumder @sanbrock Returning False in L552 if the data type of a field does not fit effectively makes this field "undocumented". Is this intentional? The produced warnings at least are rather contradictive:
WARNING: The value at /ENTRY[entry]/data/@axes should be one of: (<class 'str'>, <class 'numpy.str_'>, <class 'numpy.bytes_'>), as defined in the NXDL as NX_CHAR.
WARNING: Field /ENTRY[entry]/data/@axes written without documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanbrock @sherjeelshabih Also, apparently during this second round of checking for undocumented fields, that are not being found before, no checks for enumerations are being done. I don't really understand the logic of this checking completely, but I suggest a checking for enums should happen here as well.
Why can't we re-use the mechanism in recurse_tree here, i.e. the handling_map and corresponding functions?
if not check(obj, *args): | ||
return False | ||
def check_all_children_for_callable( | ||
objects: Union[list, np.ndarray], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extend this to all iterable data types (e.g. tuples, ...)
tmp_arr = None | ||
if isinstance(objects, list): | ||
# Handles list and list of list | ||
tmp_arr = np.array(objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work of the dtypes within a list are not the same?
@RubelMozumder @sherjeelshabih I suggest to meet and discuss how to handle this PR and #557 . It does not make sense to independently develop different solutions for the same problem. |
Agreed. I already plan to merge what Rubel has here with some changes I want to keep in the other branch. |
@lukaspie, you can also check this issue with here.