Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for open enumerations #1521

Merged
merged 11 commits into from
Feb 13, 2025
Merged

Allow for open enumerations #1521

merged 11 commits into from
Feb 13, 2025

Conversation

lukaspie
Copy link
Contributor

@lukaspie lukaspie commented Dec 10, 2024

The idea of this PR is to allow for open enumerations, i.e., to allow values not explicitly given in the enumeration list. An example is the type field in NXsource (https://manual.nexusformat.org/classes/base_classes/NXsource.html#nxsource-type-field) which is too restrictive for some experiments.

Note that by default enumerations will be closed as for some fields/attributes, the values given in the enumerations really are the only sensible ones (a prominent example being NXentry/definition field in most application definitions.

  • Implement in nxdl.xsd
  • Adapt sphinx to render it nicely. This is an example rendering of NXsource/type which is now an open enum:
    image

Note that for some reason, the changes to the nxdl.xsd did not show up in my local build (would be in nxdl_desc.html#enumerationtype), so maybe this implementation is not yet correct. I could appreciate someone with more knowdledge of the NXDl language to chime in. This is also why this is currently still a draft.
EDIT: The changes to the nxdl.xsd are now also showing up in the HTML description of the NXDL:
image

Closes #1520

@lukaspie lukaspie linked an issue Dec 10, 2024 that may be closed by this pull request
3 tasks
@lukaspie lukaspie changed the title Allow for Open enumerations Allow for open enumerations Dec 10, 2024
@phyy-nx
Copy link
Contributor

phyy-nx commented Dec 10, 2024

There are a few places we want to consider open enumerations. #1486 adds identifierNAME and #1407 added changes to NXsource that would merit an open enumeration. @lukaspie you have made the latter change, but the former could be part of this PR as well. Consider rebasing this PR on the #1486 branch? Unless there was a third location we were considering?

Also in both locations, I think just adding open="true" is not enough. Consider adding language such as this: "Users are encouraged to use options from this list. However if your (identifier type or source type) is not listed, users can put their own values here. In that case, we encourage users to contact the NeXus developers to see if your (identifier type or source type) can be added to this list." That's a lot of words, I realize, but maybe there's a more succinct way to say this?

@lukaspie lukaspie force-pushed the 1520-open-enumerations branch from 1952b31 to ea0c66f Compare December 11, 2024 11:33
@lukaspie
Copy link
Contributor Author

There are a few places we want to consider open enumerations. #1486 adds identifierNAME and #1407 added changes to NXsource that would merit an open enumeration. @lukaspie you have made the latter change, but the former could be part of this PR as well. Consider rebasing this PR on the #1486 branch? Unless there was a third location we were considering?

I brought in the changes from #1486 now and already make the enum for the identifier type open (with the possibility of reverting if #1486 yields a different outcome what was is proposed there)

Afterwards, I went digging a bit for possible other uses of open enumerations within existing base classes / application definitions, using this search in the GitHub GUI

repo:nexusformat/definitions enumeration (path:/^applications\// OR path:/^base_classes\//)

resulting in these search results.

Here is a list of concepts where it might make sense to reconsider if the provided values are complete. I also added a bit of reasoning why this might be a good place for an open enumeration:

# Concept (with link) Reasoning
1 NXsensor/measurement There are many more sensor measurements that are possible (e.g. current, mechanical properties, etc.) and I don't see a way of making this list complete.
2 NXsource/mode It is already stated with a comment there that other sources could add to this list.
3 NXsource/target_material It is unclear to me if these are really all the possible target materials. This one is probably not urgent.

I think we could do 1 and 2 together with this PR and then subsequent changes to make other enumerations open in separate PRs. What do you think?

Also in both locations, I think just adding open="true" is not enough. Consider adding language such as this: "Users are encouraged to use options from this list. However if your (identifier type or source type) is not listed, users can put their own values here. In that case, we encourage users to contact the NeXus developers to see if your (identifier type or source type) can be added to this list." That's a lot of words, I realize, but maybe there's a more succinct way to say this?

I agree that would be helpful. Do you want to do this in the XML definitions themselves or just in the HTML rendering? Currently, I only modified the HTML, so if you look at the XML, you would only see the attribute open="true". We can add it in the XML as well, but then it must be done in every open enumeration separately. I would be open to both possibilities.

@lukaspie lukaspie added enhancement NIAC has requested The NIAC has requested this issue to be considered labels Dec 11, 2024
@lukaspie lukaspie requested a review from a team December 19, 2024 12:09
@lukaspie lukaspie force-pushed the 1520-open-enumerations branch from b464848 to 5a7fecb Compare January 7, 2025 10:34
@lukaspie lukaspie marked this pull request as ready for review January 7, 2025 10:38
@lukaspie lukaspie force-pushed the 1520-open-enumerations branch from 542f065 to b975817 Compare January 7, 2025 10:39
@phyy-nx
Copy link
Contributor

phyy-nx commented Jan 16, 2025

4 possible routes:

  1. Accept but make validators print a verbose-hidden info message if custom values are used for open enumerations
  2. Accept but make validators warn if custom values are used for open enumerations
  3. Middle ground: when using custom, require a flag like 'custom=True' in the data file (would limit typos)
  4. Note that validators could use a custom controlled vocabulary for a given facility

Straw poll in Telco: 3: 10 votes, 1: 3 votes, 4: 1 vote.

@lukaspie if you could implement 3 here, then we'll vote on this PR.

Any of these values or a custom value: becomes Any of these values or a custom value (if you use a custom value, also set @custom=True):

Plus the corresponding change to nxdl.xsd. Note if open=False, custom must = False

@lukaspie lukaspie force-pushed the 1520-open-enumerations branch from 4d7ecf7 to 6b70ea4 Compare January 17, 2025 14:09
mkuehbach pushed a commit to FAIRmat-NFDI/nexus_definitions that referenced this pull request Jan 17, 2025
@phyy-nx
Copy link
Contributor

phyy-nx commented Jan 27, 2025

Hi all, as discussed in the Telco, we can now move this PR to an online vote. NIAC committee members please vote on this PR using emojis on this comment. 👍 for yes, 👎 for no, anything else (for example 👀) to abstain. We need 14 votes to hit quorum so please review and vote!

@paulmillar
Copy link

I was wondering whether NIAC should have an opinion on introducing new terms as specialisation of existing terms.

Here is a made-up example.

Suppose, a particular NeXus file could be described using an NXsource/type with value "Pulsed Muon Source"; however, for this file, the source could be more accurately described as a "Ultra-fast Pulsed Muon Source". There may be some distinction between the more general term ("Pulsed Muon Source") and the more specific term ("Ultra-fast Pulsed Muon Source"). There could be some benefits from describing the file using the more precise term. The open enumeration would allow someone simply to use the more precise term ("Ultra-fast Pulsed Muon Source").

However, introducing a new term also has some negative aspects:

  • Creating such a specialisation would make it harder to search for pulsed muon sources (of any type) as this search would need to include both terms: "Pulsed Muon Source" and "Ultra-fast Pulsed Muon Source". Somehow, the agent (human or computer) searching would need to know that the latter is a specialisation of the former.
  • It also throws into question the definition of the more general term ("Pulsed Muon Source"). With the specific term ("Ultra-fast Pulsed Muon Source") now in used, it would be tempting to consider the general term ("Pulsed Muon Source") as an indication that the specific term ("Ultra-fast Pulsed Muon Source") doesn't apply -- if it did apply then the dataset would be described by the more specific term. However, there could easily exist datasets that have been described used the more general term ("Pulsed Muon Source"), but that could have been better described using the more specific term ("Ultra-fast Pulsed Muon Source"), had the more specific term existed when the file was written.

Introducing a new term as a specialisation ("Ultra-fast Pulsed Muon Source" vs "Pulsed Muon Source") allows for more accurate description, but introduces complexity. This creates a tension. The decision on whether (on balance) it is "better" to use an existing, broader term or introduce a newer, narrower term is non-trivial.

Given this is an open enumeration, the person introducing the new term (i.e., anyone writing a NeXus file) might not being aware of the consequences of their actions.

This is a place where NIAC could offer advice; e.g., through the documentation.

For example, the documentation could say something to the effect:

In general you should use an existing term in the enumeration where possible; new terms should describe situations that are not covered by the existing terms.

If we want, we could put in something stronger.

Please contact NIAC before introducing a new term that is a more specific version of an existing term.

@paulmillar
Copy link

Another thought: do we want to request that new values for open enumerations are reported back to NIAC?

With this information, we could add the new value(s) to the list in an updated version of NeXus. Doing this would reduce the risk of multiple incompatible values being selected by different facilities.

Alternatively, we could hold a list of enumeration values in lists outside of NeXus standard. This could allow for a faster process through which new values become visible to everyone.

@phyy-nx
Copy link
Contributor

phyy-nx commented Feb 3, 2025

@paulmillar thanks for your comments, but they make it unclear as to what is happening in this PR right now. This PR is under review and in a voting period. Do you want your comments to imply that this PR needs changes? I assume not because you voted in favor of the proposal. If not, can you move your comments to a new issue?

Again, I want to emphasize my appreciation of your review and your comments. It's great to get scrutiny and feedback.

@paulmillar
Copy link

@phyy-nx ,
Sorry for the confusion. The proposal is fine as-is, and gets a 👍 from me.

My comments are intended to describe ways in which (I think) the proposed changes could be improved. These are topics that (I think) NIAC should discuss. However, they are not intended to block this PR. Instead, they may be addressed after the PR is accepted.

The idea of creating issues on these comments sounds good to me. However, I propose only doing that once the PR is accepted. (Open an issue against something that isn't yet in the repo seems a little odd.)

Would that be OK?

@phyy-nx
Copy link
Contributor

phyy-nx commented Feb 3, 2025

No worries. The online voting method is new to the NIAC. It's totally fine if you'd like to do a issue after the PR is accepted. Thanks!

@phyy-nx
Copy link
Contributor

phyy-nx commented Feb 12, 2025

The vote has passed. Thanks all. @lukaspie please merge at your convenience.

@paulmillar please be sure any issues you raised that you'd like to pursue are moved to a new issue. Thanks.

@lukaspie lukaspie merged commit 545661e into main Feb 13, 2025
3 checks passed
@lukaspie lukaspie deleted the 1520-open-enumerations branch February 13, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement NIAC vote needed PR needs an approving vote from NIAC before merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Open enumerations
3 participants