Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerated properties should have to/from ICU4C APIs #6067

Open
Manishearth opened this issue Feb 4, 2025 · 12 comments
Open

Enumerated properties should have to/from ICU4C APIs #6067

Manishearth opened this issue Feb 4, 2025 · 12 comments
Labels
C-unicode Component: Props, sets, tries needs-approval One or more stakeholders need to approve proposal

Comments

@Manishearth
Copy link
Member

Our enumerated properties are opaque wrappers around integers, using the same values used by ICU4C.

It's reasonable to abstract away our internal implementation, but the ICU4C values are pretty standard across i18n code, and it's reasonable to wish to access these values.

We should offer to/from u8 (or u16) APIs.

I propose we name them to_icu4c_value() and from_icu4c_value(), linking to ICU4C docs. There is no fallibility, unknown values are accepted.

I am not opposed to just having Into/From and documenting the impl as stably producing ICU4C values.

cc @sffc @echeran @robertbastian

@Manishearth Manishearth added the C-unicode Component: Props, sets, tries label Feb 4, 2025
@Manishearth Manishearth added this to the ICU4X 2.0 Stretch ⟨P2⟩ milestone Feb 4, 2025
@sffc sffc added the needs-approval One or more stakeholders need to approve proposal label Feb 4, 2025
@sffc
Copy link
Member

sffc commented Feb 4, 2025

I'm fine adding these, since as you observed this is already part of our observable behavior via TrieValue. Prefer explicitly-named functions as usual.

Perhaps to_icu4c_int or to_icu4c_discriminant?

@Manishearth
Copy link
Member Author

Fine by me

@sffc
Copy link
Member

sffc commented Feb 11, 2025

The PR uses the names in the OP. I don't have a record of us deciding on those names over #6067 (comment).

@robertbastian
Copy link
Member

I implemented @Manishearth's proposal as you proposed two names. I'll leave this open if you want to find-replace the names.

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting and removed needs-approval One or more stakeholders need to approve proposal labels Feb 11, 2025
@sffc
Copy link
Member

sffc commented Feb 11, 2025

My impression of the names:

  1. to_icu4c_value sounds like it should return a ffi type for directly interoperating with ICU4C. But it's also fine.
  2. to_icu4c_discriminant is the most accurate description of what the function returns. It also sounds more strange, which is fine, because most users won't need this function unless they specifically want the ICU4C discriminant for some reason (e.g. harfbuzz).
  3. to_icu4c_int is also correct, but maybe less clear.

So my preference on the name would probably be: 2 > 1 > 3

@Manishearth
Copy link
Member Author

I like 2 as well. No strong opinion, 2 ~> 1 > 3.

@echeran
Copy link
Contributor

echeran commented Feb 11, 2025

I'm fine adding these, since as you observed this is already part of our observable behavior via TrieValue. Prefer explicitly-named functions as usual.

Perhaps to_icu4c_int or to_icu4c_discriminant?

I like to_icu4c_int. ICU4C expresses the values as int32_t. I do not prefer the alternative option _discriminant because that is a term I only see in Rust, and it is not used in ICU4C. I do not prefer the OP option to_icu4c_value because "value" is an overloaded term that could refer to the Unicode Property Value or the integer value that one might get back from by the ICU4C CPTrie

@sffc
Copy link
Member

sffc commented Feb 11, 2025

Another suggestion: to_raw_icu4c_value

@sffc
Copy link
Member

sffc commented Feb 14, 2025

@eggrobin pointed out that C enums are integers; there is no casting involved. However, @Manishearth points out that this depends on -fpermissive in C++.

With this information, I am slightly more okay with _value. My vote changes to:

Choices:

  1. to_icu4c_value
  2. to_icu4c_discriminant
  3. to_icu4c_int
  4. to_raw_icu4c_value

Vote:

  • @sffc: to_raw_icu4c_value ~> to_icu4c_discriminant ~> to_icu4c_value ~> to_icu4c_int

@sffc sffc added needs-approval One or more stakeholders need to approve proposal and removed discuss Discuss at a future ICU4X-SC meeting labels Feb 14, 2025
@Manishearth
Copy link
Member Author

@Manishearth: to_icu4c_value ~> to_icu4c_discriminant > to_icu4c_int ~>> to_raw_icu4c_value

(I do not think that raw is necessary and it feels like additional info to confuse people)

@eggrobin
Copy link
Member

@Manishearth points out that this depends on -fpermissive in C++.

That is a GCC flag. In C++, you need a cast; there is no such thing as fpermissive in clang or msvc.

@sffc sffc removed their assignment Feb 20, 2025
@sffc
Copy link
Member

sffc commented Feb 20, 2025

@echeran Please vote.

Manishearth added a commit to Manishearth/icu4x that referenced this issue Feb 21, 2025
They can be re-renamed when unicode-org#6067 is fixed, if desired.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unicode Component: Props, sets, tries needs-approval One or more stakeholders need to approve proposal
Projects
None yet
Development

No branches or pull requests

5 participants