Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region is validated, but language is not #959

Open
thany opened this issue Feb 12, 2025 · 5 comments
Open

Region is validated, but language is not #959

thany opened this issue Feb 12, 2025 · 5 comments
Labels
c: locale Component: locale identifiers question

Comments

@thany
Copy link

thany commented Feb 12, 2025

Synopsis

When constructing a new Intl.Locale object, any language tag can be passed, but not any region tag can be passed.

Example code

Take for example a nonexisting language, in an existing region:

let locale = new Intl.Locale('xxxxx-nl');
-> Intl.Locale { baseName: "xxxxx-NL", numeric: false, language: "xxxxx", region: "NL" }

So this works by assuming perhaps a custom language. The other way round, 'xxxxx' gets ignored:

let locale = new Intl.Locale('nl-xxxxx');
-> Intl.Locale { baseName: "nl-xxxxx", numeric: false, language: "nl" }

At least ignored in the sense that 'xxxxx' is not assumed to be the region. If we go explicit, it will fail:

let locale = new Intl.Locale('nl', { region: 'xxxxx' });
-> Uncaught RangeError: invalid value "xxxxx" for option region

Takeaways

  1. Lenient parsing for language - any language is allowed.
  2. Strict parsing for region - it is restricted to a supposed internal list of valid names.
  3. An unknown region in the locale string is ignored - probably assumed to be an arbitrary suffix, not a region.

Why is this an issue?

Languages do not evolve as quickly as (political) geographical regions do. This could mean that when a new region emerges, perhaps after settling a dispute, that new region will not be accepted by any browser. An update of some kind would be required in order to have a newly formed region be valid in a locale identifier.

This also means older browsers will assume recently emerged regions to be invalid, and people living there might be offended by it.

But also, since languages evolve much more slowly than regions, it seems backwards to me that the language in a locale identifier is not validated at all. Presumably this is so that a custom or esoteric language can be specified (like Vulkan or something) but then why isn't that also the case for regions?

I'm guessing this is done because otherwise the parser can't know what part of the locale sits after the first dash. So for example in new Intl.Locale('nl-Latn') there is still no region, but a script instead. But when explicitly passing properties, like in the third example, no parsing needs to be done for the region property, and a nonexisting one can safely be assumed to be custom (or extraterrestrial). And for me, this re-raises the question why a region must adhere to a predefined list of values, and the language property is free to be anything at all.

@eemeli
Copy link
Member

eemeli commented Feb 12, 2025

This might be a duplicate of #951?

@sffc sffc added the c: locale Component: locale identifiers label Feb 12, 2025
@sffc
Copy link
Contributor

sffc commented Feb 12, 2025

Strict parsing for region - it is restricted to a supposed internal list of valid names.

Not quite. It is restricted to being a BCP-47 region code, which is usually either {alpha}{alpha} or {num}{num}{num} (the exact grammar is in the spec).

@sffc sffc added the question label Feb 12, 2025
@anba
Copy link
Contributor

anba commented Feb 12, 2025

The xxxxx in new Intl.Locale('nl-xxxxx') doesn't specify a region, but a variant subtag.

@thany
Copy link
Author

thany commented Feb 13, 2025

The xxxxx in new Intl.Locale('nl-xxxxx') doesn't specify a region, but a variant subtag.

I already alluded to something like that, to quote myself:

probably assumed to be an arbitrary suffix, not a region.

The issue can be reduced to a single question really:

Why is language not validated, and region is?

Not quite. It is restricted to being a BCP-47 region code, which is usually either {alpha}{alpha} or {num}{num}{num} (the exact grammar is in the spec).

Fair enough. It doesn't even matter to the discussion what exactly the region is validated against, be it a list or a pattern or otherwise. My issue is the discrepancy between validation (or the lack thereof) between language and region. My simple mind is like "whatever the reason for not validating languages is, is equally if not more applicable to regions".

Maybe there's a perfectly good reason why custom languages are allowed and custom regions aren't, but I haven't come across it.

@anba
Copy link
Contributor

anba commented Feb 13, 2025

Language subtags are 2-3 or 5-8 alphabetic characters.

Code Result
new Intl.Locale("x").language RangeError
new Intl.Locale("xx").language "xx"
new Intl.Locale("xxx").language "xxx"
new Intl.Locale("xxxx").language RangeError
new Intl.Locale("xxxxx").language "xxxxx"
new Intl.Locale("xxxxxx").language "xxxxxx"
new Intl.Locale("xxxxxxx").language "xxxxxxx"
new Intl.Locale("xxxxxxxx").language "xxxxxxxx"
new Intl.Locale("xxxxxxxxx").language RangeError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Component: locale identifiers question
Projects
None yet
Development

No branches or pull requests

4 participants