Skip to content

Commit

Permalink
Editorial: Add GetLocale{Language,Script,Region,Variants} operations
Browse files Browse the repository at this point in the history
Add new `GetLocale{Language,Script,Region,Variants}` operations to
retrieve the corresponding subtags from a locale tag.

These new operations are used in `ApplyOptionsToTag`,
`IsStructurallyValidLanguageTag`, and the `Intl.Locale.prototype`
accessor functions.

GetLocaleLanguage:
Returns the longest prefix matching `unicode_language_subtag`. The
previous definitions could be misinterpreted to match variant subtags
whose length is larger than the language subtag. For example in "en-basiceng"`
the longest substring matching `unicode_language_subtag` is "basiceng".

GetLocaleScript:
The previous definition from `Intl.Locale.prototype.script` is reused.

GetLocaleRegion:
Instead of using the previous definition from
`Intl.Locale.prototype.region`, it was rewritten to match definition from
`GetLocaleScript` a bit more closely. To not confuse language and region
subtags, the leading language subtag is first removed before searching for
`unicode_region_subtag`.

GetLocaleVariants:
Uses the suggestion from code review in #822. The leading "-" character
is removed for consistency with the other three new operations.

`get Intl.prototype.{language,script,region}` are now all simply calling
the new abstract operations to retrieve the subtags.

`ApplyOptionsToTag` uses the new operations to retrieve the subtags from
the original language tag when the corresponding option is absent. The
updated `languageId` is now manually constructed through string
concatenation instead of using subtag matching.

`IsStructurallyValidLanguageTag` now calls `GetLocaleVariants` to
retrieve the variant subtags. The variable `lang` was renamed to
`languageId` for consistency with the rest of the spec and because
`lang` can be more easily misinterpreted to stand for "language".

`CanonicalizeUnicodeLocaleId` was changed to fix the incorrect
redeclaration warning for `extension` from ecmarkup:
- Instead of using yet another way to retrieve the Unicode extension
  sequence, simply use the existing terms "Unicode locale extension
  sequence". (The existing term already makes sure that substrings in
  private-use subtags are ignored, so we don't have to worry about
  `pu_extensions`.
- "Unicode locale extension sequences" include the leading "-"
  character, so `newExtension` actually needs to be initialised with
  "-u".
  • Loading branch information
anba committed Sep 27, 2023
1 parent 58f5d94 commit e15d12d
Show file tree
Hide file tree
Showing 2 changed files with 95 additions and 43 deletions.
120 changes: 86 additions & 34 deletions spec/locale.html
Original file line number Diff line number Diff line change
Expand Up @@ -91,23 +91,14 @@ <h1>
1. Set _tag_ to CanonicalizeUnicodeLocaleId(_tag_).
1. Assert: _tag_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _languageId_ be the longest prefix of _tag_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. If _language_ is not *undefined*, then
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal replaced by the string _language_.
1. If _script_ is not *undefined*, then
1. If _languageId_ does not contain a <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_script_subtag</code> Unicode locale nonterminal, then
1. Set _languageId_ to the string-concatenation of the <emu-not-ref>substring</emu-not-ref> of _languageId_ matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal, *"-"*, _script_, and the rest of _languageId_.
1. Else,
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_script_subtag</code> Unicode locale nonterminal replaced by the string _script_.
1. If _region_ is not *undefined*, then
1. If _languageId_ does not contain a <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal, then
1. Let _variants_ be the longest suffix of _languageId_ that is a consecutive sequence of <emu-not-ref>substrings</emu-not-ref> in which each element is a *"-"* followed by a <emu-not-ref>substring</emu-not-ref> that is matched by the <code>unicode_variant_subtag</code> Unicode locale nonterminal. If there is no such suffix, set _variants_ to the empty String.
1. Set _languageId_ to the string-concatenation of:
* the longest <emu-not-ref>substring</emu-not-ref> of _languageId_ that is immediately followed by _variants_
* *"-"*
* _region_
* _variants_
1. Else,
1. Set _languageId_ to _languageId_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal replaced with the string _region_.
1. If _language_ is *undefined*, set _language_ to GetLocaleLanguage(_languageId_).
1. If _script_ is *undefined*, set _script_ to GetLocaleScript(_languageId_).
1. If _region_ is *undefined*, set _region_ to GetLocaleRegion(_languageId_).
1. Let _variants_ be GetLocaleVariants(_languageId_).
1. Set _languageId_ to _language_.
1. If _script_ is not *undefined*, set _languageId_ to the string-concatenation of _languageId_, *"-"*, and _script_.
1. If _region_ is not *undefined*, set _languageId_ to the string-concatenation of _languageId_, *"-"*, and _region_.
1. If _variants_ is not *undefined*, set _languageId_ to the string-concatenation of _languageId_, *"-"*, and _variants_.
1. Set _tag_ to _tag_ with the <emu-not-ref>substring</emu-not-ref> matched by the <code>unicode_language_id</code> Unicode locale nonterminal replaced by the string _languageId_.
1. Return CanonicalizeUnicodeLocaleId(_tag_).
</emu-alg>
Expand Down Expand Up @@ -326,10 +317,7 @@ <h1>get Intl.Locale.prototype.language</h1>
<emu-alg>
1. Let _loc_ be the *this* value.
1. Perform ? RequireInternalSlot(_loc_, [[InitializedLocale]]).
1. Let _locale_ be _loc_.[[Locale]].
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _lang_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Return the longest <emu-not-ref>substring</emu-not-ref> of _lang_ matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal.
1. Return GetLocaleLanguage(_loc_.[[Locale]]).
</emu-alg>
</emu-clause>

Expand All @@ -339,11 +327,7 @@ <h1>get Intl.Locale.prototype.script</h1>
<emu-alg>
1. Let _loc_ be the *this* value.
1. Perform ? RequireInternalSlot(_loc_, [[InitializedLocale]]).
1. Let _locale_ be _loc_.[[Locale]].
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _lang_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. If _lang_ contains a subtag matched by the <code>unicode_script_subtag</code> Unicode locale nonterminal, return that subtag.
1. Return *undefined*.
1. Return GetLocaleScript(_loc_.[[Locale]]).
</emu-alg>
</emu-clause>

Expand All @@ -353,14 +337,7 @@ <h1>get Intl.Locale.prototype.region</h1>
<emu-alg>
1. Let _loc_ be the *this* value.
1. Perform ? RequireInternalSlot(_loc_, [[InitializedLocale]]).
1. Let _locale_ be _loc_.[[Locale]].
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _lang_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Let _subtags_ be a List of the subtags of _lang_.
1. NOTE: A <code>unicode_region_subtag</code> subtag is only valid immediately after an initial <code>unicode_language_subtag</code> subtag, optionally with a single <code>unicode_script_subtag</code> subtag between them. In that position, <code>unicode_region_subtag</code> cannot be confused with any other valid subtag because all their productions are disjoint.
1. If the length of _subtags_ is greater than 1 and _subtags_[1] can be matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal, return _subtags_[1].
1. If the length of _subtags_ is greater than 2 and _subtags_[2] can be matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal, return _subtags_[2].
1. Return *undefined*.
1. Return GetLocaleRegion(_loc_.[[Locale]]).
</emu-alg>
</emu-clause>
</emu-clause>
Expand Down Expand Up @@ -390,4 +367,79 @@ <h1>Properties of Intl.Locale Instances</h1>
<li>[[Numeric]] is a Boolean value specifying whether numeric sorting is used by the locale, or is *undefined*. This internal slot only exists if the [[RelevantExtensionKeys]] internal slot of %Locale% contains *"kn"*.</li>
</ul>
</emu-clause>

<emu-clause id="sec-intl-locale-abstracts">
<h1>Abstract Operations for Locale Objects</h1>

<emu-clause id="sec-getlocalelanguage" type="abstract operation">
<h1>
GetLocaleLanguage (
_locale_: a String,
): a String
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _languageId_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Assert: The first subtag of _languageId_ can be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal.
1. Return the first subtag of _languageId_.
</emu-alg>
</emu-clause>

<emu-clause id="sec-getlocalescript" type="abstract operation">
<h1>
GetLocaleScript (
_locale_: a String,
): a String or *undefined*
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _languageId_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Assert: _languageId_ contains at most one subtag that can be matched by the <code>unicode_script_subtag</code> Unicode locale nonterminal.
1. If _languageId_ contains a subtag matched by the <code>unicode_script_subtag</code> Unicode locale nonterminal, return that subtag.
1. Return *undefined*.
</emu-alg>
</emu-clause>

<emu-clause id="sec-getlocaleregion" type="abstract operation">
<h1>
GetLocaleRegion (
_locale_: a String,
): a String or *undefined*
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _languageId_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. NOTE: A <code>unicode_region_subtag</code> subtag is only valid immediately after an initial <code>unicode_language_subtag</code> subtag, optionally with a single <code>unicode_script_subtag</code> subtag between them. In that position, <code>unicode_region_subtag</code> cannot be confused with any other valid subtag because all their productions are disjoint.
1. Assert: The first subtag of _languageId_ can be matched by the <code>unicode_language_subtag</code> Unicode locale nonterminal.
1. Let _languageIdTail_ be the suffix of _languageId_ following the first subtag.
1. Assert: _languageIdTail_ contains at most one subtag that can be matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal.
1. If _languageIdTail_ contains a subtag matched by the <code>unicode_region_subtag</code> Unicode locale nonterminal, return that subtag.
1. Return *undefined*.
</emu-alg>
</emu-clause>

<emu-clause id="sec-getlocalevariants" type="abstract operation">
<h1>
GetLocaleVariants (
_locale_: a String,
): a String or *undefined*
</h1>
<dl class="header">
</dl>
<emu-alg>
1. Assert: _locale_ can be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal.
1. Let _languageId_ be the longest prefix of _locale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. If there is a non-empty suffix of _languageId_ that is a consecutive sequence of <emu-not-ref>substrings</emu-not-ref> in which each element is a *"-"* followed by a <emu-not-ref>substring</emu-not-ref> that is matched by the <code>unicode_variant_subtag</code> Unicode locale nonterminal, then
1. Let _variants_ be the longest such suffix.
1. Return the substring of _variants_ from 1.
1. Return *undefined*.
</emu-alg>
</emu-clause>
</emu-clause>
</emu-clause>
18 changes: 9 additions & 9 deletions spec/locales-currencies-tz.html
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,11 @@ <h1>
1. Let _lowerLocale_ be the ASCII-lowercase of _locale_.
1. If _lowerLocale_ cannot be matched by the <code>unicode_locale_id</code> Unicode locale nonterminal, return *false*.
1. If _lowerLocale_ uses any of the backwards compatibility syntax described in <a href="https://unicode.org/reports/tr35/#BCP_47_Conformance">Unicode Technical Standard #35 Part 1 Core, Section 3.3 BCP 47 Conformance</a>, return *false*.
1. Let _lang_ be the longest prefix of _lowerLocale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Let _langRefinements_ be the longest suffix of _lang_ following a non-empty prefix matched by the <code>unicode_language_subtag</code> or <code>unicode_script_subtag</code> Unicode locale nonterminal.
1. If _langRefinements_ contains any duplicate <emu-not-ref>substrings</emu-not-ref> matched greedily by the <code>unicode_variant_subtag</code> Unicode locale nonterminal, return *false*.
1. Let _allExtensions_ be the suffix of _lowerLocale_ following _lang_.
1. Let _languageId_ be the longest prefix of _lowerLocale_ matched by the <code>unicode_language_id</code> Unicode locale nonterminal.
1. Let _variants_ be GetLocaleVariants(_languageId_).
1. If _variants_ is not *undefined*, then
1. If _variants_ contains any duplicate subtags, return *false*.
1. Let _allExtensions_ be the suffix of _lowerLocale_ following _languageId_.
1. If _allExtensions_ contains a <emu-not-ref>substring</emu-not-ref> matched by the <code>pu_extensions</code> Unicode locale nonterminal, let _extensions_ be the prefix of _allExtensions_ preceding the longest such <emu-not-ref>substring</emu-not-ref>. Otherwise, let _extensions_ be _allExtensions_.
1. If _extensions_ is not the empty String, then
1. If _extensions_ contains any duplicate singleton subtags, return *false*.
Expand All @@ -86,18 +87,17 @@ <h1>
</dl>
<emu-alg>
1. Let _localeId_ be the String value resulting from performing the algorithm to transform _locale_ to canonical form per <a href="https://unicode.org/reports/tr35/#LocaleId_Canonicalization">Unicode Technical Standard #35 Part 1 Core, Annex C LocaleId Canonicalization</a> (note that the algorithm begins with canonicalizing syntax only).
1. If _localeId_ contains a <emu-not-ref>substring</emu-not-ref> matched by the <code>pu_extensions</code> Unicode locale nonterminal, let _localeWithoutPrivateUse_ be the prefix of _localeId_ preceding the longest such <emu-not-ref>substring</emu-not-ref>. Otherwise, let _localeWithoutPrivateUse_ be _localeId_.
1. Let _extension_ be the longest <emu-not-ref>substring</emu-not-ref> of _localeWithoutPrivateUse_ that is a Unicode locale extension sequence. If there is no such <emu-not-ref>substring</emu-not-ref>, set _extension_ to the empty String.
1. [id="step-canonicalizeunicodelocaleid-u-extension"] If _extension_ is not the empty String, then
1. Let _newExtension_ be *"u"*.
1. [id="step-canonicalizeunicodelocaleid-u-extension"] If _localeId_ contains a substring that is a Unicode locale extension sequence, then
1. Let _extension_ be the String value consisting of the substring of the Unicode locale extension sequence within _localeId_.
1. Let _newExtension_ be *"-u"*.
1. Let _components_ be UnicodeExtensionComponents(_extension_).
1. For each element _attr_ of _components_.[[Attributes]], do
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _attr_.
1. For each Record { [[Key]], [[Value]] } _keyword_ in _components_.[[Keywords]], do
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Key]].
1. If _keyword_.[[Value]] is not the empty String, then
1. Set _newExtension_ to the string-concatenation of _newExtension_, *"-"*, and _keyword_.[[Value]].
1. Assert: _newExtension_ is not equal to *"u"*.
1. Assert: _newExtension_ is not equal to *"-u"*.
1. Set _localeId_ to a copy of _localeId_ in which the first appearance of <emu-not-ref>substring</emu-not-ref> _extension_ has been replaced with _newExtension_.
1. Return _localeId_.
</emu-alg>
Expand Down

0 comments on commit e15d12d

Please sign in to comment.