Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale related APIs -- language matching #46

Open
zbraniecki opened this issue Oct 16, 2015 · 39 comments
Open

Locale related APIs -- language matching #46

zbraniecki opened this issue Oct 16, 2015 · 39 comments
Labels
c: locale Component: locale identifiers Proposal Larger change requiring a proposal s: comment Status: more info is needed to move forward

Comments

@zbraniecki
Copy link
Member

UPDATE: Latest Proposal - Nov 2015

I started working on a spec and polyfill for Intl.Locale API.

I should have some proposal for this next week. Currently I have five functions on it:

  • isStructurallyValidLanguageTag(locale)
  • canonicalizeLocaleList(locales)
  • resolveLocale(availableLocales, requestedLocales, options, relevantExtensionKeys, localeData)
  • prioritizeAvailableLocales(availableLocales, requestedLocales, defaultLocale)
  • getDirection(locale)

The rationale for those five are:

  • isStructurallyValidTag
    This allows testing language tag for sanitation/validation purposes.
  • canonicalizeLocaleList
    This is the core function allowing all language negotiation operations
  • resolveLocale

this exposes the core function used by all Intl formatters that allows people to write their own custom formatters or polyfills.

  • prioritizeAvailableLocales

This is basic language negotiation function for all localization libraries. It takes requestedLocales list (navigator.languages or custom), availableLocales (app provided locales), and a fallback defaultLocale (default app locale, not default host env locale) and returns the list of prioritized available locales that best matches user requested locales.

  • getDirection

Simple function allowing l10n libraries to set html.dir based on their negotiated locale.

@caridy - One question I'd like to get help with is about using Set instead of Array. In all cases where we carry a list of locales, we always want it deduplicated and Set does just that. It would be sweet to use it instead of an Array, but it's API seems to be significantly less user friendly than that of the Array - it's impossible to just retrieve locales[0], it's not easy to map etc.
Should I stick to Array or try to use Set?

@caridy
Copy link
Contributor

caridy commented Oct 16, 2015

Two things:

  • You can certainly use a Set internally as part of the logic, but let's stick to an array for arguments, I haven't seen any precedent of using a Set as arguments in Ecma.
  • Instead of implementing a new polyfill (once we settle on the api) you can just submit a PR for Intl.js, which already implements those abtract operations.

@zbraniecki
Copy link
Member Author

sure! thanks for the feedback :)

@zbraniecki
Copy link
Member Author

@caridy - I started playing with Intl.js and ended up separating src/locale.js out of src/core.js and then carved out the common functions into src/utils.js because src/core.js is huuge :)
Not sure if you're interested in any sort of refactor like that - if not, I'll bring it back into src/core.js.

Branch is here: https://github.com/zbraniecki/Intl.js/tree/locale

I also wrote a simple polyfill just for Intl.Locale based on Gecko implementation - https://github.com/zbraniecki/IntlLocale

Let's now talk about the API and get it agreed upon (before I start the daunting task of working on the spec)

@caridy
Copy link
Contributor

caridy commented Oct 21, 2015

@zbraniecki I added the module system into Intl.js to be able to split it into small chunks, I didn't get to split core into small chunks. In other words, go for it, we will appreciate it. Just open a PR there, and we can discuss the details.

In the other hand, I'm more interested on the API discussion :)

@zbraniecki
Copy link
Member Author

@caridy Cool. Do you have any comments on the API I proposed in comment 1? I'd like to make Locale a good unified language negotiation API for other Intl APIs and localization frameworks.

I believe that with those five functions are enough to achieve that. On top of that, all of that code except of getDirection is already in Intl API, just not exposed, so it should not cause any cost to the vendors deploying it.

@zbraniecki
Copy link
Member Author

I guess it's not the right idea to at the same time refactor a major piece of code and introduce new feature for debate in it :)

So I updated my locale branch to just introduce IntlLocale to Intl.js polyfill now - https://github.com/zbraniecki/Intl.js/tree/locale

@caridy - does it look like something you may be interested in?

@caridy
Copy link
Contributor

caridy commented Nov 6, 2015

update: we need to cook the proposal for the next meeting in two weeks. will try to coordinate some time to work on this.

@zbraniecki
Copy link
Member Author

We've talked about this with @caridy and he mentioned that Intl formatters may be interested in switching to something like prioritizeLocales for language negotiation.

This would mean that we don't have to expose resolveLocale at all, and just exposing prioritizeLocales will be enough for both - localization frameworks and for custom intl formatters to match the behavior of the platform ones.

The only thing coming to my mind that we may want to use depth parameter to limit the length of the resolved chain - if Intl formatter will only take the first one anyway, there's no reason to resolve the whole chain. At the same time l10n frameworks operate in more uncertainty (l20n for example falls back when a single l10n string cannot be resolved in a given language at runtime) so they need full "depth".

That would leave us with:

  • isStructurallyValidLanguageTag
  • canonicalizeLocaleList
  • prioritizeLocales(availableLocales, requestedLocales, defaultLocale, depth)
  • getDirection

(and later)

  • firstDayOfTheWeek (or weekendInfo returning firstDay, weekendStarts and weekendEnds)

wrt. prioritizeLocales you can see an implementation here: https://github.com/l20n/l20n.js/blob/v1.0.x/lib/l20n/intl.js#L434

It uses LookupAvailableLocales which is a new function that acts exactly like LookupSupportedLocales but instead of returning a requested locale that has a match in available ones, it returns the available locale that has a match in requested ones.
So if the user requests 'sv-SE' and we have 'sv', LookupSupportedLocales will return 'sv-SE', while the new function will return 'sv'. Which is what l10n and intl can later use to load the right resources.

@rxaviers
Copy link
Member

I'm interested to know the exact algorithm and how this relates to https://github.com/rxaviers/ecma402-fix-lookup-matcher please.

@caridy
Copy link
Contributor

caridy commented Nov 13, 2015

@rxaviers that's precisely the reason why we will abstract the operation to produce the parent list, then we can update that to take into consideration the proper algo step.

@caridy
Copy link
Contributor

caridy commented Nov 23, 2015

@caridy caridy added this to the 3rd Edition milestone Nov 23, 2015
@caridy
Copy link
Contributor

caridy commented Nov 23, 2015

@rxaviers Ideally we can get https://github.com/rxaviers/ecma402-fix-lookup-matcher to fit into getParentLocales(), in other words, make the lookup matcher fix so we can expose it via getParentLocales(). Can you work with @zbraniecki to get that flushed out?

@rxaviers
Copy link
Member

@caridy, note https://github.com/rxaviers/ecma402-fix-lookup-matcher is about finding the bundle, not the parent, which is the first entry of the chain. For example, given the locale zh-TW, you start at the zh-Hant bundle, and then inherits from zh-Hant parents, probably using getParentLocales().

@caridy
Copy link
Contributor

caridy commented Nov 24, 2015

yes @rxaviers, the fact that all those algos will require the inheritance path to make a decision about the data availability, that's the piece we want to expose, so they can do that logic in user-land as well.

@rxaviers
Copy link
Member

rxaviers commented Dec 7, 2015

It's in my TO-DO, but I got few spare time at the moment to go through it. Please, alert me in case I'm blocking you, but I will try to catch up time permitting. Thanks

@zbraniecki
Copy link
Member Author

Here's the initial POC of the patch for Intl.js to implement the abstract locale operations discussed in the slides @caridy linked on Nov 23rd: andyearnshaw/Intl.js@master...zbraniecki:locale

@caridy - does it look good? If so, I'll work on the spec patch.

@caridy
Copy link
Contributor

caridy commented Jan 14, 2016

@zbraniecki let's work on the spec, and then we can review the implementation. In principle, it looks good.

@zbraniecki
Copy link
Member Author

Here's the first draft of the spec - master...zbraniecki:locale-api
I didn't write any algo for resolveLocaleInfo because I'm not sure how to approach accessing the necessary information from CLDR.

@rxaviers
Copy link
Member

@caridy @zbraniecki sorry for the long delay on this reply and for my short worded comments above, but the end of the year has been intensive. First of all, great job with the Locale API, it's a good way of exposing these info and will definitely benefit custom libraries and polyfills.

How can I help you to get https://github.com/rxaviers/ecma402-fix-lookup-matcher into getParentLocales()? Note the first draft above gives wrong results [details].

@zbraniecki
Copy link
Member Author

@zbraniecki
Copy link
Member Author

Heads up, this has been advanced to Stage 2 as of today. I'm going to work on updating the spec/polyfill now to @rxaviers and TC39 feedback and will submit it to reviewers in a couple weeks.

@stasm
Copy link
Contributor

stasm commented Feb 5, 2016

CLDR defines groups of supplemental data for each territory, like timeData or weekData. An incomplete table can be consulted here:

http://www.unicode.org/cldr/charts/28/supplemental/territory_information.html

And the full data set here:

http://unicode.org/repos/cldr/trunk/common/supplemental/supplementalData.xml

Perhaps a good future-proof solution would be to base the method names on these groups, e.g.:

Intl.Locale('fr-FR').getWeekData();

which would return:

{
    minDays: {
        count: 4
    },
    firstDay: {
        day: 'mon'
    },
    weekendStart: {
        day: 'sat'
    },
    weekendEnd: {
        day: 'sun'
    }
}

Or:

Intl.Locale('fr-FR').getTimeData();

which would return:

{
    hours: {
        preferred: 'H',
        allowed: 'H hB'
    }
}

And for a different French-speaking territory:

Intl.Locale('fr-CA').getTimeData();

which would return:

{
    hours: {
        preferred: 'h',
        allowed: 'h hb H hB'
    }
}

This is a bit verbose but it maps the CLDR data one-to-one which should make it much easier to extend it in the future.

@rxaviers
Copy link
Member

rxaviers commented Feb 5, 2016

FWIW, find CLDR in the JSON format here https://github.com/unicode-cldr/cldr-core/tree/master/supplemental.

@zbraniecki zbraniecki changed the title Intl.Locale API Locale related APIs Sep 23, 2016
@zbraniecki
Copy link
Member Author

As I mentioned in #6 I landed mozIntl.getCalendarInfo for calendar information in Gecko.

Also, instead of getDirection, we're going to use mozIntl.getLocaleInfo which works very similarly to getCalendarInfo.

I'll be happy to revisit it and move away from that API toward any standard that ECMA402 will come up with.

@jacobrask
Copy link

Is there any further discussion on Intl.getCalendarInfo happening somewhere else? How did that API work out for you in Gecko?

@caridy
Copy link
Contributor

caridy commented May 8, 2017

We haven't discuss it recently, but probably something that we will get back to at some point.

@brettz9
Copy link

brettz9 commented Dec 13, 2017

Some of the information in the IANA language subtag registry, a registry from which the following may be derived, may be incomplete, but I wonder whether the existing information present could be used to supply getLocaleInfo with info on macrolanguages of encompassed languages?

This would, I imagine, be useful information for i18n as e.g., a user with language "in" (an encompassed language with parent macrolanguage "ms") could be directed to a "ms" locale if an "in" one could not be found. The writing system could conceivably be different, but if the "Suppress-Script" information were available to languages too (see #205), compatibility here could be detected as well.)

@sffc sffc added s: help wanted Status: help wanted; needs proposal champion c: locale Component: locale identifiers and removed enhancement labels Mar 19, 2019
@hugovdm
Copy link

hugovdm commented Apr 12, 2019

I'm also looking for something like resolveLocale or prioritizeLocales: an API to call to help doing language matching (wanting to select a best locale based on the user's preferred list of locales and a developer-provided list of locales supported by an app, for example).

@opyh
Copy link

opyh commented Aug 4, 2019

https://www.unicode.org/reports/tr35/#LanguageMatching proposes an algorithm to find a best fit for given sets of available and requested languages.

Could this be referenced in the spec - or maybe even integrated?

https://github.com/rxaviers/cldrjs/blob/master/doc/bundle_lookup_matcher.md describes more details about issues with this algorithm.

@sffc
Copy link
Contributor

sffc commented Sep 19, 2019

I see several discussions in this thread. Let's move discussion of Intl.getCalendarInfo() over to #6, and use this thread to focus on the language matcher issues.

@sffc sffc changed the title Locale related APIs Locale related APIs -- language matching Sep 19, 2019
@johan
Copy link

johan commented Feb 23, 2020

I'm developing an internal API for Gecko which will be used by various date/time calendars and pickers.

At the moment the proposal for the API to retrieve the calendar data is:

let info = Intl.getCalendarInfo(locales);

which returns:

{
  "firstDayOfWeek": 7,
  "weekendStart": 6,
  "weekendEnd": 1,
  "locale": "de",
  "calendar": "gregory" 
}

[...]

As I said - this is an internal API, but maybe it'll help us design ECMA 402 API for that dataset.

FWIW: if weekday references end up numbers instead of weekday name abbreviation names, as in CLDR (probably a good thing), it would be internally consistent to sync those numbers up with ECMA-262's data type Week Day (the return value from Date.prototype.getDay), ranging from Sun=0 to Sat=6.

@sffc sffc added Proposal Larger change requiring a proposal s: comment Status: more info is needed to move forward and removed s: help wanted Status: help wanted; needs proposal champion labels Jun 5, 2020
@sffc sffc removed this from the 4th Edition milestone Jun 5, 2020
@sffc sffc added the User Preferences Related to user preferences label Jun 5, 2020
@ryzokuken
Copy link
Member

Intl.Locale is Stage 4 and the follow-on Intl LocaleInfo is Stage 3. Can this be closed?

@sffc
Copy link
Contributor

sffc commented Jun 5, 2021

Intl.Locale is Stage 4 and the follow-on Intl LocaleInfo is Stage 3. Can this be closed?

No. As suggested in #46 (comment), this issue is about language matching. It would be fixed by the Intl.LocaleMatcher proposal.

https://github.com/tc39/proposal-intl-localematcher

@sffc sffc removed the User Preferences Related to user preferences label Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Component: locale identifiers Proposal Larger change requiring a proposal s: comment Status: more info is needed to move forward
Projects
None yet
Development

No branches or pull requests