Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goals and non-goals #77

Merged
merged 8 commits into from
Jun 1, 2020
Merged

Goals and non-goals #77

merged 8 commits into from
Jun 1, 2020

Conversation

stasm
Copy link
Collaborator

@stasm stasm commented Apr 16, 2020

This is based on #59 and the discussion in the March 23rd meeting. The language is sometimes a bit hand-wavy because many details remain to be discussed and decided. If you think I have included to much detail, please let me know.

Let's have another discussion about this in the upcoming monthly meeting next week. I'd like for this list to represent the common agreement of the group with respect to the expectations of the outcome of our work.

Rendered document.

@CLAassistant
Copy link

CLAassistant commented Apr 16, 2020

CLA assistant check
All committers have signed the CLA.


- Ensure that the data model is interoperable with existing interchange
formats. In particular, specify how to losslessly convert translations to
and from XLIFF.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is worth to mention SSML here to

@stasm
Copy link
Collaborator Author

stasm commented May 14, 2020

I've updated the PR to reflect the points raised in the last monthly meeting:

  • I added a goal about storing structured data alongside translations, like markup.
  • I add canonical in front of the data model.
  • I removed the part about specifying how parsing erros should be handled.

Other comments which I haven't directly addressed in the document:

  • Data model for translation units vs. data model for resources (collections) of units. I left that out on purpose. I think it deserves its own discussion. I wouldn't want to block the goal setting on it, however, so I tried to phrase the goals such that they work in both scenarios.
  • Different representations for authoring and runtime. After thinking about this, I see it as a solution rather the a goal per se. I tried to capture the intent in the goal no. 6 about enabling a range of different implementations and solutions.
  • Backward compatibility. It came up during the meeting but not as a hard requirement. I think the goal no. 4 about the interoperability with XLIFF can be a proxy for this; if legacy formats can be converted to XLIFF, and the future standard can too, then we'll have ensured backward compatibility.

I also made a significant change to the layout of the document. I realized that in my previous attempt the goals were mixed with deliverables. I suspect this made discussing them harder, since some bullet points contained a bit of an implicit solution.

In the current version, I tried to make goals sound like statements about what we want to achieve trough the work of the group. I then separated concrete deliverables into a separate section of the document.

I also added a bit more detail to the Non-Goals section. I didn't make any changes to the non-goals themselves, but I countered each non-goal with a positive statement about what we'd like to do instead.

escape sequences, whitespace, markup, as well as parsing errors.

3. A specification for lossless conversion between the data model and XLIFF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do same as before mot specify XLIFF or we can try all together define at least some base formats more than only XLIFF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, in the l10n tools world, XLIFF is the de facto standard format that they all support for importing/exporting, regardless of which other file formats they support. It's an open standard that is not vendor-specific. The core of it is implemented across most tools, even if some may attempt an embrace,extend,... strategy.

Copy link
Collaborator

@nbouvrette nbouvrette May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that XLIFF is probably the most complete (and complex) standard that is also not consistently adopted to the same extent across different products. It might be interesting to also consider the opposite file format (the simplest), which consist of only keys and values - for example .properties file. If we could support both then we should have a complete and flexible solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point. It has been my experience as well that many vendors say they support XLIFF, and then it turns out that everyone supports a different subset of XLIFF.

Copy link
Member

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

4. Ensure interoperability with existing interchange formats, in particular
with XLIFF.

5. Ensure that the standard can integrate with existing TMS and CAT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explain what the abbreviations stand for?

Suggested change
5. Ensure that the standard can integrate with existing TMS and CAT
5. Ensure that the standard can integrate with existing TMS (Translation
Management System) and CAT (Computer Assisted Translation)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed these are included in the glossary, so perhaps not needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we can just link to this expressions to the glossary

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't join today but LGTM

@rxaviers
Copy link
Contributor

In the meeting today, there was a suggestion to include an extra goal about format date, number, etc, but no consensus on how to phrase it: we talked about API and factoid as ways to express that goal. As an alternative suggestion, we could refer to is as one of the goals is to "Express internationalization formatting features"...

By the way, should we also say something about formatting control?

@aphillips
Copy link
Member

@rxaviers I think that formatting control (i.e. subformat patterns) is an important aspect of the formatting features. And I think we must not overlook this aspect of message formatting (it is, after all, kind of the point).

I have some thoughts about how the doc you linked to talks about the problem: I think that understanding user personas is important here. Most CX developers do not want force translators or rely on translators to make decisions about how to display a given value--because we have learned that this is difficult to get right. The concept of skeletons is both powerful and produces far superior results, since translators don't have to modify or guess at the intentions of the developer (and the developer doesn't have to know or guess what the target locale needs). This is the difference in current MF between 0 and 1 in this:

This is hard to localize: {0,date,MM-dd-yyyy} and this isn't: {1,date,::yyyyMd}

@zbraniecki
Copy link
Member

@aphillips I don't understand how your example relates to the formatting control between developer and localizer.

@aphillips
Copy link
Member

@zbraniecki I'm sure that what I wrote wasn't complete enough to be clear. Every "format" has some tension between the developer (person writing the code), CX designer (person designing the interface), customer (person using the software--they choose locale, time zone, and other personal preferences), and the data itself (a price in US dollars can be converted to another currency, but can't just be displayed in another currency).

When I read the link, I saw basically two personas--the developer and the "localizer". If the latter means "translator", then "giving them control" is IMO the wrong direction, since that's the old pattern string (control == responsibility). If "localizer" means (as I suppose it's more likely to mean) "the people responsible for the localized CX", then maybe I don't have any conflict with the doc. But I would not contrast that with developers per se. FWIW, I don't want developers making decisions about the exact format either.

I'm probably in violent agreement with the doc if it's expressing that the presentation (including options) is in the format wholly separate from code, since that would mean that developers never write code to control formatting minutiae (myDateFormat.setHourCycle('24h')) and translators don't have to futz with it in their translations (This is hard to localize: {0,date, dd'.' MMM yyyy}). Instead it is expressed where it can be localized (by the localizer??) in the format where necessary (with sane rational defaults from the customer's context used most of the time). Does that make more sense?

@romulocintra
Copy link
Collaborator

@aphillips the concept sounds interesting but to be honest i cannot imagine how we can achieve this, cause changes all my pre-concepts of the i18n e l10n flow as today... is a Paradigmatic change that will improve overall lifecycle for sure ... but still intrigued, would love to have more examples or information to brainstorm and think about it...

@zbraniecki
Copy link
Member

zbraniecki commented May 19, 2020

Every "format" has some tension between the developer (person writing the code), CX designer (person designing the interface), customer (person using the software--they choose locale, time zone, and other personal preferences), and the data itself (a price in US dollars can be converted to another currency, but can't just be displayed in another currency).

At Mozilla we often say that the developer puts their localizer hat on and they translate to the first language - English.

I think from that perspective, you have the developer who is encoding some UX designed by some designer. This can be the same person, in larger projects it usually is not.

Now, they know what they want - for example "we want to show a sentence "Today is: DATE" and format the date to be "long". So they can write elem.textContent = l10nCtx.formatValue(my-msg, {date: new Date()}) and then they put their localizer hat on and write my-msg = Today is { $date } (I'm simplifying a lot).

The user knows that they like their long dates to be displayed as YYYY, MM, dddd. Those two bits of information can be merged by the system by the developer saying something like:

elem.textContent = l10nCtx.formatValue(`my-msg`, {
  date: l10nCt.Date(new Date(), { dateStyle: "long" })
});

and then the localization system can pick the skeleton or even pattern for what "long" date is according to the user preference, if the user has an override.

Now, let's assume the user has no custom preference for how to display short dates - as I'm sure you can agree vast majority of users would.

Now, a german translator writes a translation to german: my-msg = Heute ist { $date }, polish translator to polish: my-msg = Dzis jest { $date } and so on.

So far, so good. We expect vast majority of cases to work this way. In this case, the "developer" defined the format as long and all localizations use it.

But imagine there is a localization, for which the default date format doesn't work -for example it looks awkward in Polish, or it doesn't fit in in German.

In Fluent, a German localizer can say my-msg = Heute ist { DATE($date, dateStyle: "medium") } and the message fits and looks good.
The German localizer overrode the styling of the date.

I think this level of flexibility is going to be very useful to avoid painful developer dirty-hacks to support a single locale's edge cases.

Does it make sense?

@rxaviers
Copy link
Contributor

Still about formatting control... Technically, we only have the distinction between code vs localization messages. Who touches which depends on the process. Therefore, I believe the process that @zbraniecki and @aphillips described above both make sense. Now, whether or not a translator or a customer experience (CX) designer (In PayPal, we call it content designer) is the one supposed to make changes to the localization messages is about the process defined by product/company.

Back to goals, we could simply mention formatting control (as a feature) should be supported? Then, define the bits elsewhere? Thoughts.

@stasm
Copy link
Collaborator Author

stasm commented May 19, 2020

Perhaps I'm approaching the goal setting too strictly, but I'd leave the topic of formatting control out of the goals document. It sounds like a design principle to me. There's #61 which I think is related wrt. how much we want translators to be in control, as well as #64 about the guidelines which will help us evaluate syntax proposals (skeletons, arguments, etc.).

I'll add a goal about i18n formatting. I liked the phrasing @rxaviers suggested. How we get there is a topic for a design principles discussion and/or requirements.

@aphillips
Copy link
Member

@rxaviers I think I agreed that this was important before sidetracking us 🙉 and I think it should be mentioned (even if all we mean is "how do we express subformats"). @stasm, that sounds reasonable.


Off topic rambling...

@zbraniecki: yeah, we're more or less on the same page. I tend to be much more specific about roles than just "localizer" (the better to write user stories). In an era where, for better or worse, language conversion (i.e. translation) is a commodity operation (and, increasingly, is done by a machine), anything that relies on the translator to do something is a source of pain.

@romulocintra Hopefully @zbraniecki's explanation made the concepts clearer. Writing low-level formatting details into the code is brittle. I favor exposing every formatting feature possible/reasonable in the message formatting language: I would regard the need to call [the equivalent of] setFormatByArgument[Index|Name] as a defect in our design.

  - Added a goal about using i18n formatters.

  - Removed the goal about interoperability with interchange formats.

  - Rephrased the goal about compatibility with TMS/CAT to mention
    `localization roundtrip` instead.

  - Replaced `lossless conversion` with `one-to-one mapping` in the
    deliverable abvout XLIFF.

  - Added a deliverable about a conformance test suite.
@stasm
Copy link
Collaborator Author

stasm commented May 20, 2020

I updated the PR according to the discussion from Monday:

  • I added a goal about using i18n formatters to format numbers, dates, etc.
  • I removed the goal about interoperability with interchange formats.
  • I rephrased the goal about compatibility with TMS/CAT to mention localization roundtrip instead.
  • I replaced lossless conversion with one-to-one mapping in the deliverable abvout XLIFF.
  • I added a deliverable about a conformance test suite.

@stasm
Copy link
Collaborator Author

stasm commented May 20, 2020

There's one more thought I'd like to share, triggered by @DavidFatDavidF's comment about the compatibility as a design principle. I agree with this view, and I just filed #88 to discuss this further.

I'm even starting to think that the goal no. 5, which is phrased as follows after my most recent changes:

Ensure compatibility with the localization roundtrip workflows.

…could be removed and instead be a design principle. It sounds much more like a how than a what. It helps us put bounds on proposals, and it will surely confine the design of the standard.

I didn't remove it because we didn't discuss this during the meeting, but I'd like to hear what people think about it.

4. Represent structured data alongside translations, such as markup, comments,
and metadata.

5. Ensure compatibility with the localization roundtrip workflows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Ensure compatibility with the localization roundtrip workflows.
5. Be compatible with localization roundtrip workflows.

I think something like this line should stay as a goal, to keep us linked to the existing world. Perhaps expressing it slightly differently would help it fit better with the others?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidFatDavidF I think you have a better grasp on what this goal entails than I do. What do you think about the proposed phrasing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the meaning of this one is that all and any features of the new message format should be capable of L10n roundtrip.
I kind of think that the word workflow(s) is superfluous here..
Maybe the wording should be something like
5. Be capable of localization roundtrip

Compatibility probably survives here from the time when this goal mentioned the XLIFF standard that you could comply with..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think your suggestion is in line with @eemeli's and I like the simple wording too. I'll update the document.

@nciric nciric self-requested a review May 21, 2020 19:30
@echeran
Copy link
Collaborator

echeran commented May 27, 2020

The document after the changes looks good to me.

I'm even starting to think that the goal no. 5, which is phrased as follows after my most recent changes:

Ensure compatibility with the localization roundtrip workflows.

…could be removed and instead be a design principle. It sounds much more like a how than a what. It helps us put bounds on proposals, and it will surely confine the design of the standard.

I didn't remove it because we didn't discuss this during the meeting, but I'd like to hear what people think about it.

As far as whether to remove goal number 5, I could justify it either way. My slight preference is to leave it as it is, and it fits a little better here than elsewhere.

@stasm
Copy link
Collaborator Author

stasm commented May 28, 2020

As far as whether to remove goal number 5, I could justify it either way. My slight preference is to leave it as it is, and it fits a little better here than elsewhere.

Let's leave it as an explicit goal, perhaps with a slightly improved wording.

@stasm stasm changed the title Draft of goals and non-goals Goals and non-goals Jun 1, 2020
@stasm
Copy link
Collaborator Author

stasm commented Jun 1, 2020

@romulocintra I haven't seen any opposition to merging this. Would you like to go ahead and merge this PR?

@romulocintra romulocintra merged commit 829a6ba into unicode-org:master Jun 1, 2020
@romulocintra
Copy link
Collaborator

Well done Group!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.