Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use UTF-8 Encoding by Default On Windows On Non English Languages #29755

Merged
merged 13 commits into from
Jan 20, 2023

Conversation

nagilson
Copy link
Member

@nagilson nagilson commented Jan 5, 2023

Fix for: #8833, #24757, #29151, and @rawieo's remark: microsoft/vstest#821

Before:
image

After:
image

The main fix is setting the CHCP and OutputEncoding to UTF8 by default on Windows, if a custom language is set.

Things to know:

  • Make sure this works in CMD and Powershell (Works in Powershell, no easy way to test it in CMD. Hasn't been tested, but Powershell seems to the trickier one because it has its own custom preference variable)
  • Make this take effect if the system language is not english as well. Note that this could be weird if VS has different behavior, but VSLANG is respected above OS lang.
  • Add tests if possible ( I did not add tests, they would be a massive pain. There is a fairly close attempt in PR history But we would have to change the SDK to make it testable.)
  • dotnet new duplicated a lot of the old code... but it's also fixed because this runs through Main in the CLI. We could consider removing that code, but I didn't to lower risk of breaks.
  • MSBuild does not respect the environment variables so that's still english ATM. We should do the work to align those once this is approved, I can migrate/copy this code to MSBuild. MSBuild should respect upstream tool language requests msbuild#1596
  • dotnet --debug will have broken encoding, because it runs before the encoding is set. Otherwise the debugger would not be able to attach to the encoding changing code. We don't expect many people to be impacted by this, probably not worth fixing.
    image
  • UTF8 was not super-officially supported until Win10. We should only do this if it's supported to not break Win7 below, especially.

@KalleOlaviNiemitalo
Copy link
Contributor

Does this affect the emoji that dotnet watch outputs? The OEM codepage of the culture might not include those characters.

public virtual void Error(string message, string emoji = "❌")

@nagilson
Copy link
Member Author

nagilson commented Jan 5, 2023

Does this affect the emoji that dotnet watch outputs? The OEM codepage of the culture might not include those characters.

public virtual void Error(string message, string emoji = "❌")

@KalleOlaviNiemitalo That's a really good point. I was considering setting the page to UTF-8. It does indeed break the emoji in certain languages. UTF-8, meanwhile, is fine, though it's somewhat concerning in legacy scenarios.

@KalleOlaviNiemitalo
Copy link
Contributor

I suspect any Console.OutputEncoding change should be limited to Windows only. Is the original problem reproducible on other operating systems?

@nagilson
Copy link
Member Author

nagilson commented Jan 5, 2023

Not sure, but you're right in that InputEncoding doesn't work for certain System.Runtime.Versioning.UnsupportedOSPlatform, like tvos or android, so I am planning to see what happens when you try to use it on an unsupported OS before merging. Thanks for taking a look.

@KalleOlaviNiemitalo
Copy link
Contributor

KalleOlaviNiemitalo commented Jan 6, 2023

I mean, if I'm using Linux via a terminal from the 1980s and have set LANG=fi_FI.ISO-8859-1 to make the tty I/O compatible with that, then I'd like the dotnet CLI to keep the same encoding even if some other environment variable overrides the language. So if the original issues do not reproduce on Linux then it's not worth breaking this scenario just to match Windows.

@nagilson
Copy link
Member Author

nagilson commented Jan 6, 2023

Yes, this does not appear to repro on Linux. I'll check other platforms I can access in a bit, but limiting it to Windows for now 👍

@nagilson nagilson changed the title Set the CHCP Language Page to the Corresponding Culture Use UTF-8 Encoding by Default On Windows Jan 6, 2023
@nagilson
Copy link
Member Author

nagilson commented Jan 6, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nagilson nagilson changed the title Use UTF-8 Encoding by Default On Windows Use UTF-8 Encoding by Default On Windows On Non English Languages Jan 6, 2023
… appears to be rather difficult to get the changed encoding if it happens mid process. We would probably need to write more code for this. Let's see if we want to do that.
…cause it appears to be rather difficult to get the changed encoding if it happens mid process. We would probably need to write more code for this. Let's see if we want to do that."

This reverts commit eada426.
@nagilson nagilson changed the title Use UTF-8 Encoding by Default On Windows On Non English Languages Use UTF-8 Encoding by Default On Windows On Non English Languages & Change Culture For All Based on OS Main Language Jan 7, 2023
@nagilson nagilson marked this pull request as ready for review January 7, 2023 00:24
@nagilson nagilson requested a review from a team January 9, 2023 17:41
Copy link
Member

@dsplaisted dsplaisted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I don't know glob/loc details such as the different between CurrentCulture and CurrentUICulture, so we should get someone who does to review this.

@nagilson
Copy link
Member Author

Thanks, do you have any contacts for loc?

dotnet/dotnet-api-docs#8767 The docs are kinda confusing so I filed an issue here to ask.

@dsplaisted
Copy link
Member

@tarekgh can probably review this or point us to someone who can.

if (OperatingSystem.IsWindows() && Environment.OSVersion.Version.Major >= 10) // Encoding is only an issue on Windows, UTF-8 is only officially supported on 10+.
{
Console.OutputEncoding = DefaultMultilingualEncoding;
Console.InputEncoding = DefaultMultilingualEncoding; // Setting both encodings causes a change in the CHCP, making it so we dont need to P-Invoke ourselves.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik would setting UTF-8 encoding to the console be ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this, https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page, after the May 2019 Windows 10 update UTF-8 is officially supported. Since this only happens in Windows 10+, and when using languages besides English, which already have a bad encoding without this change, I think we can take this change for 8.

cc @dsplaisted for potential approval.

… dotnet to use OS language, but still change the encoding in any case where the language isn't en
@nagilson nagilson changed the title Use UTF-8 Encoding by Default On Windows On Non English Languages & Change Culture For All Based on OS Main Language Use UTF-8 Encoding by Default On Windows On Non English Languages Jan 11, 2023
@nagilson
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Pull request contains merge conflicts.

Copy link
Member

@dsplaisted dsplaisted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nagilson nagilson enabled auto-merge January 20, 2023 17:49
@nagilson nagilson merged commit a23e88e into dotnet:main Jan 20, 2023
JaynieBai pushed a commit to dotnet/msbuild that referenced this pull request May 12, 2023
…8503)

Fixes #1596

Changes Made
SetConsoleUI now calls into a helper which sets the encoding to support non-en languages and checks if an environment variable exists to change the language to.

Testing
Setting DOTNET_CLI_UI_LANGUAGE=ja now changes msbuild correctly:
image

Doing a complicated build (aka building MSBuild) to use multiple threads shows other threads seem to use the same UI culture:

image

See that chcp remains the same after execution:
image

(Was set to 65001 temporarily but back to the original page before execution.)

Notes
Much of this code is a port of this code: dotnet/sdk#29755
There are some details about the code here.

[!] In addition, it will introduce a breaking change for msbuild just like the SDK.
The break is documented here for the sdk: dotnet/docs#34250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants