-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give API to measure the space that a string occupies #218
Comments
IIRC, determining the width of a string is a pretty hard problem actually. There are all sorts of crazy Unicode edge cases to handle, there are some assumptions we make in the code manually (eg box-drawing chars are single-width always). Adding @adiviness as he has been working in that area quite a lot. I'd really doubt that we'd be adding another API to conhost. Is there an equivalent API on *nix that we could use for inspiration? |
It is false for CJK languages since 1980s using some fonts, but true for other fonts (like my Sarasa Gothic). |
I have seen multiple libraries trying to "guess" the actual width of a string, like https://github.com/martinheidegger/varsize-string If we have an accurate API then it would greatly help people writing console applications. |
on the unix side, there isn't a great API for terminal emulators. the closest are the wcwidth and wcswidth functions. they effectively operate on code points. most programs (both terminal emulators and editors/tools) tend to just use common examples of complicated rules:
the original question was about the rendering box needed for a particular grapheme in a particular font. this shouldn't matter, but in practice, a lot of fonts (including monospace ones) aren't consistent in their widths/heights. they can be narrower or wider than a single cell requiring manual intervention to center/scale them in the respective cells. freetype/fontconfig are the standard font related libraries in the unix world for rendering. along those lines, wide-characters (i.e. CJK) should be taking up two cells even if the font gets it wrong. otherwise you easily run out of sync with the console's idea of cursor location and the remote application's idea of cursor location. i grok that this might be a fundamental limitation in the existing Windows console code and is not trivial to resolve. hth. |
+1 for having such a function. |
The most important thing is not how you measure the width, it is important that the measurement of terminal app and console app agree with each other. When the width doesn't match, it will mess up all ncurses apps or tmux/screen. So instead of providing another platform dependent function, I strongly suggest using a widely used library like utf8proc (with this patch) to determine charactor width. It follows the Unicode standard mostly. And there are characters with situational width, depending on locales. Make sure you app can handle this or just use the library. |
@kghost @miniksa |
From @alabuzhev in #10592
|
fyi: there is new Unicode Terminal Complex Script Support, or TCSS proposal |
@DHowett-MSFT - I'm super interested in helping with this. At the minimum, you can count on Terminal.Gui as being a test case. Please feel free to reach out (tig (at) kindel (dot) com). |
Note that there's also Contour's Unicode Core proposal, which has already been adopted by a number of other terminals, and at least one application that I'm aware of. |
FWIW To be in line with what is/becomes the default in Windows Terminal, I used the code effective in v1.22 to measure the displayed width of strings. The results are really promising. |
This page mentions На этой странице упоминается предложение с сайта unicode.org о правильной поддержке сложных скриптов в текстовых терминалах. Перевёл его на русский язык. Надеюсь, русскоязычным коллегам пригодится! |
Because the result of wcwidth() execution is not always reliable in practice (for example, it often returns -1 for characters that do occupy screen space, which needs to be taken into account somehow), I used the following hack: I measured the actual width in terminal cells for the first 1,114,111 Unicode characters. The measurement was performed using the following algorithm:
And so on for each character. I did this in the GNOME terminal with default settings. I attach the result, as well as the source code of the program for measurement - you can run it in MS Terminal or in any other terminal and compare the results. Be prepared for the process to be quite lengthy: it took me two days. |
There are quite reasonable thought on this here: |
The primary issue with having an API that asks the terminal is that it requires a costly cross-process roundtrip. Console applications on Windows are already some of the worst of any OS when it comes to performance precisely due to this issue. However, we can solve that by exposing the measurement as a function in kernel32.dll. Since we own the OS and platform, we can simply build the internal APIs needed to inject the Terminal's idea of Unicode into the console processes it owns. This would make it as fast as I'm not a big fan of the comment you linked, because text attributes that influence text measurement would hurt performance. Right now, an ideal terminal can infer the cursor position purely from measuring text after VT parsing (even if VT line renditions are used). If the linked comment's idea would be adopted, this wouldn't work anymore and during text iteration the attributes would need to be checked. I expect that this would halve the performance, or something of that order. @o-sdn-o's character geometry proposal on the other hand wouldn't have this issue. The codepoints they're proposing would be part of the grapheme cluster segmentation that would need to happen regardless. It would allow terminal applications precise control over the width of ambiguous width codepoints (or those few awkward codepoints that are narrow/wide but should be the opposite). I'm not sure whether the vertical sizing, rotation, and halving are needed in practice though. I think they should only be adopted if a stronger need for them arises. I think, if anything, the introduction of a |
Vertical sizing and halving are just side effects. One of the points in this approach: the terminal can operate internally exclusively with 1x1 fragments, this will dramatically simplify its life. Receiving a 3x3 cluster, it breaks it into nine independent (adjacent for the time being) objects of size 1x1. If necessary, for example, for the purpose of copying a selection, it can reassemble this cluster back into a monolith. |
This is an extension to #57.
Under a certain console/PTY, assume the font family/size is specified, give a string, and return the space (a bit mask of the character matrix?) it would occupy.
The text was updated successfully, but these errors were encountered: