-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diacritical marks are ignored when differientiating #169
Comments
It seems I can only download a single diff file from the site, yet import requires two separate files for two different sides. So instead here's a link: https://editor.mergely.com/UsHJyREC/ |
Or because merge left/right arrows are there, it does know that there are differences, just doesn't visually show them with colors as expected. While the diacritic mark is small by itself, in actual use it always goes on top of or under another letter. So ideally, both the diacritic and the letter it is on should be highlighted during differentiation. Otherwise, the word. |
@hadithmv, thanks for reporting the issue. there are actually 2 separate diffs being run. The 1st is by line, and only detects lines that are different. The 2nd is by character, within the line. It appears the 2nd diff may be failing diacritical marks for some reason. |
here's how another diff compare tool fixed this particular issue: https://gitlab.gnome.org/GNOME/meld/-/commit/d272016a481a9f9563d318a79ce9767cb8ad69ef might be of help. |
Sorry - this has been on the bottom of the pile for a while. This is a tough one. Normally, diacriticals can be combined into a single character, e.g. For example, if we look at I am not a unicode expert, so I'm a bit lost because while it behaves like a diacritical, as far as I can tell,
\u064E is not in those ranges and is identified as an "Arabic Fatha". Unlike \u0300 is a "Diacritical". My guess is that the Fatha cannot be combined into a single character using normalize and whenever Mergely encounters these Fatha's, then it should include the adjacent character in the diff (i.e. |
The problem seems to be specifically with nonspacing marks. The diacritical might only be a subset. Only a small number of "common" marks can be normalized into a single character. So the issue here is handling the situation when it cannot be normalized (which I think would happen a lot). |
🎉 This issue has been resolved in version 5.3.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
While mergely does not by default ignore things like whitespace, case and accents, it does however ignore diacritical marks.
I've included a picture as well as a diff file that shows this for a couple of languages:
RL2vASt1.zip
The text was updated successfully, but these errors were encountered: