Diacritical marks are ignored when differientiating #169

hadithmv · 2022-10-02T22:38:43Z

While mergely does not by default ignore things like whitespace, case and accents, it does however ignore diacritical marks.

I've included a picture as well as a diff file that shows this for a couple of languages:

RL2vASt1.zip

hadithmv · 2022-10-02T22:44:10Z

It seems I can only download a single diff file from the site, yet import requires two separate files for two different sides. So instead here's a link: https://editor.mergely.com/UsHJyREC/

hadithmv · 2022-10-02T22:51:34Z

Or because merge left/right arrows are there, it does know that there are differences, just doesn't visually show them with colors as expected.

While the diacritic mark is small by itself, in actual use it always goes on top of or under another letter. So ideally, both the diacritic and the letter it is on should be highlighted during differentiation. Otherwise, the word.

wickedest · 2022-10-03T12:18:50Z

@hadithmv, thanks for reporting the issue. there are actually 2 separate diffs being run. The 1st is by line, and only detects lines that are different. The 2nd is by character, within the line. It appears the 2nd diff may be failing diacritical marks for some reason.

hadithmv · 2024-06-11T15:02:42Z

Its been a while, and I don't know if this would be of help, but,

This is how vscode's shows it on their diff compare, it highlights the exact place where the change occurs. I thought that since it was open source, it might be possible to check out how they are getting this to work.

hadithmv · 2024-06-15T07:00:21Z

here's how another diff compare tool fixed this particular issue:

https://gitlab.gnome.org/GNOME/meld/-/commit/d272016a481a9f9563d318a79ce9767cb8ad69ef

might be of help.

wickedest · 2024-06-15T11:25:36Z

Sorry - this has been on the bottom of the pile for a while.

This is a tough one. Normally, diacriticals can be combined into a single character, e.g. a + ´ = á. This can be done with String.normalize("NFC"), but normalize does not work in this case.

For example, if we look at كلمة and كَلمة:

I am not a unicode expert, so I'm a bit lost because while it behaves like a diacritical, as far as I can tell, \u064E is something else. I understand diacriticals to be in these ranges (source):

0300-036F
1AB0-1AFF
1DC0-1DFF
20D0-20FF
2DE2-2DFF
FE20-FE2F

\u064E is not in those ranges and is identified as an "Arabic Fatha". Unlike \u0300 is a "Diacritical".

My guess is that the Fatha cannot be combined into a single character using normalize and whenever Mergely encounters these Fatha's, then it should include the adjacent character in the diff (i.e. ['0643', '064E']). But that isn't going to be easy. For starters, outside of the ranges above, I have no idea how many unicode characters behave this way.

wickedest · 2024-06-15T11:30:36Z

The problem seems to be specifically with nonspacing marks. The diacritical might only be a subset. Only a small number of "common" marks can be normalized into a single character. So the issue here is handling the situation when it cannot be normalized (which I think would happen a lot).

#169)

# [5.3.0](v5.2.0...v5.3.0) (2024-06-16) ### Features * Supports unicode diacritical marks when rendering line diff (fixes [#169](#169)) ([#197](#197)) ([a469a65](a469a65))

github-actions · 2024-06-16T19:37:21Z

🎉 This issue has been resolved in version 5.3.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

wickedest added the bug label Oct 3, 2022

wickedest added a commit that referenced this issue Jun 16, 2024

feat: Supports unicode diacritical marks when rendering line diff (fixes

5f3034f

#169)

wickedest closed this as completed in a469a65 Jun 16, 2024

github-actions bot pushed a commit that referenced this issue Jun 16, 2024

chore(release): 5.3.0 [skip ci]

64a20ee

# [5.3.0](v5.2.0...v5.3.0) (2024-06-16) ### Features * Supports unicode diacritical marks when rendering line diff (fixes [#169](#169)) ([#197](#197)) ([a469a65](a469a65))

github-actions bot added the released label Jun 16, 2024

hadithmv mentioned this issue Sep 18, 2024

Still cant seem to be able to get mergely diacritical marks differentiating to show #204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diacritical marks are ignored when differientiating #169

Diacritical marks are ignored when differientiating #169

hadithmv commented Oct 2, 2022

hadithmv commented Oct 2, 2022

hadithmv commented Oct 2, 2022 •

edited

Loading

wickedest commented Oct 3, 2022

hadithmv commented Jun 11, 2024 •

edited

Loading

hadithmv commented Jun 15, 2024

wickedest commented Jun 15, 2024

wickedest commented Jun 15, 2024 •

edited

Loading

github-actions bot commented Jun 16, 2024

Diacritical marks are ignored when differientiating #169

Diacritical marks are ignored when differientiating #169

Comments

hadithmv commented Oct 2, 2022

hadithmv commented Oct 2, 2022

hadithmv commented Oct 2, 2022 • edited Loading

wickedest commented Oct 3, 2022

hadithmv commented Jun 11, 2024 • edited Loading

hadithmv commented Jun 15, 2024

wickedest commented Jun 15, 2024

wickedest commented Jun 15, 2024 • edited Loading

github-actions bot commented Jun 16, 2024

hadithmv commented Oct 2, 2022 •

edited

Loading

hadithmv commented Jun 11, 2024 •

edited

Loading

wickedest commented Jun 15, 2024 •

edited

Loading