Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect behavior of SortedDictionary when used with OrdinalIgnoreCase comparer #29112

Closed
AndreyChechel opened this issue Mar 28, 2019 · 4 comments

Comments

@AndreyChechel
Copy link

Number of entries added to a SortedDictionary shouldn't be affected by the order of insertion.
The issue is reproducible in non-Windows environments (checked in .NET Core 2.1 / 2.2).
See playground.

There are 5 entries being inserted in the sample below. The dictionary is supposed to have 4 entries, since two records have the equal keys in terms of OrdinalIgnoreCase comparer,

  • ["narınnarın"] = 3
  • ["narinnarin"] = 4

Proving snippet:

var equal = string.Equals("narınnarın", "narinnarin", StringComparison.OrdinalIgnoreCase);
Console.WriteLine(equal); // "True" (on Linux/Mac)

Sample

using System;
using System.Collections.Generic;
using System.Reflection;
using System.Runtime.Versioning;

public class Program
{
  public static void Main()
  {
    Console.WriteLine(Environment.OSVersion);

    // ------------------------------------------------------------------------------
    // Test Case 1
    // Expected output: 4
    // Actual output: 4
    // ------------------------------------------------------------------------------

    var dict1 = new SortedDictionary<string, int>(StringComparer.OrdinalIgnoreCase);

    dict1["naryn"] = 1;
    dict1["нарын"] = 2;
    dict1["narınnarın"] = 3;
    dict1["narinnarin"] = 4;
    dict1["ナルインナルイン"] = 5;

    Console.WriteLine(dict1.Count); // Outputs 4 - OK

    // ------------------------------------------------------------------------------
    // Test Case 2 (changed order of insertion)
    // Expected output: 4
    // Actual output: 5
    // ------------------------------------------------------------------------------

    var dict2 = new SortedDictionary<string, int>(StringComparer.OrdinalIgnoreCase);

    dict2["naryn"] = 1;
    dict2["нарын"] = 2;
    dict2["ナルインナルイン"] = 3;
    dict2["narınnarın"] = 4;
    dict2["narinnarin"] = 5;

    Console.WriteLine(dict2.Count); // Outputs 5 (Expected 4)
  }
}

Actual Output

Unix 5.0.3.200
4
5

Expected Output

Unix 5.0.3.200
4
4
@pgovind
Copy link

pgovind commented Sep 16, 2019

I spent some time debugging this and here's what I found:

I don't think there is a bug in the implementation of SortedDictionary(or SortedSet which is what it leverages). The interesting bit is the following lines:

            var equal = string.Equals("narınnarın", "narinnarin", StringComparison.OrdinalIgnoreCase);
            Console.WriteLine(equal); // True on Mac
            var rightCompare = StringComparer.OrdinalIgnoreCase.Compare("naryn", "narınnarın"); // equals -1
            var leftCompare = StringComparer.OrdinalIgnoreCase.Compare("naryn", "narinnarin"); // equals 16

This is important because "narınnarın" is the right child of "naryn". When we go looking for "narinnarin", we check Compare("naryn", "narinnarin"). If "narınnarın" and "narinnarin" are indeed equal, we should have gone right at "naryn" and found "narınnarın". Instead, we go left and find null and insert into the SortedSet. Why are rightCompare and leftCompare different? This may be a string compare bug. Tagging @tarekgh to potentially take a look?

@tarekgh
Copy link
Member

tarekgh commented Sep 17, 2019

When you are comparing "naryn" and "narınnarın", the first different characters are 'y' and 'ı'. The 'y' ordinal value is 0x79 and the ordinal value of 'ı' is 0x131. This means 'y' < 'ı' which means "naryn" < "narınnarın" and that is why you got negative number.

When you are comparing "naryn" and "narinnarin", the first different characters are 'y' and 'i'. The 'y' ordinal value is 0x79 and the ordinal value of 'i' is 0x69. This means 'y' > 'i' which means "naryn" > "narinnarin" and that is why you got positive number.

Note, the casing is done only for checking the equality but not for deciding which string less than or greater the other string.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@GrabYourPitchforks
Copy link
Member

Seems like it might be an ICU bug? Per @pgovind's comment, the collation engine is returning s1 < s2, s1 > s3, and s2 == s3. Those three relationships when taken together are nonsensical.

@GrabYourPitchforks GrabYourPitchforks self-assigned this Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Jun 17, 2020
Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Dec 17, 2024
@dotnet-policy-service dotnet-policy-service bot removed this from the Future milestone Dec 31, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2025
@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Jan 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants