Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix undefined behaviour in Array.cs #83116

Merged
merged 1 commit into from
Mar 8, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions src/libraries/System.Private.CoreLib/src/System/Array.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1332,31 +1332,31 @@ public static unsafe int IndexOf<T>(T[] array, T value, int startIndex, int coun
if (sizeof(T) == sizeof(byte))
{
int result = SpanHelpers.IndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<byte[]>(array)), startIndex),
ref Unsafe.Add(ref Unsafe.As<T, byte>(ref MemoryMarshal.GetArrayDataReference(array)), startIndex),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates an extra generic method instantiation. It should be ok - we are unlikely to hit this path for too many different Ts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its typically fine when we can restrict the callers, but this is a general public API accessible to Span<T> and that can be hit for any user-defined T.

Given we've notably hit several aliasing bugs in the past for the JIT and had to fix them, including ones that caused invalid codegen, this seems like something we should generally be protecting against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3rd party code also notably looks to code in the BCL to see what it can and cannot do a lot of the time.

There are plenty of users that will see this code and copy it as is, not understanding the limitations or that they may need to fix it if the JIT ever changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does address computation only, without dereferencing anything. The JIT should not run into any aliasing issues with it given the optimizations that it does today. In case the JIT starts doing more aggressive optimizations that depend on no aliasing, there are likely more issues to fix.

We have said many times that CoreLib is special and that it is not ok to copy code from it in general. If people copy code without understanding what it does and whether it is ok to copy it, it is their problem. I do not know how to help them.

I do not have a problem with taking this change. I am just saying that it does not make sense to fix all instances of this issue in CoreLib.

Copy link
Member

@tannergooding tannergooding Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does address computation only, without dereferencing anything.

I don't believe it has to. We had general bugs in GetArrayDataReference which is why the generic version was made intrinsic. The non-generic version has, afaik, the same general issue since it is not intrinsic yet today.

For example we hit #79760 (comment) which was a fairly complex issue related to assertion prop and inlining caused by the aliasing of T[] to RawArrayData. There were at least 3-4 other issues that were also caused by the general aliasing.

In this case, GetArrayDataReference will still itself have a deref, because it's doing ref Unsafe.As<RawData>(array).Data and so a similar issue to the above can still occur.

We have said many times that CoreLib is special and that it is not ok to copy code from it in general. If people copy code without understanding what it does and whether it is ok to copy it, it is their problem. I do not know how to help them.

I think the only "fix" here is to ensure that such code is annotated with an explanation of why it is ok for corelib and not elsewhere. Even within the BCL team such knowledge is not always common place for these types of specialized scenarios and may end up in unintended places due to copy/paste, refactoring, etc.

Having a small code comment calling out something like "// DANGEROUS: Depends on runtime internals" is effectively a minimum safety net to help prevent these types of mistakes.

Copy link
Member

@tannergooding tannergooding Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixing the non-generic GetArrayDataReference(Array) to be intrinsic and may be another valid fix.

Doing that would also notably remove one of the Unsafe.As generic instantiations.

Copy link
Contributor Author

@MichalPetryka MichalPetryka Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, GetArrayDataReference will still itself have a deref, because it's doing ref Unsafe.As(array).Data and so a similar issue to the above can still occur.

Not true anymore since it was made an intrinsic.

I think fixing the non-generic GetArrayDataReference(Array) to be intrinsic and may be another valid fix.

Doing that would also notably remove one of the Unsafe.As generic instantiations.

This would work too, I didn't do it in my GADR PR though since I wasn't sure what nodes to emit there to handle both SZ and MD arrays.

Could you remind me what was the issue with making JIT intrinsics not cause instantiations when handled by the JIT?

Copy link
Member

@jkotas jkotas Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does address computation only, without dereferencing anything. The JIT should not run into any aliasing issues with it given the optimizations that it does today

I don't believe it has to. We had general bugs in GetArrayDataReference which is why the generic version was made intrinsic.

I meant memory writes. I believe that the aliasing bugs only occur with interleaving memory reads and writes given what the JIT can do today.

Non-generic GetArrayDataReference(Array) has to do more work to deal with MD arrays. We would have to do expansion of the non-generic GetArrayReference based on the actual type of the incoming argument to avoid the extra work for SD arrays.

Could you remind me what was needed to make JIT intrinsics not cause instantiations when handled by the JIT?

#82917 (comment)

Unsafe.As<T, byte>(ref value),
count);
return (result >= 0 ? startIndex : 0) + result;
}
else if (sizeof(T) == sizeof(short))
{
int result = SpanHelpers.IndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<short[]>(array)), startIndex),
ref Unsafe.Add(ref Unsafe.As<T, short>(ref MemoryMarshal.GetArrayDataReference(array)), startIndex),
Unsafe.As<T, short>(ref value),
count);
return (result >= 0 ? startIndex : 0) + result;
}
else if (sizeof(T) == sizeof(int))
{
int result = SpanHelpers.IndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<int[]>(array)), startIndex),
ref Unsafe.Add(ref Unsafe.As<T, int>(ref MemoryMarshal.GetArrayDataReference(array)), startIndex),
Unsafe.As<T, int>(ref value),
count);
return (result >= 0 ? startIndex : 0) + result;
}
else if (sizeof(T) == sizeof(long))
{
int result = SpanHelpers.IndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<long[]>(array)), startIndex),
ref Unsafe.Add(ref Unsafe.As<T, long>(ref MemoryMarshal.GetArrayDataReference(array)), startIndex),
Unsafe.As<T, long>(ref value),
count);
return (result >= 0 ? startIndex : 0) + result;
Expand Down Expand Up @@ -1580,7 +1580,7 @@ public static unsafe int LastIndexOf<T>(T[] array, T value, int startIndex, int
{
int endIndex = startIndex - count + 1;
int result = SpanHelpers.LastIndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<byte[]>(array)), endIndex),
ref Unsafe.Add(ref Unsafe.As<T, byte>(ref MemoryMarshal.GetArrayDataReference(array)), endIndex),
Unsafe.As<T, byte>(ref value),
count);

Expand All @@ -1590,7 +1590,7 @@ ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<byte[]>(array))
{
int endIndex = startIndex - count + 1;
int result = SpanHelpers.LastIndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<short[]>(array)), endIndex),
ref Unsafe.Add(ref Unsafe.As<T, short>(ref MemoryMarshal.GetArrayDataReference(array)), endIndex),
Unsafe.As<T, short>(ref value),
count);

Expand All @@ -1600,7 +1600,7 @@ ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<short[]>(array)
{
int endIndex = startIndex - count + 1;
int result = SpanHelpers.LastIndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<int[]>(array)), endIndex),
ref Unsafe.Add(ref Unsafe.As<T, int>(ref MemoryMarshal.GetArrayDataReference(array)), endIndex),
Unsafe.As<T, int>(ref value),
count);

Expand All @@ -1610,7 +1610,7 @@ ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<int[]>(array)),
{
int endIndex = startIndex - count + 1;
int result = SpanHelpers.LastIndexOfValueType(
ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Unsafe.As<long[]>(array)), endIndex),
ref Unsafe.Add(ref Unsafe.As<T, long>(ref MemoryMarshal.GetArrayDataReference(array)), endIndex),
Unsafe.As<T, long>(ref value),
count);

Expand Down