Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] iOS - Size reduction: optimize corelib runtime type checks in generic functions #54849

Closed
Tracked by #61815
imhameed opened this issue Jun 28, 2021 · 4 comments
Closed
Tracked by #61815
Assignees
Labels
area-Codegen-meta-mono runtime-mono specific to the Mono runtime
Milestone

Comments

@imhameed
Copy link
Contributor

Seen while investigating size differences between legacy Mono and netcore Mono when used for iOS. Functions that call ThrowHelper.ThrowForUnsupportedNumericVectorBaseType are a great example of what goes wrong; with both mini and LLVM we fail to elide impossible branches and we end up repeatedly generating kilobytes of noisy dead code across several functions.

@imhameed imhameed added area-Codegen-meta-mono runtime-mono specific to the Mono runtime labels Jun 28, 2021
@imhameed imhameed added this to the 6.0.0 milestone Jun 28, 2021
@imhameed imhameed self-assigned this Jun 28, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 28, 2021
@imhameed imhameed removed the untriaged New issue has not been triaged by the area owner label Jun 28, 2021
@SamMonoRT SamMonoRT changed the title [mono] Size reduction: optimize corelib runtime type checks [mono] iOS - Size reduction: optimize corelib runtime type checks Jun 28, 2021
@imhameed imhameed changed the title [mono] iOS - Size reduction: optimize corelib runtime type checks [mono] iOS - Size reduction: optimize corelib runtime type checks in generic functions Jul 7, 2021
@imhameed
Copy link
Contributor Author

imhameed commented Jul 8, 2021

I wrote a toy implementation of an intrinsic that allows some type equality comparisons to be evaluated during AOT compilation here: imhameed@4d747b0#diff-ce66e94ce2cdb8e54003a124a4057ce5d5dc5325ad32c9601a4662f8fa39367aR1808

I've disabled partial generic sharing in this exploratory branch to get genuinely monomorphic instantiations of generic functions. Shared generic instantiations fetch auxiliary data through some sort of generic context. I don't completely understand this yet, but there is some documentation covering shared generics here. For posterity, here is my understanding of what partial generic sharing (as distinct from generic sharing in general) is meant to accomplish, as explained to me by Zoltan:

Suppose you have a generic function that takes one type parameter. Call it f<T>. Partial generic sharing is apparently meant to allow sharing generated code between instantiations taking primitive types and instantiations taking enums whose underlying types are the same primitives--e.g. the code generated for f<int> will also correctly handle a f<enum-whose-underlying-type-is-int>.

This is fine when the function is parametric over its type parameters but can be problematic otherwise. Examples:

Both cause lots of branchy runtime type checking code to be generated.

For these examples, we'd ideally generate specialized monomorphic code for each supported primitive type (ideally this would just wrap a single instruction, maybe with a safepoint poll on function entry), plus a single function that does nothing but immediately raise an exception, shared across all other invalid types.

Some preliminary size results from full AOT compilation of System.Private.CoreLib.dll on amd64 Linux:

  • with partial generic sharing enabled: 9.01 MiB of executable code in .text,
  • with partial generic sharing disabled: 8.73 MiB of executable code in .text, and
  • with partial generic sharing disabled and the aforementioned intrinsic: 8.07 MiB of executable code in .text.

This is a reduction of ~985.66 KiB (the result is ~89% as large as the baseline).

Improvements are more modest on a linked version of the stdlib. Here are some local numbers for size-comparison/MySingleView, with a linked/trimmed copy of System.Private.CoreLib.dll-llvm.o compiled by hand:

  • with partial generic sharing: 1.87 MiB of executable code in __text,
  • with partial generic sharing disabled: 1.80 MiB of executable code in __text, and
  • with partial generic sharing disabled and the aforementioned intrinsic: 1.71 MiB of executable code in __text.

The same for System.Private.CoreLib.dll.o, compiled by hand:

  • with partial generic sharing: 688 KiB of executable code in __text,
  • with partial generic sharing disabled: 665 KiB of executable code in __text, and
  • with partial generic sharing disabled and the aforementioned intrinsic: 665 KiB of executable code in __text.

This is an improvement of ~191.32 KiB (the result is ~92% as large as the baseline).

We have a dead code elimination pass and some constant folding passes that operate on mini IR so I'm not sure why the GSHAREDVT instantiations of some System.Numerics.Vector functions (e.g. System_Numerics_Vector_1_T_GSHAREDVT_ScalarDivide_T_GSHAREDVT_T_GSHAREDVT) aren't minimized into an immediate call to mono_arch_throw_exception, which is partly why System.Private.CoreLib.dll.o isn't even smaller with my changes. I haven't investigated this yet.

@lambdageek
Copy link
Member

I guess the motivation for partial generic sharing is "we want some valuetype sharing, but we don't want gsharedvt where we can't even copy values to a registers". But it doesn't seem like it's finding a lot of opportunities to pay off and in the case of these helper functions that use typeof(T) effectively as compile-time checks, it's actually an optimization blocker.

In the long term, we will want more control over AOT specialization (particularly for numerics heavy code using the static virtual interface methods feature), so perhaps we would want some version of partial generic sharing (perhaps more aggressive - so not only ints and enums are shared, but maybe wrappers around vector types).

But I think for now it makes sense to take the new intrinsic and to flip the default for partial generic sharing to off, and provide an option to turn it back on through command line options.

@imhameed imhameed modified the milestones: 6.0.0, 7.0.0 Jul 8, 2021
@vargaz
Copy link
Contributor

vargaz commented Jul 10, 2021

The reason this type of sharing is used is mainly to support enums, i.e. for a method Sort, a Sort<T_INT> instance could handle all enums whose base type is int. So this is not really an optimization but required to be able to handle enum and generics efficiently.

The problem in this case is that optimizing this code requires higher level optimizations than the ones currently implemented in the mono JIT. I.e.

    IL_0000:  ldtoken    !!T
    IL_0005:  call       [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle([System.Runtime]System.RuntimeTypeHandle)
    IL_000a:  ldtoken    [System.Runtime]System.Byte
    IL_000f:  call       [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle([System.Runtime]System.RuntimeTypeHandle)
    IL_0014:  call       bool [System.Runtime]System.Type::op_Inequality([System.Runtime]System.Type,
                                                                         [System.Runtime]System.Type)
    IL_0019:  brfalse    IL_015f

In theory, the JIT could determine that the result of ldtoken !!T is a type which is either int, or an enum whose base type is int, and eliminate the comparison, but it requires the lowering of ldtoken instruction to happen later, i.e. the JIT IR would need to include an LDTOKEN opcode, and a JIT pass which operates on them.

@imhameed
Copy link
Contributor Author

imhameed commented Aug 20, 2021

In the specific case I'm calling out here (ThrowHelper.ThrowForUnsupportedNumericVectorBaseType), ints don't have the same behavior as an enum whose base type is int. In theory full specialization with after-the-fact deduplication could achieve the same goal as generic sharing and partial generic sharing, although it requires a more whole-program view of the project being compiled so it's not a localized, simple, or easy change. And as you pointed out in a chat a while ago, doing deduplication using the same tools that already exist for C++ involves support from the linker, but ICF/"comdat folding" isn't universally available. Also possibly related is #52559.

vargaz added a commit to vargaz/runtime that referenced this issue Sep 27, 2021
Comparisons of the form
```
        internal static void ThrowForUnsupportedIntrinsicsVectorBaseType<T>() where T : struct
        {
            if (typeof(T) != typeof(byte) && typeof(T) != typeof(sbyte) &&
            ...
        }
```
are very common in BCL code. They are compiled to:
```
    IL_0001:  ldtoken    !!T
    IL_0006:  call       [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle([System.Runtime]System.RuntimeTypeHandle)
    IL_000b:  ldtoken    [System.Runtime]System.Byte
    IL_0010:  call       [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle([System.Runtime]System.RuntimeTypeHandle)
    IL_0015:  call       bool [System.Runtime]System.Type::op_Inequality([System.Runtime]System.Type,
                                                                         [System.Runtime]System.Type)
    IL_001a:  brfalse    IL_015a
```

In gshared code, !!T is usually constrained to be a reference, or to have a primitive base type.
Use this fact to eliminate some of the Type::op_Equality/op_Inequality () calls.

Fixes dotnet#54849.
@vargaz vargaz closed this as completed in b8db0ea Sep 28, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Codegen-meta-mono runtime-mono specific to the Mono runtime
Projects
None yet
Development

No branches or pull requests

3 participants