-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InlineIfLambda doesn't seem to apply when using |>
#12388
Comments
This might be more of a feature request than bug request when I think about it. |
|>
|>
I think it's better to make a benchmark using Benchmark.Net to better measure perf and allocations (which might come from non-inlined code, eg closures). Judging from your numbers in case of pipe compiler can't really optimize the code so real currying takes action (calling nested closures one by one) resulting in a substantially worse perf. |
Benchmark.NET numbers:
|
Decompiled into C# (no decompile to F# yet in dnspy) public long Baseline()
{
long s = 0L;
for (int i = 0; i < 10001; i++)
{
int j = i + 1;
if ((j & 1) == 0)
{
s += (long)j;
}
}
return s;
}
public long Linq()
{
return Enumerable.Range(0, 10001).Select(new Func<int, int>(Program.Linq@40.Invoke)).Where(new Func<int, bool>(Program.Linq@40-1.Invoke)).Select(new Func<int, long>(Program.Linq@40-2.Invoke)).Sum();
}
public long PipeLine()
{
FSharpFunc<FSharpFunc<int, bool>, bool> _instance = Program.PipeLine@45.@_instance;
FSharpFunc<FSharpFunc<int, bool>, bool> arg = new Program.PipeLine@46-2(@_instance);
FSharpFunc<FSharpFunc<long, bool>, bool> fsharpFunc = new Program.PipeLine@47-4(arg);
FSharpRef<long> fsharpRef = new FSharpRef<long>(0L);
bool flag = fsharpFunc.Invoke(new Program.PipeLine@48-6(fsharpRef));
return fsharpRef.contents;
}
public long Implicit()
{
long num = 0L;
int num2 = 0;
for (;;)
{
bool flag;
if (num2 <= 10000)
{
int num3 = num2;
int num4 = 1 + num3;
if ((num4 & 1) == 0)
{
long num5 = (long)num4;
num += num5;
flag = true;
}
else
{
flag = true;
}
}
else
{
flag = false;
}
if (!flag)
{
break;
}
num2++;
}
bool flag2 = num2 > 10000;
return num;
}
public long Explicit()
{
FSharpRef<long> fsharpRef = new FSharpRef<long>(0L);
int num = 0;
for (;;)
{
bool flag;
if (num <= 10000)
{
int num2 = num;
flag = Program.r@56(fsharpRef, 1 + num2);
}
else
{
flag = false;
}
if (!flag)
{
break;
}
num++;
}
bool flag2 = num > 10000;
return fsharpRef.contents;
}
internal static bool r@56(FSharpRef<long> s, int v)
{
if ((v & 1) == 0)
{
long num = (long)v;
s.contents += num;
return true;
}
return true;
} From this I would not suspect |
Can you please add MemoryDiagnoser attribute to your benchmark class? I think the compiler could benefit from "un-currying" and "lambdafying" to perform better in pipeline cases. Also, I think if you stick to "explicit pipeline" style, your perf will be more or less the same with "explicit" one, eg map (fun x -> x + 1) etc |
I added MemoryAnalyzer and this is what I got
|
The code I use with Benchmark.NET open System
open System.Linq
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
module PushStream =
let inline ofRange b e ([<InlineIfLambda>] r) =
let mutable i = b
while i <= e && r i do
i <- i + 1
i > e
let inline filter ([<InlineIfLambda>] f) ([<InlineIfLambda>] ps) ([<InlineIfLambda>] r) =
ps (fun v -> if f v then r v else true)
let inline map ([<InlineIfLambda>] f) ([<InlineIfLambda>] ps) ([<InlineIfLambda>] r) =
ps (fun v -> r (f v))
let inline fold ([<InlineIfLambda>] f) z ([<InlineIfLambda>] ps) =
let mutable s = z
ps (fun v -> s <- f s v; true) |> ignore
s
open PushStream
[<MemoryDiagnoser>]
[<RyuJitX64Job>]
type FsInlineBenchmark() =
class
[<Benchmark>]
member x.Baseline() =
let mutable s = 0L
for i = 0 to 10000 do
let i = i + 1
if (i &&& 1) = 0 then
s <- s + int64 i
s
[<Benchmark>]
member x.Linq() =
Enumerable.Range(0,10001).Select((+) 1).Where(fun v -> (v &&& 1) = 0).Select(int64).Sum()
[<Benchmark>]
member x.PipeLine () =
ofRange 0 10000
|> map ((+) 1)
|> filter (fun v -> (v &&& 1) = 0)
|> map int64
|> fold (+) 0L
[<Benchmark>]
member x.Implicit () =
fold (+) 0L (map int64 (filter (fun v -> (v &&& 1) = 0) (map ((+) 1) (ofRange 0 10000))))
[<Benchmark>]
member x.Explicit () =
fold (+) 0L (fun r -> map int64 (fun r -> filter (fun v -> (v &&& 1) = 0) (fun r -> map ((+) 1) (fun r -> ofRange 0 10000 r) r) r) r)
end
BenchmarkRunner.Run<FsInlineBenchmark>() |> ignore
|
Found a potential work-around let inline (|>>) ([<InlineIfLambda>] v : _ -> _) ([<InlineIfLambda>] f : _ -> _) = f v
let pipeLineVariant () =
ofRange 0 10000
|>> map ((+) 1)
|>> filter (fun v -> (v &&& 1) = 0)
|>> map int64
|>> fold (+) 0L
let v = pipeLineVariant () This seems to generate code that is much more optimal than if I use |
PipeLineVariant is the updated pipeline example using
|
Related: fsharp/fslang-suggestions#1105 |
Yes, please make this a language suggestion or feature request. It is by-design as things stand |
Repro steps
I am experimenting with InlineIfLambda and tried to implement a simple push stream library to see if I get some performance gains. I do see some pretty exciting ones from using InlineIfLambda but when I use the
|>
forward pipeline it doesn't seem inlining applies.I have written a simple push stream library with some tests to illustrate the issue I see. Make sure to run it in Release mode without debugger applied.
The numbers I see
First it's pretty exciting to see that functional pipeline can come close to matching the baseline but I would like the pipeline code to get the same gains as the explicit case.
Expected behavior
The testCases
pipeLine
,explicit
andimplicit
all have the same performance which is roughly close to the baseline.Actual behavior
Testcase
pipeLine
performs worse thanexplicit
andimplicit
as it seems the inlining is not applied when using|>
Known workarounds
Don't use
|>
when inlining lambdas has significant performance benefits.Related information
The text was updated successfully, but these errors were encountered: