InlineIfLambda doesn't seem to apply when using `|>` #12388

mrange · 2021-11-13T20:34:41Z

Repro steps

I am experimenting with InlineIfLambda and tried to implement a simple push stream library to see if I get some performance gains. I do see some pretty exciting ones from using InlineIfLambda but when I use the |> forward pipeline it doesn't seem inlining applies.

I have written a simple push stream library with some tests to illustrate the issue I see. Make sure to run it in Release mode without debugger applied.

open System
open System.Diagnostics

type 'T PushStream = ('T->bool)->bool

module PushStream =
  let inline ofRange b e ([<InlineIfLambda>] r) =
      let mutable i = b
      while i <= e && r i do
        i <- i + 1
      i >= e
    
  let inline filter ([<InlineIfLambda>] f) ([<InlineIfLambda>] ps) ([<InlineIfLambda>] r) =
    ps (fun v -> if f v then r v else true)

  let inline map ([<InlineIfLambda>] f) ([<InlineIfLambda>] ps) ([<InlineIfLambda>] r) =
    ps (fun v -> r (f v))

  let inline fold ([<InlineIfLambda>] f) z ([<InlineIfLambda>] ps) =
    let mutable s = z
    ps (fun v -> s <- f s v; true) |> ignore
    s

// Simplistic performance measurement
let now =
  let sw = Stopwatch ()
  sw.Start ()
  fun () -> sw.ElapsedMilliseconds

let time n a =
  let inline cc i = GC.CollectionCount i

  GC.Collect (2, GCCollectionMode.Forced, true) |> ignore
  GC.WaitForFullGCComplete () |> ignore

  let bcc0, bcc1, bcc2  = cc 0, cc 1, cc 2
  let before            = now ()
  for i = 1 to n do
    a () |> ignore
  let after             = now ()
  let acc0, acc1, acc2  = cc 0, cc 1, cc 2

  let cc      = acc0 - bcc0, acc1 - bcc1, acc2 - bcc2
  let elapsed = after - before
  elapsed, cc

open PushStream


// The baseline, the pipeline implemented as plain old loop
let baseLine () =
  let mutable s = 0L
  for i = 0 to 10000 do
    let i = i + 1
    if (i &&& 1) = 0 then
      s <- s + int64 i
  s

// How I would like the code to work
let pipeLine () =
  ofRange 0 10000
  |> map ((+) 1)
  |> filter (fun v -> (v &&& 1) = 0)
  |> map int64
  |> fold (+) 0L

// Trying to improve performance by avoid use of |>
let implicit () =
  fold (+) 0L (map int64 (filter (fun v -> (v &&& 1) = 0) (map ((+) 1) (ofRange 0 10000))))

// Trying to improve performance by writing explicit lambdas
let explicit () =
  fold (+) 0L (fun r -> map int64 (fun r -> filter (fun v -> (v &&& 1) = 0) (fun r -> map ((+) 1) (fun r -> ofRange 0 10000 r) r) r) r)

let testCases = 
  [|
    "baseLine"  , baseLine
    "pipeLine"  , pipeLine
    "implicit"  , implicit
    "explicit"  , explicit
  |]

for nm, tc in testCases do
  let r = tc ()
  printfn "Result of %s : %d" nm r

for nm, tc in testCases do
  printfn "Warming up: %s" nm
  for i = 0 to 1000 do
    tc () |> ignore

for nm, tc in testCases do
  printfn "Running: %s ..." nm
  let elapsed, cc = time 20000 tc
  printfn "  ... took %d ms with cc: %A" elapsed cc

The numbers I see

Running: baseLine ...
  ... took 137 ms with cc: (0, 0, 0)
Running: pipeLine ...
  ... took 801 ms with cc: (1, 0, 0)
Running: implicit ...
  ... took 185 ms with cc: (0, 0, 0)
Running: explicit ...
  ... took 172 ms with cc: (0, 0, 0)

First it's pretty exciting to see that functional pipeline can come close to matching the baseline but I would like the pipeline code to get the same gains as the explicit case.

Expected behavior

The testCases pipeLine, explicit and implicit all have the same performance which is roughly close to the baseline.

Actual behavior

Testcase pipeLine performs worse than explicit and implicit as it seems the inlining is not applied when using |>

Known workarounds

Don't use |> when inlining lambdas has significant performance benefits.

Related information

Operating system: Windows 10
.NET Runtime kind: .NET 6.0.100

The text was updated successfully, but these errors were encountered:

mrange · 2021-11-13T20:35:02Z

This might be more of a feature request than bug request when I think about it.

En3Tho · 2021-11-13T21:19:55Z

I think it's better to make a benchmark using Benchmark.Net to better measure perf and allocations (which might come from non-inlined code, eg closures).

Judging from your numbers in case of pipe compiler can't really optimize the code so real currying takes action (calling nested closures one by one) resulting in a substantially worse perf.

mrange · 2021-11-14T09:38:04Z

Benchmark.NET numbers:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1348 (21H2)
Intel Core i5-3570K CPU 3.40GHz (Ivy Bridge), 1 CPU, 4 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT DEBUG
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT


|   Method |       Mean |     Error |    StdDev |
|--------- |-----------:|----------:|----------:|
| Baseline |   6.821 us | 0.0540 us | 0.0505 us |
|     Linq | 149.053 us | 0.2072 us | 0.1837 us |
| PipeLine |  34.643 us | 0.0765 us | 0.0716 us |
| Implicit |   9.110 us | 0.0487 us | 0.0456 us |
| Explicit |   8.298 us | 0.0136 us | 0.0121 us |

mrange · 2021-11-14T09:49:17Z

Decompiled into C# (no decompile to F# yet in dnspy)

	public long Baseline()
	{
		long s = 0L;
		for (int i = 0; i < 10001; i++)
		{
			int j = i + 1;
			if ((j & 1) == 0)
			{
				s += (long)j;
			}
		}
		return s;
	}

	public long Linq()
	{
		return Enumerable.Range(0, 10001).Select(new Func<int, int>(Program.Linq@40.Invoke)).Where(new Func<int, bool>(Program.Linq@40-1.Invoke)).Select(new Func<int, long>(Program.Linq@40-2.Invoke)).Sum();
	}

	public long PipeLine()
	{
		FSharpFunc<FSharpFunc<int, bool>, bool> _instance = Program.PipeLine@45.@_instance;
		FSharpFunc<FSharpFunc<int, bool>, bool> arg = new Program.PipeLine@46-2(@_instance);
		FSharpFunc<FSharpFunc<long, bool>, bool> fsharpFunc = new Program.PipeLine@47-4(arg);
		FSharpRef<long> fsharpRef = new FSharpRef<long>(0L);
		bool flag = fsharpFunc.Invoke(new Program.PipeLine@48-6(fsharpRef));
		return fsharpRef.contents;
	}

	public long Implicit()
	{
		long num = 0L;
		int num2 = 0;
		for (;;)
		{
			bool flag;
			if (num2 <= 10000)
			{
				int num3 = num2;
				int num4 = 1 + num3;
				if ((num4 & 1) == 0)
				{
					long num5 = (long)num4;
					num += num5;
					flag = true;
				}
				else
				{
					flag = true;
				}
			}
			else
			{
				flag = false;
			}
			if (!flag)
			{
				break;
			}
			num2++;
		}
		bool flag2 = num2 > 10000;
		return num;
	}

	public long Explicit()
	{
		FSharpRef<long> fsharpRef = new FSharpRef<long>(0L);
		int num = 0;
		for (;;)
		{
			bool flag;
			if (num <= 10000)
			{
				int num2 = num;
				flag = Program.r@56(fsharpRef, 1 + num2);
			}
			else
			{
				flag = false;
			}
			if (!flag)
			{
				break;
			}
			num++;
		}
		bool flag2 = num > 10000;
		return fsharpRef.contents;
	}
	
        internal static bool r@56(FSharpRef<long> s, int v)
	{
		if ((v & 1) == 0)
		{
			long num = (long)v;
			s.contents += num;
			return true;
		}
		return true;
	}

From this I would not suspect Implicit and Explicit to perform differently but at least on my machine there seems to be a diff.

En3Tho · 2021-11-14T09:54:59Z

Can you please add MemoryDiagnoser attribute to your benchmark class?

I think the compiler could benefit from "un-currying" and "lambdafying" to perform better in pipeline cases.

Also, I think if you stick to "explicit pipeline" style, your perf will be more or less the same with "explicit" one, eg map (fun x -> x + 1) etc

mrange · 2021-11-14T10:03:37Z

I added MemoryAnalyzer and this is what I got

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1348 (21H2)
Intel Core i5-3570K CPU 3.40GHz (Ivy Bridge), 1 CPU, 4 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]    : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT DEBUG
  RyuJitX64 : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT

Job=RyuJitX64  Jit=RyuJit  Platform=X64

|   Method |       Mean |     Error |    StdDev | Allocated |
|--------- |-----------:|----------:|----------:|----------:|
| Baseline |   6.866 us | 0.0690 us | 0.0646 us |         - |
|     Linq | 147.643 us | 0.4201 us | 0.3929 us |     400 B |
| PipeLine |  34.323 us | 0.1181 us | 0.1105 us |     168 B |
| Implicit |   9.064 us | 0.0410 us | 0.0383 us |         - |
| Explicit |   8.260 us | 0.0234 us | 0.0219 us |      24 B |

mrange · 2021-11-14T10:05:43Z

The code I use with Benchmark.NET

open System
open System.Linq

open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running

module PushStream =
  let inline ofRange b e ([<InlineIfLambda>] r) =
      let mutable i = b
      while i <= e && r i do
        i <- i + 1
      i > e
    
  let inline filter ([<InlineIfLambda>] f) ([<InlineIfLambda>] ps) ([<InlineIfLambda>] r) =
    ps (fun v -> if f v then r v else true)

  let inline map ([<InlineIfLambda>] f) ([<InlineIfLambda>] ps) ([<InlineIfLambda>] r) =
    ps (fun v -> r (f v))

  let inline fold ([<InlineIfLambda>] f) z ([<InlineIfLambda>] ps) =
    let mutable s = z
    ps (fun v -> s <- f s v; true) |> ignore
    s

open PushStream

[<MemoryDiagnoser>]
[<RyuJitX64Job>]
type FsInlineBenchmark() =
  class
    [<Benchmark>]
    member x.Baseline() =
      let mutable s = 0L
      for i = 0 to 10000 do
        let i = i + 1
        if (i &&& 1) = 0 then
          s <- s + int64 i
      s

    [<Benchmark>]
    member x.Linq() =
      Enumerable.Range(0,10001).Select((+) 1).Where(fun v -> (v &&& 1) = 0).Select(int64).Sum()

    [<Benchmark>]
    member x.PipeLine () =
      ofRange 0 10000
      |> map ((+) 1)
      |> filter (fun v -> (v &&& 1) = 0)
      |> map int64
      |> fold (+) 0L

    [<Benchmark>]
    member x.Implicit () =
      fold (+) 0L (map int64 (filter (fun v -> (v &&& 1) = 0) (map ((+) 1) (ofRange 0 10000))))

    [<Benchmark>]
    member x.Explicit () =
      fold (+) 0L (fun r -> map int64 (fun r -> filter (fun v -> (v &&& 1) = 0) (fun r -> map ((+) 1) (fun r -> ofRange 0 10000 r) r) r) r)

  end
BenchmarkRunner.Run<FsInlineBenchmark>() |> ignore

mrange · 2021-11-14T16:07:02Z

Found a potential work-around

let inline (|>>) ([<InlineIfLambda>] v : _ -> _) ([<InlineIfLambda>] f : _ -> _) = f v

let pipeLineVariant () =
  ofRange 0 10000
  |>> map ((+) 1)
  |>> filter (fun v -> (v &&& 1) = 0)
  |>> map int64
  |>> fold (+) 0L
let v = pipeLineVariant ()

This seems to generate code that is much more optimal than if I use |>. I will update the benchmark with this later.

mrange · 2021-11-14T16:14:41Z

PipeLineVariant is the updated pipeline example using |>> above

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1348 (21H2)
Intel Core i5-3570K CPU 3.40GHz (Ivy Bridge), 1 CPU, 4 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]    : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT DEBUG
  RyuJitX64 : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT

Job=RyuJitX64  Jit=RyuJit  Platform=X64

|          Method |       Mean |     Error |    StdDev | Allocated |
|---------------- |-----------:|----------:|----------:|----------:|
|        Baseline |   6.906 us | 0.0613 us | 0.0573 us |         - |
|            Linq | 149.370 us | 0.2530 us | 0.2113 us |     400 B |
|        PipeLine |  34.717 us | 0.1054 us | 0.0986 us |     168 B |
| PipeLineVariant |   9.149 us | 0.0313 us | 0.0293 us |         - |
|        Implicit |   9.144 us | 0.0283 us | 0.0264 us |         - |
|        Explicit |   8.341 us | 0.0149 us | 0.0132 us |      24 B |

En3Tho · 2022-02-05T10:50:05Z

Related: fsharp/fslang-suggestions#1105

dsyme · 2022-02-07T13:01:11Z

This might be more of a feature request than bug request when I think about it.

Yes, please make this a language suggestion or feature request. It is by-design as things stand

mrange added the Bug label Nov 13, 2021

mrange changed the title ~~InlineIfLambda doesn't apply when using |>~~ InlineIfLambda doesn't seem to apply when using |> Nov 13, 2021

dsyme added Feature Improvement and removed Bug labels Nov 15, 2021

dsyme closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InlineIfLambda doesn't seem to apply when using `|>` #12388

InlineIfLambda doesn't seem to apply when using `|>` #12388

mrange commented Nov 13, 2021

mrange commented Nov 13, 2021

En3Tho commented Nov 13, 2021

mrange commented Nov 14, 2021

mrange commented Nov 14, 2021 •

edited

Loading

En3Tho commented Nov 14, 2021 •

edited

Loading

mrange commented Nov 14, 2021

mrange commented Nov 14, 2021 •

edited

Loading

mrange commented Nov 14, 2021 •

edited

Loading

mrange commented Nov 14, 2021

En3Tho commented Feb 5, 2022 •

edited

Loading

dsyme commented Feb 7, 2022

InlineIfLambda doesn't seem to apply when using |> #12388

InlineIfLambda doesn't seem to apply when using |> #12388

Comments

mrange commented Nov 13, 2021

mrange commented Nov 13, 2021

En3Tho commented Nov 13, 2021

mrange commented Nov 14, 2021

mrange commented Nov 14, 2021 • edited Loading

En3Tho commented Nov 14, 2021 • edited Loading

mrange commented Nov 14, 2021

mrange commented Nov 14, 2021 • edited Loading

mrange commented Nov 14, 2021 • edited Loading

mrange commented Nov 14, 2021

En3Tho commented Feb 5, 2022 • edited Loading

dsyme commented Feb 7, 2022

InlineIfLambda doesn't seem to apply when using `|>` #12388

InlineIfLambda doesn't seem to apply when using `|>` #12388

mrange commented Nov 14, 2021 •

edited

Loading

En3Tho commented Nov 14, 2021 •

edited

Loading

mrange commented Nov 14, 2021 •

edited

Loading

mrange commented Nov 14, 2021 •

edited

Loading

En3Tho commented Feb 5, 2022 •

edited

Loading