Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JpegEncoder - Optimize Some Low Hanging Fruit #1533

Merged
merged 15 commits into from
Feb 4, 2021

Conversation

JimBobSquarePants
Copy link
Member

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

I had a go at optimizing some of the methods identified by @tkp1n here
#1476 (comment)

BEFORE

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.102
  [Host]     : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  DefaultJob : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
Method Mean Error StdDev Ratio
'System.Drawing Jpeg' 17.37 ms 0.092 ms 0.082 ms 1.00
'ImageSharp Jpeg' 23.58 ms 0.220 ms 0.205 ms 1.36

AFTER

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.102
  [Host]     : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  DefaultJob : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
Method Mean Error StdDev Ratio RatioSD
'System.Drawing Jpeg' 17.98 ms 0.149 ms 0.132 ms 1.00 0.00
'ImageSharp Jpeg' 21.33 ms 0.302 ms 0.296 ms 1.19 0.02

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Feb 3, 2021

This is all working on my SB2. I'll have to crack open the mac to see what's happening here. 😢
Also working on my 2015 MacBook Pro

@JimBobSquarePants
Copy link
Member Author

Definitely a Jit issue. Removing inlining changed the error. I’ll manually inline everything into the method.

@codecov
Copy link

codecov bot commented Feb 3, 2021

Codecov Report

Merging #1533 (461e59d) into master (edff37d) will decrease coverage by 0.04%.
The diff coverage is 72.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1533      +/-   ##
==========================================
- Coverage   83.52%   83.47%   -0.05%     
==========================================
  Files         742      742              
  Lines       32802    32830      +28     
  Branches     3671     3667       -4     
==========================================
+ Hits        27399    27406       +7     
- Misses       4689     4709      +20     
- Partials      714      715       +1     
Flag Coverage Δ
unittests 83.47% <72.34%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...geSharp/Formats/Jpeg/Components/GenericBlock8x8.cs 87.87% <ø> (ø)
...rc/ImageSharp/Formats/Jpeg/Components/Block8x8F.cs 85.65% <37.83%> (-4.51%) ⬇️
...omponents/Encoder/YCbCrForwardConverter{TPixel}.cs 64.28% <50.00%> (ø)
src/ImageSharp/Formats/Jpeg/Components/RowOctet.cs 91.42% <90.90%> (+3.92%) ⬆️
src/ImageSharp/Formats/Jpeg/JpegEncoderCore.cs 94.86% <100.00%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edff37d...461e59d. Read the comment docs.

@JimBobSquarePants
Copy link
Member Author

Everything works now. Will try and recreate the JIT issue and open an issue upstream

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one suggestion.

@saucecontrol
Copy link
Contributor

Is this code just for dividing your 8x8 blocks by the quantization table? If so, why not calculate the reciprocal of the QT values up front so you can multiply instead? Also, you can probably use vroundps (Avx.RoundToNearestInteger) instead of the offset calc you currently have, assuming you're just truncating that result later.

@JimBobSquarePants
Copy link
Member Author

Is this code just for dividing your 8x8 blocks by the quantization table? If so, why not calculate the reciprocal of the QT values up front so you can multiply instead? Also, you can probably use vroundps (Avx.RoundToNearestInteger) instead of the offset calc you currently have, assuming you're just truncating that result later.

I'll be honest @saucecontrol and say it's been a while since I looked at all this so I've forgotten a lot of how jpeg works. I'm just limiting changes to known bottlenecks (that part was actually the least of the three).

Regarding Avx.RoundToNearestInteger. I thought best to leave the expected result for both pipelines identical for now.

@saucecontrol
Copy link
Contributor

Ha, I've just started re-learning everything I've ever semi-understood about JPEG. Didn't expect the quantization to be much of a bottleneck, but it seems you got some good gains from widening the vectors, so it may be worth looking into some other time.

@JimBobSquarePants JimBobSquarePants merged commit 27135a0 into master Feb 4, 2021
@JimBobSquarePants JimBobSquarePants deleted the js/jpeg-encoder-perf branch February 4, 2021 05:17
@JimBobSquarePants
Copy link
Member Author

Ha, I've just started re-learning everything I've ever semi-understood about JPEG

Man... I cannot wait to see what you bring to the table.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Feb 4, 2021

Getting so close now...

BenchmarkDotNet=v0.12.1.1467-nightly, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.102
  [Host]       : .NET 5.0.2 (5.0.220.61120), X64 RyuJIT
  .Net 5.0 CLI : .NET 5.0.2 (5.0.220.61120), X64 RyuJIT

Job=.Net 5.0 CLI  Arguments=/p:DebugType=portable  Toolchain=.NET 5.0
IterationCount=5  LaunchCount=1  WarmupCount=5
Method Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
'System.Drawing Load, Resize, Save' 412.63 ms 40.228 ms 6.225 ms 1.00 0.00 - - - 11 KB
'ImageSharp Load, Resize, Save' 167.79 ms 13.961 ms 2.160 ms 0.41 0.01 333.3333 - - 1,988 KB
'ImageMagick Load, Resize, Save' 441.81 ms 20.898 ms 5.427 ms 1.07 0.02 - - - 53 KB
'ImageFree Load, Resize, Save' 274.35 ms 10.106 ms 1.564 ms 0.67 0.01 6000.0000 6000.0000 6000.0000 91 KB
'MagicScaler Load, Resize, Save' 79.77 ms 7.361 ms 1.912 ms 0.19 0.00 - - - 104 KB
'SkiaSharp Canvas Load, Resize, Save' 261.03 ms 9.696 ms 2.518 ms 0.63 0.01 - - - 99 KB
'SkiaSharp Bitmap Load, Resize, Save' 258.87 ms 9.651 ms 1.493 ms 0.63 0.01 - - - 84 KB
'NetVips Load, Resize, Save' 160.20 ms 9.794 ms 2.543 ms 0.39 0.01 - - - 47 KB

JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021
JpegEncoder - Optimize Some Low Hanging Fruit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants