-
-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PackFromRgbPlanes AVX2 vectorised implementation for Rgba32 and Rgba24 pixels #1242
Closed
Closed
Changes from 12 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
5771c2c
Add PackFromRgbPlanes AVX2 vectorised implementation for Rgba32 and R…
john-h-k 883344c
Fix build
JimBobSquarePants e907126
Fix slicing
JimBobSquarePants f7289ee
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants a50fc32
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants 0510783
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants 09f464f
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants 4a90255
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants c35b3a8
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants 660b110
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants 6ed1928
Merge remote-tracking branch 'upstream/master' into add-packfromrgbpl…
JimBobSquarePants dd071a2
Fix refs
JimBobSquarePants f449283
Fix slicing
JimBobSquarePants 3b216c7
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants cf78b85
Merge branch 'master' into add-packfromrgbplanes
JimBobSquarePants 3298603
Merge remote-tracking branch 'origin/master' into add-packfromrgbplanes
antonfirsov f4cabf2
ImageMaths -> Numerics
antonfirsov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -128,6 +128,103 @@ internal static void NormalizedFloatToByteSaturate(ReadOnlySpan<float> source, S | |
} | ||
} | ||
|
||
internal static void PackBytesToUInt32SaturateChannel4( | ||
ReadOnlySpan<byte> channel0, | ||
ReadOnlySpan<byte> channel1, | ||
ReadOnlySpan<byte> channel2, | ||
Span<byte> dest) | ||
{ | ||
DebugGuard.IsTrue(channel0.Length == dest.Length, nameof(channel0), "Input spans must be of same length!"); | ||
DebugGuard.IsTrue(channel1.Length == dest.Length, nameof(channel1), "Input spans must be of same length!"); | ||
DebugGuard.IsTrue(channel2.Length == dest.Length, nameof(channel2), "Input spans must be of same length!"); | ||
|
||
#if SUPPORTS_RUNTIME_INTRINSICS | ||
HwIntrinsics.PackBytesToUInt32SaturateChannel4Reduce(ref channel0, ref channel1, ref channel2, ref dest); | ||
|
||
// I can't immediately see a way to do this operation efficiently with Vector<T> or Vector4<T>. TODO | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is none :) |
||
#elif SUPPORTS_EXTENDED_INTRINSICS | ||
// ExtendedIntrinsics.PackBytesToUInt32SaturateChannel4Reduce(ref channel0, ref channel1, ref channel2, ref dest); | ||
#else | ||
// BasicIntrinsics256.PackBytesToUInt32SaturateChannel4Reduce(ref channel0, ref channel1, ref channel2, ref dest); | ||
#endif | ||
|
||
// Deal with the remainder: | ||
if (channel0.Length > 0) | ||
{ | ||
PackBytesToUInt32SaturateChannel4Remainder(channel0, channel1, channel2, dest); | ||
} | ||
} | ||
|
||
private static void PackBytesToUInt32SaturateChannel4Remainder( | ||
ReadOnlySpan<byte> channel0, | ||
ReadOnlySpan<byte> channel1, | ||
ReadOnlySpan<byte> channel2, | ||
Span<byte> dest) | ||
{ | ||
DebugGuard.MustBeGreaterThanOrEqualTo(dest.Length, channel0.Length * 4, nameof(dest)); | ||
|
||
ref byte s0Base = ref MemoryMarshal.GetReference(channel0); | ||
ref byte s1Base = ref MemoryMarshal.GetReference(channel1); | ||
ref byte s2Base = ref MemoryMarshal.GetReference(channel2); | ||
ref byte dBase = ref MemoryMarshal.GetReference(dest); | ||
|
||
for (int i = 0, j = 0; i < dest.Length; i += 1, j += 4) | ||
{ | ||
Unsafe.Add(ref dBase, j) = Unsafe.Add(ref s0Base, i); | ||
Unsafe.Add(ref dBase, j + 1) = Unsafe.Add(ref s1Base, i); | ||
Unsafe.Add(ref dBase, j + 2) = Unsafe.Add(ref s2Base, i); | ||
Unsafe.Add(ref dBase, j + 2) = 0xFF; | ||
} | ||
} | ||
|
||
internal static void PackBytesToUInt24( | ||
ReadOnlySpan<byte> channel0, | ||
ReadOnlySpan<byte> channel1, | ||
ReadOnlySpan<byte> channel2, | ||
Span<byte> dest) | ||
{ | ||
DebugGuard.IsTrue(channel0.Length == dest.Length, nameof(channel0), "Input spans must be of same length!"); | ||
DebugGuard.IsTrue(channel1.Length == dest.Length, nameof(channel1), "Input spans must be of same length!"); | ||
DebugGuard.IsTrue(channel2.Length == dest.Length, nameof(channel2), "Input spans must be of same length!"); | ||
|
||
#if SUPPORTS_RUNTIME_INTRINSICS | ||
HwIntrinsics.PackBytesToUInt24Reduce(ref channel0, ref channel1, ref channel2, ref dest); | ||
|
||
// I can't immediately see a way to do this operation efficiently with Vector<T> or Vector4<T>. TODO | ||
#elif SUPPORTS_EXTENDED_INTRINSICS | ||
// ExtendedIntrinsics.PackBytesToUInt24Reduce(ref channel0, ref channel1, ref channel2, ref dest); | ||
#else | ||
// BasicIntrinsics256.PackBytesToUInt24Reduce(ref channel0, ref channel1, ref channel2, ref dest); | ||
#endif | ||
|
||
// Deal with the remainder: | ||
if (channel0.Length > 0) | ||
{ | ||
PackBytesToUInt24Remainder(channel0, channel1, channel2, dest); | ||
} | ||
} | ||
|
||
private static void PackBytesToUInt24Remainder( | ||
ReadOnlySpan<byte> channel0, | ||
ReadOnlySpan<byte> channel1, | ||
ReadOnlySpan<byte> channel2, | ||
Span<byte> dest) | ||
{ | ||
DebugGuard.MustBeGreaterThanOrEqualTo(dest.Length, channel0.Length * 3, nameof(dest)); | ||
|
||
ref byte s0Base = ref MemoryMarshal.GetReference(channel0); | ||
ref byte s1Base = ref MemoryMarshal.GetReference(channel1); | ||
ref byte s2Base = ref MemoryMarshal.GetReference(channel2); | ||
ref byte dBase = ref MemoryMarshal.GetReference(dest); | ||
|
||
for (int i = 0, j = 0; i < dest.Length; i += 1, j += 3) | ||
{ | ||
Unsafe.Add(ref dBase, j) = Unsafe.Add(ref s0Base, i); | ||
Unsafe.Add(ref dBase, j + 1) = Unsafe.Add(ref s1Base, i); | ||
Unsafe.Add(ref dBase, j + 2) = Unsafe.Add(ref s2Base, i); | ||
} | ||
} | ||
|
||
[MethodImpl(InliningOptions.ColdPath)] | ||
private static void ConvertByteToNormalizedFloatRemainder(ReadOnlySpan<byte> source, Span<float> dest) | ||
{ | ||
|
@@ -192,6 +289,16 @@ private static void VerifySpanInput(ReadOnlySpan<byte> source, Span<float> dest, | |
$"length should be divisible by {shouldBeDivisibleBy}!"); | ||
} | ||
|
||
[Conditional("DEBUG")] | ||
private static void VerifySpanInput(ReadOnlySpan<byte> source, Span<byte> dest, int shouldBeDivisibleBy) | ||
{ | ||
DebugGuard.IsTrue(source.Length == dest.Length, nameof(source), "Input spans must be of same length!"); | ||
DebugGuard.IsTrue( | ||
ImageMaths.ModuloP2(dest.Length, shouldBeDivisibleBy) == 0, | ||
nameof(source), | ||
$"length should be divisible by {shouldBeDivisibleBy}!"); | ||
} | ||
|
||
[Conditional("DEBUG")] | ||
private static void VerifySpanInput(ReadOnlySpan<float> source, Span<byte> dest, int shouldBeDivisibleBy) | ||
{ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
38 changes: 38 additions & 0 deletions
38
src/ImageSharp/PixelFormats/PixelImplementations/Rgb24.PixelOperations.cs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
// Copyright (c) Six Labors. | ||
// Licensed under the Apache License, Version 2.0. | ||
|
||
using System; | ||
using System.Runtime.InteropServices; | ||
|
||
namespace SixLabors.ImageSharp.PixelFormats | ||
{ | ||
/// <content> | ||
/// Provides optimized overrides for bulk operations. | ||
/// </content> | ||
public partial struct Rgb24 | ||
{ | ||
/// <summary> | ||
/// <see cref="PixelOperations{TPixel}"/> implementation optimized for <see cref="Rgb24"/>. | ||
/// </summary> | ||
internal partial class PixelOperations : PixelOperations<Rgb24> | ||
{ | ||
/// <inheritdoc /> | ||
public override void PackFromRgbPlanes( | ||
Configuration configuration, | ||
ReadOnlySpan<byte> redChannel, | ||
ReadOnlySpan<byte> greenChannel, | ||
ReadOnlySpan<byte> blueChannel, | ||
Span<Rgb24> destination) | ||
{ | ||
Guard.NotNull(configuration, nameof(configuration)); | ||
Guard.IsTrue(redChannel.Length == greenChannel.Length, nameof(redChannel), "Red channel must be same size as green channel"); | ||
Guard.IsTrue(greenChannel.Length == blueChannel.Length, nameof(greenChannel), "Green channel must be same size as blue channel"); | ||
Guard.DestinationShouldNotBeTooShort(redChannel, destination, nameof(destination)); | ||
|
||
destination = destination.Slice(0, redChannel.Length); | ||
|
||
SimdUtils.PackBytesToUInt32SaturateChannel4(redChannel, greenChannel, blueChannel, MemoryMarshal.AsBytes(destination)); | ||
} | ||
} | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not missing something here we can pack these with less work in a manner similar to how it's done in the jpeg color converter?
ImageSharp/src/ImageSharp/Formats/Jpeg/Components/Decoder/ColorConverters/JpegColorConverter.FromYCbCrSimdAvx2.cs
Line 44 in 120080b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's the same problem. The unpack operations work in-lane, so whatever starts in the upper lane of the input will be in the upper lanes of the output. The current code re-permutes after each round to keep things in the right place throughout. The other options are:
Option 2 will be cheaper since extract costs the same as a permute, and as with the YCbCr conversion, you can get by with only permuting 3 inputs for 4 outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought so, thanks for confirming. 👍