Implement the SSE hardware intrinsics. #15538

tannergooding · 2017-12-15T06:45:44Z

This implements basic codegen support for all currently declared SSE intrinsics except for the Store intrinsics.

tannergooding · 2017-12-15T06:46:32Z

src/jit/emitxarch.cpp

@@ -107,7 +108,7 @@ bool emitter::IsDstSrcSrcAVXInstruction(instruction ins)
 {
    return IsAVXInstruction(ins) &&
           (ins == INS_movlpd || ins == INS_movlps || ins == INS_movhpd || ins == INS_movhps || ins == INS_movss ||
-            ins == INS_movlhps || ins == INS_sqrtss || ins == INS_sqrtsd || ins == INS_cvtss2sd || ins == INS_cvtsd2ss);


INS_movlhps was incorrectly listed as DstSrcSrc, which will cause it to generate a different result on AVX vs non-AVX machines.

tannergooding · 2017-12-15T06:52:40Z

src/jit/emitxarch.cpp

+
+void emitter::emitIns_SIMD_R_R_R_I(instruction ins, regNumber reg, regNumber reg1, regNumber reg2, int ival, var_types simdtype)
+{
+    if (UseVEXEncoding() && reg1 != reg)


For both this and emitIns_SIMD_R_R_R above: Do we have a work item tracking support for reg2 being a memory operand when it is properly aligned?

Currently we are generating things like

C4E1791000 vmovupd xmm0, xmmword ptr [rax] C4E14858F0 vaddps xmm6, xmm0

While in some cases we could generate vaddps xmm6, [rax]

reg2 being a memory operand when it is properly aligned?

And in what circumstances will the memory operand be aligned? Only for local variables I guess.

Yeah. It definitely can't be used everywhere, and isn't something I think needs to be immediate.

But, for the cases where the memory will always be aligned (locals, is one example), it would be useful to generate the more efficient code

Only for local variables I guess

@tannergooding @mikedn

It is not true. In general devs writing vectorized code would peel off start of loops to align data and do all the remaining calculations on aligned data until the end of the loops where processing may require change of logic.

This means that all data accessed in the main loops will be aligned and doing anything but vaddps xmm6, [rax] would be an unacceptable loss of the CPU cycles.

In general devs writing vectorized code would peel off start of loops to align data

The key here is "in general". The JIT has no way to know that the address is aligned or not and it cannot make such assumptions simply because "in general" developers do that.

I suspect that it will work if the developer uses LoadAligned to indicate that the memory address is aligned.

I agree that, "in general", users writing performant code will try to keep accesses aligned to avoid reads/writes that cross cache line or page boundaries.

However, as @mikedn points out, assuming this is incorrect and detecting it is likely hard.

It's probably worth a separate discussion (will log a bug in a bit) but I believe it is something we should try to enable (maybe a special intrinsic, compiler hint, or attribute to tell the jit memory accesses will be aligned for a given method).

It's probably worth a separate discussion (will log a bug in a bit)

It is the right approach I think as this not that easy to get it right

tannergooding · 2017-12-15T06:54:25Z

src/jit/hwintrinsiccodegenxarch.cpp

@@ -103,6 +103,133 @@ void CodeGen::genSSEIntrinsic(GenTreeHWIntrinsic* node)
            op2Reg = op2->gtRegNum;
            emit->emitIns_SIMD_R_R_R(INS_addps, targetReg, op1Reg, op2Reg, TYP_SIMD16);
            break;
+
+        case NI_SSE_And:


All the cases (so far) are essentially the same with only the instruction (or immediate value) differing.

It might be worthwhile including this info in the hwintrinsiclist and having them be entries in an array for constant time lookup...

Yes, unlike "classic" instructions SSE/AVX instructions have a more consistent format so it should possible to drive most codegen from tables. Basically you need to add to the list a "form" value that can be used to select between R_R_R and R_R_R_I and of course, the instruction itself.

Well, there's also the problem of memory operands but that's something that the IR node will indicate. So R_R_R in the hw intrinsic list really stands for R_R_RM.

I was actually just thinking of having a single array, something where the entries are {instruction, immediate}. Helper functions (like get_IsSupported) would have INS_INVALID and instructions without an immediate would probably have -1.

There can then be two general groupings. One for instructions without an immediate (those that use SIMD_R_R_R) and one for instructions with an immediate (those that use SIMD_R_R_R_I).

Other instructions (such as Load) which have different semantics can be in their own case grouping.

I don't think we necessarily need a form value, unless you think that would be more efficient.

I do not want to tabulate the codegen of hw intrinsic, which won't always be such simple.
For example, intrinsicID + baseType is not enough to indicate codegen for some intrinsic, I would add a new field into GenTreeHWIntrinsic to represent overloads that would be used here.

I don't think we necessarily need a form value, unless you think that would be more efficient.

The form thing was just a suggestion. Any solution that cuts down the amount of repeated code should be good enough, provided that it doesn't cause other problem.

I would add a new field into GenTreeHWIntrinsic to represent overloads that would be used here.

Hmm? It's not clear why that would be better.

I don't think we should do it for all instructions, but a number of them (especially the binary arithmetic and comparison operators) will all be fairly common and follow roughly the same code path.

Cutting down duplicate code will likely be worthwhile for these operations.

Hmm? It's not clear why that would be better.

@mikedn For example, we need an additional field to distinghuish these two intrinsics (intrinsicID + baseType is not enough).

Vector128<float> GatherVector128(float* baseAddress, Vector128<long> index, byte scale); Vector128<float> GatherVector128(float* baseAddress, Vector128<int> index, byte scale);

My current plan is to add a overloads field in GenTreeHWIntrinsic to bring that information from importer to CodeGen.

Here is the complete list of GatherVector128 to show the necessity.

/// <summary> /// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i32gather_epi64 (__int64 const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<long> GatherVector128(long* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i32gather_epi64 (__int64 const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<ulong> GatherVector128(ulong* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128 _mm_i32gather_ps (float const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128d _mm_i32gather_pd (double const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<double> GatherVector128(double* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i64gather_epi32 (int const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<long> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i64gather_epi32 (int const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector128<long> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i64gather_epi64 (__int64 const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<long> GatherVector128(long* baseAddress, Vector128<long> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128i _mm_i64gather_epi64 (__int64 const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<ulong> GatherVector128(ulong* baseAddress, Vector128<long> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128 _mm_i64gather_ps (float const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector128<long> index, byte scale) => GatherVector128(baseAddress, index, scale); /// <summary> /// __m128d _mm_i64gather_pd (double const* base_addr, __m128i vindex, const int scale) /// </summary> public static unsafe Vector128<double> GatherVector128(double* baseAddress, Vector128<long> index, byte scale) => GatherVector128(baseAddress, index, scale);

intrinsicID + baseType is not enough

Hmm, can't we use a different intrinsic id for each overload?

Hmm, can't we use a different intrinsic id for each overload?

That will break the intrinsic searching because a method name has to map to one intrinsic id in the current [Intrinsic] system.

tannergooding · 2017-12-15T06:55:34Z

src/jit/hwintrinsiclistxarch.h

+HARDWARE_INTRINSIC(SSE_Multiply,                        "Multiply",                     SSE)
+HARDWARE_INTRINSIC(SSE_Or,                              "Or",                           SSE)
+HARDWARE_INTRINSIC(SSE_Reciprocal,                      "Reciprocal",                   SSE)
+HARDWARE_INTRINSIC(SSE_ReciprocalSqrt,                  "ReciprocalSquareRoot",         SSE)


This name is getting fixed in #15471, will update after it is merged.

tannergooding · 2017-12-15T06:57:19Z

I split and categorized the hardware intrinsic tests to make it easier to tell what coverage exists for each ISA and each instruction.

A good portion of the code could be shared and it might be worth refactoring sooner rather than later.

tannergooding · 2017-12-15T07:02:15Z

For the CompareGreaterThan intrinsics, the hardware manual recommends using the inverse operations or by using software emulation (swapping the operands and using a different predicate).

Using the inverse operations is preferred, correct? I was trying to think of reasons why software emulation would be suggested, but came up drawing a blank....

tannergooding · 2017-12-15T07:04:13Z

FYI. @fiigii

I figured I would start on this while waiting for the scalar intrinsic PR to get merged. I believe you are going to be focusing on AVX/AVX2, so I hopefully am not duplicating work you've already done here 😄

mikedn · 2017-12-15T07:14:50Z

src/jit/emitxarch.cpp

+void emitter::emitIns_SIMD_R_R_R_I(
+    instruction ins, regNumber reg, regNumber reg1, regNumber reg2, int ival, var_types simdtype)
+{
+    if (UseVEXEncoding() && reg1 != reg)


Parenthesize reg1 != reg

fiigii · 2017-12-15T08:16:25Z

@tannergooding Thank you so much for the work!

so I hopefully am not duplicating work you've already done here

My next step is to implement Load/Store of AVX/AVX2/SSE/SSE2 because these memory-access intrinsics need a bit more infrastructure.

I recommend you start with simple intrinsics that generate single instruction (e.g., arithmetic, bit manipulation, and comparison, etc.). Then certain helper intrinsics can be expanded to combinations of these simple intrinsics.

fiigii · 2017-12-15T18:01:31Z

For both this and emitIns_SIMD_R_R_R above: Do we have a work item tracking support for reg2 being a memory operand when it is properly aligned?

Yes, this feature is in my plan, we have to enable containment analysis on hardware intrinsic in the future. In C++, users can indicate this codegen by deferring pointers of vectors, but C# has no pointer to managed type. So this feature would considerably impact CQ.
BTW, not only "reg2 being a memory operand", but also reg1 to fold Store*.

mikedn · 2017-12-15T19:23:17Z

but C# has no pointer to managed type

Hmm, it does have that - ref. Aka byref in JIT.

fiigii · 2017-12-15T19:23:29Z

this feature is in my plan, we have to enable containment analysis on hardware intrinsic in the future

To be clear.

var res = Avx.Add(vec1, Avx.Load(address));

should generate

vaddps ymm1, ymm2, [adress]

mikedn · 2017-12-15T19:28:16Z

To be clear...

Yes. Though I suppose it should also work with Avx.Add(vec1, *p) where p is a pointer of appropriate vector type. In fact it's not clear if Load is actually needed. Only LoadAligned is obviously needed because there's no way to encode the fact that the pointer is properly aligned in IL.

fiigii · 2017-12-15T19:29:04Z

Hmm, it does have that - ref. Aka byref in JIT.

But that requires overloading intrinsics with ref and allow converting ref element to/from ref Vector128/256<T>.

mikedn · 2017-12-15T19:34:18Z

But that requires overloading intrinsics with ref and allow converting ref element to/from ref Vector128/256.

Only the load/store intrinsics would need that. I think that was discussed but I don't remember how exactly we ended up only with pointer versions. Anyway, doesn't matter to much, they can be added if deemed useful at some point.

fiigii · 2017-12-15T19:41:42Z

Only the load/store intrinsics would need that.

I meant C++ can compile:

// vec1 is a __m256 and addr is float*
auto a = _mm256_add_ps(vec1, *(__m256*)addr);

to

vaddps  ymm0, ymm0, ymmword ptr [addr]

But, in C#, we have to rely on folding Load into the comsuer intrinsics.

fiigii · 2017-12-15T19:53:09Z

Only LoadAligned is obviously needed because there's no way to encode the fact that the pointer is properly aligned in IL.

Most of the aligned and unaligned instructions have no difference of performance and function on modern Intel CPUs when the load/store fall into one cache-line. Users should be responsible for alignment of pointers.

mikedn · 2017-12-15T20:31:10Z

I meant C++ can compile:

Yes, and similar code will work in C# except that you can't generate addps xmm0, xmmword ptr [addr] because SSE2 requires memory operands to be aligned and the JIT has no way to know if addr is 16 byte aligned or not.

But, in C#, we have to rely on folding Load into the comsuer intrinsics.

For SSE2, yes. Sse2.Add(v1, Sse2.LoadAligned(addr)) should generate addps xmm0, xmmword ptr[addr].

fiigii · 2017-12-17T19:02:06Z

you can't generate addps xmm0, xmmword ptr [addr] because SSE2 requires memory operands to be aligned and the JIT has no way to know if addr is 16 byte aligned or not.

Sse2.Add(v1, Sse2.LoadAligned(addr)) should generate addps xmm0, xmmword ptr[addr].

@mikedn addps does not require 16-byte alignment. Only Type 1 instructions require explicit alignment:

tannergooding · 2017-12-17T19:45:35Z

@fiigii, there are some instructions, like addsubpd, where they explicitly indicate (on the actual instruction page) that the source operand must be aligned or #GP will be generated.

It would be useful to determine which section of the manual is accurate.

mikedn · 2017-12-17T20:05:43Z

Hmm, too lazy to dig through the docs at this hour. Let's test:

__m128 a[42];
__asm addps xmm0, xmmword ptr [a+1]

Crashes with "read access violation". Runs fine with [a] or [a+16].
But:

__m128 a[42];
__asm vaddps xmm0, xmm0, xmmword ptr [a+1]

Runs fine.

So yes, classic SSE2 (no VEX encoded) instructions require 16 byte alignment.

fiigii · 2017-12-17T20:08:57Z

@mikedn @tannergooding Sorry, I was worry. Type 2 instructions also require alignment with legacy SSE encoding

fiigii · 2017-12-17T20:15:39Z

In summary, we should fold Load/Store for all VEX-encoding instructions.
BTW, @mikedn could you point any code that I can learn enabling containment analysis for SIMD operations? I am not very familiar with this part.

mikedn · 2017-12-18T06:35:24Z

could you point any code that I can learn enabling containment analysis for SIMD operations

See the various ContainCheckX functions in lowerxarch.cpp. There's ContainCheckSIMD but it's rather limited in what it does. See ContainCheckBinary too.

In short:

If the second operand is a memory operand (GT_IND usually) you can make it contained by calling MakeSrcContained.
If the second operand is a local variable then you may call SetRegOptional to tell the register allocator that it does not need to load the variable into a register (if it's not already in a register).
If the first operand is a memory operand/local variable and the instruction is commutative then you could try to swap the operands to take advantage of containment
AFAIR there aren't many, if any, SSE/AVX instruction that have a RMW (read/modify/write) form (e.g. no addps xmmword ptr [mem], xmm0). That will simplify things as handling this form is rather convoluted.

tannergooding · 2017-12-24T06:10:30Z

Note: CI results are inaccurate until https://github.com/dotnet/coreclr/issues/15618 is resolved as the IsSupported check is returning false and no intrinsics actually have code being generated.

I am testing locally to ensure Windows x64 is working, but that is all the validation I am doing at this time.

tannergooding · 2017-12-24T22:06:39Z

@fiigii, @mikedn.

Implemented SSE_Shuffle using GT_LIST for the operands. It's fairly straightforward and can be fairly well isolated from the other code paths.

fiigii · 2017-12-25T02:23:05Z

@tannergooding I will change the importer and codgen of HW intrinsic to a table-driven implementation, so that @mikedn's solution ("add a integer data member to GenTreeHWIntrinsic and store the immediate in it.") may be better to unify the exception code.

tannergooding · 2017-12-25T05:31:57Z

Implemented Set, SetAll, and SetZero intrinsics.

All that's left now is Load, Store, and StaticCast (and improving codegen in general).

tannergooding · 2017-12-25T05:54:16Z

src/jit/hwintrinsiccodegenxarch.cpp

+
+            argList = argList->Rest();
+            op3     = argList->Current();
+            ival    = op3->AsIntConCommon()->IconValue();


This fails for the Shuffle_r test, but passes for the Shuffle_ro test. Seems to be because op3 is a GT_CAST when optimizations are not enabled.

Looks like this is because we aren't forcing expansion of the intrinsics (on first pass), which causes other downstream processing on the args to occur.

fiigii · 2017-12-25T06:09:39Z

src/jit/hwintrinsicxarch.cpp

+            assert(sig->numArgs == 1);
+            op1     = impSIMDPopStack(TYP_FLOAT);
+            retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, intrinsic, TYP_FLOAT, 16);
+            break;


I think all the helper intrinsics should be expanded to other "real" intrinsics in the importer. That would simplify the subsequent optimization work.

Do you mean expand to the existing SIMD intrinsics used by System.Numerics.Vector<T>?

No, I mean helper intrinsic should be converted to other HW intrinsics. For example, Sse.SetAll(v) may be expanded to Sse.Shuffle(aNewLoaclVar(v), 0), Sse.SetZero<float> may be expanded to Sse.Xor(v, v).

… Subtract scalar intrinsics

…ubtract scalar intrinsics

…nord scalar intrinsics

…rd scalar intrinsics

…r intrinsics

…intrinsics

…4WithTruncation, Single, and Vector128Single scalar intrinsics

…ithTruncation, Single, and Vector128Single scalar intrinsics

…d scalar intrinsics for op: eq, gt, ge, lt, le, and ne

…scalar intrinsics for op: eq, gt, ge, lt, le, and ne

…LoadScalar intrinsics

…adScalar intrinsics

tannergooding · 2018-01-17T00:33:35Z

Had to resolve a merge conflict with the System.Math.Round intrinsic PR (#14736).

Went ahead and added the function header comments and changed isNumericType to be varTypeIsArithmetic instead.

Will merge once tests are green.

sandreenko · 2018-01-17T07:17:53Z

Do not want to bother you, but frankejit is broken again:
f:\codegenmirror\src\ndp\clr\src\jit\emitxarch.cpp(4140): error C3861: 'emitHandleMemOp': identifier not found [F:\CodegenMirror\src\NDP\clr\src\jit\frankenjit\frankenjit.nativeproj]

So new functions should be under #ifndef LEGACY_BACKEND

tannergooding · 2018-01-17T15:20:07Z

Do not want to bother you, but frankejit is broken again:

No worries. I should have remembered I needed to ifdef the R_R_A_I method after just fixing the others.

Fix should be #15900. I tested locally on the TFS sources and it looks like JIT builds succesfully.

tannergooding commented Dec 15, 2017

View reviewed changes

mikedn reviewed Dec 15, 2017

View reviewed changes

tannergooding commented Dec 25, 2017

View reviewed changes

fiigii reviewed Dec 25, 2017

View reviewed changes

tannergooding added 23 commits January 16, 2018 16:04

Adding support for the SSE Shuffle intrinsic

94f82fb

Adding tests for the SSE Shuffle intrinsic

9e54585

Adding support for the SSE StaticCast intrinsic

224b8dc

Adding tests for the SSE StaticCast intrinsic

1283d87

Adding support for the SSE Add, Divide, Max, Min, Move, Multiply, and…

bfc992e

… Subtract scalar intrinsics

Adding tests for the SSE Add, Divide, Max, Min, Move, Multiply, and S…

38af536

…ubtract scalar intrinsics

Adding support for the SSE compare eq, gt, ge, lt, le, ne, ord, and u…

eaf9aef

…nord scalar intrinsics

Adding tests for the SSE compare eq, gt, ge, lt, le, ne, ord, and uno…

cd60a85

…rd scalar intrinsics

Adding support for the SSE Reciprocal, ReciprocalSqrt, and Sqrt scala…

3fcdaf8

…r intrinsics

Adding tests for the SSE Reciprocal, ReciprocalSqrt, and Sqrt scalar …

db75c98

…intrinsics

Adding support for the SSE ConvertTo Int32, Int32WithTruncation, Int6…

e84b55e

…4WithTruncation, Single, and Vector128Single scalar intrinsics

Adding tests for the SSE ConvertTo Int32, Int32WithTruncation, Int64W…

2a256bd

…ithTruncation, Single, and Vector128Single scalar intrinsics

Adding support for the SSE Compare<op>Ordered and Compare<op>Unordere…

887d5c4

…d scalar intrinsics for op: eq, gt, ge, lt, le, and ne

Adding tests for the SSE Compare<op>Ordered and Compare<op>Unordered …

0817912

…scalar intrinsics for op: eq, gt, ge, lt, le, and ne

Adding support for the SSE Set scalar intrinsic

5ef844b

Adding tests for the SSE Set scalar intrinsic

dee8fb7

Adding support for the SSE MoveMask intrinsic

baeed0c

Adding tests for the SSE MoveMask intrinsic

51a1a59

Updating the SSE HWIntrinsics to share code where possible.

a15aa12

Updating most of the SSE Compare intrinsics to support containment

5ca9417

Adding support for the SSE Load, LoadAligned, LoadHigh, LoadLow, and …

677c5c3

…LoadScalar intrinsics

Adding tests for the SSE Load, LoadAligned, LoadHigh, LoadLow, and Lo…

11b6ac8

…adScalar intrinsics

Resolving PR feedback

a8db845

tannergooding merged commit e522589 into dotnet:master Jan 17, 2018

tannergooding mentioned this pull request Jan 17, 2018

Updating emitIns_R_R_A_I to not be defined for the legacy backend. #15900

Merged

tannergooding mentioned this pull request Jan 17, 2018

Resolving a few issues with the HWIntrinsic code #15901

Merged

fabsenet mentioned this pull request Jan 30, 2019

Can SIMD instruction increase performance? fabsenet/adrilight#78

Open

Implement the SSE hardware intrinsics. #15538

Implement the SSE hardware intrinsics. #15538

Conversation

tannergooding commented Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Dec 15, 2017

tannergooding commented Dec 15, 2017

tannergooding commented Dec 15, 2017

Choose a reason for hiding this comment

fiigii commented Dec 15, 2017

fiigii commented Dec 15, 2017

mikedn commented Dec 15, 2017

fiigii commented Dec 15, 2017

mikedn commented Dec 15, 2017

fiigii commented Dec 15, 2017 • edited Loading

mikedn commented Dec 15, 2017

fiigii commented Dec 15, 2017 • edited Loading

fiigii commented Dec 15, 2017

mikedn commented Dec 15, 2017

fiigii commented Dec 17, 2017

tannergooding commented Dec 17, 2017

mikedn commented Dec 17, 2017

fiigii commented Dec 17, 2017 • edited Loading

fiigii commented Dec 17, 2017

mikedn commented Dec 18, 2017

tannergooding commented Dec 24, 2017

tannergooding commented Dec 24, 2017

fiigii commented Dec 25, 2017

tannergooding commented Dec 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Dec 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jan 17, 2018

sandreenko commented Jan 17, 2018

tannergooding commented Jan 17, 2018

tannergooding commented Dec 15, 2017 •

edited

Loading

fiigii Dec 15, 2017 •

edited

Loading

fiigii Dec 15, 2017 •

edited

Loading

fiigii commented Dec 15, 2017 •

edited

Loading

fiigii commented Dec 15, 2017 •

edited

Loading

fiigii commented Dec 17, 2017 •

edited

Loading

tannergooding Dec 27, 2017 •

edited

Loading