[x86] conv_depthwise_5x5 optimization #6979

lxwlaq · 2021-09-16T12:11:50Z

conv	old	new	rate
1×240×14×14 s1	0.3609	0.0802	300%
1×96×28×28 s2	0.1745	0.04215	300%
1×120×28×28 s1	0.7046	0.1082	600%
1×72×56×56 s2	0.3437	0.0982	200%

model	old	new	rate
MobileNetV3 large	17.49	12.48	28.6%
MobileNetV3 small	9.32	5.34	42.7%

…into depthwise_conv2d

…into dp5

paddle-bot-old · 2021-09-16T12:12:36Z

Thanks for your contribution!

…into dp5

chenjiaoAngel · 2021-09-27T02:58:20Z

lite/backends/x86/math/conv_depthwise_5x5.cc

+namespace math {
+#define Min(a, b) (a < b ? a : b)
+#define ROUNDUP(a, b) ((((a) + (b)-1) / (b)) * (b))
+void conv_depthwise_5x5s1s2(const float* din,


写成两个函数实现：conv_depthwise_5x5s1 和 conv_depthwise_5x5s2

chenjiaoAngel · 2021-09-27T02:59:17Z

lite/backends/x86/math/conv_depthwise_5x5.cc

+      __m256 _bias = flag_bias ? _mm256_maskload_ps(bias + c, bias_mask)
+                               : _mm256_set1_ps(0.f);
+
+      if (stride == 1) {


分开两个函数实现，减少if-else 判断

chenjiaoAngel · 2021-09-27T03:01:59Z

lite/kernels/x86/conv_compute.cc

-      impl_ = new DepthwiseConv<PRECISION(kFloat), PRECISION(kFloat)>;
-    }
+  if (dw_kernel && kps_equal && no_dilation && flag_dw &&
+      (flag_dw_5x5 || paddings[0] == 1)) {


paddings 只支持1，是指3x3_dw 吗？我看原有实现是都支持呀

chenjiaoAngel · 2021-09-27T03:02:17Z

lite/kernels/x86/conv_compute.cc

@@ -89,11 +88,11 @@ void Conv2dCompute<PRECISION(kFloat), PRECISION(kFloat)>::PrepareForRun() {
  const int iw = param.x->dims()[3];

  //! select conv impl
-  if (this->device_ctx->avx_level() != AVXType::AVX_NONE) {


这个非AVX实现有补齐是吗

chenjiaoAngel · 2021-09-27T03:03:24Z

lite/kernels/x86/conv_depthwise.cc

-    lite::x86::math::unpack8_m256(&output_pack_, param.output);
-  } else if (pack_size == 4) {
-    lite::x86::math::unpack4_m128(&output_pack_, param.output);
+    LOG(FATAL) << "weights scale size must equal to filter size";


这个报错信息不对吧，应该是conv_depthwise 只支持3x3 5x5 吧

chenjiaoAngel · 2021-09-27T03:04:00Z

lite/tests/kernels/conv_compute_test.cc

+          for (auto stride : {1, 2}) {
+            for (auto pad : {0, 1}) {
+              for (auto bias : {false, true}) {
+                for (auto act : {"relu", "leaky_relu"}) {


加上relu6 的单测吧

还有hard_swish单测

chenjiaoAngel · 2021-09-27T03:04:41Z

lite/backends/x86/math/conv_depthwise_5x5.cc

+                r = _mm256_blendv_ps(r,
+                                     _mm256_mul_ps(negative_slope, r),
+                                     _mm256_cmp_ps(r, zero, 2));
+              } else {


加上hard_swish 激活判断

chenjiaoAngel · 2021-09-27T03:06:19Z

lite/tests/kernels/conv_compute_test.cc

@@ -418,27 +417,32 @@ void TestConvDepthwise(Place place, float abs_error = 2e-5) {
  // time-consuming
  for (int64_t n : {1, 3, 4}) {
    for (auto win : {3, 4, 7, 16, 30}) {
-      std::vector<int64_t> dims{n, 32, win, win};


TestConvDepthwise 加入单测测试中，当前只有这个测试：

chenjiaoAngel · 2021-09-29T02:48:24Z

lite/backends/x86/math/avx/conv_utils.h

@@ -74,6 +74,18 @@ void pack_padding8_m256(lite::Tensor* input,
                        const int channel_num,
                        const std::vector<int>& paddings);

+void packC8_with_Cleft(const float* din,


下个PR加上注视，介绍函数作用。类似：

chenjiaoAngel · 2021-09-29T02:49:08Z

lite/backends/x86/math/conv_depthwise_5x5.cc

+              r3 = _mm256_blendv_ps(r3,
+                                    _mm256_mul_ps(negative_slope, r3),
+                                    _mm256_cmp_ps(r3, zero, 2));
+            } else {


下个PR加上hard_siwsh 激活判断实现

chenjiaoAngel · 2021-09-29T02:49:47Z

lite/backends/x86/math/conv_depthwise_5x5.cc

+                                    _mm256_mul_ps(negative_slope, r3),
+                                    _mm256_cmp_ps(r3, zero, 2));
+            } else {
+              LOG(FATAL) << "[X86] activation type not supported";


这句话修改下： LOG(FATAL) << "[X86] activation type" << static_cast(act_type) << "not supported";

chenjiaoAngel · 2021-09-29T02:50:01Z

lite/backends/x86/math/conv_depthwise_5x5.cc

+                                   _mm256_mul_ps(negative_slope, r),
+                                   _mm256_cmp_ps(r, zero, 2));
+            } else {
+              LOG(FATAL) << "[X86] activation type not supported";


chenjiaoAngel · 2021-09-29T02:50:28Z

下个PR完善下上述评论信息

chenjiaoAngel

LGTM

lxwlaq added 17 commits August 20, 2021 14:43

add x86 depthwise_conv 3x3s1p1 3x3s2p1 test=develop

b1fd731

merge from develop test=develop

4da6c5e

Fix pre-commit error test=develop

08e2df6

Fix lite/kernels/x86/CMakeLists.txt confict test=develop

71c805c

fix lite/kernels/x86/CMakeLists.txt conflict test=develop

e27c25f

optimize mask about left and right processing test=develop

351522a

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle-Lite …

5037411

…into depthwise_conv2d

fix pre-commit

6716de2

fix lite/kernels/x86/conv_depthwise.cc bug

e3530e1

fix CMakeLists.txt bug

d627821

fix pre-commit bug

c4bc9ef

fix pre-commmit bug

1061670

dethwise5x5 AVX

95be834

merge

4629c6e

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle-Lite …

cca18df

…into dp5

add x86 depthwise 5x5 conv2d avx optimization

110e458

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle-Lite …

60812c1

…into dp5

lxwlaq added 5 commits September 16, 2021 12:43

fix

5c75135

fix

dc0412d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle-Lite …

c0bf078

…into dp5

fix

2ccecc9

fix

2888abb

chenjiaoAngel reviewed Sep 27, 2021

View reviewed changes

add relu6 fuse test and divide s1 s2

353ae20

chenjiaoAngel reviewed Sep 29, 2021

View reviewed changes

chenjiaoAngel approved these changes Sep 29, 2021

View reviewed changes

chenjiaoAngel merged commit 11fed88 into PaddlePaddle:develop Sep 29, 2021

lxwlaq added a commit to lxwlaq/Paddle-Lite that referenced this pull request Nov 17, 2021

[x86] conv_depthwise_5x5 optimization (PaddlePaddle#6979)

396a232

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[x86] conv_depthwise_5x5 optimization #6979

[x86] conv_depthwise_5x5 optimization #6979

lxwlaq commented Sep 16, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 16, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 27, 2021

chenjiaoAngel Sep 29, 2021

chenjiaoAngel Sep 29, 2021

chenjiaoAngel Sep 29, 2021

chenjiaoAngel Sep 29, 2021

chenjiaoAngel Sep 29, 2021

chenjiaoAngel commented Sep 29, 2021

chenjiaoAngel left a comment

[x86] conv_depthwise_5x5 optimization #6979

[x86] conv_depthwise_5x5 optimization #6979

Conversation

lxwlaq commented Sep 16, 2021 • edited Loading

paddle-bot-old bot commented Sep 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenjiaoAngel commented Sep 29, 2021

chenjiaoAngel left a comment

Choose a reason for hiding this comment

lxwlaq commented Sep 16, 2021 •

edited

Loading