-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x86] conv_depthwise_5x5 optimization #6979
Conversation
lxwlaq
commented
Sep 16, 2021
•
edited
Loading
edited
conv | old | new | rate |
---|---|---|---|
1×240×14×14 s1 | 0.3609 | 0.0802 | 300% |
1×96×28×28 s2 | 0.1745 | 0.04215 | 300% |
1×120×28×28 s1 | 0.7046 | 0.1082 | 600% |
1×72×56×56 s2 | 0.3437 | 0.0982 | 200% |
model | old | new | rate |
---|---|---|---|
MobileNetV3 large | 17.49 | 12.48 | 28.6% |
MobileNetV3 small | 9.32 | 5.34 | 42.7% |
…into depthwise_conv2d
Thanks for your contribution! |
namespace math { | ||
#define Min(a, b) (a < b ? a : b) | ||
#define ROUNDUP(a, b) ((((a) + (b)-1) / (b)) * (b)) | ||
void conv_depthwise_5x5s1s2(const float* din, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写成两个函数实现:conv_depthwise_5x5s1 和 conv_depthwise_5x5s2
__m256 _bias = flag_bias ? _mm256_maskload_ps(bias + c, bias_mask) | ||
: _mm256_set1_ps(0.f); | ||
|
||
if (stride == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
分开两个函数实现,减少if-else 判断
impl_ = new DepthwiseConv<PRECISION(kFloat), PRECISION(kFloat)>; | ||
} | ||
if (dw_kernel && kps_equal && no_dilation && flag_dw && | ||
(flag_dw_5x5 || paddings[0] == 1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddings 只支持1,是指3x3_dw 吗?我看原有实现是都支持呀
@@ -89,11 +88,11 @@ void Conv2dCompute<PRECISION(kFloat), PRECISION(kFloat)>::PrepareForRun() { | |||
const int iw = param.x->dims()[3]; | |||
|
|||
//! select conv impl | |||
if (this->device_ctx->avx_level() != AVXType::AVX_NONE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个非AVX实现有补齐是吗
lite/kernels/x86/conv_depthwise.cc
Outdated
lite::x86::math::unpack8_m256(&output_pack_, param.output); | ||
} else if (pack_size == 4) { | ||
lite::x86::math::unpack4_m128(&output_pack_, param.output); | ||
LOG(FATAL) << "weights scale size must equal to filter size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个报错信息不对吧,应该是conv_depthwise 只支持3x3 5x5 吧
for (auto stride : {1, 2}) { | ||
for (auto pad : {0, 1}) { | ||
for (auto bias : {false, true}) { | ||
for (auto act : {"relu", "leaky_relu"}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加上relu6 的单测吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还有hard_swish单测
r = _mm256_blendv_ps(r, | ||
_mm256_mul_ps(negative_slope, r), | ||
_mm256_cmp_ps(r, zero, 2)); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加上hard_swish 激活判断
@@ -418,27 +417,32 @@ void TestConvDepthwise(Place place, float abs_error = 2e-5) { | |||
// time-consuming | |||
for (int64_t n : {1, 3, 4}) { | |||
for (auto win : {3, 4, 7, 16, 30}) { | |||
std::vector<int64_t> dims{n, 32, win, win}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -74,6 +74,18 @@ void pack_padding8_m256(lite::Tensor* input, | |||
const int channel_num, | |||
const std::vector<int>& paddings); | |||
|
|||
void packC8_with_Cleft(const float* din, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个PR加上注视,介绍函数作用。类似:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r3 = _mm256_blendv_ps(r3, | ||
_mm256_mul_ps(negative_slope, r3), | ||
_mm256_cmp_ps(r3, zero, 2)); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下个PR加上hard_siwsh 激活判断实现
_mm256_mul_ps(negative_slope, r3), | ||
_mm256_cmp_ps(r3, zero, 2)); | ||
} else { | ||
LOG(FATAL) << "[X86] activation type not supported"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这句话修改下: LOG(FATAL) << "[X86] activation type" << static_cast(act_type) << "not supported";
_mm256_mul_ps(negative_slope, r), | ||
_mm256_cmp_ps(r, zero, 2)); | ||
} else { | ||
LOG(FATAL) << "[X86] activation type not supported"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
下个PR完善下上述评论信息 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM