Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86] conv_depthwise_5x5 optimization #6979

Merged
merged 23 commits into from
Sep 29, 2021
Merged

Conversation

lxwlaq
Copy link
Collaborator

@lxwlaq lxwlaq commented Sep 16, 2021

conv  old new rate
1×240×14×14 s1 0.3609 0.0802 300%
1×96×28×28 s2 0.1745 0.04215 300%
1×120×28×28 s1 0.7046 0.1082 600%
1×72×56×56 s2 0.3437 0.0982 200%
model  old new rate
MobileNetV3 large 17.49 12.48 28.6%
MobileNetV3 small 9.32 5.34 42.7%

@paddle-bot-old
Copy link

Thanks for your contribution!

namespace math {
#define Min(a, b) (a < b ? a : b)
#define ROUNDUP(a, b) ((((a) + (b)-1) / (b)) * (b))
void conv_depthwise_5x5s1s2(const float* din,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写成两个函数实现:conv_depthwise_5x5s1 和 conv_depthwise_5x5s2

__m256 _bias = flag_bias ? _mm256_maskload_ps(bias + c, bias_mask)
: _mm256_set1_ps(0.f);

if (stride == 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

分开两个函数实现,减少if-else 判断

impl_ = new DepthwiseConv<PRECISION(kFloat), PRECISION(kFloat)>;
}
if (dw_kernel && kps_equal && no_dilation && flag_dw &&
(flag_dw_5x5 || paddings[0] == 1)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddings 只支持1,是指3x3_dw 吗?我看原有实现是都支持呀

@@ -89,11 +88,11 @@ void Conv2dCompute<PRECISION(kFloat), PRECISION(kFloat)>::PrepareForRun() {
const int iw = param.x->dims()[3];

//! select conv impl
if (this->device_ctx->avx_level() != AVXType::AVX_NONE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个非AVX实现有补齐是吗

lite::x86::math::unpack8_m256(&output_pack_, param.output);
} else if (pack_size == 4) {
lite::x86::math::unpack4_m128(&output_pack_, param.output);
LOG(FATAL) << "weights scale size must equal to filter size";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个报错信息不对吧,应该是conv_depthwise 只支持3x3 5x5 吧

for (auto stride : {1, 2}) {
for (auto pad : {0, 1}) {
for (auto bias : {false, true}) {
for (auto act : {"relu", "leaky_relu"}) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上relu6 的单测吧

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有hard_swish单测

r = _mm256_blendv_ps(r,
_mm256_mul_ps(negative_slope, r),
_mm256_cmp_ps(r, zero, 2));
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上hard_swish 激活判断

@@ -418,27 +417,32 @@ void TestConvDepthwise(Place place, float abs_error = 2e-5) {
// time-consuming
for (int64_t n : {1, 3, 4}) {
for (auto win : {3, 4, 7, 16, 30}) {
std::vector<int64_t> dims{n, 32, win, win};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestConvDepthwise 加入单测测试中,当前只有这个测试:
image

@@ -74,6 +74,18 @@ void pack_padding8_m256(lite::Tensor* input,
const int channel_num,
const std::vector<int>& paddings);

void packC8_with_Cleft(const float* din,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个PR加上注视,介绍函数作用。类似:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

r3 = _mm256_blendv_ps(r3,
_mm256_mul_ps(negative_slope, r3),
_mm256_cmp_ps(r3, zero, 2));
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个PR加上hard_siwsh 激活判断实现

_mm256_mul_ps(negative_slope, r3),
_mm256_cmp_ps(r3, zero, 2));
} else {
LOG(FATAL) << "[X86] activation type not supported";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句话修改下: LOG(FATAL) << "[X86] activation type" << static_cast(act_type) << "not supported";

_mm256_mul_ps(negative_slope, r),
_mm256_cmp_ps(r, zero, 2));
} else {
LOG(FATAL) << "[X86] activation type not supported";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

@chenjiaoAngel
Copy link
Collaborator

下个PR完善下上述评论信息

Copy link
Collaborator

@chenjiaoAngel chenjiaoAngel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenjiaoAngel chenjiaoAngel merged commit 11fed88 into PaddlePaddle:develop Sep 29, 2021
lxwlaq added a commit to lxwlaq/Paddle-Lite that referenced this pull request Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants