Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

主从全量复制一直timeout #2263

Closed
wangshao1 opened this issue Jan 3, 2024 · 0 comments · Fixed by #2633
Closed

主从全量复制一直timeout #2263

wangshao1 opened this issue Jan 3, 2024 · 0 comments · Fixed by #2633
Assignees
Labels
☢️ Bug Something isn't working

Comments

@wangshao1
Copy link
Collaborator

wangshao1 commented Jan 3, 2024

Is this a regression?

Yes

Description

现象:
执行slaveof同步该数据时一直报timeout,无法完成数据同步。
原因:
pika master节点配置了千兆网卡,出现了网卡降速,降低到100Mb/s.
定位过程:
执行netstat -na|grep {port}发现回包积压严重,查看监控发现内存一直上涨,查看网卡带宽发现降低到100Mb/s,怀疑rsync请求的数据包大于网卡带宽导致请求无法及时收到回包,rsync client应用层重传导致进一步加剧。当前pika节点所在物理机除pika外还部署了一个proxy节点。
运维操作:

  1. DBA同学将当前pika节点上的proxy下线,减少网络带宽使用。
  2. 修改rsync主从复制的限速参数(throttle-bytes-per-second),调整到10MB,重启pika节点。
  3. slave节点开始缓慢同步历史数据,等到主从历史数据同步完成,且积压binlog同步完成,切主。
  4. 老master机器下线保修,为新master重新启动一个slave。

复现方法:
master端:写入一批数据到RocksDB中,通过wondershaper限制master端网络带宽。(sudo wondershaper -a {device} -u 102400),取消限速执行sudo wondershaper -a {device} -c
slave端:执行slaveof force命令从master同步数据。

TODO:

  1. 支持动态调整rsync限速值。
  2. 支持动态调整rsync client请求超时时间。
  3. 参数修改后不生效的问题。

Please provide a link to a minimal reproduction of the bug

No response

Screenshots or videos

截屏2024-01-03 18 04 37
slave节点一直保超时。

Please provide the version you discovered this bug in (check about page for version information)

pika版本:3.5+

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
☢️ Bug Something isn't working
Projects
None yet
2 participants