Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

业务状态机中onConfigurationCommitted方法,接收到的参数有时会是0.0.0.0:0的格式,像是bug #931

Closed
tdxafpdq opened this issue Feb 10, 2023 · 5 comments

Comments

@tdxafpdq
Copy link

Your question

自己的业务状态机:public class ******StateMachine extends StateMachineAdapter{}中,
@OverRide了public void onConfigurationCommitted(final Configuration conf) {}方法,
目的是conf发生变化时,触发onConfigurationCommitted方法,
将接收到的最新conf持久化到MySQL数据库,以实现一次性配置后,即使服务重启,也不需要重新配置集群节点。

但是有个问题:
onConfigurationCommitted接收到的参数,大多数时候正常,例如:192.168.0.1:8080,192.168.0.2:8080,192.168.0.3:8080
偶尔会出现错误格式,暂时没找到规律,例如:0.0.0.0:0,0.0.0.0:0,0.0.0.0:0

请大佬帮忙确认,这是bug吗?如何避免或者解决问题?谢谢!

Describe your question clearly

Your scenes

Describe your use scenes (why need this feature)

Your advice

Describe the advice or solution you'd like

Environment

  • SOFAJRaft version:1.3.10.bugfix_2
  • JVM version (e.g. java -version):1.8.0_291
  • OS version (e.g. uname -a):Linux servernode 5.10.0-60.56.0.84.oe2203.x86_64 typo error #1 SMP Thu Sep 15 06:37:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Maven version:3.6.3
  • IDE version:IDEA2020.1
@tdxafpdq
Copy link
Author

image
这里为什么ip是0.0.0.0:0?是不是这个原因导致的?
经过测试发现,windows系统运行正常;centos和欧拉系统运行有问题。

@killme2008
Copy link
Contributor

killme2008 commented Feb 18, 2023

0.0.0.0:0 是初始值, 除非解码失败。所有节点的 jraft 版本一致? 确实是有节点变更触发的?是通过什么方式触发变更? 变更的节点信息是正确的? 可以看下日志。

@tdxafpdq
Copy link
Author

tdxafpdq commented Feb 20, 2023

0.0.0.0:0 是初始值, 除非解码失败。所有节点的 jraft 版本一致? 确实是有节点变更触发的?是通过什么方式触发变更? 变更的节点信息是正确的? 可以看下日志。

(1)所有节点的 jraft 版本一致,均为1.3.10.bugfix_2
(2)通过日志看到:节点重启回放时,状态机的onConfigurationCommitted方法中打印的日志,全部都是0.0.0.0:0的格式;2个节点是0.0.0.0:0,0.0.0.0:0;3个节点是0.0.0.0:0,0.0.0.0:0,0.0.0.0:0
(3)通过CliClientService.addPeer()方法添加的节点:
先建立1个节点的集群,再逐个添加节点;当添加到第3个或者第4个的时候,就会概率性的出现被添加的节点的onConfigurationCommitted方法,接收到0.0.0.0:0,0.0.0.0:0,0.0.0.0:0这种格式的conf;
同时还有另外一个问题:比如3个节点的集群,已经在正常工作,这时kill掉leader,那么剩下的2个节点就会开始投票,这是正常的,问题是:其中某个节点跑到【private void preVote() {}】的【LOG.warn("Node {} can't do preVote as it is not in conf <{}>.", getNodeId(), this.conf);】这个分支里面去,原因是this.conf(ConfigurationEntry类型)留存的也是0.0.0.0:0,0.0.0.0:0,0.0.0.0:0这种错误格式,从而导致坏掉的节点不重启,剩下的这2个节点永远无法选举成功。
(4)一个暂时没有依据的猜测:有没有可能是打包时由于加密软件的原因,导致某个文件被加密,放到服务器上无法访问,导致解码失败,进而导致这个现象?

期待您的答复,十分感谢!

@killme2008
Copy link
Contributor

killme2008 commented Feb 20, 2023

增加节点都有日志的,整体自己看下日志,跟踪下

https://github.com/sofastack/sofa-jraft/blob/master/jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java#L400

确认添加的信息是正确的。

加密的事情,没有经验,不了解。如果怀疑的话,就先不加密,排除下看看。

@killme2008
Copy link
Contributor

如果没有更多反馈,先关闭了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants