Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

增量同步异常退出 #729

Closed
jiejieling opened this issue Jun 14, 2022 · 22 comments
Closed

增量同步异常退出 #729

jiejieling opened this issue Jun 14, 2022 · 22 comments

Comments

@jiejieling
Copy link

jiejieling commented Jun 14, 2022

源库版本: v4.4.14
目标库版本:v4.4.14
mongoshake版本: v2.7.2

增量同步一段时间后,日志报
[2022/06/14 13:13:15 CST] [INFO] Syncer[cus-mg-10_sh_1] try to update checkpoint mandatory from 7108956952679415809[1655183023, 1] to {1655183222 1}
[2022/06/14 13:13:15 CST] [CRIT] Syncer[cus-mg-10_sh_1] filter newestTs[7108957807377907713[1655183222, 1]] smaller than previous timestamp[{1655183403 1}]

然后就异常退出

@zhangst
Copy link
Collaborator

zhangst commented Jun 14, 2022

脱敏后的配置文件和日志文件提供下

@jiejieling
Copy link
Author

jiejieling commented Jun 14, 2022 via email

@jiejieling
Copy link
Author

jiejieling commented Jun 14, 2022

配置如下:
conf.version = 10
id = mongoshake
master_quorum = false
full_sync.http_port = 9101
incr_sync.http_port = 9100
system_profile_port = 9200
log.level = info
log.dir = /data/log/mongo-shake/
log.file = collector.log
log.flush = false
sync_mode = all
mongo_urls = mongodb://xxx:[email protected]:27017
mongo_ssl_root_ca_file =
tunnel = direct
tunnel.address = mongodb://xxx:[email protected]:20001/admin?connect=direct
tunnel.message = raw
mongo_connect_mode = secondaryPreferred
filter.namespace.black =
filter.namespace.white =
filter.pass.special.db =
filter.ddl_enable = true
filter.oplog.gids = false
checkpoint.storage.url = mongodb://xxx:[email protected]:20001/admin?connect=direct
checkpoint.storage.db = mongoshake
checkpoint.storage.collection = ckpt_default
checkpoint.storage.url.mongo_ssl_root_ca_file =
checkpoint.start_position = 1970-01-01T00:00:00Z
transform.namespace =
full_sync.reader.collection_parallel = 6
full_sync.reader.write_document_parallel = 8
full_sync.reader.document_batch_size = 128
full_sync.reader.parallel_thread = 1
full_sync.reader.parallel_index = _id
full_sync.collection_exist_drop = true
full_sync.create_index = foreground
full_sync.executor.insert_on_dup_update = false
full_sync.executor.filter.orphan_document = false
full_sync.executor.majority_enable = false
incr_sync.mongo_fetch_method = change_stream
incr_sync.change_stream.watch_full_document = false
incr_sync.oplog.gids =
incr_sync.shard_key = collection
incr_sync.shard_by_object_id_whitelist =
incr_sync.worker = 8
incr_sync.tunnel.write_thread = 8
incr_sync.target_delay = 0
incr_sync.worker.batch_queue_size = 128
incr_sync.adaptive.batching_max_size = 1024
incr_sync.fetcher.buffer_capacity = 128
incr_sync.executor.upsert = false
incr_sync.executor.insert_on_dup_update = false
incr_sync.conflict_write_to = none
incr_sync.executor.majority_enable = false
special.source.db.flag =

@jiejieling
Copy link
Author

日志如下:
collector.log

@zhangst
Copy link
Collaborator

zhangst commented Jun 14, 2022

日志如下: collector.log

这个日志文件中没有报错,确认下文件对不对

@jiejieling
Copy link
Author

collector.log.zip
不好意思, 拿错文件了

@zhangst
Copy link
Collaborator

zhangst commented Jun 14, 2022

这个问题好复现吗?如果好复现,你用2.7.3版本也试一下。
这个应该是过滤oplog的时候哪里不对,这个还在看,2.7.3改过这个逻辑。

@jiejieling
Copy link
Author

2.7.3也试过了,同样的问题,以前2.6.x没有遇到过这个问题。这个问题不太好复现,没发现什么规律 基本就是跑4个小时左右就出现了

@zhangst
Copy link
Collaborator

zhangst commented Jun 14, 2022

那你用2.7.3打开debug日志跑一下,复现了反馈下。
我继续看看代码是哪里的问题。

@jiejieling
Copy link
Author

ok

@jiejieling
Copy link
Author

2.7.3 复现了,日志我过滤掉业务数据。
collector.log

@zhangst
Copy link
Collaborator

zhangst commented Jun 14, 2022

2.6.x的版本你的 incr_sync.mongo_fetch_method,用的是oplog 还是 changestream?

@zhangst
Copy link
Collaborator

zhangst commented Jun 14, 2022

2.7.3版本,你可以使用把incr_sync.mongo_fetch_method 改为 oplog,是否还会有这个问题。
这个应该是ChangeStream的方式,没有定时的noop心跳导致的问题。

@jiejieling
Copy link
Author

jiejieling commented Jun 14, 2022 via email

@jiejieling
Copy link
Author

2.7.3版本,你可以使用把incr_sync.mongo_fetch_method 改为 oplog,是否还会有这个问题。 这个应该是ChangeStream的方式,没有定时的noop心跳导致的问题。

已经在测试,暂未复现,我持续观察

@jiejieling
Copy link
Author

2.7.3版本,你可以使用把incr_sync.mongo_fetch_method 改为 oplog,是否还会有这个问题。 这个应该是ChangeStream的方式,没有定时的noop心跳导致的问题。

oplog 目前为止,运行14个小时,未复现问题

zhangst pushed a commit that referenced this issue Jun 15, 2022
@zhangst
Copy link
Collaborator

zhangst commented Jun 15, 2022

我发了一个2.7.4版本,你试一下

@jiejieling
Copy link
Author

我发了一个2.7.4版本,你试一下

好的

@jiejieling
Copy link
Author

2.7.4版本 截止目前一切正常,看来fix是有效的

@jiejieling
Copy link
Author

测试2天,问题没再复现。

@zhangst zhangst closed this as completed Jun 16, 2022
@jiejieling
Copy link
Author

jiejieling commented Oct 11, 2022 via email

@jiejieling
Copy link
Author

jiejieling commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants